GDAL behaviour for /vsis3/, /vsicurl/, Icechunk, and most cloud-native workflows is governed by config options like AWS_NO_SIGN_REQUEST, AWS_REGION, GDAL_NUM_THREADS, CPL_VSIL_CURL_USE_HEAD, etc. Currently users set these via gdal.SetConfigOption(...) before calling gdalxarray, or via environment variables. Both work, but neither is ideal:
gdal.SetConfigOption has global, process-wide scope and persists across calls. Easy to forget; surprising when stale.
- Environment variables are out-of-band and don't appear in code, making notebooks and scripts non-reproducible without external context.
Worked example of current usage:
from osgeo import gdal
gdal.SetConfigOption("AWS_NO_SIGN_REQUEST", "YES")
gdal.SetConfigOption("AWS_REGION", "us-west-2")
import gdalxarray
from gdalxarray import GDALBackendEntrypoint
backend = GDALBackendEntrypoint()
xds = backend.open_dataset(
"/vsis3/dynamical-ecmwf-aifs-single/ecmwf-aifs-single-forecast/v0.1.0.icechunk",
multidim=True,
)
The auth/region setup is structurally separate from the open call, which makes the code less self-documenting than it could be.
Options to discuss
1. Pass-through kwarg on open_dataset
xds = backend.open_dataset(
url,
multidim=True,
gdal_config={"AWS_NO_SIGN_REQUEST": "YES", "AWS_REGION": "us-west-2"},
)
Scoped via gdal.config_options(...) context manager (GDAL ≥ 3.5) so settings don't leak globally. Self-documenting in the call. The downside is API surface — adds another kwarg to maintain.
2. Document the env-var pattern as the recommended path
No code change in gdalxarray. README has a "common config" section showing env vars for AWS anonymous, NCI THREDDS, Pawsey, GCS, etc. Lowest-friction implementation but pushes the documentation burden onto users.
3. Module-level helper for common profiles
import gdalxarray
gdalxarray.config.aws_anonymous(region="us-west-2")
gdalxarray.config.nci_friendly() # GDAL_NUM_THREADS=4, etc.
xds = backend.open_dataset(url, multidim=True)
Curated presets for the common cases observed during 0.2.0 development (NCI THREDDS rate-limit recipe, anonymous S3, anonymous GCS, etc.). Convenient but Claude-adjacent overreach — gdalxarray taking opinions on what defaults users want.
4. No-op — users use gdal.SetConfigOption or env vars as they do today
Document the patterns in the README, point at the GDAL config docs, don't add API.
Considerations
- The hypertidy philosophy leans toward primitives and thin wrappers — argues against (3).
- The "remote ops are the primary use case" observation argues for (1) being inline-with-the-open-call.
- Context-managed scoping (whichever option) avoids the "stale global" pitfall.
- Related: # covers which specific options matter for performance and reliability.
Questions
GDAL behaviour for
/vsis3/,/vsicurl/, Icechunk, and most cloud-native workflows is governed by config options likeAWS_NO_SIGN_REQUEST,AWS_REGION,GDAL_NUM_THREADS,CPL_VSIL_CURL_USE_HEAD, etc. Currently users set these viagdal.SetConfigOption(...)before callinggdalxarray, or via environment variables. Both work, but neither is ideal:gdal.SetConfigOptionhas global, process-wide scope and persists across calls. Easy to forget; surprising when stale.Worked example of current usage:
The auth/region setup is structurally separate from the open call, which makes the code less self-documenting than it could be.
Options to discuss
1. Pass-through kwarg on
open_datasetScoped via
gdal.config_options(...)context manager (GDAL ≥ 3.5) so settings don't leak globally. Self-documenting in the call. The downside is API surface — adds another kwarg to maintain.2. Document the env-var pattern as the recommended path
No code change in gdalxarray. README has a "common config" section showing env vars for AWS anonymous, NCI THREDDS, Pawsey, GCS, etc. Lowest-friction implementation but pushes the documentation burden onto users.
3. Module-level helper for common profiles
Curated presets for the common cases observed during 0.2.0 development (NCI THREDDS rate-limit recipe, anonymous S3, anonymous GCS, etc.). Convenient but Claude-adjacent overreach — gdalxarray taking opinions on what defaults users want.
4. No-op — users use
gdal.SetConfigOptionor env vars as they do todayDocument the patterns in the README, point at the GDAL config docs, don't add API.
Considerations
Questions
gdal_configworth the API surface, or isgdal.SetConfigOptionclose enough?