Describe the problem
to_geotiff(data, path, color_ramp='viridis') on a dask-backed DataArray executes the source graph twice. The streaming writer (_write_streaming) computes each chunk once to write pixels; _write_sidecars then calls _finite_stats (xrspatial/geotiff/_symbology.py), which runs dask.compute over the same source to get min/max/mean/stddev for the PAM stats and the QML ramp bounds.
Measured with a counting map_blocks layer on a 1024x1024 float64 source in 16 chunks:
| write |
chunk executions |
to_geotiff(data, path) |
16 |
to_geotiff(data, path, color_ramp='viridis') |
32 |
to_geotiff(data, path, color_ramp='viridis', color_ramp_range=(0, 2)) |
16 |
The docstring documents the extra pass and points at color_ramp_range as the escape hatch. But the streaming writer already materializes every pixel exactly once (row bands when a band fits the buffer budget, row x column segments on wide rasters, strip bands with tiled=False), so the stats pass can ride along with the write instead of re-reading the source.
Proposed fix
- Give
_write_streaming an optional chunk_observer callback, called with each materialized buffer right after compute. That's before the out_dtype cast and the NaN->sentinel restore, so the observer sees logical values.
- Add a small accumulator in
_symbology.py: count/min/max/mean/M2 per buffer, combined with Chan's parallel variance formula. Population stddev at the end, same as _finite_stats (ddof=0).
- In
to_geotiff, when color_ramp is set on the dask streaming path and no color_ramp_range was given, thread the accumulator through the write and hand its result to write_symbology_sidecars instead of re-running _finite_stats.
The GPU writer and VRT tiled paths keep the current behavior: the GPU writer fully materializes anyway, and VRT writes per-tile through a different code path. The docstring should say so.
Found by /sweep-performance against the geotiff module (2026-07-01). This costs wall-clock time and IO (the source is read twice), not memory. Affects dask+numpy and dask+cupy on the streaming CPU write path.
Describe the problem
to_geotiff(data, path, color_ramp='viridis')on a dask-backed DataArray executes the source graph twice. The streaming writer (_write_streaming) computes each chunk once to write pixels;_write_sidecarsthen calls_finite_stats(xrspatial/geotiff/_symbology.py), which runsdask.computeover the same source to get min/max/mean/stddev for the PAM stats and the QML ramp bounds.Measured with a counting
map_blockslayer on a 1024x1024 float64 source in 16 chunks:to_geotiff(data, path)to_geotiff(data, path, color_ramp='viridis')to_geotiff(data, path, color_ramp='viridis', color_ramp_range=(0, 2))The docstring documents the extra pass and points at
color_ramp_rangeas the escape hatch. But the streaming writer already materializes every pixel exactly once (row bands when a band fits the buffer budget, row x column segments on wide rasters, strip bands withtiled=False), so the stats pass can ride along with the write instead of re-reading the source.Proposed fix
_write_streamingan optionalchunk_observercallback, called with each materialized buffer right after compute. That's before the out_dtype cast and the NaN->sentinel restore, so the observer sees logical values._symbology.py: count/min/max/mean/M2 per buffer, combined with Chan's parallel variance formula. Population stddev at the end, same as_finite_stats(ddof=0).to_geotiff, whencolor_rampis set on the dask streaming path and nocolor_ramp_rangewas given, thread the accumulator through the write and hand its result towrite_symbology_sidecarsinstead of re-running_finite_stats.The GPU writer and VRT tiled paths keep the current behavior: the GPU writer fully materializes anyway, and VRT writes per-tile through a different code path. The docstring should say so.
Found by /sweep-performance against the geotiff module (2026-07-01). This costs wall-clock time and IO (the source is read twice), not memory. Affects dask+numpy and dask+cupy on the streaming CPU write path.