Summary
The dask backends in xrspatial/contour.py materialize every chunk's contour output in a single dask.compute(*all_results) call and build the per-chunk task graph with nested Python for loops over row/column chunks. For large dask-backed inputs this makes peak memory scale with the total contour complexity rather than with chunk size, and it puts heavy pressure on the dask scheduler.
Affected code
_contours_dask (xrspatial/contour.py, lines ~418-455)
_contours_dask_cupy (xrspatial/contour.py, lines ~458-487)
Details
1. Single dask.compute materializes all chunk results
Both _contours_dask and _contours_dask_cupy build a list of delayed _process_chunk_numpy / _process_chunk_cupy tasks, then call:
chunk_results = dask.compute(*all_results)
This forces the scheduler to execute and return every chunk's contour geometry to the client process at once. The subsequent merge/deduplication step then builds numpy arrays holding every segment across the whole raster:
seg_rows = np.array(all_segs_r, dtype=np.float64)
seg_cols = np.array(all_segs_c, dtype=np.float64)
So peak memory is proportional to the total number of contour segments, not to the per-chunk buffer size. The module's own docstring already notes this limitation, but it means the dask backend is not safe for very large (e.g. 30 TB-class) terrain workloads.
2. Python loops over chunk axes
_contours_dask / _contours_dask_cupy also iterate over orig_row_chunks and orig_col_chunks in Python to build all_results. For a raster with many small chunks this serializes graph construction and is slower than a vectorized approach.
Impact
- OOM risk: dask inputs can run out of client memory during the merge step.
- Scheduler pressure: a single
dask.compute call with one delayed object per chunk can create a very large task graph and a single synchronization point.
Possible fixes
- Compute chunk results in batches instead of all at once, bounding the amount of intermediate geometry held in memory.
- Consider building the per-chunk task graph with vectorized indexing/offsets rather than nested Python loops.
- Optionally accumulate per-level segment signatures incrementally so the final global merge does not need to hold every segment simultaneously.
Audit context
Found during the 2026-06-15 performance sweep against xrspatial/contour.py.
Summary
The dask backends in
xrspatial/contour.pymaterialize every chunk's contour output in a singledask.compute(*all_results)call and build the per-chunk task graph with nested Pythonforloops over row/column chunks. For large dask-backed inputs this makes peak memory scale with the total contour complexity rather than with chunk size, and it puts heavy pressure on the dask scheduler.Affected code
_contours_dask(xrspatial/contour.py, lines ~418-455)_contours_dask_cupy(xrspatial/contour.py, lines ~458-487)Details
1. Single
dask.computematerializes all chunk resultsBoth
_contours_daskand_contours_dask_cupybuild a list of delayed_process_chunk_numpy/_process_chunk_cupytasks, then call:This forces the scheduler to execute and return every chunk's contour geometry to the client process at once. The subsequent merge/deduplication step then builds numpy arrays holding every segment across the whole raster:
So peak memory is proportional to the total number of contour segments, not to the per-chunk buffer size. The module's own docstring already notes this limitation, but it means the dask backend is not safe for very large (e.g. 30 TB-class) terrain workloads.
2. Python loops over chunk axes
_contours_dask/_contours_dask_cupyalso iterate overorig_row_chunksandorig_col_chunksin Python to buildall_results. For a raster with many small chunks this serializes graph construction and is slower than a vectorized approach.Impact
dask.computecall with one delayed object per chunk can create a very large task graph and a single synchronization point.Possible fixes
Audit context
Found during the 2026-06-15 performance sweep against
xrspatial/contour.py.