Skip to content

contour: dask backends materialize all chunk results in a single compute #3333

@Melissari1997

Description

@Melissari1997

Summary

The dask backends in xrspatial/contour.py materialize every chunk's contour output in a single dask.compute(*all_results) call and build the per-chunk task graph with nested Python for loops over row/column chunks. For large dask-backed inputs this makes peak memory scale with the total contour complexity rather than with chunk size, and it puts heavy pressure on the dask scheduler.

Affected code

  • _contours_dask (xrspatial/contour.py, lines ~418-455)
  • _contours_dask_cupy (xrspatial/contour.py, lines ~458-487)

Details

1. Single dask.compute materializes all chunk results

Both _contours_dask and _contours_dask_cupy build a list of delayed _process_chunk_numpy / _process_chunk_cupy tasks, then call:

chunk_results = dask.compute(*all_results)

This forces the scheduler to execute and return every chunk's contour geometry to the client process at once. The subsequent merge/deduplication step then builds numpy arrays holding every segment across the whole raster:

seg_rows = np.array(all_segs_r, dtype=np.float64)
seg_cols = np.array(all_segs_c, dtype=np.float64)

So peak memory is proportional to the total number of contour segments, not to the per-chunk buffer size. The module's own docstring already notes this limitation, but it means the dask backend is not safe for very large (e.g. 30 TB-class) terrain workloads.

2. Python loops over chunk axes

_contours_dask / _contours_dask_cupy also iterate over orig_row_chunks and orig_col_chunks in Python to build all_results. For a raster with many small chunks this serializes graph construction and is slower than a vectorized approach.

Impact

  • OOM risk: dask inputs can run out of client memory during the merge step.
  • Scheduler pressure: a single dask.compute call with one delayed object per chunk can create a very large task graph and a single synchronization point.

Possible fixes

  • Compute chunk results in batches instead of all at once, bounding the amount of intermediate geometry held in memory.
  • Consider building the per-chunk task graph with vectorized indexing/offsets rather than nested Python loops.
  • Optionally accumulate per-level segment signatures incrementally so the final global merge does not need to hold every segment simultaneously.

Audit context

Found during the 2026-06-15 performance sweep against xrspatial/contour.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions