contour: dask backends materialize all chunk results in a single compute

## Summary
The dask backends in `xrspatial/contour.py` materialize every chunk's contour output in a single `dask.compute(*all_results)` call and build the per-chunk task graph with nested Python `for` loops over row/column chunks. For large dask-backed inputs this makes peak memory scale with the total contour complexity rather than with chunk size, and it puts heavy pressure on the dask scheduler.

## Affected code
- `_contours_dask` (`xrspatial/contour.py`, lines ~418-455)
- `_contours_dask_cupy` (`xrspatial/contour.py`, lines ~458-487)

## Details

### 1. Single `dask.compute` materializes all chunk results
Both `_contours_dask` and `_contours_dask_cupy` build a list of delayed `_process_chunk_numpy` / `_process_chunk_cupy` tasks, then call:

```python
chunk_results = dask.compute(*all_results)
```

This forces the scheduler to execute and return every chunk's contour geometry to the client process at once. The subsequent merge/deduplication step then builds numpy arrays holding every segment across the whole raster:

```python
seg_rows = np.array(all_segs_r, dtype=np.float64)
seg_cols = np.array(all_segs_c, dtype=np.float64)
```

So peak memory is proportional to the total number of contour segments, not to the per-chunk buffer size. The module's own docstring already notes this limitation, but it means the dask backend is not safe for very large (e.g. 30 TB-class) terrain workloads.

### 2. Python loops over chunk axes
`_contours_dask` / `_contours_dask_cupy` also iterate over `orig_row_chunks` and `orig_col_chunks` in Python to build `all_results`. For a raster with many small chunks this serializes graph construction and is slower than a vectorized approach.

## Impact
- **OOM risk:** dask inputs can run out of client memory during the merge step.
- **Scheduler pressure:** a single `dask.compute` call with one delayed object per chunk can create a very large task graph and a single synchronization point.

## Possible fixes
- Compute chunk results in batches instead of all at once, bounding the amount of intermediate geometry held in memory.
- Consider building the per-chunk task graph with vectorized indexing/offsets rather than nested Python loops.
- Optionally accumulate per-level segment signatures incrementally so the final global merge does not need to hold every segment simultaneously.

## Audit context
Found during the 2026-06-15 performance sweep against `xrspatial/contour.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contour: dask backends materialize all chunk results in a single compute #3333

Summary

Affected code

Details

1. Single `dask.compute` materializes all chunk results

2. Python loops over chunk axes

Impact

Possible fixes

Audit context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

contour: dask backends materialize all chunk results in a single compute #3333

Description

Summary

Affected code

Details

1. Single dask.compute materializes all chunk results

2. Python loops over chunk axes

Impact

Possible fixes

Audit context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Single `dask.compute` materializes all chunk results