Skip to content

morphology: memory guard raises false MemoryError on lazy dask rasters #3401

@brendancol

Description

@brendancol

Description

morph_erode, morph_dilate, morph_opening, and morph_closing raise a spurious MemoryError on large lazy dask-backed rasters. The dask backend processes the array chunk by chunk and never materializes a full padded copy, so the guard is rejecting work that would run fine.

Cause

_dispatch() in xrspatial/morphology.py calls _check_kernel_memory(rows, cols, ky, kx, name) with rows, cols = agg.shape (the full logical shape). The guard computes required = padded_rows * padded_cols * 16 and raises MemoryError when that exceeds 50% of host RAM.

That is correct for the eager numpy and cupy backends, which allocate a full padded float64 copy of the input. It is wrong for the dask backends: map_overlap allocates only a padded copy of each chunk, so peak memory scales with chunk size, not the full array. The guard's own docstring says it budgets "a padded copy of the input" -- only the eager backends do that.

Reproduction

import numpy as np
import dask.array as da
import xarray as xr
from xrspatial.morphology import morph_erode

arr = da.zeros((200000, 200000), chunks=(2048, 2048), dtype="float64")
raster = xr.DataArray(arr, dims=["y", "x"])
morph_erode(raster, kernel=np.ones((3, 3), dtype=np.uint8), boundary="reflect")

Raises:

MemoryError: erode(): kernel shape (3, 3) on a (200000, 200000) raster needs ~640.0 GB of padded float64 memory, but only 45.9 GB is available. Use a smaller kernel.

Fix

Skip the full-shape memory guard when the input is dask-backed. The per-chunk allocation is bounded by chunk size, so the full-array budget does not apply. Keep the guard for the eager numpy and cupy backends.

Backends affected

dask+numpy and dask+cupy. Eager numpy and cupy behavior is preserved.

Found by /sweep-performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdaskDask backend / chunked arraysperformancePR touches performance-sensitive codesweep-performanceFound by /sweep-performance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions