Description
morph_erode, morph_dilate, morph_opening, and morph_closing raise a spurious MemoryError on large lazy dask-backed rasters. The dask backend processes the array chunk by chunk and never materializes a full padded copy, so the guard is rejecting work that would run fine.
Cause
_dispatch() in xrspatial/morphology.py calls _check_kernel_memory(rows, cols, ky, kx, name) with rows, cols = agg.shape (the full logical shape). The guard computes required = padded_rows * padded_cols * 16 and raises MemoryError when that exceeds 50% of host RAM.
That is correct for the eager numpy and cupy backends, which allocate a full padded float64 copy of the input. It is wrong for the dask backends: map_overlap allocates only a padded copy of each chunk, so peak memory scales with chunk size, not the full array. The guard's own docstring says it budgets "a padded copy of the input" -- only the eager backends do that.
Reproduction
import numpy as np
import dask.array as da
import xarray as xr
from xrspatial.morphology import morph_erode
arr = da.zeros((200000, 200000), chunks=(2048, 2048), dtype="float64")
raster = xr.DataArray(arr, dims=["y", "x"])
morph_erode(raster, kernel=np.ones((3, 3), dtype=np.uint8), boundary="reflect")
Raises:
MemoryError: erode(): kernel shape (3, 3) on a (200000, 200000) raster needs ~640.0 GB of padded float64 memory, but only 45.9 GB is available. Use a smaller kernel.
Fix
Skip the full-shape memory guard when the input is dask-backed. The per-chunk allocation is bounded by chunk size, so the full-array budget does not apply. Keep the guard for the eager numpy and cupy backends.
Backends affected
dask+numpy and dask+cupy. Eager numpy and cupy behavior is preserved.
Found by /sweep-performance.
Description
morph_erode,morph_dilate,morph_opening, andmorph_closingraise a spuriousMemoryErroron large lazy dask-backed rasters. The dask backend processes the array chunk by chunk and never materializes a full padded copy, so the guard is rejecting work that would run fine.Cause
_dispatch()inxrspatial/morphology.pycalls_check_kernel_memory(rows, cols, ky, kx, name)withrows, cols = agg.shape(the full logical shape). The guard computesrequired = padded_rows * padded_cols * 16and raisesMemoryErrorwhen that exceeds 50% of host RAM.That is correct for the eager numpy and cupy backends, which allocate a full padded float64 copy of the input. It is wrong for the dask backends:
map_overlapallocates only a padded copy of each chunk, so peak memory scales with chunk size, not the full array. The guard's own docstring says it budgets "a padded copy of the input" -- only the eager backends do that.Reproduction
Raises:
Fix
Skip the full-shape memory guard when the input is dask-backed. The per-chunk allocation is bounded by chunk size, so the full-array budget does not apply. Keep the guard for the eager numpy and cupy backends.
Backends affected
dask+numpy and dask+cupy. Eager numpy and cupy behavior is preserved.
Found by /sweep-performance.