Skip to content

cost_distance: Dijkstra heap undersized at height*width causes out-of-bounds writes #3369

@brendancol

Description

@brendancol

Summary

The numba Dijkstra kernels in cost_distance allocate their binary min-heap at exactly height * width entries, which is too small. A lazy-deletion min-heap enqueues a pixel every time its tentative distance improves, so a pixel can be pushed several times before it settles. On grids with strongly varying friction the total push count exceeds height * width, and _heap_push then writes past the end of the h_keys/h_rows/h_cols arrays. That is an out-of-bounds write into numba-managed memory.

Two kernels have the bad sizing:

  • _cost_distance_kernel (numpy backend, dask map_overlap chunks, and the dask+cupy CPU fallback)
  • _cost_distance_tile_kernel (iterative dask path), which is worse because phase 2 also seeds boundary pixels on top of the relaxation pushes.

The CuPy relaxation kernel is a parallel Bellman-Ford and does not use this heap, so the GPU path is unaffected.

Reproduction

The iterative dask path aborts the interpreter on non-uniform friction. Comparing the dask iterative result against the single-tile numpy result over ~120 random grids (height/width 10-20, uneven chunks, friction in [0.1, 9], 4- and 8-connectivity) reliably ends in corrupted size vs prev_size / free(): invalid pointer and a SIGABRT core dump (exit 134).

A plain heapq Dijkstra confirms the push count: on a 6x6 grid (capacity 36) it reaches 44 pushes, already past the allocated height * width.

The numpy single-tile path does not crash on small grids because the overflow happens to land on an adjacent allocation, but it is still undefined behavior and corrupts memory on larger or differently-laid-out inputs.

Fix

Size the heap to the real upper bound on pushes: the directed edge count plus one seed per pixel, height * width * (n_neighbors + 1). For the tile kernel, add the boundary-seed headroom + 2 * (width + height) + 4. The worst observed push count is about 1.9x height * width, well inside n_neighbors * height * width.

Impact

Memory corruption or a process abort on the numpy and dask backends whenever friction is non-uniform, which is the common real-world case. Severity: critical.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions