The lazy-backend memory claim (only the requested region is materialized) is verified today only against the conftest fixtures and one xarray tutorial dataset. Those are all small enough that a regression that quietly buffers the whole result would still fit in RAM and look fine.
What we want is a benchmark / smoke test that runs on a real-world-sized Dataset (a few GB on disk, multiple chunks) and asserts:
- peak RSS for a single-region access stays bounded by chunk size, not result size,
- streaming aggregation does not blow up while reducing a long axis,
compute() time scales linearly in materialized rows.
Likely lives under perf_tests/, gated behind a marker so CI does not always pay the download cost.
Spun out of #167.
The lazy-backend memory claim (only the requested region is materialized) is verified today only against the conftest fixtures and one xarray tutorial dataset. Those are all small enough that a regression that quietly buffers the whole result would still fit in RAM and look fine.
What we want is a benchmark / smoke test that runs on a real-world-sized Dataset (a few GB on disk, multiple chunks) and asserts:
compute()time scales linearly in materialized rows.Likely lives under
perf_tests/, gated behind a marker so CI does not always pay the download cost.Spun out of #167.