Skip to content

Measure to_dataset peak memory + wall time on non-toy data #172

Description

@ghostiee-11

The lazy-backend memory claim (only the requested region is materialized) is verified today only against the conftest fixtures and one xarray tutorial dataset. Those are all small enough that a regression that quietly buffers the whole result would still fit in RAM and look fine.

What we want is a benchmark / smoke test that runs on a real-world-sized Dataset (a few GB on disk, multiple chunks) and asserts:

  • peak RSS for a single-region access stays bounded by chunk size, not result size,
  • streaming aggregation does not blow up while reducing a long axis,
  • compute() time scales linearly in materialized rows.

Likely lives under perf_tests/, gated behind a marker so CI does not always pay the download cost.

Spun out of #167.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions