Skip to content

managed_microsimulation cannot run a local dataset file #415

@MaxGhenis

Description

@MaxGhenis

Problem

managed_microsimulation (and the underlying resolve_managed_dataset_reference) only accept a managed dataset name from the release manifest, or a remote URI (hf://, gs://). There is no way to run a simulation on a local dataset file — a build artifact the caller produced themselves (e.g. a downstream pipeline's per-year Stage-output H5 that is not part of any release manifest).

Passing a local path:

managed_microsimulation(dataset="/tmp/build/2026.h5", allow_unmanaged=True)

raises ValueError: Unknown dataset '/tmp/build/2026.h5' for country 'us'. Known datasets: [...]. allow_unmanaged=True only relaxes URIs (those containing ://), not local paths, and a file:// URI fails downstream with FileNotFoundError.

Impact

Local build-and-score pipelines — for example projecting the certified base to future years and then scoring reforms on the resulting local H5s — cannot go through the managed wrapper at all. They have to construct policyengine_us.Microsimulation directly, which bypasses the provenance recording and runtime-model pinning that the managed path exists to enforce.

Proposed fix

Accept a local filesystem path in resolve_managed_dataset_reference when allow_unmanaged=True (the same explicit opt-in already required for unmanaged URIs). materialize_dataset_source already passes non-URI paths through unchanged, so the simulation constructs normally and the provenance bundle is recorded (managed_by=policyengine.py). Passing a local path without allow_unmanaged=True should raise an actionable error instead of the generic "Unknown dataset" message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions