Skip to content

S1 — Decide the no-fabricated-zeros guard for FAO feature columns (C-26) #52

Description

@Polichinel

Epic: #51 · risk-register C-26 (Tier 1)

Problem

pipeline-core's data fetch applies df.fillna(0.0) unconditionally to all features (views-pipeline-core modules/dataloaders/dataloaders.py:~1208), consumed by this repo at unfao.py:_read_historical_data (L45-60). A datafactory assembly gap (unharvested month, failed source) becomes lr_ged_sb/ns/os = 0.0 — "zero fatalities" — and _validate (unfao.py:147-172) checks only the 9 metadata columns, never the features, so the fabricated zeros reach FAO with no error signal.

Why this is a decision, not a straight implementation

By the time this repo receives the DataFrame, the NaNs are already 0.0 — a filled zero is indistinguishable from a real zero. The only bounding signal — the zarr's last_valid_month_id attribute — is not reachable in this repo (verified: zero references in views_postprocessing/; it lives in the upstream zarr/dataloader). So the fix is mostly upstream, and this story decides the split.

Decision to make (and record in C-26 + here)

  • (a) pipeline-core companion (preferred): stop the unconditional fill — bounded fill using last_valid_month_id (fill only outside the declared valid range, fail on NaN inside it) + log a fill count. Open the companion in views-pipeline-core.
  • (b) this-repo guard (if the signal is surfaced): have the loader expose last_valid_month_id/valid range to the manager; _read/_validate drop or fail on (cell, month) inside the valid range that arrived as fill.
  • (c) interim this-repo signal: a feature-column anomaly check (per-target zero-fraction log / not-all-zero-per-month) — weak, but better than nothing until (a) lands.

Work

  • Confirm the current un_fao fetch path (datafactory vs viewser) and whether fillna(0.0) is still unconditional in the installed pipeline-core.
  • Decide (a)/(b)/(c); write the decision into register C-26 and this issue.
  • If (a)/(b): open the views-pipeline-core companion issue (bounded fill + expose last_valid_month_id) and link it here.

Acceptance criteria

  • A written decision on how fabricated zeros are prevented/detected (in C-26 + this issue).
  • The pipeline-core companion exists and is linked (if upstream work is needed).
  • The this-repo portion (if any) is scoped into a follow-up implementation note.

Dependencies

Likely spawns a views-pipeline-core companion. No code in this repo until the decision is made.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planningInvestigation/spike/decision workstoryA single reviewable unit of an epic

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions