Epic: #51 · risk-register C-26 (Tier 1)
Problem
pipeline-core's data fetch applies df.fillna(0.0) unconditionally to all features (views-pipeline-core modules/dataloaders/dataloaders.py:~1208), consumed by this repo at unfao.py:_read_historical_data (L45-60). A datafactory assembly gap (unharvested month, failed source) becomes lr_ged_sb/ns/os = 0.0 — "zero fatalities" — and _validate (unfao.py:147-172) checks only the 9 metadata columns, never the features, so the fabricated zeros reach FAO with no error signal.
Why this is a decision, not a straight implementation
By the time this repo receives the DataFrame, the NaNs are already 0.0 — a filled zero is indistinguishable from a real zero. The only bounding signal — the zarr's last_valid_month_id attribute — is not reachable in this repo (verified: zero references in views_postprocessing/; it lives in the upstream zarr/dataloader). So the fix is mostly upstream, and this story decides the split.
Decision to make (and record in C-26 + here)
- (a) pipeline-core companion (preferred): stop the unconditional fill — bounded fill using
last_valid_month_id (fill only outside the declared valid range, fail on NaN inside it) + log a fill count. Open the companion in views-pipeline-core.
- (b) this-repo guard (if the signal is surfaced): have the loader expose
last_valid_month_id/valid range to the manager; _read/_validate drop or fail on (cell, month) inside the valid range that arrived as fill.
- (c) interim this-repo signal: a feature-column anomaly check (per-target zero-fraction log / not-all-zero-per-month) — weak, but better than nothing until (a) lands.
Work
Acceptance criteria
Dependencies
Likely spawns a views-pipeline-core companion. No code in this repo until the decision is made.
Epic: #51 · risk-register C-26 (Tier 1)
Problem
pipeline-core's data fetch applies
df.fillna(0.0)unconditionally to all features (views-pipeline-core modules/dataloaders/dataloaders.py:~1208), consumed by this repo atunfao.py:_read_historical_data(L45-60). A datafactory assembly gap (unharvested month, failed source) becomeslr_ged_sb/ns/os = 0.0— "zero fatalities" — and_validate(unfao.py:147-172) checks only the 9 metadata columns, never the features, so the fabricated zeros reach FAO with no error signal.Why this is a decision, not a straight implementation
By the time this repo receives the DataFrame, the NaNs are already
0.0— a filled zero is indistinguishable from a real zero. The only bounding signal — the zarr'slast_valid_month_idattribute — is not reachable in this repo (verified: zero references inviews_postprocessing/; it lives in the upstream zarr/dataloader). So the fix is mostly upstream, and this story decides the split.Decision to make (and record in C-26 + here)
last_valid_month_id(fill only outside the declared valid range, fail on NaN inside it) + log a fill count. Open the companion in views-pipeline-core.last_valid_month_id/valid range to the manager;_read/_validatedrop or fail on(cell, month)inside the valid range that arrived as fill.Work
un_faofetch path (datafactory vs viewser) and whetherfillna(0.0)is still unconditional in the installed pipeline-core.last_valid_month_id) and link it here.Acceptance criteria
Dependencies
Likely spawns a views-pipeline-core companion. No code in this repo until the decision is made.