Skip to content

Epic: FAO delivery input-integrity — complete-coverage contract + no silent bad data #51

Description

@Polichinel

Parent: umbrella #20. Source: the input-integrity cluster (register C-25/C-26/C-30/C-34/C-15) hardened by two expert-code-reviews + a /falsify pass (2026-06-25/26).

Problem

The FAO delivery trusts its upstream inputs with no completeness or identity contract. The only gate (unfao.py:_validate, L147-172) checks the 9 GAUL metadata columns for nulls — nothing else. So bad data reaches FAO silently: fabricated zeros from missing months (C-26, Tier 1), the wrong forecast file (C-25), unassigned cells at global scale (C-30, Tier 1), partial coverage from a wrong region (C-34), and no provenance to audit any of it (C-15).

Why it matters

Two Tier-1 risks deliver wrong numbers a partner acts on, with no error signal — and especially before the imminent global switch (REGION africa_me_legacy → land_gaul, views-models#127). The fail-loud mechanism already exists (the enricher left-merges → NaN → crash, and logs unmapped counts); this epic adds the missing contracts, built right.

Desired end state

  • Every delivered cell is verified complete for the configured region; observed history is never fabricated; the forecast input's identity is verified; the delivery carries structured provenance.
  • All guards are representation-agnostic invariants in a reusable views_postprocessing/delivery/ package; pandas is isolated to one views_postprocessing/unfao/extraction.py seam; guards are called by — never methods of — the manager.
  • The pipeline is ready for the land_gaul global switch with no silent-bad-data paths.

Design contract — NON-NEGOTIABLE (every story obeys it; enforced by tests/test_input_integrity_design_contract.py)

① Two homes. Representation-free invariants + constants in delivery/ (partner-agnostic, reusable by the coming UN agencies); the pandas→primitives seam in unfao/extraction.py (FAO-local). Contract and representation never share a module.
② Primitives are the abstraction. Invariants take set[int] / np.ndarray / scalars / dicts (DIP). Extraction isolated in one module; pandas→frames swap = a new extraction module, invariants untouched (OCP). No premature Extractor Protocol (YAGNI/ISP) — it's a migration, not a coexistence.
③ Called, never inherited. Guards live outside the inherited class; the manager does extract → call pure guards → raise. The manager's LSP/SDP/SAP violation is DEFERRED to C-40 — this epic does not claim to fix it.

Scope

In: the delivery/ scaffold + unfao/extraction.py seam; the five guards (coverage, observed-range, identity, land_gaul-pin, provenance); package hygiene (delete orphaned unfao/frames.py + dead unfao/mapping/; fix the "frame" name overload); end-to-end tests.
Out: the pandas→views-frames migration of the delivery (C-40 / pipeline-core-gated); the manager de-inheritance (C-40); the datafactory NaN refactor; the per-cell coverage mask (Case B — deferred behind a tripwire); timeouts (C-13/#11, C-28) and the multi-store refactor (C-33) — different clusters.

Stories (implementation order)

Dependencies

S0 ──► S1 ──► S4
   ├─► S2 (also blocked on a pipeline-core companion: surface last_valid_month_id)
   ├─► S3
   └─► S5
{S1..S5} ──► S6

Epic acceptance criteria

  • delivery/ holds representation-free invariants (no import pandas anywhere under it); unfao/extraction.py is the only representation-aware code; guards are called, not inherited.
  • The five checks are enforced: complete coverage (S1), no fabricated observed months (S2), forecast identity (S3), land_gaul 64,736 + 82 (S4), structured provenance (S5).
  • Package hygiene done (orphaned frames.py + dead mapping/ gone; no "frame" name overload).
  • tests/test_input_integrity_design_contract.py stubs have flipped from xfail to real assertions; full suite green; ready for the land_gaul switch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicA large capability spanning multiple stories

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions