Skip to content

Panel-id rename-onto-occupied collisions survive in bespoke panel_ids.py (Niger, GhanaLSS; Burkina_Faso latent) #548

@ligon

Description

@ligon

Follow-up from the panel-id v2 red-team (PRs #547 / #546). The #504 update_id guard and the #536 Mali full-cover cur_set fix the rename-onto-occupied → groupby().first() data-loss class for update_id-routed countries (Malawi) and Mali. The same class survives, pre-existing and untouched, in countries whose bespoke panel_ids.py emits updated_ids.json directly (bypassing update_id):

  • GhanaLSS — 2 duplicate (i,t) tuples in sample/plot_features = the 2 reused-target ids in updated_ids['1988-89'] (101332←[204932,204922], 114008←[255728,255718]). Clear rename-onto-occupied.
  • Niger — 59 duplicate (i,t) tuples in sample/plot_features, all in wave 2014-15. Mechanism disputed by the two red-team lenses: one read 52/59 as id_walk rename targets (rename-onto-occupied); the other traced them to EXTENSION split-offs (cover EXTENSION ∈ {0,1,2}; 59 (GRAPPE,MENAGE) pairs appear >1×) already duplicate before id_walk. Needs a dig to confirm which.
  • Burkina_Faso (latent) — builds cur_set from panel-only candidates (updated_ids['2021-22'].keys()), the unfixed pattern. Safe today only because s00_me_bfa2021.dta contains exclusively panel households (full-cover == panel-only); a future cover with non-panel rows would resurface the Mali bug silently. Cheap hardening: build cur_set from the full cover (analogous to [feature-audit] plot_features degenerates to (i,t) with no plot_id level — Niger, Mali, Malawi, GhanaLSS (split from #513) #536).

Recommended: port #536's full-cover-cur_set guard into Niger + GhanaLSS bespoke scripts, harden Burkina_Faso defensively, audit Senegal/Tanzania/Ethiopia bespoke scripts (clean today, same pattern).

Separate related observation (distinct class, file separately if confirmed): roster + household_characteristics emit ['t','i','pid'] person-level collisions in non-panel waves — Mali 32026, Niger 210, GhanaLSS 10. Pre-existing; not panel-id i-rename. Likely a roster pid-multiplicity bug.

All residuals are byte-identical on development (pre-existing) — they do not block #547/#546.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions