Pragmatic, re-runnable data cleanup for the relay directory (catch-up on the mess predating the harvest hardening; prevention is already in place so it won't re-spike).
Categories
- bad: fails
safeRelayUrl (malformed / SSRF / non-ws) → delete
- dup: multiple docs normalize to the same canonical URL → keep richest (max checksTotal, then freshest), delete the rest
- renorm: kept doc's stored
relay ≠ canonical → rewrite to canonical
- stale (opt,
--stale=N): kept doc not checked in N days → delete
Design
planRelaySweep(docs, {staleDays, now}) in src/hoses/relays.js — pure, returns {bad, dups, renames, stale}; unit-tested.
scripts/sweep-relays.js — Mongo I/O only; dry by default, --apply to execute, --stale=N optional. Idempotent / re-runnable.
Acceptance
- Dry run prints category counts;
--apply deletes bad+dups(+stale) and normalizes kept URLs.
planRelaySweep unit-tested (dedup keeps richest, bad dropped, renorm detected, stale by age).
npm test green.
Pragmatic, re-runnable data cleanup for the relay directory (catch-up on the mess predating the harvest hardening; prevention is already in place so it won't re-spike).
Categories
safeRelayUrl(malformed / SSRF / non-ws) → deleterelay≠ canonical → rewrite to canonical--stale=N): kept doc not checked in N days → deleteDesign
planRelaySweep(docs, {staleDays, now})insrc/hoses/relays.js— pure, returns{bad, dups, renames, stale}; unit-tested.scripts/sweep-relays.js— Mongo I/O only; dry by default,--applyto execute,--stale=Noptional. Idempotent / re-runnable.Acceptance
--applydeletes bad+dups(+stale) and normalizes kept URLs.planRelaySweepunit-tested (dedup keeps richest, bad dropped, renorm detected, stale by age).npm testgreen.