Skip to content

scripts: relay directory data sweep (bad URLs, dedup, normalize, stale) #29

@melvincarvalho

Description

@melvincarvalho

Pragmatic, re-runnable data cleanup for the relay directory (catch-up on the mess predating the harvest hardening; prevention is already in place so it won't re-spike).

Categories

  • bad: fails safeRelayUrl (malformed / SSRF / non-ws) → delete
  • dup: multiple docs normalize to the same canonical URL → keep richest (max checksTotal, then freshest), delete the rest
  • renorm: kept doc's stored relay ≠ canonical → rewrite to canonical
  • stale (opt, --stale=N): kept doc not checked in N days → delete

Design

  • planRelaySweep(docs, {staleDays, now}) in src/hoses/relays.jspure, returns {bad, dups, renames, stale}; unit-tested.
  • scripts/sweep-relays.js — Mongo I/O only; dry by default, --apply to execute, --stale=N optional. Idempotent / re-runnable.

Acceptance

  • Dry run prints category counts; --apply deletes bad+dups(+stale) and normalizes kept URLs.
  • planRelaySweep unit-tested (dedup keeps richest, bad dropped, renorm detected, stale by age).
  • npm test green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions