Skip to content

Prober: track membership by lastChecked (decouple from uptime) + reset corrupted counters #35

@melvincarvalho

Description

@melvincarvalho

Public-tab uptime is still wrong for most relays: only the 8 allowlist relays had their counters reset; the other ~718 verified relays still carry degraded-bomb-sweep ratios (e.g. 13/32 ≈ 40%), so their cumulative uptime is stuck low even as clean daily probes accrue.

Fix needs two parts:

  1. Decouple prober membership from the uptime stat — target relays by lastChecked exists (we've probed it before) instead of checksOnline ≥ 1. This survives a counter reset and keeps offline-but-known relays in rotation (so they can recover). Brand-new harvested candidates (no lastChecked) still excluded.
  2. Reset corrupted counters (checksOnline/checksTotal/uptime) for the tracked set (ops), then re-probe → uptime rebuilds clean from honest daily samples.

Change (this PR)

  • sweepTargets: query { lastChecked: { $exists: true } } instead of { checksOnline: { $gte: 1 } }.

Reset + reprobe done as an ops step after deploy.

Note: post-reset, day-1 uptime is ~100% online / 0% offline (one sample); it becomes meaningful over subsequent days.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions