Skip to content

[pull] main from CopilotKit:main#342

Merged
pull[bot] merged 177 commits into
TheTechOddBug:mainfrom
CopilotKit:main
Jun 12, 2026
Merged

[pull] main from CopilotKit:main#342
pull[bot] merged 177 commits into
TheTechOddBug:mainfrom
CopilotKit:main

Conversation

@pull

@pull pull Bot commented Jun 12, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

jpr5 and others added 30 commits June 10, 2026 22:35
Add written_by/state_written_at columns (PB migrations) and stamp every
status write with a stable host-derived writer identity. The status
writer now detects cross-writer state flips and foreign writes (the
anti-dual-writer flap-comb defense), normalizes observedAt to PB-safe
RFC-3339 shapes before date-field writes, and classifies writer errors
honestly (401 auth vs 403 permission split, new pb_not_found reason).
Pin the writer-identity stamping, flip/foreign-write warn paths
(TTL re-warns, self-write memory cap eviction), date normalization
shapes, overlay write outcomes, idempotency latches, and error
classification against fail-loud fake-PB fixtures hardened against
silent divergence from real PocketBase behavior.
Normalize padded projected keys to trimmed canonical form, skip
blank keys loudly, and replace ambiguous outcomes with explicit
discriminators: honest outage-skip/empty-projection semantics,
droppedCommError surfaced whenever the comm error misses the
aggregate row, trusted-negative duplicate preference scoped to
cell-vs-cell (no aggregate-row impersonation), and comm-error
identity asserts converted to discriminators so the consumer
cannot hot-loop.
Persist thrown driver errors to PB in the runDriverInputs catch,
carry WriteOutcome discriminators through the CLI summary (dropped
write counts with correct pluralization, no duplicate cause
clauses), and pin the runner/results behavior with StatusWriter
contract-typed stubs so writer contract drift is compile-checked
at the stub site.
…nment

Wire the legacy monolith scheduler's status writer with an explicit
writtenBy:"legacy" identity, correct the dual-writer dedupe comments
to describe the real upsert-collapse (and why it is not safe), stamp
the alert engine's synthesized cron outcome persisted:false honestly,
and align alert/orchestrator/probe test fixtures with the real
StatusWriter and OverlayWriteOutcome contracts.
…rects (SU-13)

Carry backendHostPattern + docsHost in the shell runtime config (no longer
baked from registry.json at Docker build time). Derive demo backend URLs at
runtime from the pattern; issue docs-host 301s from middleware with a runtime
DOCS_HOST so a misconfigured value can no longer 500 every docs route.

Validate NEXT_PUBLIC_LOCAL_BACKENDS and empty overrides; guard the backend
host pattern against silent env misconfigs. Reword the stale demo-page
comment about backend URL derivation. Pin the registry slug set and SSR
placeholder URL composition; fix env/spy/global leaks in runtime-config
test cleanup.

Squash of the initial runtime-URL refactor cluster:
- feat(showcase): carry backendHostPattern + docsHost in shell runtime config
- fix(showcase): derive demo backend URLs at runtime instead of baked registry values
- fix(showcase): issue docs-host 301s from middleware with runtime DOCS_HOST
- fix(showcase): never let a misconfigured DOCS_HOST 500 every docs route
- fix(showcase): guard backend host pattern against silent env misconfigs
- fix(showcase): validate NEXT_PUBLIC_LOCAL_BACKENDS values and empty overrides
- docs(showcase): reword stale demo-page comment about backend URL derivation
- test(showcase): fix env/spy/global leaks in runtime-config test cleanup
- test(showcase): pin registry slug set and SSR placeholder URL composition
…g (SU-2/8/11/14/15/16/17/18/19/20)

Resolve SEO redirect destinations against the docs host (SU-17); forward
the query string on SEO redirects (SU-16); match bare paths on wildcard
SEO sources (SU-19). Collapse duplicate slashes in docs-host redirect
destinations (SU-13). Regression test for /shared//evil.com open redirect
(SU-18). Emit 308 for docs-host redirects to match next.config parity
(SU-2). Add a path boundary to the api matcher exclusion (SU-15). Loud
guard when registry yields zero framework slugs (SU-20). Keep PostHog
capture alive via event.waitUntil (SU-14). Note docs-host redirects are
untracked by design (SU-8). Cover docs-host redirects at the middleware
level (SU-11).

Squash of the SEO-table + matcher-hardening cluster.
…rect builder safety (SU2-A/B + CR2-C)

SU2-B series — runtime-config / backend-url env robustness:
- Correct the Edge-safety story in runtime-config (SU2-B1)
- Stop per-request FATAL-CONFIG spam for unset BASE_URL (SU2-B2)
- Prepend https:// to a scheme-less POSTHOG_HOST (SU2-B3)
- Trim whitespace paste artifacts in env values and host patterns (SU2-B4)
- Memoize parseLocalBackends and warn once per value (SU2-B5)
- Make {slug} substitution immune to $-patterns (SU2-B6)
- Harden the client runtime-config reader (SU2-B7)
- runtime-config hardening batch (SU2-B8)
- Validate local-ports.json before baking NEXT_PUBLIC_LOCAL_BACKENDS (SU2-B9)
- test: warn-once assertions retry-safe; stop console leaks (SU2-B10)

CR2-C series — test infrastructure:
- Generate registry.json in a vitest globalSetup (CR2-C1)
- Stop ambient POSTHOG_KEY firing real fetches in middleware tests (CR2-C2)
- Assert the production slug set, not a re-derivation (CR2-C3)
- Make the registry generator subprocess robust (CR2-C4)
- Middleware/wiring test hygiene batch (CR2-C5)

SU2-A series — redirect-layer & PostHog capture:
- Stop $-pattern expansion in wildcard redirect substitution (SU2-A1)
- Surface PostHog capture failures once per failure class (SU2-A2)
- Duplicate exact redirect sources are first-match-wins (SU2-A3)
- Resolve runtime config once per redirected request (SU2-A4)
- Include destination host in seo_redirect capture (SU2-A5)
- Normalize scheme-less POSTHOG_HOST at the capture use site (SU2-A6)
- Correct redirect-layer comments and guard wildcard prefix boundary (SU2-A7)
- Cover docs-host hardening branches, compile matcher via path-to-regexp (SU2-A8)
…g (SU2-stragglers + SU5 + SU6-A/B)

Round-by-round CR convergence covering the redirect builder, the middleware
matcher, the docs-host self-loop guard, and the runtime-config env readers.

Highlights:
- Clear module-load warns after fresh middleware import
- Validate SET BASE_URL values (scheme-less/degenerate/garbage) with
  sentinel fallback + once-guarded FATAL log
- Normalize path/query/fragment-bearing DOCS_HOST to origin; reject
  non-http(s) schemes; branch rejection reasons
- Harden POSTHOG_HOST (degenerate-host/scheme rejection); expose
  posthogKey via readEnvPair semantics
- Reject a DOCS_HOST equal to the shell's own host (redirect-loop guard,
  authority compare)
- Warn on missing local-ports.json under SHOWCASE_LOCAL=1 and validate
  TCP port range; extract helper for tests
- backend-url hardening — slug charset guard, frozen local-backends memo,
  pattern path-segment warn, local-override URL validation
- Client config fail-loud covers all four URL fields with type checks
- Make RuntimeConfig.posthogKey optional — absence is a valid state, not
  a wiring bug
- Drop R15/R17 and guard /integrations from SEO redirects
- Dedup duplicate wildcard prefixes with first-match-wins warn
- Validate malformed SEO entries at lookup-build time
- Restore case-insensitive redirect matching parity
- Normalize trailing slashes before redirect matching
- Keep the framework segment on F13, pin MG3 case fix
- Read posthogKey from runtime config in middleware, not raw process.env
- Fall back to the default backend host pattern for degenerate values
- Disable docs redirects when the default fallback collides with the
  shell host
- Bring validateBaseUrl to parity with its sibling readers
- Strip query/fragment from POSTHOG_HOST while keeping reverse-proxy paths
- Restrict local backend overrides to http(s) URLs
- Add server-only guard to runtime-config
- Harden localBackendsEnv failure posture
- Hoist /integrations namespace guard above the docs-host redirect
- Validate seo-redirect sources and cross-kind shadowing in
  buildRedirectLookup
- Unify slash normalization for middleware matching
- Lowercase-normalize REGISTRY_FRAMEWORK_SLUGS at construction
- Escalate missing POSTHOG_KEY to console.error in production
- Skip all redirect steps when docs redirects are disabled (sentinel
  consumer)
- Reject userinfo credentials in DOCS_HOST, POSTHOG_HOST, and the backend
  host pattern
- Branch dev-vs-prod logging in readDocsHost and fatalPatternOnce
- Prepend http:// (not https://) to scheme-less loopback hosts
- Round-5 micro-finding batch across the URL config libs

SU5-A1..A7 — registry safety, // reject, builder lint batch (case-
insensitive :path*, same-destination twin allowlist, original-case
divergence remainder), matcher api boundary, generator+vitest infra, test
hygiene + empty docs-host guard, comment batch.

SU6-A1..A6 — reject miscased :path* tokens, warn on tokenless wildcards,
normalize redirect-destination comparisons like request time, reject
destinations containing "//", surface missing POSTHOG_KEY at config-
resolution time, compile matcher harness like Next's runtime, type
parse/tokensToRegexp in the path-to-regexp shim, keep buildRedirectLookup
JSDoc attached.

SU6-B1..B7 — reject query/fragment/userinfo in pattern and local-override
URL gates, return parsed-normalized URL form from validation success
paths, distinguish unset/blank/padded SHOWCASE_LOCAL states, warn when
SHOWCASE_LOCAL is set to a value other than 1, validate {slug} placeholder
in generate-registry, mirror middleware drop semantics in the wiring
test's registry re-derivation, pin the noStore spy and calls to one fresh
module instance in the Edge-path test.
…-side parity (SU7-F1/F2/F3)

SU7-F1 — backend host pattern hardening:
- F1.1  Reject bare trailing ?/# in the backend host pattern
- F1.2  Strip internal tab/CR/LF from the backend host pattern
- F1.3  Warn when ignoring an empty-string local backend override
- F1.4  Reject empty-userinfo @ in the backend host pattern authority
- F1.5  Keep __proto__ keys as data in local-backend maps
- F1.6  Commit the local-backends memo key only after the value computes
- F1.7  Trim local backend overrides before validation and name the real
        rejection
- F1.8  Honest FATAL when the pattern host is a stray scheme fragment
- F1.9  Canonicalize the pattern authority for parity with the override
        path
- F1.10 Acknowledge the staging-to-prod fail-open in the pattern fallback
- F1.11 Harden backend-url/local-backends-env test hygiene

SU7-F2 — runtime-config & client-config edge cases:
- F2.1  Branch POSTHOG_HOST rejection reasons (scheme/degenerate/parse-
        failure) instead of the catch-all mislabel
- F2.2  Reject loopback BASE_URL/DOCS_HOST in production instead of the
        silent http:// prepend
- F2.3  Key the DOCS_HOST fallback once-guard on (mode, shellHost, value)
        and mode-prefix all value-only guard keys
- F2.4  Reject a present-but-empty posthogKey in the client config reader
- F2.5  Drop the trailing slash from SSR_PLACEHOLDER_URL for structural
        parity with server values
- F2.6  Attribute the DOCS_HOST slash-strip to readDocsHost itself
- F2.7  Normalize trailing-dot FQDN spellings in the docs self-host loop
        guard (both compare sides)
- F2.8  Harden console spies to capture all log args; pin the full all-env
        config shape; converge SSR simulation on vi.stubGlobal

SU7-F3 — script-side parity, table classification & test isolation:
- F3 #1  Handle a missing reference integration per the error contract
- F3 #2  Port the runtime backend-host-pattern normalization into the
         generator — scheme/trailing-slash strip, degenerate fallback,
         NEXT_PUBLIC fallback
- F3 #3  Treat non-mapping manifest parses (empty/null/scalar/array YAML)
         as validation errors, not TypeErrors
- F3 #4  Label a missing/unreadable constraints.yaml per the stderr+exit(1)
         error contract
- F3 #5  Align atomic-write tmp naming with the test harness straggler-
         sweep convention; guard main() on direct invocation
- F3 #6  Correct the determineCellStatus unshipped docstring; replace
         stale hardcoded cell counts with formulas
- F3 #7  Isolate the pattern suite on a per-suite tmpdir harness; snapshot
         the generator's full write set
- F3 #8  Classify discarded duplicate wildcards as duplicates — hoist the
         owner check above the destination warns
- F3 #9  Reject a root ("/") EXACT seo-redirect source — homepage-hijack
         twin of the root-wildcard guard
- F3 #10 Reject seo-redirect entries with non-printable-ASCII source/
         destination — close the silent-dead-entry class
- F3 #11 Strip trailing slashes in normalizePosthogHost before the scheme
         test
- F3 #12 Message-filter the empty-slug-set error count; pin the single
         matcher entry
Pure prettier-style rewrite (line-wrapping, single-line collapses, multi-
line callback indentation). No semantic changes — confirmed via tsc clean,
oxlint clean, vitest 235/235 pass in showcase/shell.

Touched files:
- showcase/scripts/__tests__/generate-registry-pattern.test.ts
- showcase/scripts/generate-registry.ts
- showcase/shell/src/lib/backend-url.ts
- showcase/shell/src/lib/backend-url.test.ts
- showcase/shell/src/lib/docs-redirects.test.ts
- showcase/shell/src/lib/local-backends-env.test.ts
- showcase/shell/src/lib/runtime-config.ts
- showcase/shell/src/lib/runtime-config.test.ts
- showcase/shell/src/lib/runtime-url-wiring.test.ts
- showcase/shell/src/middleware.ts
- showcase/shell/src/middleware.test.ts
- showcase/shell/vitest.global-setup.ts
## Summary

Anti-dual-writer defense for the showcase `status` collection — the
flap-comb incident class where the legacy monolith scheduler and the
fleet writer fight over the same rows.

- **Writer identity columns**: new `written_by` / `state_written_at`
columns on `status` (two PB migrations, idempotency-symmetric up/down
paths). Every write stamps a stable host-derived writer identity; the
legacy orchestrator wires `writtenBy: "legacy"` explicitly.
- **Flip / foreign-write detection**: the status writer detects
cross-writer state flips inside a validated window and warns
(TTL-deduped re-warns so sustained fighting stays visible; one-time warn
when the self-write memory cap starts evicting; same-identity replica
hint).
- **PB-safe date normalization**: `observedAt` is normalized to RFC-3339
PB-safe shapes before any date-field write (zone-less shapes, colonless
offsets rejected) — no PB 400 unlatch-retry hot loops.
- **Aggregator honesty**: padded/blank projected keys normalized or
skipped loudly; explicit outcome discriminators replace ambiguous
asserts (no consumer hot-loop); `droppedCommError` surfaced whenever a
comm error misses the aggregate row; trusted-negative duplicate handling
scoped to cell-vs-cell so no cell can impersonate an aggregate row.
- **CLI persistence honesty**: thrown driver errors are persisted to PB
in the `runDriverInputs` catch; the summary carries `WriteOutcome`
discriminators (dropped-write counts, correct pluralization); writer
error classification documented honestly (401 vs 403 split, new
`pb_not_found` reason).
- **Alert/fixture truthfulness**: the alert engine's synthesized cron
outcome is stamped `persisted: false`; alert/orchestrator/probe test
fixtures align with the real `StatusWriter` / `OverlayWriteOutcome`
contracts, with fail-loud fake-PB fixtures hardened against silent
divergence from real PocketBase.

Hardened over 10 CR rounds (~60 reviewer reports), converged to zero
findings.

## Test plan

- [x] Full harness vitest suite: 124 test files / 2378 tests passing
- [x] `tsc --noEmit` and build typecheck (`tsconfig.build.json`) clean
- [x] oxfmt clean on all touched files; oxlint 0 errors
- [x] Known flake note: `probe-invoker.test.ts` wall-clock assertion
(elapsed 101ms vs <100ms bound) fired once under full-suite load and
passes in isolation (67/67) — pre-existing timing sensitivity, unrelated
to this diff
…s, and middleware/builder hardening (SU) (#5401)

## Summary

End-to-end hardening of the showcase shell's URL plane:

- Carries `backendHostPattern` + `docsHost` in the shell runtime config
(no longer baked from `registry.json` at Docker build time). Derives
demo backend URLs at runtime from the pattern, so a new pattern via env
reconfigures every integration on the next deploy without a registry
rebuild.
- Issues docs-host 301/308 redirects from middleware with a runtime
`DOCS_HOST`; misconfigured values no longer 500 every docs route — they
fall back to a sentinel that disables the docs-redirect step.
- Hardens the redirect table builder + matcher: first-match-wins for
duplicate exact sources, deduped wildcard prefixes with warn, malformed
entries rejected at lookup-build time, case-insensitive matching parity,
trailing-slash normalization, structural `/integrations` namespace guard
above the docs-host redirect (closes R15/R17 hijack class).
- Hardens runtime-config + backend-url env readers:
scheme/whitespace/control-char normalization, query/fragment/userinfo
rejection on `DOCS_HOST` / `POSTHOG_HOST` / backend-host pattern /
local-override URLs, present-but-empty `posthogKey` rejection, prod
loopback `BASE_URL`/`DOCS_HOST` rejection (no silent `http://` prepend),
once-guarded FATAL logging (no per-request spam).
- Brings the build-time twin in `showcase/scripts/generate-registry.ts`
to parity with the runtime normalizer (scheme/trailing-slash strip,
degenerate fallback, `NEXT_PUBLIC` fallback, slug validation).
- Surfaces PostHog capture failures once per failure class; keeps
capture alive across the redirect via `event.waitUntil`; missing
`POSTHOG_KEY` is `console.error` in production and surfaces at
config-resolution time (not first redirect).
- Open-redirect hardening on `/shared//evil.com` (SU-18); `//` rejected
at both source and destination; root `/` exact-source and `/:path*`
wildcard sources rejected; non-printable-ASCII source/destination
rejected.

Stream: SU (shell-runtime-urls). Subject groups
SU-2/8/11/13/14/15/16/17/18/19/20 (initial), SU2-A/B (env-hardening),
CR2-C (test infra), SU5-A1..A7 (registry/builder/lint/matcher boundary),
SU6-A1..A6 (request-time normalization parity), SU6-B1..B7
(parsed-normalized return forms, SHOWCASE_LOCAL states, generator
`{slug}` validation), SU7-F1..F3 (final-round backend-pattern + POSTHOG
+ table + script-side parity).

## Test plan

- [x] `pnpm test` in `showcase/shell` — 7 files, 235/235 pass
- [x] `pnpm test` in `showcase/scripts` — 51 files, 1857/1857 pass (one
pre-existing flake in `generate-registry.test.ts` unrelated to this
diff; passes on subsequent runs, classic vitest fork-reuse cross-file
pollution)
- [x] `pnpm exec tsc --noEmit` in `showcase/shell` — clean
- [x] `oxlint` on every changed file — 0 warnings, 0 errors
- [x] `oxfmt --check` on every changed file — clean
- [x] `pnpm build` in `showcase/shell` (full Next 15.x build with
registry+demo-content+starter-content+search-index generation) — clean
- [ ] CI green on the PR — confirm via `gh pr checks` after push
…rve low-frequency jobs

claimNext listed ONE global oldest-50 pending page; with the d4+d5
producers ticking every 15min against 2 serial browser workers, a
persistent backlog permanently saturated that page and e2e-demos jobs
never entered the candidate set (prod: all 18 e2e-demos jobs pending
forever; staging: 3,734 pending, oldest 22h).

claimNext now discovers the distinct families present in pending
(oldest first, one perPage=1 query per family) and tries them in
round-robin rotation across calls, listing a per-family candidate page
for each. Every discovered family is attempted before giving up, so no
family starves while any of its jobs are claimable. The S0 CAS
exactly-one-winner semantics and the per-page anti-herd shuffle are
unchanged; only candidate SELECTION changed.
Producers enqueued a fresh batch every scheduled tick regardless of
whether the family's previous batch had even been claimed — against 2
serial browser workers the queue compounded without bound (staging:
3,734 pending), feeding the claim-page starvation.

A scheduled tick now skips its family's batch when that family already
has pending (unclaimed) jobs, bounding the per-family backlog to one
batch, with a structured fleet.producer.skipped-for-backlog log
(family, pendingCount, skippedJobs) and a skippedForBacklog count on
TickResult. The check is per family, fails OPEN on a count blip (a PB
read failure must never stop production), and is BYPASSED by
operator-triggered ticks (explicit intent wins; the trigger CLI treats
0 enqueued as failure). Backed by a new
FleetQueueClient.countPendingForFamily (server-side totalItems count of
a family's pending rows).
…ins structurally

sweepExpired only reclaimed claimed/running leases — a pending row had
no terminal path, so an accumulated backlog (staging: 3,734 pending,
oldest 22h) could only drain through 2 serial workers and effectively
never did.

The sweep now also expires pending jobs older than expiryPeriods x
their family's production period (default 3 periods; the family has
enqueued fresher batches since, so the stale job's result would be
ancient data). Each stale row is first CLAIMED via the S0 CAS under a
synthetic stale-pending-sweeper id — so the delete can never race a
worker (exactly-one-winner) — then deleted; a failed delete self-heals
via normal lease expiry + re-queue. Unparseable created timestamps are
conservatively skipped (delete is destructive). Policy is configurable
via FleetQueueClientConfig.stalePending; the control-plane wires the
real per-family cadences (FLEET_FAMILY_PERIODS_MS: d4/d5 15min, d6/
e2e-demos hourly) so 15min families expire on a 45min window. No comm
error is synthesized for expired-pending rows (they never ran); the
count surfaces as SweepResult.expiredPending and in the producer's
sweep log.
…e lease phase just re-queued

sweepExpired's lease phase re-queues an expired-lease row to pending and
emits worker-reclaimed-pending ("back in flight"), but the stale-pending
phase of the SAME call lists pending fresh and ages rows off PB's system
`created` (the ORIGINAL enqueue time) — so a long-claimed job was
re-queued then immediately claimed-and-deleted, falsifying the comm
error and nulling downstream aggregate-key resolution on the deleted
row. Track the ids re-queued in this sweep and exclude them from this
call's stale phase; a truly stale job ages out on the next sweep. The
`created` anchor is kept (re-anchoring needs a column; out of scope).
…e-sweeper rows

When the stale-pending sweep CAS-claims a row under stale-pending-sweeper
and the delete fails, the next lease sweep treated the expired sweeper
lease like a crashed worker's: it re-queued the row AND synthesized a
worker-reclaimed-pending comm error — a gray "re-queued / back in flight"
dashboard overlay for stale garbage mid-deletion, attributed to a
non-existent worker. The lease sweep now special-cases rows held by the
stale-pending sweeper: still re-queued (the self-healing delete-retry
contract is unchanged) but silently — no comm error, no reclaimed count,
just a stale-sweeper-retry-requeue debug line. The lease holder is
snapshotted before the release CAS so attribution reflects who held the
expired lease, not the post-release row.
…ses beyond the page are reclaimed

The sweepExpired lease phase listed claimed/running rows with perPage 50
and NO sort: with >50 such rows (mass worker crash), PB's unspecified
default order could return the same 50 live-lease rows every sweep,
leaving expired leases beyond the page permanently orphaned with zero
signal. Sort by lease_expires_at ascending (indexed) so the most-expired
rows always head the page, and WARN when the page is full so truncation
is observable. Single page per sweep is kept deliberately — the sort
guarantees progressive forward drain.
… one

A single 50-row page per sweep was far slower than the incident the
stale-pending drain exists for: against the 3,734-row staging backlog at
~10 sweeps/hour that is ~7.5 hours of drain. The sweep now loops candidate
pages (re-listing page 1 — deletes shift pagination) up to a cap of 10
pages / 500 rows per sweep, draining the same backlog in well under an
hour while bounding a single sweep's PB load. CAS-claim-then-delete per
row is unchanged; a pass that expires nothing terminates the loop.
filterBackloggedFamilies ran BEFORE maybeSweep, so the tick whose own
stale-pending drain cleared a family's backlog still counted the
about-to-be-expired rows and skipped that family — production resumed a
full cron period late. Reordered tick() to sweep first; the cadence gate
and fail-open semantics (maybeSweep swallows sweep failures) are
unchanged, only the order moved.
…ucers

Only the d6 producer was built with onSweepCommErrors, but all four family
producers run the same GLOBAL queue.sweepExpired on their own crons — and the
sweep's S0 CAS means whichever producer ticks first wins each expired job's
reclaim, along with its synthesized comm error. With smoke/demos/deep sweeping
far more often than d6's hourly :40, the worker-reclaimed-pending dashboard
overlay (and stale-pending telemetry) was dropped ~11 of 12 sweeps, since
job-producer's maybeSweep forwards comm errors only when the sink is wired.

Share the ONE control-plane sink (surfaceSweepCommErrors -> aggregator) across
all four producers and correct the now-false "preserves the current behavior"
comment. Sweeps remain CAS-safe across producers: the S0 CAS guarantees exactly
one producer reclaims (and forwards) each expired job, and the surfacing leg is
best-effort per error, so the shared sink introduces no double-write.

Red-green: new runControlPlane REQ-B test drives the SMOKE producer's tick and
asserts its swept overlay reaches the status row (RED against d6-only wiring,
GREEN after). The test doUnmocks/re-mocks everything it touches so it passes in
isolation despite the file's leaked doMock factories.
…ed-pending semantics

The sweep no longer synthesizes worker-crashed-mid-job (it cannot tell a
crash from a platform teardown); it re-queues the job and emits the neutral
worker-reclaimed-pending kind. Update the queue-client module header, the
contracts kind/heartbeat/SweepResult/sweepExpired docs, the job-producer
sink/tick docs, the sweep test fixtures and titles, and the dashboard's
mirrored kind description. worker-crashed-mid-job is now documented as the
worker self-observed in-driver crash only. No runtime behavior changes.
…er tick outcome

maybeSweep's catch arm returned the same shape as a clean zero-reclaim
sweep (sweptExpired: true, reclaimed: 0), so a thrown sweepExpired call
was indistinguishable from success in the TickResult and the
tick-complete log. Add sweepFailed to the sweep outcome, TickResult,
and the tick-complete log; the cadence latch is unchanged (a failed
sweep still consumes its window so a persistently-failing sweep cannot
fire on every tick).
The field was dead on the only consumer path: queue-client's enqueue()
destructures only `payload` and never reads leaseSeconds (the claim
lease comes from the WORKER side — claimNext(workerId, leaseSeconds)
with the worker-loop's DEFAULT_LEASE_SECONDS). No production call site
ever set it, so wiring it up would add a knob nothing needs; delete is
chosen over wire-it.

Call-site enumeration (all removed):
- contracts.ts EnqueueJobInput.leaseSeconds (declaration; never read by
  queue-client.ts enqueue, the sole FleetQueueClient.enqueue impl)
- job-producer.ts ServiceJobSpec.leaseSeconds (only producer thereof)
- job-producer.ts toEnqueueInput() spec.leaseSeconds -> input.leaseSeconds
  threading (only writer of the field)
- job-producer.test.ts spec fixture leaseSeconds: 600 + the
  `expect(input.leaseSeconds).toBe(600)` assertion that legitimized the
  dead plumbing

Worker-side lease plumbing (claimNext/renewLease/worker-loop
leaseSeconds) is unrelated and untouched.
…umerators' families

stalePendingFilters silently falls back to the 1h default period for
any family missing from FLEET_FAMILY_PERIODS_MS, so a typo'd key (e.g.
"d5" vs "d5-single-pill-e2e") would never throw — it would just quietly
mis-size that family's stale-pending drain window. Lock the map's keys
to the probe-key families derived by RUNNING the four real enumerator
factories against a fake discovery source, so either side drifting
breaks the test. Also document the known d6 FLEET_PRODUCER_CRON
override drift limitation on the map (an env override changes d6's real
cadence without updating the nominal period).
- orchestrator.test.ts: file-level afterEach doUnmocks queue-client,
  status-writer, and result-consumer (vi.doMock factories persist across
  the file; resetModules clears the module cache, not the mock
  registry) so a leaked stub can't poison later tests. Full file
  re-run: 100/100 pass with the leaks closed — no test was depending on
  a leaked factory.
- orchestrator.test.ts: the R5-G4 webhook-secret tests now save/restore
  POCKETBASE_URL like the HF13-A2 pattern instead of unconditionally
  deleting it in finally.
- job-producer.test.ts: the no-warm test stubbed a local fetch spy it
  never wired in (vacuously zero calls); stub GLOBAL fetch via
  vi.stubGlobal (+ vi.unstubAllGlobals in afterEach) and assert the
  unconfigured producer never falls back to it.
- queue-client.test.ts: famOf re-implemented probeKeyFamily; import the
  production helper from contracts so the tests can't drift from the
  real family rule.
- control-plane.test.ts: the invalid-cron latch test now also retries
  start() on the FAILED instance and asserts it throws again (a
  stuck-true latch would make the retry a silent no-op).
jpr5 and others added 27 commits June 11, 2026 20:42
…ong-expired carve-out

Sibling of the queue-client prose fix (CF7 #10), which flagged this
contract doc as describing only the drain phase: despite the name, the
lease phase's long-expired carve-out also claim-deletes claimed/running
rows (stale created-age, long-expired or unparseable lease) into this
count — no re-queue, no comm error, no reclaimed increment.
… grafts older signal (CF8 F3)

The cold-load comm-error supplemental fetch runs CONCURRENTLY with the bulk
pages, so the bulk copy of an aggregate row can be NEWER than the supplemental
snapshot (the row's state changed between the two reads). The previous merge
replaced the bulk row unconditionally — regressing state/observed_at and
potentially fail_count back to the older supplemental values until the row's
next SSE delta (long for slow-cadence aggregates).

Add a freshness guard: when the supplemental row is strictly older (by
observed_at), keep the newer bulk row INTACT — signal-less rather than
chimera (newer core + stale signal). A chimera row would be silently swallowed
by the reducer's signal-PRESENCE no-op check; a signal-less bulk row lets
the next SSE delta restore the real current signal via the
undefined→defined presence flip.

Equal timestamps and unparseable timestamps both prefer the supplemental
(signal-bearing) row — only POSITIVELY-stale supplemental is suppressed,
preserving the cold-load comm-error overlay intent of CF7-F3 #1.
… tick-result doc reconciliations (Procedure 3 promotions)

PROMOTE_TO_A (defense-in-depth, exploit-class):
- fleet-health.ts reclaim list interpolated workerId raw into the PB filter
  literal. workerId is DB-sourced (read back from the workers roster row),
  not a compile-time constant, and the same field is escaped via JSON.stringify
  at orchestrator.ts:3240 — but the reclaim path was missing the same
  hardening. A double-quote in worker_id (corrupt row, buggy self-registration)
  would either throw the list (silently skipping this worker's reclaim every
  cycle) or widen the filter to claim other workers' jobs. Match the sibling
  escape pattern.

PROMOTE_TO_B (doc reconciliations exposed by the CF round-8 audit):
- queue-client.ts COUNT-NAME CAVEAT was stale — the cross-referenced
  contracts.ts doc was already updated (commit 80c5c94) but the queue-client
  side still asked a future maintainer to make the edit that had already
  landed.
- WarmHealthConfig + JobProducerOptions.warmHealth docs said the producer
  warms 'every enumerated backend' / 'each enumerated spec'; the implementation
  warms gate.specs (post-validation, post-backlog-gate). A fully-backlogged
  tick warms nothing. Updated both interface-level docs to match.
- TickResult.reclaimedIndeterminate said 'Of reclaimed, the reclaims...' —
  but the field is DISJOINT from reclaimed (sibling SweepResult contract +
  the queue-client both state a thrown release lands here exclusively).
  Rewrote to 'In addition to reclaimed, ...'.
- TickResult.skippedForBacklog doc named only the dedupe-gate contributor;
  the in-function comment correctly documents the fail-CLOSED poisoned-count
  fold-in. Expanded the field doc to name both contributors, and to clarify
  that the fail-OPEN leg lands in backlogGateFailedOpen separately.
- SweepResult.commErrors pairing equation
  (commErrors.length === reclaimed + reclaimedIndeterminate) was asserted
  unconditionally on the shared contract, but reclaimedIndeterminate is
  optional and fakes may not report the split. Scoped to 'implementations
  that report the split'.
…see the D4 rung

Mirrors cell-model resolveD4 fold semantics (worst-state-wins, 1h stale
window, rank-based missing-chat collapse, chat-wins tie-break pinned by
row-identity assertions); d4 is informational — rollup contributors
unchanged.
…and scopes the rollup line honestly

D4 row inserted with worst-state strictness; e2e relabeled "E2E (Demo)"
atomically (label-derived testids); rollup line relabeled "Service
(health + e2e)"; headline regression pins pill-red <-> visible red popup
row via buildCellModel.chipColor cross-assert; green-badge tests
FRESH-pinned.
…rid, legacy cells, and legend

Grid d3 badge API->E2E, legacy e2e badge RT->E2E, legend D6 "Parity
(PR)"->"Parity (Reference)", legend D4 chip copy 6h->1h (real window);
d4 grid "RT" and d2 "API" unchanged.
The auth-middleware-presence regex in queue-client.test.ts hook-parity
suite was anchored to the single-line `}, $apis.requireAdminAuth());`
closer. After oxfmt rewrote fleet-claim.pb.js to its multi-line form
(`},\n  $apis.requireAdminAuth(),\n);`) the regex stopped matching
and the test asserted 0 routes were guarded — a false alarm.

Broaden the regex to accept both the single-line closer and the
formatter's split form; the structural intent (each routerAdd
handler-end is followed by the requireAdminAuth middleware) is
unchanged.
…, dashboard drilldown D4 parity (#5399)

## Summary

Bundles two related dashboard/harness lanes that landed together once
both reached green:

1. **CF #18 fairness / claim-fair lane** — the 23-commit base
(`fix/fleet-claim-fairness` rebased onto its CF7 integration tip
`80c5c9402`) carrying claim-spike, cf3/cf4/cf5/cf6/cf7 hardening waves,
plus the **CF8 micro-fix** for the round-8 supplemental-merge finding:

- **CF8 F3 supplemental-merge freshness guard** (`useLiveStatus.ts`):
the cold-load comm-error supplemental fetch runs CONCURRENTLY with the
bulk pages, so the bulk copy of an aggregate row can be NEWER than the
supplemental snapshot. The previous merge replaced the bulk row
unconditionally, regressing `state`/`observed_at` to stale values until
the row's next SSE delta (long for slow-cadence aggregates). Added
`supplementalRowIsOlder` — when the supplemental row is strictly older
the newer bulk row stays INTACT (signal-less) rather than being grafted
into a chimera (newer core + stale signal) that the reducer's
signal-PRESENCE no-op check would silently swallow.

- **Procedure 3 promotions** (bucket-c/d audit over the CF round-8
ledger): one functional fix (escape `workerId` in fleet-health's reclaim
list filter — matches the `JSON.stringify` pattern used at
orchestrator.ts:3240 for the same field; a `"`-bearing worker_id would
otherwise break out of the literal) plus five doc reconciliations on the
producer/contract surfaces flagged across slot1/slot2/slot4/slot5 (stale
queue-client cross-reference, WarmHealthConfig doc vs `gate.specs`
reality, TickResult.reclaimedIndeterminate "Of reclaimed" → "In addition
to reclaimed", TickResult.skippedForBacklog poisoned-count contributor,
SweepResult.commErrors equation scoping).

2. **Dashboard drilldown D4 parity** — three commits making the
dashboard drilldown surface the D4 rung the same way the grid does:

- `feat(showcase): add resolveD4Row + CellState.d4 so the drilldown can
see the D4 rung` — exposes the D4 row on `CellState`, modeled after
`resolveD3Row`.
- `fix(showcase): drilldown shows the D4 row, de-crosses the e2e label,
and scopes the rollup line honestly` — renders D4 in the drilldown
panel, fixes the crossed e2e label, scopes the rollup line to its real
source set.
- `fix(showcase): unify dimension naming on the legend taxonomy across
grid, legacy cells, and legend` — taxonomy cleanup so legend, grid, and
legacy cells use one set of labels.

## Verification

- Dashboard `npx tsc --noEmit`: clean (only the pre-existing
missing-`@/data/*.json` errors that exist on `main`).
- Dashboard `vitest run`: 59 files / 1012 tests passing, 1 skipped.
- Harness `npx tsc --noEmit`: clean.
- Harness `vitest run`: 121/122 files passing; 1 failed = `probe-invoker
times out invoker-level even when driver ignores abortSignal` — the
NAMED known wall-clock flake (per CF7 integration verification), re-run
in isolation: 67/67 PASS.
- Rebase of drilldown-parity onto the updated `fix/cf8-m1` tip: ZERO
conflicts (the two lanes touch disjoint files).

## Test plan

- [ ] CI green on the PR
- [ ] Dashboard drilldown shows D4 rung in addition to D3/D5/D6
- [ ] Legend taxonomy reads the same in legend, grid, drilldown
- [ ] (post-merge, in showcase) cold-load comm-error overlay still
paints; a stale supplemental no longer regresses a freshly-failed
aggregate row
…bility

Adds run-id/family/worker-id columns to probe_jobs and resource_snapshots,
plus the EnqueueJobInput.family + FleetQueueClient.pruneAged contracts and
the hoisted deriveHealth primitive that downstream projections share. The
fleet-claim PB hook stamps run-id/family at claim time so every later
projection has a stable join key. probes/run-history is updated to read
the new columns.
…rojection

queue-client stamps run_id/family onto every enqueue so downstream
projections can attribute jobs to a family-scoped batch; pruneAged
retention legs land here (the d6 producer owns the call, per §4.2).
result-aggregator computes redsIntroduced/redsCleared from claimed/
sequenced job state into probe_runs summary.
…ojection

The §5.1 FLEET_FAMILIES registry and the §5.2.1 family-summary projection
that derives per-family outcome / inflight / lastRun / lastSuccessAt from
the PB-backed batches. The memoized variant fan-outs PB reads at most once
per TTL regardless of viewer count — the SAME instance is shared by the
/api/runs routes and the family-silence monitor so a dashboard poll and a
monitor evaluation inside the same TTL cost one PB fan-out total.
job-producer takes a family option and stamps it on every enqueue; the
prune-ownership key is the d6 producer's family. family-silence-monitor
rides the existing fleet-health interval (no extra timer to tear down),
keys 6 h rate-limit + recovered one-shot per family, fails open on
PB-down (the meta-alert path must still fire), and renders alert text
from closed-vocabulary parts only (§5.2.1 redaction). control-plane
fire-and-forgets familySilence.tick(now) each fleet-health cycle.
…stamp

Read-only /api/runs routes mounted on the control-plane role, backed by
the SHARED memoized family-summary instance (one PB fan-out per TTL).
Bounds: per-route memo, request rate-limit. /health gains the
fleetRuns.lastEvaluatedAt stamp from the family-silence monitor as the
§9 compensating control for a wedged monitor (an external poll detects
the wedge — the monitor cannot report its own host's death).
…to runControlPlane

PRODUCER_FAMILY_WIRING (drift-locked set-equal to FLEET_FAMILIES via a unit
test) drives every buildJobProducer call site; the boot-resolved
worker-stale-after window is threaded through BOTH fleet-health and the
shared family-summary projection so they judge staleness against the same
window. Triggered CLI control-plane runs route through familyForLevel so a
registry rename breaks loudly instead of silently enqueueing jobs invisible
to the projection. Test queues across the harness gain a no-op pruneAged
for the new contract.
Adds the dashboard Ops worker-runs section — family table, worker strip,
run-history drill-down, D0-from-staleness vs D0-from-failure family
annotation with clock glyph, and the per-family silence banner on the
coverage tab. Wires the data layer: DTOs, /api/runs fetchers, polling
hook, and a worker-runs context provider. cell-drilldown / cell-pieces
gain family-aware rendering.
… covers

Adds a 689-line integration test that exercises the full queue lifecycle
to the /api/runs projection (enqueue → claim → terminal → projection)
across all four families. Updates the railway-envs golden + verify-deploy
drivers regression test to account for the new fleet-runs route surface.
…t tests

- queue-client.ts: add missing closing brace at EOF (TS1005 after rebase)
- job-producer.test.ts: add required `family: "d6"` to producer fixtures
  and remove duplicate `logger` key in startedProducer
- result-aggregator.test.ts: align with per-row try/catch + dedup-lookup
  behavior introduced in 0b2f613 — add `persisted: true` and
  `writeOverlay` to test writers, update B6 contract expectations
- queue-client.test.ts: drop a stray blank line
…ily-silence Slack alerting (#5400)

## Summary

Adds end-to-end **run-visibility** to the showcase fleet — a new data
path from queue enqueue → per-family projection → dashboard + Slack
alerting:

- **Contracts + PB schema**: `EnqueueJobInput.family`,
`FleetQueueClient.pruneAged`, run-id/family/worker-id columns on
`probe_jobs` / `resource_snapshots`, fleet-claim hook stamps at claim
time.
- **Queue + aggregator**: queue-client stamps run-id/family on enqueue;
d6 producer owns `pruneAged` retention; aggregator computes
`redsIntroduced`/`redsCleared` into `probe_runs`.
- **Run-view projection (`run-view.ts`)**: §5.1 `FLEET_FAMILIES`
registry + §5.2.1 family-summary projection.
`createMemoizedFamilySummary` bounds PB load at ~one fan-out per TTL
regardless of viewer count.
- **Producer wiring**: `PRODUCER_FAMILY_WIRING` drift-locks the four
producers' family ids set-equal to the registry (unit-tested);
CLI-triggered runs route through `familyForLevel` so a registry rename
breaks loudly.
- **Family-silence monitor (§9)**: Slack alerting for the one incident
class transition-keyed rules are blind to (a silent family produces no
row transitions). Rides the existing fleet-health interval (no extra
timer), per-family 6h rate-limit + recovered one-shot + boot-grace,
fails open on PB-down for the meta-alert. Closed-vocabulary redaction.
- **HTTP `/api/runs`** + **`/health.fleetRuns.lastEvaluatedAt`**:
read-only fleet-runs routes mounted unconditionally on the CP role,
backed by the SAME memoized summary the monitor uses. `/health` stamp is
the §9 compensating control so an external poll can detect a wedged
monitor.
- **Orchestrator wiring**: boot-resolved `workerStaleAfterMs` threaded
through both fleet-health and the projection so they judge staleness
against the same window; test queues across the harness gain a no-op
`pruneAged`.
- **Dashboard**: worker-runs Ops section (family table, worker strip,
run-history drill-down), D0-from-staleness vs D0-from-failure family
annotation, per-family silence banner on the coverage tab; data layer
(DTOs, fetchers, polling hook, context provider).
- **Integration test**: 689-line end-to-end test exercising the full
queue lifecycle → /api/runs projection across all four families.

This is the merged scope of the **runviz blitz** lanes T1-T15.

## Test plan

- [x] `pnpm typecheck` on `showcase/harness` and
`showcase/shell-dashboard` — clean
- [x] `vitest run` on full harness suite — 2276 pass / 3 pre-existing
probe-pool timing flakes (NOT in diff; subject is "probe-pool timing
fragility", a different PR's concern)
- [x] `oxfmt --check showcase/` — all green
- [x] Integration test `run-visibility.integration.test.ts` passes —
exercises queue → projection across all four families
- [ ] CI on PR HEAD — pending push, monitored after open

## Known follow-up (bucket-d)

- Probe-pool timing-sensitive flakes in
`src/probes/helpers/browser-pool.test.ts` (FIX#4a self-heal) and
`src/probes/loader/probe-invoker.test.ts` (timeout assertions) —
different test fails on each run; subject is probe-pool timing
fragility, not runviz.

## Deviation note

The runviz ship-finisher session ran in a Claude Agent SDK harness
without an Agent dispatch tool, so the standard 7-agent CR-loop could
not be dispatched. Inline review was performed against the cr-loop
subject-manifest + four-bucket partition: T1-T15 each landed via their
own per-task review on the blitz integration branch before reaching this
PR, and T8/T15 (the two final-merged streams) were validated via
typecheck-clean + targeted-tests-pass + cross-module wiring inspection.
A heavyweight 7-agent CR can be run against this PR if desired.
@pull pull Bot locked and limited conversation to collaborators Jun 12, 2026
@pull pull Bot added the ⤵️ pull label Jun 12, 2026
@pull pull Bot merged commit 1e9d77e into TheTechOddBug:main Jun 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants