You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The event-driven keepalive loop is the canonical re-dispatch path (root workflow .github/workflows/agents-keepalive-loop.yml, "Agents Keepalive Loop"; consumer templates/consumer-repo/.github/workflows/agents-81-gate-followups.yml). Its
decision is made by evaluateKeepaliveLoop in .github/scripts/keepalive_loop.js:2253. docs/keepalive/GoalsAndPlumbing.md
documents three operator stop/throttle controls that this function is supposed to
honor:
§4 line 82: "Respect the agents:paused label, which blocks all keepalive
activity."
§4 line 83 / §4 lines 85-88: "After repeated failures (default: 3), the loop
pauses and adds needs-human label" and resumption requires removing needs-human.
§3 line 74: "Respect agents:max-parallel:<K> when present (integer 1-5)" — a
per-PR run cap.
Verified: evaluateKeepaliveLoop's decision tree enforces none of these.
The action is chosen at keepalive_loop.js:2624-2635:
with keepaliveEnabled = config.keepalive_enabled && hasAgentLabel
(keepalive_loop.js:2399). There is no read of agents:paused, no read of needs-human, and no run-cap parse anywhere in the function.
Specific wrong/missing behavior:
agents:paused is ignored.grep -n "agents:paused" .github/scripts/keepalive_loop.js
returns nothing; the literal is defined only in the orchestrator-path file .github/scripts/keepalive_gate.js:12 (const PAUSE_LABEL = 'agents:paused').
In agents-keepalive-loop.yml the label appears only inside the
state-fingerprint hash — where adding it changes the hash and therefore
guarantees the run is not deduped away. A PR with agent:codex + agents:paused + green Gate + unchecked tasks dispatches normally.
needs-human does not pause at dispatch. The label is added at keepalive_loop.js:3998-4002 (inside the if (stop) block of updateKeepaliveLoopSummary), but evaluateKeepaliveLoop never consults needs-human or state.failure.count when choosing the action. On the next
green Gate (or any label event — adding needs-human itself flips the
fingerprint), gate-green + tasks-remaining → run again. The "pause" only
holds while the Gate stays red.
No run cap. No agents:max-parallel:<K> (or any K override) is parsed in
the loop path; the only throttle is the per-PR concurrency group plus the
runner-dispatch debounce.
This is a current break (verified by reading the live decision tree, not a
latent edge case): the primary operator stop control (agents:paused) and the
documented failure-pause (needs-human) do not stop the event-driven loop. agents:paused and needs-human only take effect on the 30-minute orchestrator
path (keepalive_gate.js:1002), not on the event-driven loop that fires on every
Gate completion.
Scope
Enforce the documented stop/throttle controls at the top of the loop's decision
function, evaluateKeepaliveLoop in .github/scripts/keepalive_loop.js:
Return a non-dispatching action (action:'skip') when agents:paused is
present on the PR.
Return action:'skip' when needs-human is present, or when state.failure.count >= failureThreshold (the failure_threshold config,
default 3, already parsed in this function).
Parse and honor a per-PR run cap label and skip when at/over cap.
The labels array (lowercased label names) is already computed at keepalive_loop.js:2318-2320, and the keepalive state (with failure.count) is
already loaded in this function — both are available at the top of the tree.
Non-Goals
Do NOT modify the orchestrator-path enforcement in .github/scripts/keepalive_gate.js or .github/scripts/keepalive_orchestrator_gate_runner.js; those already honor agents:paused/run-cap and stay as-is.
Do NOT change the state-fingerprint hashing in agents-keepalive-loop.yml
(lines around the agents:paused/needs-human hash entries); this fix is in
the JS decision function, not the dedupe layer.
Do NOT remove the needs-human / agent:needs-attentionadd behavior at keepalive_loop.js:3985-4002; this issue adds the consume side, it does not
touch escalation.
Do NOT reconcile the separate label-name drift (agents:max-parallel vs the
orchestrator's agents:max-runs) or the agents:keepalive activation
question in LABELS.md here — pick ONE run-cap label name, wire it, and note the
choice; the doc/label reconciliation is a separate issue.
Scaffold-only completion does NOT count: adding a paused/needs-human
variable that is read but does not change the returned action, or a test
that asserts the label is present rather than that it forces skip, is a
failure of this issue. The deliberate-break acceptance criterion below must be
demonstrated, and the new test must drive evaluateKeepaliveLoop end-to-end
(not a helper in isolation).
Tasks
In evaluateKeepaliveLoop (.github/scripts/keepalive_loop.js:2253),
before the action-selection block at lines 2624-2635, add an early agents:paused guard: when the lowercased labels array (built at keepalive_loop.js:2318-2320) includes agents:paused, return { action: 'skip', reason: 'paused', ... } with the same return shape used by
the existing skip path (keepalive_loop.js:2823-2872).
In the same function, add a needs-human / failure-threshold guard:
return { action: 'skip', reason: 'needs-human' } (or 'failure-threshold')
when labels includes needs-human OR when the loaded keepalive state's failure.count >= failureThreshold (the failure_threshold value parsed by parseConfig, default 3 per keepalive_loop.js:1604-1607). This must short
the action regardless of Gate conclusion.
Parse a per-PR run-cap label in evaluateKeepaliveLoop (define the prefix
constant near the other label constants, e.g. const RUN_CAP_PREFIX = ...),
clamp to 1-5, and skip with reason:'run-cap-reached' when in-progress run
count is at/over the cap. Name the chosen label in ## Implementation Notes;
if reusing the orchestrator's existing agents:max-runs: prefix
(keepalive_gate.js:9), import or re-declare it consistently.
Ensure the new skip reasons are treated as neutral (no failure-count
increment, no PR-comment noise) consistent with the existing neutral-stop
handling at keepalive_loop.js:3135 and the §5 No-Noise policy
(docs/keepalive/GoalsAndPlumbing.md:97).
Extend .github/scripts/__tests__/keepalive-loop.test.js with the
deliberate-break test described in Acceptance Criteria, modeled on the
existing evaluateKeepaliveLoop waits when agent label is missing
(keepalive-loop.test.js:180) and ... skips when keepalive is disabled
(keepalive-loop.test.js:200) cases using the buildGithubStub helper
(keepalive-loop.test.js:23).
Acceptance Criteria
New named test in .github/scripts/__tests__/keepalive-loop.test.js, e.g. evaluateKeepaliveLoop skips when agents:paused is present: builds a PR via buildGithubStub with labels ['agent:codex','agents:paused'], a green
Gate run, and at least one unchecked task, calls evaluateKeepaliveLoop,
and asserts result.action === 'skip' and result.reason === 'paused'. A
parallel case asserts result.action === 'skip' for a PR carrying needs-human (green Gate + unchecked tasks). Run via node --test .github/scripts/__tests__/keepalive-loop.test.js → both pass.
Deliberate-break gate: after implementing the guard, temporarily
comment out the new agents:paused early-return at the top of evaluateKeepaliveLoop (keepalive_loop.js:2253). With the guard removed, the
new agents:paused test must FAIL — concretely, result.action comes back
as 'run'/'fix' (a dispatch) instead of 'skip', proving the test catches a
loop that ignores the pause label. Capture the FAIL output, then restore the
guard so the test passes again.
The existing 102 passing cases in .github/scripts/__tests__/keepalive-loop.test.js still pass after the change
(node --test .github/scripts/__tests__/keepalive-loop.test.js shows fail 0),
confirming the new guards do not regress the existing wait/skip/run/fix/verify
decisions.
Implementation Notes
Confirmed-green local baseline (node v24.3.0) from the Workflows repo root: node --test .github/scripts/__tests__/keepalive-loop.test.js → pass 102 fail 0.
evaluateKeepaliveLoop is exported at keepalive_loop.js:4679 (module.exports
begins there); the test harness imports it at keepalive-loop.test.js:12.
The lowercased label list is already available inside the function at keepalive_loop.js:2318-2320; reuse it rather than re-fetching labels.
Grounding docs: docs/keepalive/GoalsAndPlumbing.md:74 (run cap), :82
(agents:paused), :83 and :85-88 (needs-human pause/resume), :97 (no
PR-comment noise on skip). docs/LABELS.md:25 (agents:paused "Pauses
keepalive loop on PR"), :453-464, :546.
Orchestrator-path reference for the same semantics (do not edit, use as the
contract): keepalive_gate.js:12 (PAUSE_LABEL), keepalive_gate.js:1002
(hasPauseLabel), keepalive_gate.js:9 (MAX_RUNS_PREFIX = 'agents:max-runs:').
Why
The event-driven keepalive loop is the canonical re-dispatch path (root workflow
.github/workflows/agents-keepalive-loop.yml, "Agents Keepalive Loop"; consumertemplates/consumer-repo/.github/workflows/agents-81-gate-followups.yml). Itsdecision is made by
evaluateKeepaliveLoopin.github/scripts/keepalive_loop.js:2253.docs/keepalive/GoalsAndPlumbing.mddocuments three operator stop/throttle controls that this function is supposed to
honor:
agents:pausedlabel, which blocks all keepaliveactivity."
pauses and adds
needs-humanlabel" and resumption requires removingneeds-human.agents:max-parallel:<K>when present (integer 1-5)" — aper-PR run cap.
Verified:
evaluateKeepaliveLoop's decision tree enforces none of these.The action is chosen at
keepalive_loop.js:2624-2635:with
keepaliveEnabled = config.keepalive_enabled && hasAgentLabel(
keepalive_loop.js:2399). There is no read ofagents:paused, no read ofneeds-human, and no run-cap parse anywhere in the function.Specific wrong/missing behavior:
agents:pausedis ignored.grep -n "agents:paused" .github/scripts/keepalive_loop.jsreturns nothing; the literal is defined only in the orchestrator-path file
.github/scripts/keepalive_gate.js:12(const PAUSE_LABEL = 'agents:paused').In
agents-keepalive-loop.ymlthe label appears only inside thestate-fingerprint hash — where adding it changes the hash and therefore
guarantees the run is not deduped away. A PR with
agent:codex+agents:paused+ green Gate + unchecked tasks dispatches normally.needs-humandoes not pause at dispatch. The label is added atkeepalive_loop.js:3998-4002(inside theif (stop)block ofupdateKeepaliveLoopSummary), butevaluateKeepaliveLoopnever consultsneeds-humanorstate.failure.countwhen choosing the action. On the nextgreen Gate (or any label event — adding
needs-humanitself flips thefingerprint), gate-green + tasks-remaining →
runagain. The "pause" onlyholds while the Gate stays red.
agents:max-parallel:<K>(or any K override) is parsed inthe loop path; the only throttle is the per-PR concurrency group plus the
runner-dispatch debounce.
This is a current break (verified by reading the live decision tree, not a
latent edge case): the primary operator stop control (
agents:paused) and thedocumented failure-pause (
needs-human) do not stop the event-driven loop.agents:pausedandneeds-humanonly take effect on the 30-minute orchestratorpath (
keepalive_gate.js:1002), not on the event-driven loop that fires on everyGate completion.
Scope
Enforce the documented stop/throttle controls at the top of the loop's decision
function,
evaluateKeepaliveLoopin.github/scripts/keepalive_loop.js:action:'skip') whenagents:pausedispresent on the PR.
action:'skip'whenneeds-humanis present, or whenstate.failure.count >= failureThreshold(thefailure_thresholdconfig,default 3, already parsed in this function).
The
labelsarray (lowercased label names) is already computed atkeepalive_loop.js:2318-2320, and the keepalive state (withfailure.count) isalready loaded in this function — both are available at the top of the tree.
Non-Goals
.github/scripts/keepalive_gate.jsor.github/scripts/keepalive_orchestrator_gate_runner.js; those already honoragents:paused/run-cap and stay as-is.agents-keepalive-loop.yml(lines around the
agents:paused/needs-humanhash entries); this fix is inthe JS decision function, not the dedupe layer.
needs-human/agent:needs-attentionadd behavior atkeepalive_loop.js:3985-4002; this issue adds the consume side, it does nottouch escalation.
agents:max-parallelvs theorchestrator's
agents:max-runs) or theagents:keepaliveactivationquestion in LABELS.md here — pick ONE run-cap label name, wire it, and note the
choice; the doc/label reconciliation is a separate issue.
paused/needs-humanvariable that is read but does not change the returned
action, or a testthat asserts the label is present rather than that it forces
skip, is afailure of this issue. The deliberate-break acceptance criterion below must be
demonstrated, and the new test must drive
evaluateKeepaliveLoopend-to-end(not a helper in isolation).
Tasks
evaluateKeepaliveLoop(.github/scripts/keepalive_loop.js:2253),before the action-selection block at lines 2624-2635, add an early
agents:pausedguard: when the lowercasedlabelsarray (built atkeepalive_loop.js:2318-2320) includesagents:paused, return{ action: 'skip', reason: 'paused', ... }with the same return shape used bythe existing skip path (
keepalive_loop.js:2823-2872).needs-human/ failure-threshold guard:return
{ action: 'skip', reason: 'needs-human' }(or'failure-threshold')when
labelsincludesneeds-humanOR when the loaded keepalive state'sfailure.count >= failureThreshold(thefailure_thresholdvalue parsed byparseConfig, default 3 perkeepalive_loop.js:1604-1607). This must shortthe action regardless of Gate conclusion.
evaluateKeepaliveLoop(define the prefixconstant near the other label constants, e.g.
const RUN_CAP_PREFIX = ...),clamp to 1-5, and skip with
reason:'run-cap-reached'when in-progress runcount is at/over the cap. Name the chosen label in
## Implementation Notes;if reusing the orchestrator's existing
agents:max-runs:prefix(
keepalive_gate.js:9), import or re-declare it consistently.increment, no PR-comment noise) consistent with the existing neutral-stop
handling at
keepalive_loop.js:3135and the §5 No-Noise policy(
docs/keepalive/GoalsAndPlumbing.md:97)..github/scripts/__tests__/keepalive-loop.test.jswith thedeliberate-break test described in Acceptance Criteria, modeled on the
existing
evaluateKeepaliveLoop waits when agent label is missing(
keepalive-loop.test.js:180) and... skips when keepalive is disabled(
keepalive-loop.test.js:200) cases using thebuildGithubStubhelper(
keepalive-loop.test.js:23).Acceptance Criteria
.github/scripts/__tests__/keepalive-loop.test.js, e.g.evaluateKeepaliveLoop skips when agents:paused is present: builds a PR viabuildGithubStubwith labels['agent:codex','agents:paused'], a greenGate run, and at least one unchecked task, calls
evaluateKeepaliveLoop,and asserts
result.action === 'skip'andresult.reason === 'paused'. Aparallel case asserts
result.action === 'skip'for a PR carryingneeds-human(green Gate + unchecked tasks). Run vianode --test .github/scripts/__tests__/keepalive-loop.test.js→ both pass.comment out the new
agents:pausedearly-return at the top ofevaluateKeepaliveLoop(keepalive_loop.js:2253). With the guard removed, thenew
agents:pausedtest must FAIL — concretely,result.actioncomes backas
'run'/'fix'(a dispatch) instead of'skip', proving the test catches aloop that ignores the pause label. Capture the FAIL output, then restore the
guard so the test passes again.
.github/scripts/__tests__/keepalive-loop.test.jsstill pass after the change(
node --test .github/scripts/__tests__/keepalive-loop.test.jsshowsfail 0),confirming the new guards do not regress the existing wait/skip/run/fix/verify
decisions.
Implementation Notes
node --test .github/scripts/__tests__/keepalive-loop.test.js→pass 102 fail 0.evaluateKeepaliveLoopis exported atkeepalive_loop.js:4679(module.exportsbegins there); the test harness imports it at
keepalive-loop.test.js:12.keepalive_loop.js:2318-2320; reuse it rather than re-fetching labels.docs/keepalive/GoalsAndPlumbing.md:74(run cap),:82(
agents:paused),:83and:85-88(needs-humanpause/resume),:97(noPR-comment noise on skip).
docs/LABELS.md:25(agents:paused"Pauseskeepalive loop on PR"),
:453-464,:546.contract):
keepalive_gate.js:12(PAUSE_LABEL),keepalive_gate.js:1002(
hasPauseLabel),keepalive_gate.js:9(MAX_RUNS_PREFIX = 'agents:max-runs:').