Summary
deploybot react is the privileged orchestrator the GitHub Action runs on every delivery event. It already computes a rich result object (promoted, drain, integrations, release, top-level state) and prints it as JSON — but only to stdout inside one of many interleaved Actions runs, and it always exits 0. As a result, the worst operational failure mode is invisible: when the pipeline is paused, every triggered react no-ops, returns exit 0, and CI stays green, so a stuck pipeline can go unnoticed indefinitely.
This issue proposes rendering the result react already produces into the surfaces humans and agents actually look at — a GitHub Actions step summary, a sticky PR/commit status, and a non-green CI signal when paused or timed-out — plus a run_id to correlate the event fan-in. This is additive, needs no schema or architecture change, and strengthens the existing --json / notify / comment-marker patterns.
Problem statement
command_react returns/prints a structured object but only to stdout, and the top-level main() returns 0 for every non-exception path. (cli.py command_react, main)
- When
pipeline_control() is paused, react returns {"state": "paused", "reason": ...} and exits 0; the reason lives only in a deploybot-control:v1 comment marker, and nothing surfaces it proactively. (records.py control_body)
- The workflow uses
concurrency: group=deploybot-${{ github.repository }}, cancel-in-progress: false and fans in 6 event families, so debugging "why didn't PR #42 merge?" means diffing multiple interleaved runs with no correlation id. (examples/github-workflow.yml)
follow_release can return timed-out, but only to stdout with exit 0 — a stuck deploy looks successful. (pipeline.py follow_release)
- The Action invokes
react and currently writes no $GITHUB_STEP_SUMMARY. (action.yml)
There is no place an operator can glance at to see what the latest react pass did or why the pipeline is stuck.
Proposed behavior
Render the existing command_react result into operator-visible surfaces:
- Actions step summary — when
GITHUB_STEP_SUMMARY is set, append a Markdown table each pass: state, promoted, merged, waiting [{number, reason}], integration PRs, release/timeout. Built from the result object react already returns.
- Visible paused / timed-out signal — when
state == "paused" or release.state == "timed-out", make the run non-green (documented non-zero exit and/or a failing check-run). A normal empty pass stays exit 0. The JSON state field remains authoritative for agents.
- Sticky status surface — upsert a single "DeployBot status" comment (or commit status) summarizing the latest pass, including, when paused, the reason and the
deploybot control unpause remedy. Reuse the marker-upsert machinery in records.py.
run_id correlation — compute once per pass (sha256(repo:utc_now)[:12], mirroring the intent_id pattern in command_request) and include it in the result JSON and in every notify() payload emitted during that pass.
CLI / API / config changes
- No new subcommand required; behavior attaches to the existing
react flow.
- Add
run_id to command_react's result dict and thread it into notify() payloads.
- Step-summary writing handled in the composite Action's final step (or behind a
GITHUB_STEP_SUMMARY check in the CLI).
- Optional: a non-zero exit policy for
react when paused/timed-out.
- No
.mergequeue.toml schema changes. No marker schema changes.
Backward compatibility
Purely additive. Default text/JSON stdout is unchanged aside from the new run_id field; the step summary, sticky status, and exit-code policy are additive and auto-detected (e.g., only when GITHUB_STEP_SUMMARY is present). Commit-pinned workflows are unaffected. Safe to ship in a minor release.
Telemetry / logging needs
- No external telemetry.
- Reuse the existing
notify() webhook for run_id-stamped events.
- Reuse the existing comment-marker upsert for the sticky status surface.
Acceptance criteria
Risks & mitigations
- Noisy sticky comment → upsert a single comment in place rather than posting per pass; only update when content changes.
- Unexpected non-zero exits breaking existing automation → scope non-zero strictly to
paused and timed-out; document it; keep empty/normal passes exit 0.
- Step-summary unavailable outside Actions → guard on
GITHUB_STEP_SUMMARY presence; no-op locally.
- Scope creep into concurrency/idempotency fixes → explicitly out of scope here; this issue only surfaces existing state (duplicates become visible first, then fixable).
Out of scope (possible follow-ons)
- Idempotency keys (per
run_id + batch fingerprint) to prevent duplicate integration PRs / CI dispatches under event bursts.
- Emitting
paused and follow timeouts as notify() events for alerting.
- Expanded
react orchestration tests (promote→drain→overlap-integrate→follow, timed-out branch).
Summary
deploybot reactis the privileged orchestrator the GitHub Action runs on every delivery event. It already computes a rich result object (promoted,drain,integrations,release, top-levelstate) and prints it as JSON — but only to stdout inside one of many interleaved Actions runs, and it always exits 0. As a result, the worst operational failure mode is invisible: when the pipeline is paused, every triggeredreactno-ops, returns exit 0, and CI stays green, so a stuck pipeline can go unnoticed indefinitely.This issue proposes rendering the result
reactalready produces into the surfaces humans and agents actually look at — a GitHub Actions step summary, a sticky PR/commit status, and a non-green CI signal when paused or timed-out — plus arun_idto correlate the event fan-in. This is additive, needs no schema or architecture change, and strengthens the existing--json/notify/ comment-marker patterns.Problem statement
command_reactreturns/prints a structured object but only to stdout, and the top-levelmain()returns 0 for every non-exception path. (cli.pycommand_react,main)pipeline_control()ispaused,reactreturns{"state": "paused", "reason": ...}and exits 0; the reason lives only in adeploybot-control:v1comment marker, and nothing surfaces it proactively. (records.pycontrol_body)concurrency: group=deploybot-${{ github.repository }}, cancel-in-progress: falseand fans in 6 event families, so debugging "why didn't PR #42 merge?" means diffing multiple interleaved runs with no correlation id. (examples/github-workflow.yml)follow_releasecan returntimed-out, but only to stdout with exit 0 — a stuck deploy looks successful. (pipeline.pyfollow_release)reactand currently writes no$GITHUB_STEP_SUMMARY. (action.yml)There is no place an operator can glance at to see what the latest
reactpass did or why the pipeline is stuck.Proposed behavior
Render the existing
command_reactresult into operator-visible surfaces:GITHUB_STEP_SUMMARYis set, append a Markdown table each pass: state, promoted, merged,waiting [{number, reason}], integration PRs, release/timeout. Built from the result objectreactalready returns.state == "paused"orrelease.state == "timed-out", make the run non-green (documented non-zero exit and/or a failing check-run). A normal empty pass stays exit 0. The JSONstatefield remains authoritative for agents.deploybot control unpauseremedy. Reuse the marker-upsert machinery inrecords.py.run_idcorrelation — compute once per pass (sha256(repo:utc_now)[:12], mirroring theintent_idpattern incommand_request) and include it in the result JSON and in everynotify()payload emitted during that pass.CLI / API / config changes
reactflow.run_idtocommand_react's result dict and thread it intonotify()payloads.GITHUB_STEP_SUMMARYcheck in the CLI).reactwhen paused/timed-out..mergequeue.tomlschema changes. No marker schema changes.Backward compatibility
Purely additive. Default text/JSON stdout is unchanged aside from the new
run_idfield; the step summary, sticky status, and exit-code policy are additive and auto-detected (e.g., only whenGITHUB_STEP_SUMMARYis present). Commit-pinned workflows are unaffected. Safe to ship in a minor release.Telemetry / logging needs
notify()webhook forrun_id-stamped events.Acceptance criteria
reactpass writes a Markdown table to$GITHUB_STEP_SUMMARYwhen set; behavior is unchanged when unset.deploybot control unpause" in the summary/status surface and makes the Actions run non-green; a normal empty pass stays green / exit 0.followtimeout renders as a visible non-green signal and appears in the summary, distinguishable fromverified.react's JSON includes arun_id, and the same id appears in anynotify()payloads emitted during that pass.waiting[]entries carry the existingclassify()reason strings (e.g., "CI is not complete", "head changed after it was queued").--json) is byte-for-byte unchanged aside from the additiverun_idfield.run_idpropagation intonotify, and sticky-comment upsert — following the existingunittest/patch("agent_merge_queue...")style intests/test_cli.py.Risks & mitigations
pausedandtimed-out; document it; keep empty/normal passes exit 0.GITHUB_STEP_SUMMARYpresence; no-op locally.Out of scope (possible follow-ons)
run_id+ batch fingerprint) to prevent duplicate integration PRs / CI dispatches under event bursts.pausedandfollowtimeouts asnotify()events for alerting.reactorchestration tests (promote→drain→overlap-integrate→follow, timed-out branch).