You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a read-onlydeploybot doctor command (plus a matching MCP diagnose tool) that runs an ordered set of environment and policy checks and reports ✓ / ⚠ / ✗ with a remediation hint per item.
DeployBot's value depends on GitHub state it does not own — gh presence, auth, token scopes, queue labels, exact required-check display names, trusted-actor logins, and a protective ruleset — yet these are validated lazily. The dominant failure mode for an alpha tool is therefore silent or cryptic misconfiguration (a raw gh stderr mid-operation, or a PR that waits forever), not a bug in the well-tested queue engine. doctor turns "it silently doesn't merge" into "here is exactly what to fix" in one command, with zero new dependencies or infrastructure.
Problem statement
Setup and ongoing reliability hinge on external GitHub state that DeployBot checks only at the point of use:
action.yml simply pip installs and runs drain --json — no preflight. (action.yml#L14-L20)
GitHub._run shells out to gh for every call; a non-zero exit surfaces as raw stderr. (cli.py#L586-L597)
trusted_actors / coordinator_actors / allowed_reviewers are free-text logins, only shape-checked — never verified against GitHub, so a typo silently breaks all marker trust. (config.py#L189-L209)
required_checks names must exactly match GitHub check display names; a mismatch makes a PR wait forever with a generic "is not complete." (cli.py#L530-L535)
The security model (independent ruleset, no-bypass merge credential) is prose-only with nothing to detect drift. (README.md#L49-L60)
Labels must be created via a separate ensure-labels step or every PR reports "queue authorization label is missing." (cli.py#L516-L517)
There is no single, fast, read-only way to confirm "this repo is correctly wired for DeployBot."
Proposed behavior
A new read-onlydeploybot doctor command runs the checks below, each independently degradable so one failure doesn't abort the rest. It exits 0 when there are no hard ✗ (warnings allowed) and non-zero when any ✗ is present.
Trusted/coordinator/reviewer logins — resolve @repository-owner; verify each login exists and (best-effort) has access; ✗ on an unknown login (the silent killer).
Required checks (best-effort) — sample recent runs on the base branch (or an open queued PR) and ⚠ if a configured required_checks name is never observed → likely display-name mismatch.
Branch protection / ruleset (best-effort) — fetch protection/rulesets for base_branch; ⚠ if the configured required checks aren't independently enforced, or if DeployBot's identity could bypass.
Checks 7–8 are advisory (⚠ only) and must never hard-fail.
CLI / API / config changes
New subparser doctor with --json (mirrors plan/status conventions); no positional args.
New command_doctor(client, *, json_output) returning a list of {check, status, detail, hint} dicts; text mode renders the same data.
New MCP tool diagnose(repository=None, config=None) calling doctor --json, keeping human/agent parity with the rest of mcp_server.py.
No .mergequeue.toml schema changes.
Backward compatibility
Purely additive: a new command + new MCP tool. No change to existing markers, batch format, other commands' exit codes, or config shape. Safe to ship in a minor release (e.g., v0.2.0); commit-pinned workflows are unaffected.
Telemetry / logging needs
No external telemetry.
Reuse the existing --json output pattern for a severity-tagged report.
Each gh probe doctor makes must catch QueueError / non-zero exit and convert it into a ✗ row instead of crashing — this also seeds a reusable "non-fatal probe" helper for future commands.
Acceptance criteria
deploybot doctor exits 0 on a correctly configured repo and prints one ✓ line per check.
With no gh on PATH, doctor prints a single ✗ "GitHub CLI not found" with an install hint and exits non-zero — without a Python traceback.
With gh present but unauthenticated, the auth check is ✗ with a gh auth login hint; later network-dependent checks degrade to ⚠ "skipped (no auth)" rather than crashing.
A malformed .mergequeue.toml yields a single ✗ carrying the ConfigError message (no traceback), and remaining checks still run where possible.
Missing queue/blocked labels produce a ⚠ recommending deploybot ensure-labels; after running it, doctor reports ✓.
A trusted_actors entry that is not a real GitHub login produces a ✗ naming the offending login; @repository-owner resolves correctly against owner/name and passes.
A required_checks name not present in observed base-branch/PR check runs produces a ⚠ flagging a possible display-name mismatch (advisory only, never exit-failing).
deploybot doctor --json emits a stable array of {check, status, detail, hint} objects with status ∈ {ok, warn, fail}; exit code is non-zero iff any status == "fail".
The MCP diagnose tool returns the same JSON payload as doctor --json.
Unit tests cover: missing-gh, unauth, bad-config, missing-labels, unknown trusted actor, check-name mismatch, and exit-code semantics — following the existing unittest / patch("agent_merge_queue.cli...") style in tests/test_cli.py.
Risks & mitigations
False alarms on rulesets/check names → mark checks 7–8 best-effort/⚠ only; document as advisory.
Token lacks admin scope to read protection → treat "cannot read" as ⚠ "insufficient scope to verify," not ✗.
Extra gh calls / rate → each probe is a single lightweight call; gate the heavier check-name sampling behind an available open queued PR.
Scope creep into auto-fixing → MVP is strictly read-only; only reference ensure-labels as a hint, never invoke it.
Out of scope (possible follow-ons)
deploybot doctor --fix for safe, confirmed remediations.
--log-level / structured logging for diagnosing drain failures in CI.
A doctor summary line in status/plan output (e.g., "⚠ 1 setup issue — run deploybot doctor").
Summary
Add a read-only
deploybot doctorcommand (plus a matching MCPdiagnosetool) that runs an ordered set of environment and policy checks and reports✓ / ⚠ / ✗with a remediation hint per item.DeployBot's value depends on GitHub state it does not own —
ghpresence, auth, token scopes, queue labels, exact required-check display names, trusted-actor logins, and a protective ruleset — yet these are validated lazily. The dominant failure mode for an alpha tool is therefore silent or cryptic misconfiguration (a rawghstderr mid-operation, or a PR that waits forever), not a bug in the well-tested queue engine.doctorturns "it silently doesn't merge" into "here is exactly what to fix" in one command, with zero new dependencies or infrastructure.Problem statement
Setup and ongoing reliability hinge on external GitHub state that DeployBot checks only at the point of use:
action.ymlsimplypip installs and runsdrain --json— no preflight. (action.yml#L14-L20)GitHub._runshells out toghfor every call; a non-zero exit surfaces as raw stderr. (cli.py#L586-L597)trusted_actors/coordinator_actors/allowed_reviewersare free-text logins, only shape-checked — never verified against GitHub, so a typo silently breaks all marker trust. (config.py#L189-L209)required_checksnames must exactly match GitHub check display names; a mismatch makes a PR wait forever with a generic "is not complete." (cli.py#L530-L535)ensure-labelsstep or every PR reports "queue authorization label is missing." (cli.py#L516-L517)There is no single, fast, read-only way to confirm "this repo is correctly wired for DeployBot."
Proposed behavior
A new read-only
deploybot doctorcommand runs the checks below, each independently degradable so one failure doesn't abort the rest. It exits0when there are no hard✗(warnings allowed) and non-zero when any✗is present.gh --versionresolvable; else✗"install GitHub CLI."gh auth statussucceeds; report the active account;✗with agh auth loginhint on failure.load_config()parses; surfaceConfigErroras a clean✗(no traceback).owner/name(repo view) and confirm read access.⚠"rundeploybot ensure-labels" if missing.@repository-owner; verify each login exists and (best-effort) has access;✗on an unknown login (the silent killer).⚠if a configuredrequired_checksname is never observed → likely display-name mismatch.base_branch;⚠if the configured required checks aren't independently enforced, or if DeployBot's identity could bypass.Checks 7–8 are advisory (
⚠only) and must never hard-fail.CLI / API / config changes
doctorwith--json(mirrorsplan/statusconventions); no positional args.command_doctor(client, *, json_output)returning a list of{check, status, detail, hint}dicts; text mode renders the same data.diagnose(repository=None, config=None)callingdoctor --json, keeping human/agent parity with the rest ofmcp_server.py..mergequeue.tomlschema changes.Backward compatibility
Purely additive: a new command + new MCP tool. No change to existing markers, batch format, other commands' exit codes, or config shape. Safe to ship in a minor release (e.g.,
v0.2.0); commit-pinned workflows are unaffected.Telemetry / logging needs
--jsonoutput pattern for a severity-tagged report.ghprobedoctormakes must catchQueueError/ non-zero exit and convert it into a✗row instead of crashing — this also seeds a reusable "non-fatal probe" helper for future commands.Acceptance criteria
deploybot doctorexits0on a correctly configured repo and prints one✓line per check.ghon PATH, doctor prints a single✗"GitHub CLI not found" with an install hint and exits non-zero — without a Python traceback.ghpresent but unauthenticated, the auth check is✗with agh auth loginhint; later network-dependent checks degrade to⚠"skipped (no auth)" rather than crashing..mergequeue.tomlyields a single✗carrying theConfigErrormessage (no traceback), and remaining checks still run where possible.⚠recommendingdeploybot ensure-labels; after running it, doctor reports✓.trusted_actorsentry that is not a real GitHub login produces a✗naming the offending login;@repository-ownerresolves correctly againstowner/nameand passes.required_checksname not present in observed base-branch/PR check runs produces a⚠flagging a possible display-name mismatch (advisory only, never exit-failing).deploybot doctor --jsonemits a stable array of{check, status, detail, hint}objects withstatus ∈ {ok, warn, fail}; exit code is non-zero iff anystatus == "fail".diagnosetool returns the same JSON payload asdoctor --json.gh, unauth, bad-config, missing-labels, unknown trusted actor, check-name mismatch, and exit-code semantics — following the existingunittest/patch("agent_merge_queue.cli...")style intests/test_cli.py.Risks & mitigations
⚠only; document as advisory.⚠"insufficient scope to verify," not✗.ghcalls / rate → each probe is a single lightweight call; gate the heavier check-name sampling behind an available open queued PR.ensure-labelsas a hint, never invoke it.Out of scope (possible follow-ons)
deploybot doctor --fixfor safe, confirmed remediations.--log-level/ structured logging for diagnosingdrainfailures in CI.status/planoutput (e.g., "⚠ 1 setup issue — rundeploybot doctor").