You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After Phase 1 retest on a2a_fleet v0.8.15 (commit 2d6c877ca, 2026-06-23), the plugin is in good shape for oc_receiver end-to-end coding work — drove issue NousResearch#201 and #122 to merged PRs in Interstellar-code/hermes-switchui (PR NousResearch#276, NousResearch#281) in a single turn each using client.send_message_and_wait().
The Phase 1 retest surfaced 3 real ergonomics gaps that v0.8.15 closed (workdir auto-injection in role_text_for(), send_message_and_wait helper, dup-dispatch guard returning JSON-RPC -32001).
This issue tracks the remaining structured gaps plus an intentional multi-executor test matrix for the next phase of hardening. Goal: make the plugin bulletproof regardless of which receiver is on the other end.
Phase 2 enablers (priority order)
Structured task payload for fleet_send — blocker for coding-loop integration. Current briefs are free text. The coding-loop needs {context, expected_output, reply_to, deadline_s} to pass scope/acceptance/reply-format machine-parseable. Right now OpenCode re-derives what the orchestrator already knows. Explicitly deferred in v0.8.15 commit message.
response_handler profile routing — currently response_handler is a global config value. Different peers (claude, codex, opencode, agy, hermes-to-hermes) want different handlers per dispatch.
Per-peer cost / turn budget — track tokens + turns per peer to prevent runaway. Explicitly deferred in v0.8.15 commit message.
Coding-loop executor: a2a_opencode mode — coding-loop calls fleet_send / send_message_and_wait instead of running the agent locally. Same intake guard, same worktree convention, same PR criteria. Needs feat(workflow-engine): plugin contract refactor — phases 1-6 #1 first.
Multi-executor test matrix (upcoming, in priority order)
Current oc_receiver was tested happy-path only. The other three managed modes will likely expose protocol-level gaps the opencode test didn't trigger.
Executor
Why test
Known risks to watch
Claude Code (cc_receiver)
Different harness, different JSON-RPC quirks
First-encounter protocol gaps likely. Closely related: see #120 for cross-profile peering work.
Codex CLI (codex_receiver)
codex-cli version drift already seen in v0.8.5
Parse failures on output schema; need a pinned version + documented rationale. Prior failure mode captured in references/codex-receiver-error-parsing-pitfall.md.
Antigravity (agy_receiver)
Newer, less battle-tested
Expect different failure modes; baseline not yet established. See #109 (atomic last_stdout + prefix_drifted flag) and #71 (A2A Handshake v2) for the underlying extractor-drift fix path.
Hermes-to-Hermes (profile ↔ profile)
Both sides write /transcripts/<peer>.jsonl
Transcript collision risk: A2A-originated turns vs locally-originated turns must be namespaced cleanly. See #120 for the A2A bind-race + cross-profile work this requires.
Cross-cutting risks during multi-executor testing
codex-cli version drift — pin a version, document why
Profile misroute in spawners — peer using wrong profile → response goes nowhere silently
300s turn cap — all receivers inherit this; non-oc models may need faster models or shorter task breakdown (already validated: model='zai-coding-plan/glm-5-turbo' completes a 9-step dispatch in ~140s vs glm-5.1 xhigh timing out)
Multi-repo port allocation via allocate_band_port ✓
Acceptance criteria for closing this issue
All four Phase 2 enablers either implemented or explicitly moved to a follow-up issue with a clear blocker
Live fleet_send round-trips validated for all four managed modes against a real coding task (not just PONG) — see "Real-work validation" criterion in the skill's "v0.8.5 — Live test matrix" section
A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)
Context
After Phase 1 retest on a2a_fleet v0.8.15 (commit
2d6c877ca, 2026-06-23), the plugin is in good shape for oc_receiver end-to-end coding work — drove issue NousResearch#201 and #122 to merged PRs inInterstellar-code/hermes-switchui(PR NousResearch#276, NousResearch#281) in a single turn each usingclient.send_message_and_wait().The Phase 1 retest surfaced 3 real ergonomics gaps that v0.8.15 closed (workdir auto-injection in
role_text_for(),send_message_and_waithelper, dup-dispatch guard returning JSON-RPC-32001).This issue tracks the remaining structured gaps plus an intentional multi-executor test matrix for the next phase of hardening. Goal: make the plugin bulletproof regardless of which receiver is on the other end.
Phase 2 enablers (priority order)
Structured task payload for
fleet_send— blocker for coding-loop integration. Current briefs are free text. The coding-loop needs{context, expected_output, reply_to, deadline_s}to pass scope/acceptance/reply-format machine-parseable. Right now OpenCode re-derives what the orchestrator already knows. Explicitly deferred in v0.8.15 commit message.response_handlerprofile routing — currentlyresponse_handleris a global config value. Different peers (claude, codex, opencode, agy, hermes-to-hermes) want different handlers per dispatch.Per-peer cost / turn budget — track tokens + turns per peer to prevent runaway. Explicitly deferred in v0.8.15 commit message.
Coding-loop
executor: a2a_opencodemode — coding-loop callsfleet_send/send_message_and_waitinstead of running the agent locally. Same intake guard, same worktree convention, same PR criteria. Needs feat(workflow-engine): plugin contract refactor — phases 1-6 #1 first.Multi-executor test matrix (upcoming, in priority order)
Current
oc_receiverwas tested happy-path only. The other three managed modes will likely expose protocol-level gaps the opencode test didn't trigger.cc_receiver)codex_receiver)references/codex-receiver-error-parsing-pitfall.md.agy_receiver)last_stdout+ prefix_drifted flag) and #71 (A2A Handshake v2) for the underlying extractor-drift fix path./transcripts/<peer>.jsonlCross-cutting risks during multi-executor testing
model='zai-coding-plan/glm-5-turbo'completes a 9-step dispatch in ~140s vsglm-5.1 xhightiming out)What is already shipped (do NOT re-flag these as gaps)
workdirauto-injection inrole_text_for()✓ v0.8.15client.send_message_and_wait()helper atclient.py:186✓ v0.8.15-32001✓ v0.8.15deploy_oc_receiver✓A2A_OC_TOKEN_<peer>auth env-var chain ✓allocate_band_port✓Acceptance criteria for closing this issue
fleet_sendround-trips validated for all four managed modes against a real coding task (not just PONG) — see "Real-work validation" criterion in the skill's "v0.8.5 — Live test matrix" sectiona2a-fleet-deployupdated to v1.0+ with all rows in the test matrix moved to "shipped" or to follow-up issuesRelated issues
Local references
a2a-fleet-deployv0.9.0 (updated 2026-06-23 with the live future-to-do section)references/v0.8.15-coding-task-retest.md(PR fix: codex model discovery on Python 3.10 (tomllib fallback) NousResearch/hermes-agent#281)references/codex-receiver-error-parsing-pitfall.mdreferences/fleet-send-failure-decision-tree.md