A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)

# A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)

## Context

After Phase 1 retest on **a2a_fleet v0.8.15** (commit `2d6c877ca`, 2026-06-23), the plugin is in good shape for **oc_receiver** end-to-end coding work — drove issue #201 and #122 to merged PRs in `Interstellar-code/hermes-switchui` (PR #276, #281) in a single turn each using `client.send_message_and_wait()`.

The Phase 1 retest surfaced 3 real ergonomics gaps that v0.8.15 closed (workdir auto-injection in `role_text_for()`, `send_message_and_wait` helper, dup-dispatch guard returning JSON-RPC `-32001`).

This issue tracks the **remaining structured gaps** plus an **intentional multi-executor test matrix** for the next phase of hardening. Goal: make the plugin bulletproof regardless of which receiver is on the other end.

## Phase 2 enablers (priority order)

1. **Structured task payload for `fleet_send`** — *blocker for coding-loop integration.* Current briefs are free text. The coding-loop needs `{context, expected_output, reply_to, deadline_s}` to pass scope/acceptance/reply-format machine-parseable. Right now OpenCode re-derives what the orchestrator already knows. Explicitly deferred in v0.8.15 commit message.

2. **`response_handler` profile routing** — currently `response_handler` is a global config value. Different peers (claude, codex, opencode, agy, hermes-to-hermes) want different handlers per dispatch.

3. **Per-peer cost / turn budget** — track tokens + turns per peer to prevent runaway. Explicitly deferred in v0.8.15 commit message.

4. **Coding-loop `executor: a2a_opencode` mode** — coding-loop calls `fleet_send` / `send_message_and_wait` instead of running the agent locally. Same intake guard, same worktree convention, same PR criteria. Needs #1 first.

## Multi-executor test matrix (upcoming, in priority order)

Current `oc_receiver` was tested happy-path only. The other three managed modes will likely expose protocol-level gaps the opencode test didn't trigger.

| Executor | Why test | Known risks to watch |
|---|---|---|
| **Claude Code** (`cc_receiver`) | Different harness, different JSON-RPC quirks | First-encounter protocol gaps likely. Closely related: see #120 for cross-profile peering work. |
| **Codex CLI** (`codex_receiver`) | codex-cli version drift already seen in v0.8.5 | Parse failures on output schema; need a pinned version + documented rationale. Prior failure mode captured in `references/codex-receiver-error-parsing-pitfall.md`. |
| **Antigravity** (`agy_receiver`) | Newer, less battle-tested | Expect different failure modes; baseline not yet established. See #109 (atomic `last_stdout` + prefix_drifted flag) and #71 (A2A Handshake v2) for the underlying extractor-drift fix path. |
| **Hermes-to-Hermes** (profile ↔ profile) | Both sides write `/transcripts/<peer>.jsonl` | Transcript collision risk: A2A-originated turns vs locally-originated turns must be namespaced cleanly. See #120 for the A2A bind-race + cross-profile work this requires. |

## Cross-cutting risks during multi-executor testing

- **codex-cli version drift** — pin a version, document why
- **Profile misroute in spawners** — peer using wrong profile → response goes nowhere silently
- **300s turn cap** — all receivers inherit this; non-oc models may need faster models or shorter task breakdown (already validated: `model='zai-coding-plan/glm-5-turbo'` completes a 9-step dispatch in ~140s vs `glm-5.1 xhigh` timing out)
- **Hermes-to-Hermes transcript collision** — verify namespacing handles A2A-originated vs locally-originated turns cleanly

## What is *already shipped* (do NOT re-flag these as gaps)

- `workdir` auto-injection in `role_text_for()` ✓ v0.8.15
- `client.send_message_and_wait()` helper at `client.py:186` ✓ v0.8.15
- dup-dispatch guard returning JSON-RPC `-32001` ✓ v0.8.15
- One-line `deploy_oc_receiver` ✓
- Per-mode port bands (9300 / 9310 / 9320 / 9330) ✓
- `A2A_OC_TOKEN_<peer>` auth env-var chain ✓
- Transcript persistence ✓
- Multi-repo port allocation via `allocate_band_port` ✓

## Acceptance criteria for closing this issue

- All four Phase 2 enablers either implemented or explicitly moved to a follow-up issue with a clear blocker
- Live `fleet_send` round-trips validated for **all four** managed modes against a real coding task (not just PONG) — see "Real-work validation" criterion in the skill's "v0.8.5 — Live test matrix" section
- At least one cross-profile Hermes↔Hermes exchange closed (depends on #120)
- Cross-cutting risks documented as either "tested" or filed as discrete follow-up issues
- Skill `a2a-fleet-deploy` updated to v1.0+ with all rows in the test matrix moved to "shipped" or to follow-up issues

## Related issues

- #120 — Enable Hermes↔Hermes A2A peering (response_handler: agent across profiles) + fix A2A bind-race
- #109 — Proposal: atomic last_stdout persistence + prefix_drifted flag
- #71 — [RFC] A2A Bidirectional Session Handshake: Orchestrator-Worker Protocol v2
- #108 — (referenced by #109) agy receiver extractor drift
- #97 — codex CLI drift
- #98 — HERMES_HOME inheritance

## Local references

- Skill: `a2a-fleet-deploy` v0.9.0 (updated 2026-06-23 with the live future-to-do section)
- Worked example for v0.8.15 single-turn dispatch: `references/v0.8.15-coding-task-retest.md` (PR #281)
- Codex output parsing pitfall: `references/codex-receiver-error-parsing-pitfall.md`
- Failure-mode decision tree: `references/fleet-send-failure-decision-tree.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix) #146

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)

Context

Phase 2 enablers (priority order)

Multi-executor test matrix (upcoming, in priority order)

Cross-cutting risks during multi-executor testing

What is already shipped (do NOT re-flag these as gaps)

Acceptance criteria for closing this issue

Related issues

Local references

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Executor	Why test	Known risks to watch
Claude Code (`cc_receiver`)	Different harness, different JSON-RPC quirks	First-encounter protocol gaps likely. Closely related: see #120 for cross-profile peering work.
Codex CLI (`codex_receiver`)	codex-cli version drift already seen in v0.8.5	Parse failures on output schema; need a pinned version + documented rationale. Prior failure mode captured in `references/codex-receiver-error-parsing-pitfall.md`.
Antigravity (`agy_receiver`)	Newer, less battle-tested	Expect different failure modes; baseline not yet established. See #109 (atomic `last_stdout` + prefix_drifted flag) and #71 (A2A Handshake v2) for the underlying extractor-drift fix path.
Hermes-to-Hermes (profile ↔ profile)	Both sides write `/transcripts/<peer>.jsonl`	Transcript collision risk: A2A-originated turns vs locally-originated turns must be namespaced cleanly. See #120 for the A2A bind-race + cross-profile work this requires.

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix) #146

Description

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)

Context

Phase 2 enablers (priority order)

Multi-executor test matrix (upcoming, in priority order)

Cross-cutting risks during multi-executor testing

What is already shipped (do NOT re-flag these as gaps)

Acceptance criteria for closing this issue

Related issues

Local references

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions