Skip to content

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix) #146

Description

@Interstellar-code

A2A Fleet — multi-executor hardening roadmap (Phase 2 enablers + bulletproofing test matrix)

Context

After Phase 1 retest on a2a_fleet v0.8.15 (commit 2d6c877ca, 2026-06-23), the plugin is in good shape for oc_receiver end-to-end coding work — drove issue NousResearch#201 and #122 to merged PRs in Interstellar-code/hermes-switchui (PR NousResearch#276, NousResearch#281) in a single turn each using client.send_message_and_wait().

The Phase 1 retest surfaced 3 real ergonomics gaps that v0.8.15 closed (workdir auto-injection in role_text_for(), send_message_and_wait helper, dup-dispatch guard returning JSON-RPC -32001).

This issue tracks the remaining structured gaps plus an intentional multi-executor test matrix for the next phase of hardening. Goal: make the plugin bulletproof regardless of which receiver is on the other end.

Phase 2 enablers (priority order)

  1. Structured task payload for fleet_sendblocker for coding-loop integration. Current briefs are free text. The coding-loop needs {context, expected_output, reply_to, deadline_s} to pass scope/acceptance/reply-format machine-parseable. Right now OpenCode re-derives what the orchestrator already knows. Explicitly deferred in v0.8.15 commit message.

  2. response_handler profile routing — currently response_handler is a global config value. Different peers (claude, codex, opencode, agy, hermes-to-hermes) want different handlers per dispatch.

  3. Per-peer cost / turn budget — track tokens + turns per peer to prevent runaway. Explicitly deferred in v0.8.15 commit message.

  4. Coding-loop executor: a2a_opencode mode — coding-loop calls fleet_send / send_message_and_wait instead of running the agent locally. Same intake guard, same worktree convention, same PR criteria. Needs feat(workflow-engine): plugin contract refactor — phases 1-6 #1 first.

Multi-executor test matrix (upcoming, in priority order)

Current oc_receiver was tested happy-path only. The other three managed modes will likely expose protocol-level gaps the opencode test didn't trigger.

Executor Why test Known risks to watch
Claude Code (cc_receiver) Different harness, different JSON-RPC quirks First-encounter protocol gaps likely. Closely related: see #120 for cross-profile peering work.
Codex CLI (codex_receiver) codex-cli version drift already seen in v0.8.5 Parse failures on output schema; need a pinned version + documented rationale. Prior failure mode captured in references/codex-receiver-error-parsing-pitfall.md.
Antigravity (agy_receiver) Newer, less battle-tested Expect different failure modes; baseline not yet established. See #109 (atomic last_stdout + prefix_drifted flag) and #71 (A2A Handshake v2) for the underlying extractor-drift fix path.
Hermes-to-Hermes (profile ↔ profile) Both sides write /transcripts/<peer>.jsonl Transcript collision risk: A2A-originated turns vs locally-originated turns must be namespaced cleanly. See #120 for the A2A bind-race + cross-profile work this requires.

Cross-cutting risks during multi-executor testing

  • codex-cli version drift — pin a version, document why
  • Profile misroute in spawners — peer using wrong profile → response goes nowhere silently
  • 300s turn cap — all receivers inherit this; non-oc models may need faster models or shorter task breakdown (already validated: model='zai-coding-plan/glm-5-turbo' completes a 9-step dispatch in ~140s vs glm-5.1 xhigh timing out)
  • Hermes-to-Hermes transcript collision — verify namespacing handles A2A-originated vs locally-originated turns cleanly

What is already shipped (do NOT re-flag these as gaps)

  • workdir auto-injection in role_text_for() ✓ v0.8.15
  • client.send_message_and_wait() helper at client.py:186 ✓ v0.8.15
  • dup-dispatch guard returning JSON-RPC -32001 ✓ v0.8.15
  • One-line deploy_oc_receiver
  • Per-mode port bands (9300 / 9310 / 9320 / 9330) ✓
  • A2A_OC_TOKEN_<peer> auth env-var chain ✓
  • Transcript persistence ✓
  • Multi-repo port allocation via allocate_band_port

Acceptance criteria for closing this issue

  • All four Phase 2 enablers either implemented or explicitly moved to a follow-up issue with a clear blocker
  • Live fleet_send round-trips validated for all four managed modes against a real coding task (not just PONG) — see "Real-work validation" criterion in the skill's "v0.8.5 — Live test matrix" section
  • At least one cross-profile Hermes↔Hermes exchange closed (depends on Enable Hermes↔Hermes A2A peering (response_handler: agent across profiles) + fix A2A bind-race #120)
  • Cross-cutting risks documented as either "tested" or filed as discrete follow-up issues
  • Skill a2a-fleet-deploy updated to v1.0+ with all rows in the test matrix moved to "shipped" or to follow-up issues

Related issues

Local references

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions