Skip to content

Matrix Coder workflows (ralph/autopilot/ultrawork/ultraqa) are persona text only — no programmatic loop, iteration cap, or cross-turn state #129

Description

@Interstellar-code

Epic: #76
Affected phases: #115 (Phase 3 — Workflow/loop skills)

What was claimed

Phase 3 (#115) stated:

A user can run an end-to-end autonomous coding loop (e.g. autopilot or Ralph) that self-corrects via verify and is observable on Switch UI.

The Phase 3 scope listed concrete loop semantics:

  • Ralph — repeat `executor → verify` until verification passes, bounded, with stop criteria
  • autopilot — full chain `plan → executor → test → review → verify` end-to-end
  • ultrawork — parallel high-throughput fan-out of roles across files/topics via `delegate_task`
  • ultraqa — `test → verify → fix` cycle until suite is green or 5-cycle cap

What actually exists

All four workflow `.md` persona files are present and well-written:

  • `plugins/matrix_coder/personas/workflows/ralph.md`
  • `plugins/matrix_coder/personas/workflows/autopilot.md`
  • `plugins/matrix_coder/personas/workflows/ultrawork.md`
  • `plugins/matrix_coder/personas/workflows/ultraqa.md`

When a user sends `matrix ralph: fix auth tests`, `core/harness.py` loads `ralph.md` via `registry.load_persona()`, composes it with the base contracts, and injects the resulting text into the current turn via the `pre_llm_call` hook. The agent then reads procedural instructions and may attempt to self-direct.

Nothing more happens programmatically. Specifically:

  • No Python code in `core/` implements any loop, gate-progression, or iteration
  • `delegate_task` appears only inside persona text as an instruction to the agent — no code calls it
  • There is no iteration counter, no 5-cycle cap, no stop-criteria check
  • There is no cross-turn state: if the agent produces a response and the next user message is unrelated, the workflow context is gone
  • The `_inject_persona` hook clears after every non-trigger turn — a multi-turn loop would need to re-trigger or persist state
  • The only test for workflows is `test_real_loader_workflow_composes`, which asserts `"Ralph" in out` (persona text was injected), not that any loop structure ran

The existing code is correct for single-turn persona injection. It is not an implementation of the loop/orchestration semantics Phase 3 described.

What "done" looks like

A complete Phase 3 implementation requires at minimum:

  1. Loop driver in `harness.py` — a `run_workflow(workflow, goal, session_id)` function that:

    • Dispatches the appropriate sequence of specialist roles in order (or in parallel for ultrawork)
    • Tracks iteration count and enforces the configured cap (default 5)
    • Checks a stop condition (e.g. `verify` returns pass) between iterations
    • Returns a `SpecialistResult` summarising the final outcome
  2. Cross-turn state — workflow sessions need to survive across turns. Options: store active workflow state in `HermesBridge` (keyed by session_id), or require the full loop to complete within a single invocation via `delegate_task`.

  3. `delegate_task` integration — `ultrawork` fan-out requires actual calls to `delegate_task` from Python, not just instructing the agent to do so in persona text.

  4. Kanban child cards per iteration — Phase 2 (Matrix Coder Phase 2 — Observability: Kanban audit mirror #114) specified child cards per specialist dispatch; currently only a single parent card is created per invocation. Loop iterations should open child cards.

  5. Tests — test that the loop iterates, that the cap fires, that a passing `verify` step halts the loop, and that a blocked gate surfaces correctly.

Impact

Users invoking `matrix ralph`, `matrix autopilot`, `matrix ultrawork`, or `matrix ultraqa` get the workflow persona injected for that turn and the agent may attempt to self-direct — but there is no guarantee of gate-by-gate progression, no iteration cap, no automatic stop on success, and no multi-turn coordination. The behaviour depends entirely on the model following the persona instructions in a single context window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpluginPlugin-related

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions