- Demo is deployed and accessible at
/demos/subagentson the dashboard host - Agent backend is healthy (
/api/health);ANTHROPIC_API_KEYis set on Railway; the FastAPI backend exposesPOST /subagents
- Navigate to
/demos/subagents; verify the page renders within 3s with a left-side delegation log and a right-sideCopilotChatpane - Verify
data-testid="delegation-log"is visible with heading "Sub-agent delegations" - Verify
data-testid="delegation-count"reads "0 calls" on first load - Verify the empty-state placeholder reads "Ask the supervisor to complete a task. Every sub-agent it calls will appear here."
- Verify the chat input placeholder is "Give the supervisor a task..."
- Verify all 3 suggestion pills are visible with verbatim titles: "Write a blog post", "Explain a topic", "Summarize a topic"
- Click the "Write a blog post" suggestion (cold-exposure training prompt)
- Within 5s verify
data-testid="supervisor-running"appears with the "Supervisor running" pulse indicator - Within 30s verify at least one
data-testid="delegation-entry"appears with badge "Research" anddata-testid="delegation-status"initially readingrunning - Verify the entry's status flips to
completedonce the sub-agent returns and theresulttext is visible inside the entry's white inner panel - Within 60s total verify additional delegation entries appear in order: Research -> Writing -> Critique (3 entries total in most cases)
- Verify
data-testid="delegation-count"updates to match the number of entries (e.g. "3 calls") - Verify the supervisor's final chat reply includes a brief summary of the produced deliverable
- Each
data-testid="delegation-entry"shows: a#Nindex, a sub-agent badge with the correct emoji (🔎 Research / ✍️ Writing / 🧐 Critique), a status chip, the task text after "Task:", and the sub-agent's result rendered with whitespace preserved - Hover the supervisor running chip while a delegation is in flight — verify the pulse animation is present (no static-only state)
- Click "Explain a topic" (LLM tool calling prompt) and wait for completion
- Verify the writing entry's
resultis a single polished paragraph (the writing sub-agent's signature) - Verify the research entry's
resultis a bulleted list of 3-5 facts (the research sub-agent's signature) - Verify the critique entry's
resultcontains 2-3 actionable critiques
- After the first run completes, click "Summarize a topic" (reusable rockets)
- Verify NEW delegation entries are appended to the existing list (count keeps growing) — confirms
state["delegations"]accumulates across turns within the same thread
- Send an empty message; verify it is a no-op
- If a sub-agent call fails (e.g. due to upstream rate limit), verify the failing entry is rendered with status
failedand a result line starting with "sub-agent call failed:" — confirms the fail-loud path - Verify DevTools -> Console shows no uncaught errors during any flow above
- Page loads within 3 seconds
- First delegation entry appears within 30 seconds of submitting a non-trivial task
- Each delegation entry transitions from
running->completed(orfailed) and the count badge stays in sync withstate["delegations"].length - Supervisor's final chat reply summarises the work and arrives within 90 seconds of submission
- No UI layout breaks, no uncaught console errors