Problem
startClaudeAgent() in src/server/claude-agent.ts (~lines 254-264) returns { ok: true, message: 'starting' } immediately after spawning the gateway process, without waiting for the health probe to pass. If the gateway then fails to come up healthy, the caller/UI sees ok:true and can appear stuck or static with no error surfaced.
Background
This is the residual hardening noted when closing #198 / #199 / #200. Those issues shared a root cause — the hermes vs claude binary-name mismatch + a hardcoded port 8642 — which was fixed in v2.3.30 (commit 74bb80fc). In the normal case the gateway now starts, so this optimistic-success path is not hit. But the false-positive-success behavior itself was not changed, so a genuine startup failure still presents as a silent stuck state rather than an actionable error.
Proposed fix
After spawning, poll isClaudeAgentHealthy(resolveGatewayUrl()) (both helpers already exist as of v2.3.30) with a bounded timeout:
- resolves healthy within timeout → return
{ ok: true, message: 'started' }
- never healthy → return
{ ok: false, error: '<actionable message: gateway spawned but did not become healthy on <url> within <N>s>' }
so the UI can show a real error / retry affordance instead of hanging on an optimistic success.
Acceptance
- A gateway that spawns but never serves
/health yields ok:false with a clear error, not ok:true.
- The happy path (gateway becomes healthy) is unchanged.
- Covered by a unit test (spawn stub + health probe that never passes → ok:false within timeout).
Problem
startClaudeAgent()insrc/server/claude-agent.ts(~lines 254-264) returns{ ok: true, message: 'starting' }immediately after spawning the gateway process, without waiting for the health probe to pass. If the gateway then fails to come up healthy, the caller/UI seesok:trueand can appear stuck or static with no error surfaced.Background
This is the residual hardening noted when closing #198 / #199 / #200. Those issues shared a root cause — the
hermesvsclaudebinary-name mismatch + a hardcoded port 8642 — which was fixed in v2.3.30 (commit74bb80fc). In the normal case the gateway now starts, so this optimistic-success path is not hit. But the false-positive-success behavior itself was not changed, so a genuine startup failure still presents as a silent stuck state rather than an actionable error.Proposed fix
After spawning, poll
isClaudeAgentHealthy(resolveGatewayUrl())(both helpers already exist as of v2.3.30) with a bounded timeout:{ ok: true, message: 'started' }{ ok: false, error: '<actionable message: gateway spawned but did not become healthy on <url> within <N>s>' }so the UI can show a real error / retry affordance instead of hanging on an optimistic success.
Acceptance
/healthyieldsok:falsewith a clear error, notok:true.