Skip to content

Gateway: startClaudeAgent() reports ok:true/"starting" without confirming health passes #201

@Interstellar-code

Description

@Interstellar-code

Problem

startClaudeAgent() in src/server/claude-agent.ts (~lines 254-264) returns { ok: true, message: 'starting' } immediately after spawning the gateway process, without waiting for the health probe to pass. If the gateway then fails to come up healthy, the caller/UI sees ok:true and can appear stuck or static with no error surfaced.

Background

This is the residual hardening noted when closing #198 / #199 / #200. Those issues shared a root cause — the hermes vs claude binary-name mismatch + a hardcoded port 8642 — which was fixed in v2.3.30 (commit 74bb80fc). In the normal case the gateway now starts, so this optimistic-success path is not hit. But the false-positive-success behavior itself was not changed, so a genuine startup failure still presents as a silent stuck state rather than an actionable error.

Proposed fix

After spawning, poll isClaudeAgentHealthy(resolveGatewayUrl()) (both helpers already exist as of v2.3.30) with a bounded timeout:

  • resolves healthy within timeout → return { ok: true, message: 'started' }
  • never healthy → return { ok: false, error: '<actionable message: gateway spawned but did not become healthy on <url> within <N>s>' }

so the UI can show a real error / retry affordance instead of hanging on an optimistic success.

Acceptance

  • A gateway that spawns but never serves /health yields ok:false with a clear error, not ok:true.
  • The happy path (gateway becomes healthy) is unchanged.
  • Covered by a unit test (spawn stub + health probe that never passes → ok:false within timeout).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions