Skip to content

Streaming reasoning event translation (SSE) #19

@HXYerror

Description

@HXYerror

Part of #1. Depends on #3.

Goal

The Responses API streams a richer event vocabulary than chat/completions. We need a faithful translator on both consumer surfaces:

  • /v1/responses clients should see canonical OpenAI Responses SSE events
  • /v1/messages clients should see Anthropic SSE with thinking_delta interleaved correctly

Event vocabulary to handle

From upstream (forward as-is on /v1/responses, translate on /v1/messages):

Event Forwarded? Maps to (Anthropic)
response.created initial message_start
response.in_progress drop / ping
response.output_item.added (type=reasoning) content_block_start { type:'thinking' }
response.reasoning.delta thinking_delta
response.reasoning_summary_text.delta additional thinking_delta (configurable channel)
response.reasoning.done signature_delta (carry encrypted_content) + content_block_stop
response.output_item.added (type=message) content_block_start { type:'text' }
response.output_text.delta text_delta
response.output_text.done content_block_stop
response.output_item.added (type=function_call) content_block_start { type:'tool_use' }
response.function_call_arguments.delta input_json_delta
response.output_item.done content_block_stop
response.completed message_delta (usage) + message_stop
response.failed error envelope

Tasks

  • Implement a state machine in src/routes/responses/stream.ts for the OpenAI-shape passthrough (mostly forward + minimal normalization)
  • Implement Responses → Anthropic translator in src/routes/messages/responses-stream-translation.ts (created by Reasoning types & reasoning_effort passthrough in chat-completions #7) that carefully sequences content_block_start/delta/stop per item index
  • Handle out-of-order item indexes (Responses uses output_index and content_index; Anthropic uses a single index per content block — must allocate Anthropic indexes in arrival order)
  • Buffer partial JSON for tool args until function_call_arguments.delta accumulates, or stream input_json_delta chunks faithfully
  • Always emit a final message_stop even on stream errors, so Claude Code clients don't hang

Acceptance criteria

  • Manual test: stream a gpt-5.3-codex agent loop with tool use through /v1/messages, confirm Claude Code renders thinking, tool calls, and text in correct order
  • Event ordering tests added to tests/ covering reasoning-then-text, text-then-tool, tool-then-text-then-reasoning sequences

File pointers

Metadata

Metadata

Assignees

No one assigned

    Labels

    reasoningReasoning / thinking / encrypted_contentresponses-apiOpenAI /v1/responses API support

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions