[Epic] Support OpenAI Responses API for GPT-5 / Codex reasoning models

## Background

GitHub Copilot's upstream now exposes both `/chat/completions` and `/responses`. Codex-family models (`gpt-5-codex`, `gpt-5.1-codex`, `gpt-5.1-codex-max`, `gpt-5.3-codex`) and o-pro models (`o1-pro`, `o3-pro`) are **`/responses`-only** on the upstream — they cannot be reached via `/chat/completions` at all. Other reasoning models (`gpt-5`, `gpt-5.1`, `gpt-5.2`, `o1`, `o3`, `o4-mini`) work on `/chat/completions` but in a degraded mode: there is no way to round-trip `reasoning.encrypted_content`, so multi-turn reasoning quality and cost both regress.

copilot-api 0.7.0 is a strict Chat Completions–only proxy. A full audit confirms:

- No `/v1/responses` Hono route (`src/server.ts:17–31`)
- No upstream `${copilotBaseUrl}/responses` call site (`src/services/copilot/`)
- Zero matches for `reasoning`, `reasoning_content`, `reasoning_effort`, `encrypted_content`, `output_text`, `response_id` across the entire repo
- Anthropic adapter explicitly suppresses `thinking` block output (`src/routes/messages/non-stream-translation.ts:305`)

## Impact on users

| Use case | Today |
|---|---|
| `gpt-5-codex` / `5.1-codex` / `5.3-codex` / `-codex-max` | ❌ unreachable |
| `o1-pro`, `o3-pro` | ❌ unreachable |
| `gpt-5` / o-series multi-turn reasoning | ⚠️ encrypted_content lost, re-thinks every turn |
| Codex CLI (`local_shell`, `apply_patch`, `previous_response_id`) | ❌ impossible |
| Claude Code → Copilot reasoning models with thinking visibility | ❌ adapter drops it |

## Sub-issues (suggested implementation order)

- [ ] #2 — Add `/v1/responses` route scaffolding
- [ ] #3 — Add upstream Responses API service client
- [ ] #14 — Model-to-endpoint routing (chat vs responses)
- [ ] #15 — Preserve `encrypted_content` for multi-turn reasoning
- [ ] #16 — Reasoning types & `reasoning_effort` passthrough in chat-completions
- [ ] #17 — Anthropic `/v1/messages` → Responses API adapter (Claude Code on Codex)
- [ ] #18 — Emit Anthropic `thinking` blocks from reasoning channel
- [ ] #19 — Streaming reasoning event translation (SSE)
- [ ] #20 — `Copilot-Vision-Request` header for image inputs in Responses path
- [ ] #21 — `previous_response_id` stateful chain support
- [ ] #22 — Tests for Responses path & reasoning fidelity

## Dependency graph

```
       #2 (route) ──┐
                    ├─► #14 (routing) ─► #17 (anthropic→responses) ─► #18 (thinking) ─► #22 (tests)
       #3 (client) ─┤                  │                              │
                    │                  └─► #19 (streaming) ───────────┘
                    ├─► #15 (encrypted_content)
                    ├─► #20 (vision header)
                    └─► #21 (previous_response_id)

  #16 (chat-completions reasoning_effort) — independent, can land anytime
```

## References

- OpenAI Responses API: https://platform.openai.com/docs/api-reference/responses
- OpenAI Reasoning guide: https://platform.openai.com/docs/guides/reasoning
- litellm reference impl: `litellm/llms/github_copilot/responses/transformation.py` (`GithubCopilotResponsesAPIConfig(OpenAIResponsesAPIConfig)`)
- litellm Anthropic→Responses adapter: `litellm/llms/anthropic/experimental_pass_through/responses_adapters/transformation.py`
- Relevant litellm PRs: [#17130](https://github.com/BerriAI/litellm/pull/17130), [#19650](https://github.com/BerriAI/litellm/pull/19650), [#22370](https://github.com/BerriAI/litellm/pull/22370), [#22448](https://github.com/BerriAI/litellm/pull/22448), [#25278](https://github.com/BerriAI/litellm/pull/25278)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Support OpenAI Responses API for GPT-5 / Codex reasoning models #1

Background

Impact on users

Sub-issues (suggested implementation order)

Dependency graph

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use case	Today
`gpt-5-codex` / `5.1-codex` / `5.3-codex` / `-codex-max`	❌ unreachable
`o1-pro`, `o3-pro`	❌ unreachable
`gpt-5` / o-series multi-turn reasoning	⚠️ encrypted_content lost, re-thinks every turn
Codex CLI (`local_shell`, `apply_patch`, `previous_response_id`)	❌ impossible
Claude Code → Copilot reasoning models with thinking visibility	❌ adapter drops it

[Epic] Support OpenAI Responses API for GPT-5 / Codex reasoning models #1

Description

Background

Impact on users

Sub-issues (suggested implementation order)

Dependency graph

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions