Skip to content

Model-to-endpoint routing (chat vs responses) #5

@HXYerror

Description

@HXYerror

Part of #1.

Goal

Decide per-model whether to dispatch to upstream `/chat/completions` or `/responses`. Codex-family models on Copilot upstream are `/responses`-only — calling `/chat/completions` against `gpt-5.3-codex` or `gpt-5.1-codex-max` will fail. Conversely, `gpt-4o`, `claude-`, `gemini-` only work on `/chat/completions`.

Current state

`src/routes/models/route.ts:16–24` re-emits every upstream model verbatim with no metadata. There is no `mode` field tracked anywhere in the codebase.

Tasks

  • Add a model-mode classifier. Three reasonable options (pick one):
    1. Static map in `src/lib/model-modes.ts` keyed by model id prefix (`gpt-5.3-codex`, `gpt-5.1-codex-max`, `gpt-5.1-codex`, `gpt-5-codex`, `o1-pro`, `o3-pro` → `responses`; everything else → `chat`)
    2. Heuristic based on substring match on `codex` and suffix `-pro` for o-series
    3. Capability-based by reading `capabilities.supports.tools` / `capabilities.type` from the upstream `/models` response
  • Wire the classifier into:
    • `/v1/chat/completions`: if model is `responses`-only, return a structured 400 with a helpful error pointing to `/v1/responses`
    • `/v1/responses`: if model is `chat`-only, internally translate to chat-completions and bridge the response back (or return 400, depending on policy — recommend `Translate when feasible, else 400`)
    • `/v1/messages` (Anthropic): use the classifier to decide whether to dispatch via chat-completions or Responses (this is needed by the Anthropic→Responses adapter, see separate issue)
  • Surface `mode` in `/v1/models` response so clients/dashboards can introspect

Acceptance criteria

  • `POST /v1/chat/completions` with `model: "gpt-5.3-codex"` returns 400 with a message naming `/v1/responses`
  • `POST /v1/responses` with `model: "gpt-4o"` either translates to chat-completions or returns 400
  • `GET /v1/models` shows a `mode` field per model

Reference

litellm: `model_prices_and_context_window.json` keys `github_copilot/gpt-5.3-codex` (`mode: "responses"`), `github_copilot/gpt-5.1-codex-max` (`mode: "responses"`).
litellm gate: `ProviderConfigManager.github_copilot_supports_responses_api(model)` (PR #19650)

Metadata

Metadata

Assignees

No one assigned

    Labels

    responses-apiOpenAI /v1/responses API support

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions