Model-to-endpoint routing (chat vs responses)

Part of #1.

## Goal

Decide per-model whether to dispatch to upstream \`/chat/completions\` or \`/responses\`. Codex-family models on Copilot upstream are **\`/responses\`-only** — calling \`/chat/completions\` against \`gpt-5.3-codex\` or \`gpt-5.1-codex-max\` will fail. Conversely, \`gpt-4o\`, \`claude-*\`, \`gemini-*\` only work on \`/chat/completions\`.

## Current state

\`src/routes/models/route.ts:16–24\` re-emits every upstream model verbatim with no metadata. There is no \`mode\` field tracked anywhere in the codebase.

## Tasks

- [ ] Add a model-mode classifier. Three reasonable options (pick one):
  1. **Static map** in \`src/lib/model-modes.ts\` keyed by model id prefix (\`gpt-5.3-codex\`, \`gpt-5.1-codex-max\`, \`gpt-5.1-codex\`, \`gpt-5-codex\`, \`o1-pro\`, \`o3-pro\` → \`responses\`; everything else → \`chat\`)
  2. **Heuristic** based on substring match on \`codex\` and suffix \`-pro\` for o-series
  3. **Capability-based** by reading \`capabilities.supports.tools\` / \`capabilities.type\` from the upstream \`/models\` response
- [ ] Wire the classifier into:
  - \`/v1/chat/completions\`: if model is \`responses\`-only, return a structured 400 with a helpful error pointing to \`/v1/responses\`
  - \`/v1/responses\`: if model is \`chat\`-only, internally translate to chat-completions and bridge the response back (or return 400, depending on policy — recommend \`Translate when feasible, else 400\`)
  - \`/v1/messages\` (Anthropic): use the classifier to decide whether to dispatch via chat-completions or Responses (this is needed by the Anthropic→Responses adapter, see separate issue)
- [ ] Surface \`mode\` in \`/v1/models\` response so clients/dashboards can introspect

## Acceptance criteria

- \`POST /v1/chat/completions\` with \`model: \"gpt-5.3-codex\"\` returns 400 with a message naming \`/v1/responses\`
- \`POST /v1/responses\` with \`model: \"gpt-4o\"\` either translates to chat-completions or returns 400
- \`GET /v1/models\` shows a \`mode\` field per model

## Reference

litellm: \`model_prices_and_context_window.json\` keys \`github_copilot/gpt-5.3-codex\` (\`mode: \"responses\"\`), \`github_copilot/gpt-5.1-codex-max\` (\`mode: \"responses\"\`).
litellm gate: \`ProviderConfigManager.github_copilot_supports_responses_api(model)\` (PR [#19650](https://github.com/BerriAI/litellm/pull/19650))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model-to-endpoint routing (chat vs responses) #5

Goal

Current state

Tasks

Acceptance criteria

Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model-to-endpoint routing (chat vs responses) #5

Description

Goal

Current state

Tasks

Acceptance criteria

Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions