Copilot API

A reverse-engineered proxy that turns the GitHub Copilot API into fully compatible OpenAI and Anthropic endpoints — letting you use Copilot with any tool that speaks either protocol, including Claude Code.

Features

Triple API Compatibility — OpenAI Chat Completions, OpenAI Responses API, and Anthropic Messages API, all backed by GitHub Copilot
Claude Code Integration — Interactive model selector (--claude-code) copies a ready-to-paste launch command; full support for thinking blocks, typed tools, token counting, and auto-compaction
Automatic Endpoint Routing — Models that only support /responses (e.g. gpt-5.4-mini) are transparently routed through the Responses API with bidirectional translation
Web Search — Two-pass search via Tavily (free) or Brave Search — the proxy intercepts search tool calls, fetches live results, and injects them for the model
Smart Context Management — Auto-switches to the largest-context model when token count exceeds the requested model's window; image stripping cascade on 413 errors to trigger compaction
Rate Limiting — Interval-based and sliding-window burst limiting with configurable wait-or-reject behavior
Usage Dashboard — Web UI showing Copilot quota, premium interactions, and detailed usage stats
Manual Approval Mode — Interactively approve/deny each request (--manual)
Docker & npx — Run anywhere: from source, via npx copilot-api@latest, or as a Docker container
Proxy Support — HTTP/HTTPS proxy via environment variables with per-URL routing

Demo

copilot-api-demo.mp4

Architecture

High-Level Request Flow

┌─────────────────────────────────────────────────────────────────────┐
│                          Clients                                    │
│  Claude Code · Cursor · OpenAI SDK · Anthropic SDK · Any HTTP       │
└──────────┬──────────────────┬──────────────────┬────────────────────┘
           │                  │                  │
           ▼                  ▼                  ▼
┌──────────────────┐ ┌───────────────┐ ┌─────────────────────┐
│ POST /v1/messages│ │ POST /v1/chat │ │ POST /v1/responses  │
│ (Anthropic API)  │ │ /completions  │ │ (Responses API)     │
└────────┬─────────┘ │ (OpenAI API)  │ └──────────┬──────────┘
         │           └───────┬───────┘            │
         ▼                   │                    ▼
┌──────────────────┐         │         ┌──────────────────────┐
│ Anthropic→OpenAI │         │         │ Responses↔CC         │
│ Translation      │         │         │ Translation          │
│ (bidirectional)  │         │         │ (bidirectional)      │
└────────┬─────────┘         │         └──────────┬───────────┘
         │                   │                    │
         ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Middleware Pipeline                           │
│  Rate Limiter → Burst Limiter → Manual Approval → Token Counter     │
│  → Model Selector → Web Search Interceptor → Image Validator        │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     Copilot Service Layer                            │
│  ┌────────────────────────┐  ┌────────────────────────────────┐     │
│  │ POST /chat/completions │  │ POST /responses                │     │
│  │ (default endpoint)     │  │ (gpt-5.x, o-series models)    │     │
│  └────────────┬───────────┘  └────────────────┬───────────────┘     │
│               └────────────────┬──────────────┘                     │
│                                ▼                                    │
│              api.githubcopilot.com                                  │
│              api.business.githubcopilot.com                         │
│              api.enterprise.githubcopilot.com                       │
└─────────────────────────────────────────────────────────────────────┘

Translation Layers

The proxy maintains three API protocol translators that convert between formats in real time, for both streaming and non-streaming responses:

┌─────────────────────────────────────────────────────────┐
│               Anthropic Messages API                     │
│  ┌─────────────────────────────────────────────────┐    │
│  │ Request: Anthropic → OpenAI                     │    │
│  │  • System blocks → system message               │    │
│  │  • Content blocks (text, image, doc, tool_result)│    │
│  │  • Thinking blocks → reasoning_content          │    │
│  │  • Typed tools (bash, text_editor, web_search)  │    │
│  │  • Tool choice (auto/any/tool/none)             │    │
│  │  • Model name normalization                     │    │
│  │  • Tool result compression (>20K chars)         │    │
│  │  • Image validation & stripping cascade         │    │
│  ├─────────────────────────────────────────────────┤    │
│  │ Response: OpenAI → Anthropic                    │    │
│  │  • SSE: message_start → content_block_start →   │    │
│  │    content_block_delta → content_block_stop →    │    │
│  │    message_delta → message_stop                  │    │
│  │  • reasoning_content → thinking blocks          │    │
│  │  • Tool calls → tool_use content blocks         │    │
│  │  • Truncated tool call detection                │    │
│  │  • Deferred finish_reason (waits for usage)     │    │
│  │  • 10s keepalive pings, 90s stall timeout       │    │
│  └─────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────┤
│               Responses API ↔ Chat Completions           │
│  ┌─────────────────────────────────────────────────┐    │
│  │ • Auto-routes models by supported_endpoints     │    │
│  │ • Claude models → Chat Completions translation  │    │
│  │ • gpt-5/o-series → Responses API translation   │    │
│  │ • Streaming event translation both directions   │    │
│  │ • JSON repair for truncated tool arguments      │    │
│  │ • Reasoning/thinking delta handling             │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

Authentication Flow

┌──────────┐    Device Code     ┌──────────┐    Poll Token     ┌──────────┐
│  Client   │ ───────────────► │  GitHub   │ ───────────────► │  GitHub  │
│  (CLI)    │ ◄─────────────── │  OAuth    │ ◄─────────────── │  Token   │
│           │  user_code +     │  Device   │   access_token   │  (PAT)   │
│           │  verification_url│  Flow     │                  │          │
└──────────┘                   └──────────┘                   └────┬─────┘
                                                                   │
                                     Stored at                     │
                          ~/.local/share/copilot-api/              │
                                  github_token                     │
                                                                   │
                                                                   ▼
┌──────────┐   Auto-refresh    ┌──────────────────────────────────────────┐
│  Copilot  │ ◄──────────────  │  GET /copilot_internal/v2/token         │
│  JWT      │  (refresh_in     │  Authorization: token <github_token>    │
│  Token    │   - 60 seconds)  │  → returns JWT with expiry              │
└──────────┘                   └──────────────────────────────────────────┘

Web Search Architecture

┌──────────────────────────────────────────────────────────────────┐
│                    Two-Pass Web Search Flow                       │
│                                                                  │
│  Client Request (with web_search tool)                           │
│        │                                                         │
│        ▼                                                         │
│  ┌─────────────┐   Is it a web search?   ┌────────────────────┐ │
│  │  Interceptor │ ─────────────────────► │ Pass 1: Non-stream │ │
│  │  Detection   │   typed tool or         │ call to Copilot    │ │
│  │              │   recognized name       │ (asks what to      │ │
│  └──────────────┘                         │  search)           │ │
│                                           └────────┬───────────┘ │
│                                                    │             │
│                                                    ▼             │
│                                 ┌────────────────────────────┐   │
│                                 │  Execute Search             │   │
│                                 │  Tavily (preferred)         │   │
│                                 │   or Brave Search           │   │
│                                 │  5s timeout, max 5 results  │   │
│                                 └────────────┬───────────────┘   │
│                                              │                   │
│                                              ▼                   │
│                                 ┌────────────────────────────┐   │
│                                 │  Pass 2: Full call          │   │
│                                 │  Injects search results     │   │
│                                 │  tool_choice: "none"        │   │
│                                 │  (original stream mode)     │   │
│                                 └────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────┘

Project Structure

src/
├── main.ts                          # CLI entry point (citty subcommands)
├── start.ts                         # Server startup, auth, caching
├── auth.ts                          # Standalone OAuth device flow
├── server.ts                        # Hono app, route registration, middleware
│
├── routes/
│   ├── completions/                 # POST /v1/chat/completions
│   │   └── handler.ts
│   ├── responses/                   # POST /v1/responses (Responses API)
│   │   └── handler.ts
│   ├── messages/                    # POST /v1/messages (Anthropic API)
│   │   ├── handler.ts              #   request orchestration, retries, error handling
│   │   ├── non-stream-translation.ts  # Anthropic ↔ OpenAI (non-streaming)
│   │   ├── stream-translation.ts      # Anthropic ↔ OpenAI (SSE streaming)
│   │   ├── count-tokens.ts            # /v1/messages/count_tokens
│   │   └── anthropic-types.ts         # TypeScript types
│   ├── models/                      # GET /v1/models
│   ├── embeddings/                  # POST /v1/embeddings
│   ├── usage/                       # GET /usage
│   └── token/                       # GET /token
│
├── services/
│   ├── copilot/
│   │   ├── create-chat-completions.ts  # Core fetch to Copilot API
│   │   ├── create-embeddings.ts
│   │   ├── get-models.ts               # Model list + context window helpers
│   │   └── responses-translation.ts    # Responses ↔ Chat Completions translation
│   ├── github/
│   │   ├── get-copilot-token.ts        # JWT token exchange + auto-refresh
│   │   ├── get-copilot-usage.ts        # Quota/usage stats
│   │   ├── get-device-code.ts          # OAuth device flow
│   │   ├── get-user.ts                 # GitHub user info
│   │   └── poll-access-token.ts        # OAuth polling
│   └── web-search/
│       ├── interceptor.ts              # Two-pass search orchestration
│       ├── brave.ts                    # Brave Search provider
│       ├── tavily.ts                   # Tavily provider
│       ├── system-prompt.ts            # Search instruction injection
│       └── tool-definition.ts          # Tool detection & definition
│
└── lib/
    ├── api-config.ts                # Copilot API URLs & VS Code impersonation headers
    ├── error.ts                     # HTTPError, Anthropic error formatting
    ├── model-selector.ts            # Auto-switch to largest-context model
    ├── rate-limit.ts                # Interval + burst rate limiters
    ├── request-logger.ts            # Colored terminal logging middleware
    ├── session-id.ts                # Claude Code session ID extraction
    ├── shell.ts                     # Cross-shell env var generation
    ├── state.ts                     # Global mutable runtime state
    ├── token.ts                     # Token persistence & refresh
    ├── tokenizer.ts                 # gpt-tokenizer token counting
    ├── proxy.ts                     # HTTP proxy support (undici)
    ├── approval.ts                  # Interactive request approval
    └── paths.ts                     # Data directory paths

Prerequisites

Bun >= 1.2.x
GitHub account with an active Copilot subscription (Individual, Business, or Enterprise)

Installation

bun install

Quick Start

# Via npx (no clone needed)
npx copilot-api@latest start

# From source
bun run dev    # development with watch mode
bun run start  # production

On first run, the proxy triggers GitHub's device-code OAuth flow — follow the on-screen URL to authorize.

Quick Start (Windows)

The included start.bat handles everything automatically:

Create a .env file in the project root (see Environment Variables)
Double-click start.bat or run it from a terminal

The script will load env vars, build if needed, show the active search provider, start the server, and open the Usage Dashboard in your browser.

Environment Variables

Create a .env file in the project root. It is gitignored.

# Web Search (optional — pick one)
TAVILY_API_KEY=tvly-...          # Preferred: free at tavily.com (1,000 req/mo)
BRAVE_API_KEY=BSA...             # Alternative: brave.com/search/api

# Proxy (optional)
HTTP_PROXY=http://proxy:8080
HTTPS_PROXY=http://proxy:8080

Provider priority: If both keys are set, Tavily is used.

Command Structure

Command	Description
`start`	Start the proxy server (handles auth if needed)
`auth`	Run GitHub OAuth flow without starting the server
`check-usage`	Show Copilot quota/usage in the terminal
`debug`	Display version, runtime, paths, and auth status

Start Command Options

Option	Alias	Default	Description
`--port`	`-p`	`4141`	Port to listen on
`--verbose`	`-v`	`false`	Enable verbose logging
`--account-type`	`-a`	`individual`	`individual`, `business`, or `enterprise`
`--manual`	—	`false`	Require interactive approval for each request
`--rate-limit`	`-r`	—	Minimum seconds between requests
`--wait`	`-w`	`false`	Queue requests instead of rejecting when rate limited
`--burst-count`	—	—	Max requests in burst window
`--burst-window`	—	—	Burst window duration in seconds
`--github-token`	`-g`	—	Provide a pre-existing GitHub token (skip OAuth)
`--claude-code`	`-c`	`false`	Interactive Claude Code setup wizard
`--show-token`	—	`false`	Display tokens in logs for debugging
`--proxy-env`	—	`false`	Use `HTTP_PROXY`/`HTTPS_PROXY` from environment

Auth Command Options

Option	Alias	Default	Description
`--verbose`	`-v`	`false`	Verbose logging
`--show-token`	—	`false`	Show token after auth

Debug Command Options

Option	Default	Description
`--json`	`false`	Output as JSON

API Endpoints

OpenAI Compatible

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (streaming & non-streaming)
`/v1/responses`	POST	OpenAI Responses API
`/v1/models`	GET	List available models (with context window metadata)
`/v1/embeddings`	POST	Generate embedding vectors

Anthropic Compatible

Endpoint	Method	Description
`/v1/messages`	POST	Anthropic Messages API (full protocol translation)
`/v1/messages/count_tokens`	POST	Token counting with model-specific scaling

Utility

Endpoint	Method	Description
`/usage`	GET	Copilot quota and usage statistics
`/token`	GET	Current Copilot JWT token

All OpenAI endpoints are also available without the /v1/ prefix. The Responses API is available at both /responses and /v1/responses.

Web Search

The proxy performs real-time web searches using a two-pass architecture:

Pass 1 — Copilot determines what to search (non-streaming call)
Search — Proxy fetches results from Tavily or Brave (5s timeout, max 5 results)
Pass 2 — Copilot generates a response using the injected search results

Each web search uses 2–3 internal Copilot API calls.

Setup

Tavily (Recommended, Free) — Sign up at app.tavily.com, add TAVILY_API_KEY to .env

Brave Search — Sign up at brave.com/search/api, add BRAVE_API_KEY to .env

Trigger Conditions

Path 1 (zero-cost): Client sends a typed Anthropic web search tool (type: "web_search_20250305")
Path 2 (preflight): Client sends a tool with a recognized name and the last user message appears to need real-time info

Recognized names: web_search, internet_search, brave_search, bing_search, google_search, find_online, internet_research

Using with Claude Code

Interactive Setup

npx copilot-api@latest start --claude-code

Select a primary model and a small/fast model. A ready-to-paste launch command is copied to your clipboard.

Manual Setup

Create .claude/settings.json in your project root:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:4141",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "ANTHROPIC_MODEL": "gpt-4.1",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-4.1",
    "ANTHROPIC_SMALL_FAST_MODEL": "gpt-4.1",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-4.1",
    "DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  },
  "permissions": {
    "deny": ["WebSearch"]
  }
}

More options: Claude Code settings · IDE integrations

Advanced Features

Automatic Endpoint Routing

Models declare their supported_endpoints. When a model doesn't support /chat/completions (e.g. some gpt-5.x variants), the proxy automatically routes through the Responses API with full translation. Claude models go the opposite direction — they're translated from Responses API to Chat Completions.

Context Overflow Auto-Switch

When estimated token count exceeds the requested model's context window, the proxy auto-switches to the largest available model. This prevents context-window errors without client-side changes.

Image Handling

413 Stripping Cascade: On payload-too-large errors, the proxy retries by progressively stripping images: older images first → all images → trigger compaction
Proactive Trimming: Set IMAGE_CONTEXT_TRIMMING_ENABLED=1 to auto-trim processed images beyond a message threshold
Validation: Rejects PNG images smaller than 4×4 pixels

Large Edit Guidance

When file-edit tools (Edit, Write, MultiEdit) are present and the model's max output is under 32K tokens, the proxy injects a system message warning about output limits — helping models plan chunked edits instead of overflowing.

Empty Response Recovery

If Copilot returns an empty response (common with some model backends), the proxy retries up to 2 times and falls back to a synthetic response explaining the failure.

Truncated Tool Call Detection

When a model's output is cut off mid-tool-call, the proxy detects the truncation and returns an explanatory text block with end_turn instead of a malformed tool_use block.

Token Counting & Compaction Scaling

Token counts include overhead estimates for typed tools (bash: 700, text_editor: 700, etc.), custom tools, and attachments. Counts are scaled per model family (Claude ×1.2, Grok ×1.03, others dynamically) to ensure accurate compaction triggers in Claude Code.

Docker

Build & Run

docker build -t copilot-api .

mkdir -p ./copilot-data
docker run -p 4141:4141 -v $(pwd)/copilot-data:/root/.local/share/copilot-api copilot-api

With Environment Variables

docker run -p 4141:4141 \
  -e GH_TOKEN=your_github_token \
  -e TAVILY_API_KEY=tvly-... \
  copilot-api

Docker Compose

version: "3.8"
services:
  copilot-api:
    build: .
    ports:
      - "4141:4141"
    environment:
      - GH_TOKEN=your_github_token_here
      - TAVILY_API_KEY=tvly-your-key-here
    restart: unless-stopped

The Docker image features multi-stage builds, a non-root user, health checks, and pinned base images.

Using with npx

npx copilot-api@latest start                    # basic
npx copilot-api@latest start --port 8080         # custom port
npx copilot-api@latest start --account-type business  # business plan
npx copilot-api@latest auth                      # auth only
npx copilot-api@latest check-usage               # quota info
npx copilot-api@latest debug --json              # diagnostics

Usage Dashboard

After starting the server, the console displays a URL to the web-based usage dashboard:

https://ericc-ch.github.io/copilot-api?endpoint=http://localhost:4141/usage

The dashboard shows usage quotas (Chat, Completions, Premium), detailed statistics, and supports custom endpoints via the ?endpoint= parameter. On Windows, start.bat opens it automatically.

Running from Source

bun install           # install dependencies
bun run dev           # development (watch mode)
bun run start         # production
bun run build         # compile to dist/
bun run typecheck     # type check
bun run lint:all      # lint all files
bun run knip          # find unused exports/dead code

Tips

Rate limiting: --rate-limit 30 enforces a 30s gap. Add --wait to queue instead of reject. Use --burst-count and --burst-window for sliding-window limits.
Business/Enterprise: Always pass --account-type business or enterprise — it changes the Copilot API base URL.
Web search cost: Each search uses 2–3 internal API calls. Monitor your quota.
Token persistence: Stored at ~/.local/share/copilot-api/github_token. Use auth to regenerate.
Proxy: Set HTTP_PROXY/HTTPS_PROXY and pass --proxy-env to route through a corporate proxy.

Name		Name	Last commit message	Last commit date
Latest commit History 555 Commits
.claude		.claude
.github		.github
.vscode		.vscode
docs/superpowers		docs/superpowers
pages		pages
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CUsersttbasil.claudeclaude-notify-signalsstop		CUsersttbasil.claudeclaude-notify-signalsstop
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
entrypoint.sh		entrypoint.sh
eslint.config.js		eslint.config.js
opencode.json		opencode.json
package.json		package.json
start-controlled.bat		start-controlled.bat
start-openai.bat		start-openai.bat
start-service.bat		start-service.bat
start.bat		start.bat
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts

Folders and files

Latest commit

History

Repository files navigation

Copilot API

Features

Demo

Architecture

High-Level Request Flow

Translation Layers

Authentication Flow

Web Search Architecture

Project Structure

Prerequisites

Installation

Quick Start

Quick Start (Windows)

Environment Variables

Command Structure

Start Command Options

Auth Command Options

Debug Command Options

API Endpoints

OpenAI Compatible

Anthropic Compatible

Utility

Web Search

Setup

Trigger Conditions

Using with Claude Code

Interactive Setup

Manual Setup

Advanced Features

Automatic Endpoint Routing

Context Overflow Auto-Switch

Image Handling

Large Edit Guidance

Empty Response Recovery

Truncated Tool Call Detection

Token Counting & Compaction Scaling

Docker

Build & Run

With Environment Variables

Docker Compose

Using with npx

Usage Dashboard

Running from Source

Tips

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages