You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No usage data is recorded today. Token usage is available in usage field for OpenAI Chat Completions and usage.input_tokens/output_tokens for Anthropic Messages — but ONLY in non-streaming responses by default.
⚠️ Critical caveat (per backend review B1)
Streaming token counts will NOT arrive unless we explicitly request them.
OpenAI Chat Completions streaming: requires stream_options: {include_usage: true} in the outbound request; usage arrives in a final chunk where choices: []. The proxy currently never sets this flag — we must inject it.
Anthropic streaming: token counts arrive in message_start (input) and message_delta (output) events — different events entirely. The Anthropic translator at src/routes/messages/ translates Anthropic→OpenAI before sending upstream, so we get OpenAI-shaped usage back.
Older Copilot models that ignore stream_options will produce zero-token rows — emit usage_unknown=1 flag rather than silently writing 0.
Goal
Record one event per proxied request; enforce retention via an hourly sweeper.
Tasks
Migration 003_events.sql:
CREATETABLEevents (
id INTEGERPRIMARY KEY AUTOINCREMENT,
ts INTEGERNOT NULL,
key_id TEXTNOT NULL, -- '__noauth__' sentinel for --no-auth (per backend #17)
model TEXTNOT NULL, -- user-facing alias name
upstream_model TEXTNOT NULL, -- post-alias-resolution name
prompt_tokens INTEGER,
completion_tokens INTEGER,
status INTEGERNOT NULL, -- HTTP status code
latency_ms INTEGERNOT NULL,
error TEXT, -- short error tag, NOT body
usage_unknown INTEGERNOT NULL DEFAULT 0
);
CREATEINDEXidx_events_tsON events(ts);
CREATEINDEXidx_events_key_tsON events(key_id, ts);
CREATEINDEXidx_events_model_tsON events(model, ts);
src/middleware/telemetry.ts: wraps proxy routes; records start time; in finally, captures upstream usage; inserts row
Inject stream_options.include_usage = true into outbound Chat Completions when stream=true and not already set; document in code with a link back to this issue
For Anthropic translator, parse message_start / message_delta to extract input/output tokens before forwarding
Never write request/response bodies here; that's F4.A's job
Hourly sweeper with wall-clock anchoring (per backend B7): setInterval with check Date.now() % 3600_000 to anchor to the hour boundary; "I just woke up" detection if delta > 2× expected interval (laptop suspend); DELETE FROM events WHERE ts < ? in LIMIT 1000 chunks yielding to the event loop between batches; log row count
Telemetry failures NEVER break the proxied request (catch + log)
Tests: success/4xx/5xx rows; streaming with and without include_usage support; retention sweep deletes only stale rows; sweep doesn't block writers
Acceptance criteria
A successful chat completion writes exactly one row with non-null token counts
A streaming completion to a model that doesn't return usage writes usage_unknown=1 instead of silent zeros
DB size grows linearly with traffic, bounded by retention (verified via fixture)
Telemetry write error does NOT propagate to the client response
Part of #23. Depends on F2.A, F2.C.
Background
No usage data is recorded today. Token usage is available in
usagefield for OpenAI Chat Completions andusage.input_tokens/output_tokensfor Anthropic Messages — but ONLY in non-streaming responses by default.Streaming token counts will NOT arrive unless we explicitly request them.
stream_options: {include_usage: true}in the outbound request; usage arrives in a final chunk wherechoices: []. The proxy currently never sets this flag — we must inject it.message_start(input) andmessage_delta(output) events — different events entirely. The Anthropic translator atsrc/routes/messages/translates Anthropic→OpenAI before sending upstream, so we get OpenAI-shaped usage back.stream_optionswill produce zero-token rows — emitusage_unknown=1flag rather than silently writing0.Goal
Record one event per proxied request; enforce retention via an hourly sweeper.
Tasks
003_events.sql:src/middleware/telemetry.ts: wraps proxy routes; records start time; infinally, captures upstreamusage; inserts rowstream_options.include_usage = trueinto outbound Chat Completions when stream=true and not already set; document in code with a link back to this issuemessage_start/message_deltato extract input/output tokens before forwardingsetIntervalwith checkDate.now() % 3600_000to anchor to the hour boundary; "I just woke up" detection if delta > 2× expected interval (laptop suspend);DELETE FROM events WHERE ts < ?inLIMIT 1000chunks yielding to the event loop between batches; log row countinclude_usagesupport; retention sweep deletes only stale rows; sweep doesn't block writersAcceptance criteria
usage_unknown=1instead of silent zerosFile pointers
src/lib/migrations/003_events.sql,src/middleware/telemetry.ts,src/services/retention.ts,tests/telemetry.test.tssrc/server.ts,src/services/copilot/create-chat-completions.ts(injectstream_options),src/routes/messages/stream-translation.ts(parse Anthropic usage events)Dependencies
Depends on F2.A, F2.C. Blocks F3.B.