Skip to content

F3.A — events table + telemetry middleware + retention sweep #34

@HXYerror

Description

@HXYerror

Part of #23. Depends on F2.A, F2.C.

Background

No usage data is recorded today. Token usage is available in usage field for OpenAI Chat Completions and usage.input_tokens/output_tokens for Anthropic Messages — but ONLY in non-streaming responses by default.

⚠️ Critical caveat (per backend review B1)

Streaming token counts will NOT arrive unless we explicitly request them.

  • OpenAI Chat Completions streaming: requires stream_options: {include_usage: true} in the outbound request; usage arrives in a final chunk where choices: []. The proxy currently never sets this flag — we must inject it.
  • Anthropic streaming: token counts arrive in message_start (input) and message_delta (output) events — different events entirely. The Anthropic translator at src/routes/messages/ translates Anthropic→OpenAI before sending upstream, so we get OpenAI-shaped usage back.
  • Older Copilot models that ignore stream_options will produce zero-token rows — emit usage_unknown=1 flag rather than silently writing 0.

Goal

Record one event per proxied request; enforce retention via an hourly sweeper.

Tasks

  • Migration 003_events.sql:
    CREATE TABLE events (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      ts INTEGER NOT NULL,
      key_id TEXT NOT NULL,             -- '__noauth__' sentinel for --no-auth (per backend #17)
      model TEXT NOT NULL,              -- user-facing alias name
      upstream_model TEXT NOT NULL,     -- post-alias-resolution name
      prompt_tokens INTEGER,
      completion_tokens INTEGER,
      status INTEGER NOT NULL,          -- HTTP status code
      latency_ms INTEGER NOT NULL,
      error TEXT,                       -- short error tag, NOT body
      usage_unknown INTEGER NOT NULL DEFAULT 0
    );
    CREATE INDEX idx_events_ts ON events(ts);
    CREATE INDEX idx_events_key_ts ON events(key_id, ts);
    CREATE INDEX idx_events_model_ts ON events(model, ts);
  • src/middleware/telemetry.ts: wraps proxy routes; records start time; in finally, captures upstream usage; inserts row
  • Inject stream_options.include_usage = true into outbound Chat Completions when stream=true and not already set; document in code with a link back to this issue
  • For Anthropic translator, parse message_start / message_delta to extract input/output tokens before forwarding
  • Never write request/response bodies here; that's F4.A's job
  • Hourly sweeper with wall-clock anchoring (per backend B7): setInterval with check Date.now() % 3600_000 to anchor to the hour boundary; "I just woke up" detection if delta > 2× expected interval (laptop suspend); DELETE FROM events WHERE ts < ? in LIMIT 1000 chunks yielding to the event loop between batches; log row count
  • Telemetry failures NEVER break the proxied request (catch + log)
  • Tests: success/4xx/5xx rows; streaming with and without include_usage support; retention sweep deletes only stale rows; sweep doesn't block writers

Acceptance criteria

  • A successful chat completion writes exactly one row with non-null token counts
  • A streaming completion to a model that doesn't return usage writes usage_unknown=1 instead of silent zeros
  • DB size grows linearly with traffic, bounded by retention (verified via fixture)
  • Telemetry write error does NOT propagate to the client response

File pointers

  • New: src/lib/migrations/003_events.sql, src/middleware/telemetry.ts, src/services/retention.ts, tests/telemetry.test.ts
  • Touch: src/server.ts, src/services/copilot/create-chat-completions.ts (inject stream_options), src/routes/messages/stream-translation.ts (parse Anthropic usage events)

Dependencies

Depends on F2.A, F2.C. Blocks F3.B.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions