|
| 1 | +# RFC: Opt-In Proxy Exchange Capture |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Add an opt-in runtime feature that persists proxied request and response exchanges to disk as JSONL. The feature is disabled by default and does not change existing proxy behavior or terminal usage logging. |
| 6 | + |
| 7 | +## Motivation |
| 8 | + |
| 9 | +The proxy currently provides terminal summaries for request usage, but not a durable record of payloads, response IDs, tool-call traffic, or translated streamed output. That makes debugging translation issues and upstream behavior harder than it needs to be. |
| 10 | + |
| 11 | +An opt-in capture mode gives us persistent, structured traces without making default runs noisy or risky. |
| 12 | + |
| 13 | +## Goals |
| 14 | + |
| 15 | +- Keep capture disabled by default. |
| 16 | +- Persist request and final response bodies for proxied model traffic. |
| 17 | +- Preserve current client-visible behavior, especially for streaming endpoints. |
| 18 | +- Redact sensitive values before writing to disk. |
| 19 | +- Keep storage append-friendly and easy to inspect manually. |
| 20 | +- Reduce disk growth for older captures automatically. |
| 21 | + |
| 22 | +## Non-Goals |
| 23 | + |
| 24 | +- Capturing GitHub auth and token refresh flows. |
| 25 | +- Persisting raw SSE streams byte-for-byte. |
| 26 | +- Replacing existing terminal logging. |
| 27 | +- Building a UI or search layer for captured data. |
| 28 | + |
| 29 | +## User Interface |
| 30 | + |
| 31 | +New `start` options: |
| 32 | + |
| 33 | +- `--capture` |
| 34 | +- `--capture-path <path>` |
| 35 | + |
| 36 | +Behavior: |
| 37 | + |
| 38 | +- `--capture` enables persistence. |
| 39 | +- `--capture-path` overrides the default JSONL location. |
| 40 | +- Without `--capture`, no exchanges are written to disk. |
| 41 | +- In default-path mode, the current day stays as plain `.jsonl` and prior-day files are automatically gzipped. |
| 42 | + |
| 43 | +Default path: |
| 44 | + |
| 45 | +```text |
| 46 | +~/.local/share/copilot-api/captures/YYYY-MM-DD.jsonl |
| 47 | +``` |
| 48 | + |
| 49 | +## Design |
| 50 | + |
| 51 | +A dedicated capture module is responsible for: |
| 52 | + |
| 53 | +- generating and carrying request correlation IDs |
| 54 | +- appending JSONL records |
| 55 | +- redacting token-like values |
| 56 | +- measuring request and response byte sizes |
| 57 | +- reconstructing final streamed responses for persistence |
| 58 | +- compressing prior-day default capture files to `.jsonl.gz` |
| 59 | + |
| 60 | +Capture records are intentionally aligned with the proxy's effective upstream interaction, not a byte-for-byte copy of the original inbound request. In practice that means: |
| 61 | + |
| 62 | +- normalized request fields added by the proxy, such as model normalization or inferred limits, are reflected in the saved record |
| 63 | +- compatibility routes may persist translated request and response shapes rather than the original client-facing wire format |
| 64 | +- the goal of capture is operational debugging of what the proxy effectively sent and received, not raw inbound replay fidelity |
| 65 | + |
| 66 | +Each persisted record includes: |
| 67 | + |
| 68 | +- timestamp |
| 69 | +- route and HTTP method |
| 70 | +- upstream target path |
| 71 | +- model and reasoning level when available |
| 72 | +- local request ID |
| 73 | +- upstream status |
| 74 | +- upstream response ID when available |
| 75 | +- `previous_response_id` when available |
| 76 | +- request bytes |
| 77 | +- response bytes |
| 78 | +- request body |
| 79 | +- final response body |
| 80 | +- usage totals when available |
| 81 | + |
| 82 | +For compatibility routes, the saved bodies should be interpreted as the proxy's translated interaction model. They are useful for debugging behavior through the proxy, but they are not guaranteed to exactly match the original client payload shape. |
| 83 | + |
| 84 | +## Streaming |
| 85 | + |
| 86 | +For streaming endpoints, the proxy continues forwarding events unchanged to the client. In parallel, it reconstructs a final logical response for persistence. |
| 87 | + |
| 88 | +This applies to: |
| 89 | + |
| 90 | +- OpenAI chat completions streams |
| 91 | +- Responses API streams routed back to chat completions |
| 92 | +- Anthropic message streams translated from chat completions |
| 93 | + |
| 94 | +The design intentionally stores the final reconstructed response instead of raw SSE chunks. That keeps logs compact and directly usable, at the cost of losing exact chunk timing and wire-level fidelity. |
| 95 | + |
| 96 | +## Storage Lifecycle |
| 97 | + |
| 98 | +The active capture file remains uncompressed so the proxy can append to it efficiently. |
| 99 | + |
| 100 | +When capture uses the default daily path: |
| 101 | + |
| 102 | +- today's file is written as `YYYY-MM-DD.jsonl` |
| 103 | +- older `.jsonl` files are automatically gzip-compressed to `.jsonl.gz` |
| 104 | +- compression runs when capture writes begin for a new process/day |
| 105 | +- compressed files are not reopened for append |
| 106 | + |
| 107 | +This keeps the hot file simple while reducing long-term storage usage for older days. |
| 108 | + |
| 109 | +## Redaction |
| 110 | + |
| 111 | +Before any exchange is written to disk, the capture layer redacts: |
| 112 | + |
| 113 | +- authorization headers |
| 114 | +- token-like keys |
| 115 | +- token-like string values |
| 116 | + |
| 117 | +This keeps the default capture mode suitable for debugging without writing obvious credentials to disk. |
| 118 | + |
| 119 | +## Scope |
| 120 | + |
| 121 | +Capture is implemented for proxied model traffic: |
| 122 | + |
| 123 | +- `/v1/chat/completions` |
| 124 | +- `/v1/responses` |
| 125 | +- `/v1/messages` |
| 126 | +- `/v1/embeddings` |
| 127 | + |
| 128 | +Auth and token management flows are intentionally out of scope. |
| 129 | + |
| 130 | +## Change Summary |
| 131 | + |
| 132 | +Implemented changes on branch `feat/proxy-exchange-capture`: |
| 133 | + |
| 134 | +- added CLI flags in `src/start.ts` |
| 135 | +- added capture config to runtime state |
| 136 | +- added default capture path handling |
| 137 | +- added `src/lib/exchange-capture.ts` for persistence, redaction, and stream reconstruction |
| 138 | +- added automatic gzip compression for prior daily capture files in default-path mode |
| 139 | +- threaded request IDs and upstream status through Copilot service calls |
| 140 | +- integrated persistence into chat, responses, anthropic, and embeddings routes |
| 141 | +- documented the feature in `README.md` |
| 142 | +- added unit coverage for redaction and stream reconstruction |
| 143 | +- added unit coverage for previous-day gzip compression |
| 144 | + |
| 145 | +## Validation |
| 146 | + |
| 147 | +Validated in the feature worktree with: |
| 148 | + |
| 149 | +- `bun test` |
| 150 | +- `bunx --bun tsc --noEmit` |
| 151 | +- `bunx --bun eslint --cache src tests` |
| 152 | +- `bunx --bun tsdown` |
0 commit comments