test: harden session analysis against real-world edge cases by dbroeglin · Pull Request #25 · dbroeglin/github-copilot-lab

dbroeglin · 2026-06-27T19:47:41Z

Continues the session-analysis hardening from #24. I mined the local Copilot
session store (129 sessions) for edge cases, then encoded the interesting ones
as synthetic, anonymized, simplified tests — real payload structure,
invented IDs, no prose, trimmed counts. No session is copied verbatim.

Edge cases mined and covered

Pattern	Source observation	Test
Stale prompt-cache compaction	`cacheReadTokens: 0`, full context re-billed as `cacheWriteTokens`, non-default live Opus `costPerBatch` rates	`test_stale_compaction_*`
Multiple compactions	one session had 4 `compaction_complete` events	`test_multiple_compactions_accumulate_and_track_peak`
compaction_start/complete peak	both contribute to context peak	`test_compaction_start_and_complete_peak_interplay`
API timeout	`CAPIError … operation timed out [ETIMEDOUT]; retried 5 times`	`test_api_timeout_*`, `test_timeout_error_does_not_override_authoritative_shutdown`
Socket drop + abort	`SocketError: other side closed`, `abort/user_initiated`	`test_socket_drop_then_abort_is_handled`
Failed to list models	`session.error/query`	`test_failed_to_list_models_error_is_inert`

Load-bearing realism: the stale-compaction costPerBatch rates and token counts
price out to exactly the totalNanoAiu from the source session, so the test
drives real rate-reconciliation math rather than an invented total.

Behavior locked in

rates_from_compaction reads live, non-default rates; the cold-cache AIU
split differs from the default-rate split (which is why reading live rates
matters).
Transient API failures (timeout / socket / abort) with no shutdown degrade
gracefully: tallied in event_type_counts, economics stays None, per-message
token counts still recovered, and a final shutdown remains authoritative.

9 new tests; full suite 212 passed. Ruff clean + formatted. Tests only.

Add offline edge-case tests derived from patterns observed in real local Copilot sessions, using synthetic/anonymized/simplified payloads that keep the real event structure but carry invented IDs and trimmed counts (no session is copied verbatim): - Stale prompt-cache compaction: cache_read=0 with the whole context re-billed as cache_write, carrying the non-default live Opus costPerBatch rates. The rates + token counts price out to exactly the source session's totalNanoAiu, so rate-reconciliation is exercised for real. Asserts rates_from_compaction reads the live (non-default) rates and the cold-cache AIU split differs from the default-rate split. - Multiple compactions in one long session: n_compactions accumulation, peak context = max pre-compaction size, compaction_aiu summation. - compaction_start + compaction_complete peak interplay. - Transient API failures with no shutdown degrade gracefully: API timeout (ETIMEDOUT, "retried 5 times"), socket drop, and user abort are tallied in event_type_counts but never fabricate economics; per-message token counts are still recovered. A timeout earlier in a session does not suppress the final shutdown's authoritative totals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dbroeglin merged commit 4b6bcb5 into main Jun 28, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: harden session analysis against real-world edge cases#25

test: harden session analysis against real-world edge cases#25
dbroeglin merged 1 commit into
mainfrom
dbroeglin/session-edge-case-tests

dbroeglin commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dbroeglin commented Jun 27, 2026

Edge cases mined and covered

Behavior locked in

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant