Skip to content

test: harden session analysis against real-world edge cases#25

Merged
dbroeglin merged 1 commit into
mainfrom
dbroeglin/session-edge-case-tests
Jun 28, 2026
Merged

test: harden session analysis against real-world edge cases#25
dbroeglin merged 1 commit into
mainfrom
dbroeglin/session-edge-case-tests

Conversation

@dbroeglin

Copy link
Copy Markdown
Owner

Continues the session-analysis hardening from #24. I mined the local Copilot
session store (129 sessions) for edge cases, then encoded the interesting ones
as synthetic, anonymized, simplified tests — real payload structure,
invented IDs, no prose, trimmed counts. No session is copied verbatim.

Edge cases mined and covered

Pattern Source observation Test
Stale prompt-cache compaction cacheReadTokens: 0, full context re-billed as cacheWriteTokens, non-default live Opus costPerBatch rates test_stale_compaction_*
Multiple compactions one session had 4 compaction_complete events test_multiple_compactions_accumulate_and_track_peak
compaction_start/complete peak both contribute to context peak test_compaction_start_and_complete_peak_interplay
API timeout CAPIError … operation timed out [ETIMEDOUT]; retried 5 times test_api_timeout_*, test_timeout_error_does_not_override_authoritative_shutdown
Socket drop + abort SocketError: other side closed, abort/user_initiated test_socket_drop_then_abort_is_handled
Failed to list models session.error/query test_failed_to_list_models_error_is_inert

Load-bearing realism: the stale-compaction costPerBatch rates and token counts
price out to exactly the totalNanoAiu from the source session, so the test
drives real rate-reconciliation math rather than an invented total.

Behavior locked in

  • rates_from_compaction reads live, non-default rates; the cold-cache AIU
    split differs from the default-rate split (which is why reading live rates
    matters).
  • Transient API failures (timeout / socket / abort) with no shutdown degrade
    gracefully: tallied in event_type_counts, economics stays None, per-message
    token counts still recovered, and a final shutdown remains authoritative.

9 new tests; full suite 212 passed. Ruff clean + formatted. Tests only.

Add offline edge-case tests derived from patterns observed in real local
Copilot sessions, using synthetic/anonymized/simplified payloads that keep the
real event structure but carry invented IDs and trimmed counts (no session is
copied verbatim):

- Stale prompt-cache compaction: cache_read=0 with the whole context re-billed
  as cache_write, carrying the non-default live Opus costPerBatch rates. The
  rates + token counts price out to exactly the source session's totalNanoAiu,
  so rate-reconciliation is exercised for real. Asserts rates_from_compaction
  reads the live (non-default) rates and the cold-cache AIU split differs from
  the default-rate split.
- Multiple compactions in one long session: n_compactions accumulation, peak
  context = max pre-compaction size, compaction_aiu summation.
- compaction_start + compaction_complete peak interplay.
- Transient API failures with no shutdown degrade gracefully: API timeout
  (ETIMEDOUT, "retried 5 times"), socket drop, and user abort are tallied in
  event_type_counts but never fabricate economics; per-message token counts are
  still recovered. A timeout earlier in a session does not suppress the final
  shutdown's authoritative totals.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dbroeglin dbroeglin merged commit 4b6bcb5 into main Jun 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant