test: harden session analysis against real-world edge cases#25
Merged
Conversation
Add offline edge-case tests derived from patterns observed in real local Copilot sessions, using synthetic/anonymized/simplified payloads that keep the real event structure but carry invented IDs and trimmed counts (no session is copied verbatim): - Stale prompt-cache compaction: cache_read=0 with the whole context re-billed as cache_write, carrying the non-default live Opus costPerBatch rates. The rates + token counts price out to exactly the source session's totalNanoAiu, so rate-reconciliation is exercised for real. Asserts rates_from_compaction reads the live (non-default) rates and the cold-cache AIU split differs from the default-rate split. - Multiple compactions in one long session: n_compactions accumulation, peak context = max pre-compaction size, compaction_aiu summation. - compaction_start + compaction_complete peak interplay. - Transient API failures with no shutdown degrade gracefully: API timeout (ETIMEDOUT, "retried 5 times"), socket drop, and user abort are tallied in event_type_counts but never fabricate economics; per-message token counts are still recovered. A timeout earlier in a session does not suppress the final shutdown's authoritative totals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continues the session-analysis hardening from #24. I mined the local Copilot
session store (129 sessions) for edge cases, then encoded the interesting ones
as synthetic, anonymized, simplified tests — real payload structure,
invented IDs, no prose, trimmed counts. No session is copied verbatim.
Edge cases mined and covered
cacheReadTokens: 0, full context re-billed ascacheWriteTokens, non-default live OpuscostPerBatchratestest_stale_compaction_*compaction_completeeventstest_multiple_compactions_accumulate_and_track_peaktest_compaction_start_and_complete_peak_interplayCAPIError … operation timed out [ETIMEDOUT]; retried 5 timestest_api_timeout_*,test_timeout_error_does_not_override_authoritative_shutdownSocketError: other side closed,abort/user_initiatedtest_socket_drop_then_abort_is_handledsession.error/querytest_failed_to_list_models_error_is_inertLoad-bearing realism: the stale-compaction
costPerBatchrates and token countsprice out to exactly the
totalNanoAiufrom the source session, so the testdrives real rate-reconciliation math rather than an invented total.
Behavior locked in
rates_from_compactionreads live, non-default rates; the cold-cache AIUsplit differs from the default-rate split (which is why reading live rates
matters).
gracefully: tallied in
event_type_counts, economics staysNone, per-messagetoken counts still recovered, and a final shutdown remains authoritative.
9 new tests; full suite 212 passed. Ruff clean + formatted. Tests only.