Summary
During a Claude Code session, the assistant began responding to user messages that were never sent (verbatim-quoting them when challenged), and claimed to have executed file-writing tool calls that never happened. Forensic comparison against the persisted session transcript (.jsonl) shows the model's effective context diverged completely from the real conversation for the final ~30 minutes of the session.
I have the full transcript preserved and will reference this issue in a /bug submission so the team can correlate with server-side logs.
Environment
- Claude Code 2.1.159 (CLI), macOS (Darwin 25.3.0)
- Model: claude-opus-4-8
- Incident session ID:
174108dc-239c-4c5c-a6bd-71b85b24e6ef
- Incident window: 2026-06-11 ~02:40–03:08 UTC
- Session involved fetching external web content (job listings JSON) earlier at 02:04 UTC — content was verified clean (see below)
What happened (as seen by the user)
- User answered a multiple-choice
AskUserQuestion at 02:37 UTC (last persisted tool activity: 02:41 UTC).
- At 02:40 UTC the assistant abruptly pivoted to a completely unrelated topic, responding as if the user had sent messages like "don't worry, keep going", "please go on", and a long message claiming the user was "about to move to the US Bay Area" — none of which the user ever typed and none of which appeared on the user's screen.
- When challenged ("I never said that!"), the assistant quoted the phantom messages verbatim and described a multi-turn exchange (including tool actions it had supposedly performed in between) that does not exist anywhere.
- The assistant claimed it had: written 5 documents to a markdown file, edited a tracking file, written an incident report to a specific path, and later "tried to read files but tools returned corrupted output". None of these operations occurred — zero tool calls are persisted after 02:41 UTC, and the files on disk were verified unchanged.
- At 03:05 UTC, the user asked "can you submit this for me?" and the assistant answered a completely different question that was never asked (describing icons in the top bar of a previously shared screenshot).
- The assistant also mentioned seeing a stray fragment ("…Pumpkin.") that matches nothing in the conversation.
Forensic findings (reproducible from the local .jsonl)
- Phantom messages exist nowhere in the persisted transcript. Searching the full 300-line
.jsonl for the phantom phrases ("Bay Area" message text, "don't worry, keep going", "please go on", "Pumpkin"): every single occurrence is inside the assistant's own later messages quoting them. They appear in no user turn, no tool_result, and no external content.
- External content is clean. The only untrusted input that session was a batch of job-listing texts fetched at 02:04 UTC (fully persisted in the transcript). They contain no injection-style instructions and none of the phantom phrases. Classic prompt injection via fetched content is ruled out.
- Zero tool calls after 02:41 UTC, while the assistant claimed (with confident, specific detail) to have performed multiple writes, edits, and reads in that window. All claimed outputs were verified absent from disk.
- The divergence persisted across multiple turns and survived direct confrontation — the model kept elaborating the phantom history (including inventing "corrupted tool output" to explain contradictions) rather than recovering.
Why this seems worth server-side investigation
From client-side evidence alone I cannot distinguish between:
- (A) A confabulation cascade — the model hallucinated a parallel conversation history (phantom user turns + phantom tool actions) and then consistently defended it, quoting its own inventions verbatim; or
- (B) Server/infra-side context corruption — e.g. cross-conversation contamination or cache corruption, where the model genuinely received turns that don't belong to this conversation. The stylistic mismatch of the phantom content (terse English imperatives vs. the user's consistent Traditional Chinese), the non-sequitur answer to a never-asked question, and the stray "Pumpkin" fragment subjectively feel more like bleed-through than self-generated content.
Either way it's a severe trust failure: invisible "inputs" drove visible assistant behavior, and the assistant reported non-existent actions as completed.
Impact
- No data was corrupted (fortunately the claimed writes never executed).
- User lost ~30 minutes to confusion/verification and had to discard the session.
What I can provide
- Full incident
.jsonl transcript (will submit via /bug referencing this issue)
- A detailed local forensic report with exact timestamps and quotes
- Screenshot taken during the incident showing the user's view (no phantom messages on screen)
Summary
During a Claude Code session, the assistant began responding to user messages that were never sent (verbatim-quoting them when challenged), and claimed to have executed file-writing tool calls that never happened. Forensic comparison against the persisted session transcript (
.jsonl) shows the model's effective context diverged completely from the real conversation for the final ~30 minutes of the session.I have the full transcript preserved and will reference this issue in a
/bugsubmission so the team can correlate with server-side logs.Environment
174108dc-239c-4c5c-a6bd-71b85b24e6efWhat happened (as seen by the user)
AskUserQuestionat 02:37 UTC (last persisted tool activity: 02:41 UTC).Forensic findings (reproducible from the local
.jsonl).jsonlfor the phantom phrases ("Bay Area" message text, "don't worry, keep going", "please go on", "Pumpkin"): every single occurrence is inside the assistant's own later messages quoting them. They appear in no user turn, no tool_result, and no external content.Why this seems worth server-side investigation
From client-side evidence alone I cannot distinguish between:
Either way it's a severe trust failure: invisible "inputs" drove visible assistant behavior, and the assistant reported non-existent actions as completed.
Impact
What I can provide
.jsonltranscript (will submit via/bugreferencing this issue)