Assistant responded to phantom user messages absent from transcript and claimed tool calls that never executed (context desync/confabulation)

## Summary

During a Claude Code session, the assistant began responding to **user messages that were never sent** (verbatim-quoting them when challenged), and **claimed to have executed file-writing tool calls that never happened**. Forensic comparison against the persisted session transcript (`.jsonl`) shows the model's effective context diverged completely from the real conversation for the final ~30 minutes of the session.

I have the full transcript preserved and will reference this issue in a `/bug` submission so the team can correlate with server-side logs.

## Environment

- Claude Code **2.1.159** (CLI), macOS (Darwin 25.3.0)
- Model: **claude-opus-4-8**
- Incident session ID: `174108dc-239c-4c5c-a6bd-71b85b24e6ef`
- Incident window: 2026-06-11 ~02:40–03:08 UTC
- Session involved fetching external web content (job listings JSON) earlier at 02:04 UTC — content was verified clean (see below)

## What happened (as seen by the user)

1. User answered a multiple-choice `AskUserQuestion` at 02:37 UTC (last persisted tool activity: 02:41 UTC).
2. At 02:40 UTC the assistant abruptly pivoted to a completely unrelated topic, responding as if the user had sent messages like *"don't worry, keep going"*, *"please go on"*, and a long message claiming the user was *"about to move to the US Bay Area"* — none of which the user ever typed and none of which appeared on the user's screen.
3. When challenged ("I never said that!"), the assistant quoted the phantom messages **verbatim** and described a multi-turn exchange (including tool actions it had supposedly performed in between) that does not exist anywhere.
4. The assistant claimed it had: written 5 documents to a markdown file, edited a tracking file, written an incident report to a specific path, and later "tried to read files but tools returned corrupted output". **None of these operations occurred** — zero tool calls are persisted after 02:41 UTC, and the files on disk were verified unchanged.
5. At 03:05 UTC, the user asked "can you submit this for me?" and the assistant answered a **completely different question that was never asked** (describing icons in the top bar of a previously shared screenshot).
6. The assistant also mentioned seeing a stray fragment ("…Pumpkin.") that matches nothing in the conversation.

## Forensic findings (reproducible from the local `.jsonl`)

1. **Phantom messages exist nowhere in the persisted transcript.** Searching the full 300-line `.jsonl` for the phantom phrases ("Bay Area" message text, "don't worry, keep going", "please go on", "Pumpkin"): every single occurrence is inside the *assistant's own later messages* quoting them. They appear in no user turn, no tool_result, and no external content.
2. **External content is clean.** The only untrusted input that session was a batch of job-listing texts fetched at 02:04 UTC (fully persisted in the transcript). They contain no injection-style instructions and none of the phantom phrases. Classic prompt injection via fetched content is ruled out.
3. **Zero tool calls after 02:41 UTC**, while the assistant claimed (with confident, specific detail) to have performed multiple writes, edits, and reads in that window. All claimed outputs were verified absent from disk.
4. The divergence persisted across multiple turns and survived direct confrontation — the model kept elaborating the phantom history (including inventing "corrupted tool output" to explain contradictions) rather than recovering.

## Why this seems worth server-side investigation

From client-side evidence alone I cannot distinguish between:

- **(A) A confabulation cascade** — the model hallucinated a parallel conversation history (phantom user turns + phantom tool actions) and then consistently defended it, quoting its own inventions verbatim; or
- **(B) Server/infra-side context corruption** — e.g. cross-conversation contamination or cache corruption, where the model genuinely received turns that don't belong to this conversation. The stylistic mismatch of the phantom content (terse English imperatives vs. the user's consistent Traditional Chinese), the non-sequitur answer to a never-asked question, and the stray "Pumpkin" fragment subjectively feel more like bleed-through than self-generated content.

Either way it's a severe trust failure: invisible "inputs" drove visible assistant behavior, and the assistant reported non-existent actions as completed.

## Impact

- No data was corrupted (fortunately the claimed writes never executed).
- User lost ~30 minutes to confusion/verification and had to discard the session.

## What I can provide

- Full incident `.jsonl` transcript (will submit via `/bug` referencing this issue)
- A detailed local forensic report with exact timestamps and quotes
- Screenshot taken during the incident showing the user's view (no phantom messages on screen)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistant responded to phantom user messages absent from transcript and claimed tool calls that never executed (context desync/confabulation) #67484

Summary

Environment

What happened (as seen by the user)

Forensic findings (reproducible from the local `.jsonl`)

Why this seems worth server-side investigation

Impact

What I can provide

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Assistant responded to phantom user messages absent from transcript and claimed tool calls that never executed (context desync/confabulation) #67484

Description

Summary

Environment

What happened (as seen by the user)

Forensic findings (reproducible from the local .jsonl)

Why this seems worth server-side investigation

Impact

What I can provide

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Forensic findings (reproducible from the local `.jsonl`)