feat(fixtures): blocks array for tool-first / interleaved ordering across providers (#274)#283
Merged
Merged
Conversation
commit: |
…ization (#274) Introduce the fixture blocks array type with validation, and normalize loader/factory paths so block-ordered fixtures flow through consistently.
…cross providers (#274) Emit fixture blocks in their declared array order for both streaming and non-streaming paths across the Anthropic, OpenAI, Gemini, Ollama, Responses, and WebSocket providers.
…l-first fixtures (#274) Preserve the observed streaming block order during collapse so the recorder writes tool-first fixtures faithfully.
…nd back-compat (#274) Add and extend tests covering block-ordered replay per provider, recorder capture, end-to-end flow, and backward compatibility with legacy fixtures.
…bservability (#274) Document the fixture blocks array, per-provider ordering behavior, and the observability surface, and record the change in the changelog.
b7bcd59 to
2d132df
Compare
This was referenced Jun 28, 2026
Closed
jpr5
added a commit
that referenced
this pull request
Jun 28, 2026
…-class + docs (#285) ## What Completes the (still-unreleased) #274 `blocks` fixture-ordering feature so it works **uniformly across every provider**, and makes blocks-only fixtures **first-class**. Builds on #283 (blocks v1) and #282 (async video), both already on `main` under `[Unreleased]`. No version bump — stays `1.34.0`. The dedicated `chore: release v1.35.0` PR owns the bump. ## Why #283 shipped `blocks` for a subset of providers and left the record-side capture + several provider builders out of scope, with blocks-only fixtures treated as by-design-unsupported. To ship the feature as *complete* for 1.35.0, it needs to: capture/replay block order on the remaining providers, accept a fixture that is *just* `{ blocks: [...] }`, and document the feature where authors actually look. ## Changes **Feature completion** - **First-class blocks-only fixtures** — a response may be `{ blocks: [...] }` with no `content`/`toolCalls`; the shared recognizer was relaxed *additively* (legacy recognition byte-identical). - **Record side** — block-order capture added to the Cohere, Bedrock (event-stream), and Gemini-Interactions collapsers; `normalizeToolArguments` zero-arg ("{}") fix. Gemini-Interactions is args-only by design (step-index can't reconcile arrival-order blocks). - **Replay builders** — Cohere, Bedrock (invoke), Bedrock-Converse, and Gemini-Interactions now honor `blocks` ordering (streaming + non-streaming where the wire allows). - **toolCall `arguments`** may be a JSON object or a string (objects auto-stringify), consistently across top-level toolCalls and block toolCalls. - **Validation hardening** — `validateBlocks` rejects empty-text blocks and warns on blocks/content divergence. **Completion gaps found + fixed during code review** (each with a local red→green on the real surface) - Ollama **non-streaming** dropped the entire payload for a blocks-only fixture (F0 made it newly reachable) — now backfills content/tool_calls from blocks. - Realtime + Gemini-Live WS surfaces silently dropped blocks-only payloads — now honor blocks. - Programmatic `addFixture` with object block args returned HTTP 500 (bypassed normalize) — `resolveFixtureBlocks` now tolerates object args. - A valid `{ content: "", blocks: [...] }` fixture spuriously hard-errored at load — suppressed when non-empty blocks drive output. - Gemini-Live empty-text block leaked past `truncateAfterChunks` and shifted recorded-timing indices — guard now skips emission cleanly. **Docs** - `docs/fixtures/index.html`: per-provider observability matrix, blocks-only authoring, reasoning cells. - Authoring on-ramps: `write-fixtures` skill/command, `docs/examples` (worked **tool-first** example), `docs/record-replay`, root + pytest READMEs. - A runnable `fixtures/examples/llm/blocks-tool-first.json` with a permanent loadability test. ## Wire limitations (documented, not gaps) OpenAI-chat, Ollama, and Cohere **non-streaming** expose separate content/tool_calls fields, so order is not observable on the wire — payload is delivered, order is a no-op. Captured in the observability matrix. ## Test plan - Full suite: **144 files / 4244 tests** pass - `tsc --noEmit`, `eslint`, `prettier --check`, `build`, `test:exports` (node10/node16-CJS/ESM/bundler) all green - Every behavioral fix verified red→green on the real surface (live endpoint / real WS / real loader), independently re-confirmed in review ## Known non-blocking follow-ups Pre-existing items (predate this branch) and minor polish are tracked but intentionally out of scope: a shared stringify helper to de-dup the three normalize sites, JSON-serializability check for object args at load time, and assorted pre-existing collapser/request-converter edge cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #274.
Adds an optional
blocksarray to the content+toolCalls fixture response so a fixture can express ordered text / tool-call blocks, streamed in array order — enabling tool-call-before-text ("tool-first") and interleaved ordering that the previous{ content, toolCalls }shape (always text-first) could not.What's in it
blocksfixture field ([{ type: "text", text } | { type: "toolCall", name, arguments, id? }]) on the content+toolCalls response, with aresolveFixtureBlocksvalidator and load-time validation in the JSON fixture loader + factory normalization (object args auto-stringified).blocksis present it takes precedence and streams in array order; when absent (or empty[]), the existing legacy path runs byte-for-byte unchanged. Backward-compatible → minor release.blocksonly when the real upstream stream was genuinely tool-first/interleaved (text-first streams keep the legacy shape → existing recordings byte-identical). Collapsedblocksand the flattoolCallsare reconciled (same order/identity,argumentsnormalized to"{}").Per-provider observability (documented honestly)
Anthropic / Responses / Gemini — full tool-first (ordered arrays). Ollama — partial. OpenAI chat-completions — degenerate:
delta.contentanddelta.tool_callsare separate channels with no client-observable interleaving (emitted in order, documented as non-observable).Verification
tsc --noEmitclean.blocksgate, the non-streaming paths, load-time validation timing, collapser ordering/argumentsfidelity — each fixed with a red-green test.Version
Backward-compatible → minor (1.35.0) when released. Per repo convention the version bump + CHANGELOG heading are cut in a dedicated release PR; this PR adds entries under
## [Unreleased]and does not bump the version surfaces.Known follow-ups (not blocking)
normalizeToolArguments) to the Cohere / Bedrock / Gemini-Interactions collapsers (out of this PR's provider scope).responsesUsage0/0/0 for Gemini-style usage; a broken@copilotkit/aimock/serverimport in the responses-api doc;ws-responsesrequest-field whitelist /streamingProfile; stale "usage zeroed" chat doc line.🤖 Generated with Claude Code