Complete #274 blocks feature across all providers + blocks-only first-class + docs#285
Merged
Conversation
commit: |
…tent/toolCalls required) (#274) A fixture authored as `{ "blocks": [ ... ] }` with NO `content` and NO `toolCalls` is now recognized, matched, and streamed in block order. Previously the recognizer required both content + toolCalls, so a blocks-only fixture fell through every guard and the server answered 500. This is a pure RELAXATION of recognition — it recognizes MORE, never reclassifies an existing fixture. Existing `content`, `toolCalls`, and `content`+`toolCalls` fixtures keep byte-identical recognition and streaming. - types: `ContentWithToolCallsResponse.content`/`toolCalls` (and the on-disk `FixtureFileContentWithToolCallsResponse` counterparts) are now optional, so a blocks-only shape type-checks without `as any`. - guard: `isContentWithToolCallsResponse` matches when content+toolCalls are both present (legacy/combined, byte-identical clause) OR a non-empty `blocks` array is present (new). A blocks-only fixture cannot be claimed by an earlier/looser guard (`isAudioResponse` needs `audio`; `isTextResponse` needs string `content` and `!toolCalls`; `isToolCallResponse` needs a `toolCalls` array), so guard order and existing classification are unchanged. - loader: gate the content/toolCalls validation branch on the fields actually being present so a blocks-only fixture loads (blocks validated separately). - resolveFixtureBlocks: return a defensive COPY of the array, not the caller's reference. - handlers: coalesce `content ?? ""` / `toolCalls ?? []` at every `isContentWithToolCallsResponse` call site — a no-op for legacy fixtures (guard guarantees the fields), keeping behavior byte-identical, while letting the blocks-honoring builders serve blocks-only via their existing blocks branch. Builders for the 5 #274 providers are untouched (they already branch on blocks). RED/GREEN: a new e2e test loads a real on-disk blocks-only JSON fixture (`[toolCall, text]`, no content/toolCalls) through the real loader and serves it on Anthropic and Responses; RED before the change (500/no-match), GREEN after (tool_use/function_call streamed before text). Full suite stays green.
…GI collapsers (#274) Extend the #274 OrderAtom + buildOrderedBlocks + normalizeToolArguments pattern to the three remaining collapsers. Cohere v2 SSE: push a text atom on content-delta and a toolCall atom at tool-call-start (referencing the mutated accumulator); at finalize emit ordered blocks for interleaved streams and route flat toolCalls[].arguments through normalizeToolArguments (zero-arg "" -> "{}"). Bedrock EventStream (binary, both Anthropic-native and Converse branches): push text/toolCall atoms at the frame-order sites and add buildOrderedBlocks + normalizeToolArguments. The tool-bearing return previously DROPPED accumulated content; spread it when present so interleaved blocks are persistable (recorder only emits blocks when content+toolCalls coexist) and the text is no longer silently lost. Gemini-Interactions: ARGS-ONLY. Normalize the legacy 1.x string branch and the 2.x final push so zero/whitespace args persist "{}". Block-capture is intentionally omitted: the step-index-sorted finalizer (interleaved with arrival-pushed 1.x calls) cannot be reconciled with arrival-order atoms by identity, so blocks could disagree with the flat toolCalls (F4/F5). A negative test asserts blocks stays undefined for an interleaved stream. Tests: zero-arg -> "{}" for all three; Cohere/Bedrock tool-first / text-after-tool ordering, blocks<->flat consistency, and round-trip JSON validity; text-first stays block-free (byte-identical).
The Cohere v2 content+toolCalls replay builders ignored `response.blocks`
and always emitted text-then-tools. Add a blocks branch (mirroring the
OpenAI/Anthropic/Gemini providers) that, when `blocks` is present, emits
events in the blocks' array order so a tool-first / interleaved fixture
streams its tool call before its text. Cohere v2 SSE events are ordered,
so tool-first is wire-expressible.
The streaming builder emits a single tool-plan-delta before the first
toolCall block, then per-block tool-call / content events in array order.
The non-streaming builder derives content and tool_calls FROM blocks (so a
blocks-only fixture still produces correct output) but makes no ordering
guarantee, because Cohere's non-streaming response keeps text and tool
calls in separate fields — their relative order is not observable.
Legacy `{ content, toolCalls }` fixtures (no blocks) keep the unchanged
text-first path. Wire `response.blocks` from the dispatch site into both
builders.
…der (#274) Converse content is positional/ordered: the non-stream `output.message.content` array and the indexed stream `contentBlockStart`/`contentBlockDelta` events both express tool-first ordering. Branch the content+toolCalls builders on `response.blocks`: when present, emit `text` -> `{text}` and `toolCall` -> `{toolUse}` in the fixture's array order (non-stream array + stream contentBlock events, indices in encounter order after any leading reasoning block); when absent, the legacy text-first path is unchanged. Also satisfies the post-F0 blocks-only shape (no content/toolCalls) since the dispatch falls back to empty content/toolCalls and the blocks path drives output. Adds fixture-blocks-bedrock-converse.test.ts (real LLMock): tool-first emits toolUse before text in the non-stream array and the stream contentBlock events, a blocks-only fixture streams tool-first, and a no-blocks fixture keeps the unchanged legacy text-first ordering.
#274) Bedrock invoke emits Anthropic-style content arrays and binary content_block_* stream events, both positional, so tool-first ordering is wire-expressible. The invoke builders now branch on a non-empty response.blocks: they resolve the ordered blocks and emit text/tool_use in array order (tool-first capable) across BOTH the non-streaming content[] array and the streaming content_block_* events. When blocks is absent or empty, the legacy text-first { content, toolCalls } path is unchanged. Also supports blocks-only fixtures (blocks present, no content/toolCalls): output is emitted purely from the ordered blocks. Bedrock is therefore no longer a scoped-out consumer of ordered blocks; the scoped-out coverage drops its Bedrock case (Cohere and Gemini Interactions remain scoped out) and the new behavior is covered by fixture-blocks-bedrock.test.ts (tool-first NS+stream, blocks-only NS+stream, no-blocks back-compat).
…uilder (#274) The Gemini Interactions SDK 2.x step protocol is index/step-addressed and ordered, so tool-first IS wire-expressible on REPLAY. Branch the combined content+toolCalls builder on a non-empty `response.blocks`: emit one step per block in array order (text -> model_output step, toolCall -> function_call step) for both the non-stream `steps[]` body and the streamed `step.*` brackets. When `blocks` is absent or empty the legacy text-first path runs byte-for-byte unchanged. A non-empty `blocks` array also drives blocks-only fixtures (no content/toolCalls), which now stream and serve tool-first. Adds fixture-blocks-gemini-interactions.test.ts (real LLMock over /v1beta/interactions) covering tool-first ordering (non-stream + streamed), blocks-only tool-first, and no-blocks back-compat. GI is no longer a scoped-out consumer, so its case is removed from fixture-blocks-scoped-out (Bedrock + Cohere remain scoped-out).
…ivergence (#274) Complete the validateBlocks validator for two P2 gaps: - Reject `{type:"text", text:""}` blocks at LOAD time. An empty-text block produces a meaningless/spurious wire chunk on replay; this mirrors the existing content/toolCalls empty-string rejection idiom. - Warn (not error) when a fixture carries BOTH `blocks` AND legacy `content`/`toolCalls` that disagree. Builders stream `blocks` and ignore the redundant legacy fields, so divergence is a silent footgun. Compare concatenated text-block text vs `content`, and ordered toolCall-block names vs `toolCalls` names. Stay silent on the clean blocks-only path. Does not change recognition or break blocks-only / content+toolCalls fixtures.
…g observability (#274) Extend the per-provider block-order observability table to cover Cohere, Bedrock invoke, Bedrock Converse, and Gemini Interactions, splitting the shapes whose ordering is wire-observable (Full) from the separate-field non-streaming shapes where order is not on the wire (Non-observable). Classifications verified against each provider's builder. Add a "Blocks-only fixtures (first-class)" section showing a clean tool-first fixture authored with only a `blocks` array (no content/toolCalls), plus a validation note. Expand the recording note to cover Cohere/Bedrock record-side capture and the Gemini-Interactions args-normalization-only exception. CHANGELOG: add Unreleased/Added entries for blocks-only first-class, replay ordering across Cohere/Bedrock/Bedrock-Converse/Gemini-Interactions, Cohere/Bedrock record-capture, and validateBlocks load-time hardening.
…ks-only payload) (#274) The content+toolCalls handler on the OpenAI Realtime WS surface read only `response.content ?? ""` / `response.toolCalls ?? []` and ignored `response.blocks`. Post-F0, `isContentWithToolCallsResponse` matches a blocks-only fixture, so such a fixture streamed an empty text message and dropped every block — a silent empty payload — and a combined `{content,toolCalls,blocks}` fixture lost block ordering vs the HTTP path. Add a branch-not-replace `streamRealtimeBlocks` path: when `response.blocks` is non-empty, resolve via `resolveFixtureBlocks` and emit one output item per block in array order, each with an incrementing `output_index`. The Realtime protocol sequences output items on the wire with explicit `output_index`, so block order — including tool-before-text — IS observable to a client and is honored. The legacy `{content,toolCalls}` text-first path is unchanged when no blocks are present. Tests: a blocks-only `{blocks:[toolCall,text]}` fixture now streams a non-empty payload (tool item first, then text) instead of timing out on a dropped payload; combined and legacy fixtures keep their text-first output.
…locks-only payload) (#274) The content+toolCalls branch read only `content ?? ""` / `toolCalls ?? []` and ignored `response.blocks`. Post-F0 `isContentWithToolCallsResponse` matches a blocks-only fixture, so on the Gemini Live WS surface a blocks-only fixture streamed an EMPTY payload (silent drop) and combined fixtures lost block ordering. Add a branch-not-replace blocks path: when `response.blocks` is non-empty, resolve via `resolveFixtureBlocks` and emit Gemini Live WS messages in block ARRAY ORDER — a text block becomes one-or-more `serverContent.modelTurn.parts[{text}]` messages, a toolCall block becomes a `toolCall.functionCalls` message — then a terminal `serverContent.turnComplete`. The protocol expresses ordering via sequential messages, so tool-before-text (and any interleaving) is honored, matching the HTTP gemini.ts blocks branch. Conversation history accumulates the equivalent assistant turn. Fixtures without `blocks` keep the unchanged legacy text-first path.
…n unparseable tool args (#274) When a streamed 2.x tool call's accumulated arguments_delta fragments fail JSON.parse at finalize (truncated/interrupted stream), the collapser incremented droppedChunks but still persisted the invalid-JSON string as the tool call's `arguments`. That fixture then fails validateFixtures on reload, whose loader does JSON.parse(tc.arguments). Fall back to "{}" on parse failure so the persisted `arguments` is always valid JSON, consistent with the empty/unusable-args path and the OpenAI/Anthropic sibling collapsers. The droppedChunks accounting and firstDroppedSample warning are unchanged; the valid-args path is unchanged. Update the stale gemini-interactions assertion that codified the old corrupt-persist behavior to assert the "{}" fallback + JSON-parseability.
…l-only blocks (#274) The non-streaming buildCohereContentWithToolCallsResponse unconditionally pushed a { type: "text", text: "" } content entry even when blocks were tool-only (no text block), emitting an empty text entry real Cohere would not produce. It also called resolveFixtureBlocks(blocks) twice. Resolve the blocks once and reuse the result; emit a text content entry only when an actual text block exists. Legacy (no-blocks) and streaming paths unchanged.
… provider matrix (#274) Cohere fully supports reasoning on both record and replay, so its Reasoning cell was inaccurate at the matrix's own per-builder verification framing: - record: collapseCohereSSE captures content-delta thinking blocks into reasoning (src/stream-collapse.ts:1032-1033) - replay: resolveReasoningForModel feeds every Cohere builder, which renders reasoning as a leading text block (src/cohere.ts:296, 416, 503, 631; resolve at 1125, 1187, 1239) Change Cohere from em-dash to Yes. Gemini Interactions is genuinely asymmetric: its collapser assembles thought_summary deltas into reasoning on record (src/stream-collapse.ts:1593-1600), but the replay builders take only content and never re-emit reasoning (src/gemini-interactions.ts has no reasoning path). Represent this as 'Record only' with a footnote rather than a flat em-dash. No version bump (still 1.34.0). Docs-only: no tests run.
…bility prose The observability table classified Ollama (streaming) as "Partial" but the prose vocabulary only defined "Full" and "Non-observable", leaving the term undefined. Ollama streaming is genuinely a third category: chunk arrival order is carried on the wire (not Non-observable), but the structure is not positionally indexed and some clients reassemble positionally, so it is weaker than the Full providers — matching the "PARTIALLY observable" framing already documented in src/ollama.ts. Add the missing one-line definition so every cell value maps to a defined term.
…nt/tool_calls The non-streaming /api/chat builder buildOllamaChatContentWithToolCallsResponse never received response.blocks, so a blocks-only fixture (blocks present, no content/toolCalls) rendered as an empty turn: message.content "" and tool_calls []. Both payloads were silently dropped. Streaming already handled blocks correctly; only the stream:false path regressed (#274 F0). Pass response.blocks into the builder and, when blocks are present, backfill content from concatenated text blocks and tool_calls from toolCall blocks, mirroring the streaming derivation. The non-streaming wire shape has no positional array, so this is order-free as documented in the existing NOTE; ordering semantics are unchanged. Legacy (no-blocks) callers stay byte-identical. Adds non-streaming red-green coverage to fixture-blocks-ollama.test.ts: a blocks-only fixture now backfills content "hi" and a single tool_call, and a legacy no-blocks fixture is verified unchanged.
…cateAfterChunks In the WS gemini-live blocks content+toolCalls path, an empty text block hit a guard that emitted a useless empty modelTurn message then continued WITHOUT chunkIndex++ or interruption.tick(). With truncateAfterChunks set, that empty emission spent no truncate tick, so the following block leaked past the cutoff; the skipped chunkIndex also shifted recordedTimings indexing for every subsequent block. An empty text block carries no wire content, so emit nothing and spend no chunk. Non-empty-block output stays byte-identical; chunk and timing accounting is now correct. Adds a ws-test-client getMessages() snapshot accessor and gemini-live WS tests proving the empty-text block no longer leaks the trailing toolCall under truncateAfterChunks:1, with a non-empty control and the existing back-compat path unchanged.
…n blocks present
Two F0-reachable defects on the `blocks` path:
BUG A (programmatic crash): addFixture/addFixtures/prependFixture store raw
fixtures with no normalizeResponse pass, so a toolCall block with OBJECT
`arguments` reached resolveFixtureBlocks unchanged. That resolver required a
string and threw, so real dispatch returned HTTP >= 500. Make resolveFixtureBlocks
tolerant: stringify object/array `arguments` into a fresh per-block copy
(mirroring normalizeResponse's JSON.stringify) so the programmatic path is safe.
String args stay byte-identical, so the file-load path is unchanged. A missing or
non-string/non-object `arguments` still throws (new message wording).
BUG C (spurious hard error): a fixture { content: "", blocks: [...] } raised a
"content is empty string" hard error at validate (fired in both the isTextResponse
and isContentWithToolCallsResponse branches) even though builders stream `blocks`
and ignore the legacy `content` mirror when a non-empty blocks array is present.
Suppress the empty-content error in both branches when non-empty blocks drive the
output. Fixtures WITHOUT blocks still error.
The empty-TEXT-block rule (validateBlocks rejecting {type:"text",text:""}) is
intentionally strict and is left unchanged.
Adds permanent tests in blocks-fixture-tolerance.test.ts and updates the
resolveFixtureBlocks missing-arguments assertion to the new message.
…erence Add a Blocks (ordered text / tool-call streaming) subsection under Response Types covering the ordered blocks array, precedence over content/toolCalls, blocks-only first-class fixtures, per-entry arguments object/string handling, and a tool-first worked example. Extend the JSON auto-stringify note to cover block arguments. The command file is a symlink to SKILL.md, so both stay byte-identical.
Point users from the recording docs, root README, and pytest README to the full ordered-blocks reference at /fixtures#ordered-blocks, rather than re-documenting it. The recorder captures block order for genuinely tool-first/interleaved streams across OpenAI, Anthropic, Gemini, Ollama, Cohere, and Bedrock; note the Gemini Interactions record-time args-only caveat and link to the per-provider observability matrix.
Authors looking for copy-pasteable examples had no blocks fixture to crib from: the docs examples page only showed content/toolCalls shapes, which cannot express a tool call before text or interleaved ordering. Add a tool-first blocks example to docs/examples/index.html (matching the existing fixture markup and voice, with a short note on why blocks matters and the blocks-only shorthand) and ship a runnable fixtures/examples/llm/blocks-tool-first.json an author can copy verbatim. Cover the shipped fixture with a permanent test that loads it through the real loader (loadFixtureFile), runs validateFixtures, and asserts the resolved blocks are tool-first — so the example cannot silently rot. Verified red on a text-first mutation, green as shipped.
…aming comment resolveFixtureBlocks accepts and stringifies object toolCall `arguments`, but its parameter was typed FixtureBlock[] (arguments: string) so the object-tolerance invariant was type-invisible. Widen the input to the existing on-disk FixtureFileBlock[] form (arguments: string | object | array), which mirrors how the file-form response types relax their input. The RETURN type stays FixtureBlock[] (arguments: string), which callers rely on; the text and string-arguments branches now build freshly-normalized FixtureBlock values so the narrowing is honest with no `as any`. FixtureBlock is a structural subtype of FixtureFileBlock, so all existing callers keep compiling. Also reword the Ollama non-streaming NOTE header: scope the no-op invariant to block ORDER (unobservable on the non-streaming wire) rather than the stale claim that text+tool_calls are kept unchanged — the body now backfills content/tool_calls from blocks, which the body comment already states. Type/comment-only; no runtime behavior change.
70b5891 to
17166ca
Compare
Merged
jpr5
added a commit
that referenced
this pull request
Jun 28, 2026
## Release v1.35.0 Version bump cut off latest `main` (includes #285 blocks-completion and #286 video-testid). ### Version surfaces bumped (1.34.0 → 1.35.0) These five literal build surfaces track the aimock npm release line and were bumped together: 1. `package.json` — `"version"` → `1.35.0` 2. `.claude-plugin/plugin.json` — `"version"` → `1.35.0` 3. `.claude-plugin/marketplace.json` — `plugins[0].source.version` → `^1.35.0` (caret preserved) 4. `charts/aimock/Chart.yaml` — `appVersion` → `"1.35.0"` 5. `packages/aimock-pytest/src/aimock_pytest/_version.py` — `AIMOCK_VERSION` → `"1.35.0"` ### Deliberately NOT changed (independently versioned) Verified against release history — these do **not** track the aimock release line and follow their own cadence, so they are left as-is: - `charts/aimock/Chart.yaml` `version:` stays **`0.1.0`** (Helm chart version — independent of the app it packages) - `packages/aimock-pytest/pyproject.toml` `version` stays **`0.4.0`** (the Python package's own release cadence, distinct from the npm release it pins via `AIMOCK_VERSION`) ### Hygiene fix - `packages/aimock-pytest/README.md` — corrected the stale `--aimock-version` default from `1.33.0` → `1.35.0` to match `AIMOCK_VERSION`. ### CHANGELOG - Renamed the accumulated `## [Unreleased]` section to `## [1.35.0] - 2026-06-27` and inserted a fresh empty `## [Unreleased]` above it. Bullet contents unchanged. ### What's in 1.35.0 - **`blocks` feature completed across all providers** — ordered text/tool-call block streaming now honored on replay across Anthropic, OpenAI (Responses + chat-completions), Gemini, Ollama, Cohere, Bedrock (invoke + Converse), and Gemini Interactions, with record-side block capture where the wire protocol allows (#274). - **Blocks-only fixtures are first-class** — a non-empty `blocks` array is a complete response shape on its own (no `content`/`toolCalls` required); `validateBlocks` rejects malformed arrays at load time (#274). - **Veo / Grok multi-tenant testId isolation** — native Google Veo and xAI Grok Imagine async video lifecycle mocks with record-mode live proxying and per-tenant testId isolation (#278, #282). ### Publishing Publishing is **not** performed here. The release is cut and verified; the actual publish is the manual `npm run release` step, pending maintainer action. ### Verification - `npm run build` — clean (exit 0) - `npx vitest run` — **4250/4250 passed**, 144/144 files (exit 0) - `npm run test:exports` — all 🟢 (node10 / node16 CJS+ESM / bundler, across all entry points) - Type-emitting build clean (`.d.ts` emit succeeded; no separate `tsc` script in this package)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Completes the (still-unreleased) #274
blocksfixture-ordering feature so it works uniformly across every provider, and makes blocks-only fixtures first-class. Builds on #283 (blocks v1) and #282 (async video), both already onmainunder[Unreleased].No version bump — stays
1.34.0. The dedicatedchore: release v1.35.0PR owns the bump.Why
#283 shipped
blocksfor a subset of providers and left the record-side capture + several provider builders out of scope, with blocks-only fixtures treated as by-design-unsupported. To ship the feature as complete for 1.35.0, it needs to: capture/replay block order on the remaining providers, accept a fixture that is just{ blocks: [...] }, and document the feature where authors actually look.Changes
Feature completion
{ blocks: [...] }with nocontent/toolCalls; the shared recognizer was relaxed additively (legacy recognition byte-identical).normalizeToolArgumentszero-arg ("{}") fix. Gemini-Interactions is args-only by design (step-index can't reconcile arrival-order blocks).blocksordering (streaming + non-streaming where the wire allows).argumentsmay be a JSON object or a string (objects auto-stringify), consistently across top-level toolCalls and block toolCalls.validateBlocksrejects empty-text blocks and warns on blocks/content divergence.Completion gaps found + fixed during code review (each with a local red→green on the real surface)
addFixturewith object block args returned HTTP 500 (bypassed normalize) —resolveFixtureBlocksnow tolerates object args.{ content: "", blocks: [...] }fixture spuriously hard-errored at load — suppressed when non-empty blocks drive output.truncateAfterChunksand shifted recorded-timing indices — guard now skips emission cleanly.Docs
docs/fixtures/index.html: per-provider observability matrix, blocks-only authoring, reasoning cells.write-fixturesskill/command,docs/examples(worked tool-first example),docs/record-replay, root + pytest READMEs.fixtures/examples/llm/blocks-tool-first.jsonwith a permanent loadability test.Wire limitations (documented, not gaps)
OpenAI-chat, Ollama, and Cohere non-streaming expose separate content/tool_calls fields, so order is not observable on the wire — payload is delivered, order is a no-op. Captured in the observability matrix.
Test plan
tsc --noEmit,eslint,prettier --check,build,test:exports(node10/node16-CJS/ESM/bundler) all greenKnown non-blocking follow-ups
Pre-existing items (predate this branch) and minor polish are tracked but intentionally out of scope: a shared stringify helper to de-dup the three normalize sites, JSON-serializability check for object args at load time, and assorted pre-existing collapser/request-converter edge cases.