Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f3ff455
feat(fixtures): recognize blocks-only fixtures as first-class (no con…
jpr5 Jun 28, 2026
d7c4409
feat(record): capture block order + normalize args in Cohere/Bedrock/…
jpr5 Jun 28, 2026
0fdf05f
feat(fixtures): honor blocks ordering in Cohere replay builder (#274)
jpr5 Jun 28, 2026
3f21f6e
feat(fixtures): honor blocks ordering in Bedrock-Converse replay buil…
jpr5 Jun 28, 2026
341d405
feat(fixtures): honor blocks ordering in Bedrock invoke replay builde…
jpr5 Jun 28, 2026
651c2cc
feat(fixtures): honor blocks ordering in Gemini-Interactions replay b…
jpr5 Jun 28, 2026
c948ace
feat(fixtures): reject empty-text blocks and warn on blocks/content d…
jpr5 Jun 28, 2026
7d4bce7
docs(fixtures): document blocks-only authoring + per-provider orderin…
jpr5 Jun 28, 2026
2d60e3f
docs(fixtures): document empty-text rejection + blocks/content diverg…
jpr5 Jun 28, 2026
2a66b0a
fix(fixtures): honor blocks on the Realtime WS surface (no empty bloc…
jpr5 Jun 28, 2026
dfd765d
fix(fixtures): honor blocks on the Gemini Live WS surface (no empty b…
jpr5 Jun 28, 2026
a9a7c90
fix(record): Gemini-Interactions collapser falls back to valid JSON o…
jpr5 Jun 28, 2026
4ec104c
fix(fixtures): Cohere non-streaming omits spurious empty text for too…
jpr5 Jun 28, 2026
1d4c5b2
docs(fixtures): correct Cohere/Gemini-Interactions reasoning cells in…
jpr5 Jun 28, 2026
132d86a
docs(fixtures): define orphaned "Partial" block-order term in observa…
jpr5 Jun 28, 2026
230511b
fix(fixtures): Ollama non-streaming blocks-only no longer drops conte…
jpr5 Jun 28, 2026
070973b
fix(fixtures): gemini-live empty-text block no longer leaks past trun…
jpr5 Jun 28, 2026
967169f
fix(fixtures): tolerate object toolCall args; allow empty content whe…
jpr5 Jun 28, 2026
a8c13aa
docs: document blocks fixture feature in write-fixtures authoring ref…
jpr5 Jun 28, 2026
0a63cad
docs: add discoverability pointers for ordered blocks feature
jpr5 Jun 28, 2026
ca71318
docs: add worked tool-first blocks example and loadable example fixture
jpr5 Jun 28, 2026
17166ca
refactor(fixtures): tidy blocks-path type honesty and Ollama non-stre…
jpr5 Jun 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
- Optional `blocks` array on the combined `content` + `toolCalls` fixture shape lets a fixture express ordered text/tool-call blocks (`{type:"text",text}` | `{type:"toolCall",name,arguments,id?}`); when present it takes precedence over `{content, toolCalls}` for stream order, enabling tool-first and interleaved ordering. Legacy `{content, toolCalls}` fixtures are unchanged (#274)
- All five providers stream combined responses in fixture block order: Anthropic, OpenAI Responses, and Gemini are fully observable; Ollama is best-effort (clients may reassemble positionally); OpenAI chat-completions emits in order but is degenerate (`delta.content`/`delta.tool_calls` are separate channels the client merges) (#274)
- Recorder captures block order and persists `blocks` only when the recorded upstream stream was genuinely tool-first or interleaved; text-first streams keep the legacy `{content, toolCalls}` shape so golden recordings round-trip byte-identically (#274)
- Blocks-only fixtures are first-class: a non-empty `blocks` array is a complete response shape on its own, with no `content`/`toolCalls` required — builders derive the aggregate from the blocks and `validateFixtures()` accepts the shape (#274)
- Block ordering is now honored on replay across the remaining providers — Cohere (streaming), Bedrock invoke, Bedrock Converse, and Gemini Interactions — so a tool-first or interleaved fixture streams its tool call ahead of its text wherever the wire protocol can express it (#274)
- Record-side block capture extends to the Cohere and Bedrock collapsers; Gemini Interactions normalizes tool-call arguments only and does not reorder blocks on capture (its step-index protocol can't reconcile arrival-order blocks at record time), while replay still honors a hand-authored `blocks` array (#274)
- `validateBlocks` rejects a malformed `blocks` array at load time — non-array, non-object entries, a `type` other than `text`/`toolCall`, a non-string or empty-string text block, or a `toolCall` block missing a name or carrying non-JSON arguments — and warns when a fixture carries both `blocks` and divergent `content`/`toolCalls`, so a bad array never reaches a builder mid-dispatch (#274)

## [1.34.0] - 2026-06-24

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or
- **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
- **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching
- **[Response Overrides](https://aimock.copilotkit.dev/fixtures)** — Control `id`, `model`, `usage`, `finishReason` in fixture responses
- **[Ordered Blocks](https://aimock.copilotkit.dev/fixtures#ordered-blocks)** — A `blocks` array streams text and tool calls in any order (tool-first or interleaved); blocks-only fixtures are first-class, and the recorder captures order from genuinely tool-first/interleaved streams
- **[Streaming Usage Chunks](https://aimock.copilotkit.dev/streaming-physics)** — `stream_options.include_usage` support emits a final chunk with token counts, matching OpenAI's streaming usage protocol
- **[Rate Limiting Headers](https://aimock.copilotkit.dev/chaos-testing)** — `x-ratelimit-*` headers on every response and `Retry-After` on 429 errors for testing retry/backoff logic
- **Zero dependencies** — Everything from Node.js builtins
Expand Down
27 changes: 27 additions & 0 deletions docs/examples/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,33 @@ <h3>Sequential Responses</h3>
}</code></pre>
</div>

<h3>Tool-First Blocks</h3>
<p>
Plain <code>content</code> + <code>toolCalls</code> can't express a tool call
<em>before</em> text, or text and tools interleaved in a specific order &mdash; the
builders always emit text first. Use <code>blocks</code> to pin the exact emission order.
A fixture can be <strong>blocks-only</strong> (<code>response: { blocks: [...] }</code>
with no <code>content</code>/<code>toolCalls</code>), as shown here.
</p>
<div class="code-block">
<div class="code-block-header">
fixtures/examples/llm/blocks-tool-first.json <span class="lang-tag">json</span>
</div>
<pre><code>{
<span class="key">"fixtures"</span>: [
{
<span class="key">"match"</span>: { <span class="key">"userMessage"</span>: <span class="str">"what's the weather in NYC?"</span> },
<span class="key">"response"</span>: {
<span class="key">"blocks"</span>: [
{ <span class="key">"type"</span>: <span class="str">"toolCall"</span>, <span class="key">"name"</span>: <span class="str">"get_weather"</span>, <span class="key">"arguments"</span>: <span class="str">"{\"city\": \"NYC\"}"</span> },
{ <span class="key">"type"</span>: <span class="str">"text"</span>, <span class="key">"text"</span>: <span class="str">"Let me check the weather in NYC for you."</span> }
]
}
}
]
}</code></pre>
</div>

<!-- ─── Protocol Configs ───────────────────────────────────── -->

<h2>Protocol Configs</h2>
Expand Down
155 changes: 132 additions & 23 deletions docs/fixtures/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -384,41 +384,76 @@ <h2 id="ordered-blocks">Ordered blocks (tool-first &amp; interleaved streaming)<
<p>
When <code>blocks</code> is present it <strong>takes precedence</strong> over the
<code>content</code> and <code>toolCalls</code> fields for stream ordering: the blocks are
streamed in array order. (Keep <code>content</code> and <code>toolCalls</code> populated
as well &mdash; they remain the canonical aggregate for replay and for consumers that do
not read <code>blocks</code>.) When <code>blocks</code> is <strong>absent</strong>, legacy
streamed in array order. When <code>blocks</code> is <strong>absent</strong>, legacy
<code>{ content, toolCalls }</code> fixtures stream exactly as before &mdash; text-first,
byte-identical to prior releases. The field is purely additive.
</p>

<h3 id="blocks-only">Blocks-only fixtures (first-class)</h3>
<p>
A fixture can be written with <strong>only</strong> a <code>blocks</code> array &mdash; no
<code>content</code> or <code>toolCalls</code> needed. A non-empty
<code>blocks</code> array is a first-class response shape: the builders derive the
aggregate text and tool calls from the blocks themselves, and
<code>validateFixtures()</code> accepts it without requiring the legacy fields. This is
the cleanest way to author a tool-first or interleaved response &mdash; you express the
order once, in one place, with no duplicated aggregate to keep in sync.
</p>
<div class="code-block">
<div class="code-block-header">tool-first.json <span class="lang-tag">json</span></div>
<pre><code>{
<span class="prop">"content"</span>: <span class="str">"Here is the weather."</span>,
<span class="prop">"toolCalls"</span>: [
{ <span class="prop">"name"</span>: <span class="str">"get_weather"</span>, <span class="prop">"arguments"</span>: { <span class="prop">"city"</span>: <span class="str">"SF"</span> } }
],
<span class="prop">"blocks"</span>: [
{ <span class="prop">"type"</span>: <span class="str">"toolCall"</span>, <span class="prop">"name"</span>: <span class="str">"get_weather"</span>, <span class="prop">"arguments"</span>: { <span class="prop">"city"</span>: <span class="str">"SF"</span> }, <span class="prop">"id"</span>: <span class="str">"call_1"</span> },
{ <span class="prop">"type"</span>: <span class="str">"text"</span>, <span class="prop">"text"</span>: <span class="str">"Here is the weather."</span> }
]
}</code></pre>
</div>
<p>
The example above streams the <code>get_weather</code> tool call <em>before</em> the text.
For an interleaved stream, list blocks in the desired order, e.g.
<code>[toolCall, text, toolCall]</code>.
The example above streams the <code>get_weather</code> tool call <em>before</em> the text,
with no separate <code>content</code> / <code>toolCalls</code> fields. For an interleaved
stream, list blocks in the desired order, e.g. <code>[toolCall, text, toolCall]</code>.
</p>
<p>
You may still supply <code>content</code> and <code>toolCalls</code> alongside
<code>blocks</code> if you want an explicit aggregate &mdash; for example to assert a
specific merged shape independently of the order. Both forms are supported;
<code>blocks</code> always wins for stream ordering.
</p>
<div class="info-box">
<p>
<strong>Validation:</strong> <code>validateFixtures()</code> checks a
<code>blocks</code>
array at load time so a malformed array is rejected before it reaches a builder &mdash;
<code>blocks</code> must be an array; each entry must be an object with
<code>type</code> <code>"text"</code> or <code>"toolCall"</code>; a
<code>text</code> block needs a non-empty string <code>text</code>; a
<code>toolCall</code> block needs a non-empty <code>name</code>,
<code>arguments</code> that are a valid-JSON string or an object, and an optional string
<code>id</code>. If a fixture carries both <code>blocks</code> and legacy
<code>content</code>/<code>toolCalls</code> that disagree, loading warns (the redundant
legacy fields are ignored in favor of <code>blocks</code>).
</p>
</div>

<h3>Per-provider observability</h3>
<p>
How faithfully &ldquo;tool-first&rdquo; / interleaved order is observable depends on each
provider's wire protocol. The mock always emits chunks in block order; what a client can
<em>reconstruct</em> from those chunks varies:
provider's wire protocol &mdash; and, for some providers, on whether the request is
streaming. The mock always emits in block order; what a client can
<em>reconstruct</em> from the result varies. A shape is <strong>Full</strong> when the
wire carries the blocks in a single positionally-ordered structure (indexed content
blocks, ordered <code>output</code> items, ordered steps); it is
<strong>Non-observable</strong> when text and tool calls land in
<em>separate</em> top-level fields that the client merges without a shared order. It is
<strong>Partial</strong> when block order <em>is</em> carried on the wire (chunk arrival
order) but the structure is not positionally indexed, so some clients reassemble
positionally rather than honoring arrival order &mdash; observable best-effort, not
guaranteed. The classifications below were verified against each provider's builder.
</p>
<table class="endpoint-table">
<thead>
<tr>
<th>Provider</th>
<th>Provider / shape</th>
<th>Block-order support</th>
<th>Notes</th>
</tr>
Expand All @@ -429,7 +464,8 @@ <h3>Per-provider observability</h3>
<td>Full</td>
<td>
Typed <code>text</code> / <code>tool_use</code> content blocks at incrementing
indices &mdash; tool-first and interleaved are natively observable.
indices &mdash; tool-first and interleaved are natively observable, streaming and
non-streaming alike.
</td>
</tr>
<tr>
Expand All @@ -450,7 +486,45 @@ <h3>Per-provider observability</h3>
</td>
</tr>
<tr>
<td>Ollama</td>
<td>Gemini Interactions (replay)</td>
<td>Full</td>
<td>
One step per block in array order &mdash; a <code>function_call</code> step takes a
lower <code>index</code> than a later <code>model_output</code> step, streaming
(<code>step.*</code> events) and non-streaming (<code>steps[]</code>) alike. Record
side is args-normalization only &mdash; see the note below.
</td>
</tr>
<tr>
<td>Bedrock invoke</td>
<td>Full</td>
<td>
Mirrors the Anthropic Messages content array: ordered
<code>text</code> / <code>tool_use</code> entries non-streaming, indexed
<code>content_block_*</code> events streaming &mdash; tool-first is wire-expressible
on both.
</td>
</tr>
<tr>
<td>Bedrock Converse</td>
<td>Full</td>
<td>
Positional <code>content[]</code> blocks non-streaming, indexed
<code>contentBlock*</code> events (carrying <code>contentBlockIndex</code>)
streaming &mdash; a <code>toolUse</code> can precede the text on both.
</td>
</tr>
<tr>
<td>Cohere (streaming)</td>
<td>Full</td>
<td>
SSE emits <code>content-*</code> and <code>tool-call-*</code> events in block array
order, each carrying an <code>index</code> &mdash; tool-first / interleaved is
observable on the stream.
</td>
</tr>
<tr>
<td>Ollama (streaming)</td>
<td>Partial</td>
<td>
A <code>tool_calls</code> chunk can be emitted before content on the wire, but some
Expand All @@ -459,12 +533,33 @@ <h3>Per-provider observability</h3>
</tr>
<tr>
<td>OpenAI chat-completions</td>
<td>Degenerate</td>
<td>Non-observable</td>
<td>
<code>delta.content</code> and <code>delta.tool_calls</code> (streaming), or
<code>message.content</code> and <code>message.tool_calls</code> (non-streaming),
are separate channels/fields the client merges. The mock emits in block order and
the streamed wire order is assertable, but the merge is <em>not</em> positionally
interleaved, so tool-first is not semantically observable to clients on this
channel.
</td>
</tr>
<tr>
<td>Cohere (non-streaming)</td>
<td>Non-observable</td>
<td>
<code>delta.content</code> and <code>delta.tool_calls</code> are separate channels
the client merges. The mock emits chunks in block order (and the wire order is
assertable), but the merge is <em>not</em> positionally interleaved, so tool-first
is not semantically observable to clients on this channel.
The non-streaming body keeps text in <code>message.content[]</code> and tool calls
in the separate <code>message.tool_calls[]</code> field &mdash; the relative order
of a text vs. a toolCall block is not on the wire. Use the streaming shape when
order matters.
</td>
</tr>
<tr>
<td>Ollama (non-streaming)</td>
<td>Non-observable</td>
<td>
The aggregated reply carries <code>message.content</code> and
<code>message.tool_calls</code> as separate fields &mdash; no positional ordering
between a text and a toolCall block. Use the streaming shape when order matters.
</td>
</tr>
</tbody>
Expand All @@ -476,7 +571,16 @@ <h3>Per-provider observability</h3>
<em>genuinely</em> tool-first or interleaved (a tool-call delta arrives before the first
content delta, or content arrives after a tool-call delta). Ordinary text-then-tools
streams are saved in the legacy <code>{ content, toolCalls }</code> shape with no
<code>blocks</code> key, so existing golden recordings round-trip byte-identically.
<code>blocks</code> key, so existing golden recordings round-trip byte-identically. The
Cohere and Bedrock collapsers capture block order this way alongside the original
providers.
</p>
<p>
<strong>Gemini Interactions</strong> is the exception: its record-side collapser
normalizes tool-call arguments only and does not reorder blocks on capture &mdash; its
step-index protocol can't reconcile arrival-order blocks at record time. Ordering is
still honored on <em>replay</em> from a hand-authored <code>blocks</code> fixture; it is
simply not reconstructed automatically from a recording.
</p>
</div>

Expand Down Expand Up @@ -789,12 +893,12 @@ <h2>Provider Support Matrix</h2>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>&mdash;</td>
<td>Record only<sup>&dagger;</sup></td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>&mdash;</td>
</tr>
<tr>
<td>Web Searches</td>
Expand Down Expand Up @@ -828,6 +932,11 @@ <h2>Provider Support Matrix</h2>
<sup>*</sup> Azure inherits OpenAI&rsquo;s override support because Azure OpenAI routes
through the OpenAI Chat Completions response format internally.
</p>
<p class="footnote">
<sup>&dagger;</sup> Gemini Interactions captures reasoning on record (its collapser
assembles <code>thought_summary</code> deltas into <code>reasoning</code>), but its replay
builders do not re-emit reasoning, so a replayed turn carries none.
</p>
</main>
<aside class="page-toc" id="page-toc"></aside>
</div>
Expand Down
19 changes: 19 additions & 0 deletions docs/record-replay/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,25 @@ <h2>Stream Collapsing</h2>
simple <code>{ content }</code> or <code>{ toolCalls }</code> fixture response.
</p>

<h2 id="recording-block-order">Recording Block Order</h2>
<p>
When a recorded stream is <em>genuinely</em> tool-first or interleaved &mdash; a tool-call
delta arrives before the first content delta, or content arrives after a tool-call delta
&mdash; the collapser preserves that arrival order as a
<a href="/fixtures#ordered-blocks"><code>blocks</code></a> array on the fixture. This
works across OpenAI, Anthropic, Gemini, Ollama, Cohere, and Bedrock. Ordinary
text-then-tools streams are saved in the legacy <code>{ content, toolCalls }</code> shape
with no <code>blocks</code> key, so existing recordings round-trip byte-identically.
</p>
<p>
<strong>Gemini Interactions is the exception:</strong> its record-side collapser
normalizes tool-call arguments only and does not reorder blocks on capture. Ordering is
still honored on replay from a hand-authored <code>blocks</code> fixture; it is simply not
reconstructed automatically from a recording. See the
<a href="/fixtures#ordered-blocks">per-provider observability matrix</a> for how
faithfully block order is reconstructable on each provider's wire.
</p>

<h2>Header Forwarding</h2>
<p>
When proxying to upstream providers, aimock forwards the original request's headers except
Expand Down
13 changes: 13 additions & 0 deletions fixtures/examples/llm/blocks-tool-first.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"fixtures": [
{
"match": { "userMessage": "what's the weather in NYC?" },
"response": {
"blocks": [
{ "type": "toolCall", "name": "get_weather", "arguments": "{\"city\": \"NYC\"}" },
{ "type": "text", "text": "Let me check the weather in NYC for you." }
]
}
}
]
}
Loading
Loading