Skip to content

feat(fixtures): blocks array for tool-first / interleaved ordering across providers (#274)#283

Merged
jpr5 merged 5 commits into
mainfrom
feat/fixture-block-ordering
Jun 28, 2026
Merged

feat(fixtures): blocks array for tool-first / interleaved ordering across providers (#274)#283
jpr5 merged 5 commits into
mainfrom
feat/fixture-block-ordering

Conversation

@jpr5

@jpr5 jpr5 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Closes #274.

Adds an optional blocks array to the content+toolCalls fixture response so a fixture can express ordered text / tool-call blocks, streamed in array order — enabling tool-call-before-text ("tool-first") and interleaved ordering that the previous { content, toolCalls } shape (always text-first) could not.

What's in it

  • blocks fixture field ([{ type: "text", text } | { type: "toolCall", name, arguments, id? }]) on the content+toolCalls response, with a resolveFixtureBlocks validator and load-time validation in the JSON fixture loader + factory normalization (object args auto-stringified).
  • Branch-not-replace: when blocks is present it takes precedence and streams in array order; when absent (or empty []), the existing legacy path runs byte-for-byte unchanged. Backward-compatible → minor release.
  • All five providers, streaming AND non-streaming: Anthropic, OpenAI chat, Gemini, Ollama, Responses — plus the WebSocket Responses surface.
  • Recorder: stream collapsers capture cross-channel block order and persist blocks only when the real upstream stream was genuinely tool-first/interleaved (text-first streams keep the legacy shape → existing recordings byte-identical). Collapsed blocks and the flat toolCalls are reconciled (same order/identity, arguments normalized to "{}").

Per-provider observability (documented honestly)

Anthropic / Responses / Gemini — full tool-first (ordered arrays). Ollama — partial. OpenAI chat-completions — degenerate: delta.content and delta.tool_calls are separate channels with no client-observable interleaving (emitted in order, documented as non-observable).

Verification

  • Full suite 4125 passing; build + exports + lint + format + tsc --noEmit clean.
  • Red-green tests per builder (streaming + non-streaming), the collapser/recorder, the loader, plus e2e (JSON fixture → tool-first wire), scoped-out-consumer safety, and legacy back-compat.
  • Reviewed via a 3-round, 11-agent-per-round CR loop converged to zero (Procedure 3 bucket-c audit: 0 promotions). Round 1 caught real gaps — empty-blocks gate, the non-streaming paths, load-time validation timing, collapser ordering/arguments fidelity — each fixed with a red-green test.

Version

Backward-compatible → minor (1.35.0) when released. Per repo convention the version bump + CHANGELOG heading are cut in a dedicated release PR; this PR adds entries under ## [Unreleased] and does not bump the version surfaces.

Known follow-ups (not blocking)

  • Extend block capture (and zero-arg normalizeToolArguments) to the Cohere / Bedrock / Gemini-Interactions collapsers (out of this PR's provider scope).
  • Pre-existing, unrelated: responsesUsage 0/0/0 for Gemini-style usage; a broken @copilotkit/aimock/server import in the responses-api doc; ws-responses request-field whitelist / streamingProfile; stale "usage zeroed" chat doc line.

🤖 Generated with Claude Code

@pkg-pr-new

pkg-pr-new Bot commented Jun 27, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@283

commit: 2d132df

jpr5 added 5 commits June 27, 2026 21:25
…ization (#274)

Introduce the fixture blocks array type with validation, and normalize
loader/factory paths so block-ordered fixtures flow through consistently.
…cross providers (#274)

Emit fixture blocks in their declared array order for both streaming and
non-streaming paths across the Anthropic, OpenAI, Gemini, Ollama, Responses,
and WebSocket providers.
…l-first fixtures (#274)

Preserve the observed streaming block order during collapse so the recorder
writes tool-first fixtures faithfully.
…nd back-compat (#274)

Add and extend tests covering block-ordered replay per provider, recorder
capture, end-to-end flow, and backward compatibility with legacy fixtures.
…bservability (#274)

Document the fixture blocks array, per-provider ordering behavior, and the
observability surface, and record the change in the changelog.
@jpr5 jpr5 force-pushed the feat/fixture-block-ordering branch from b7bcd59 to 2d132df Compare June 28, 2026 04:27
@jpr5 jpr5 merged commit 36abb2d into main Jun 28, 2026
23 checks passed
@jpr5 jpr5 deleted the feat/fixture-block-ordering branch June 28, 2026 04:30
jpr5 added a commit that referenced this pull request Jun 28, 2026
…-class + docs (#285)

## What

Completes the (still-unreleased) #274 `blocks` fixture-ordering feature
so it works **uniformly across every provider**, and makes blocks-only
fixtures **first-class**. Builds on #283 (blocks v1) and #282 (async
video), both already on `main` under `[Unreleased]`.

No version bump — stays `1.34.0`. The dedicated `chore: release v1.35.0`
PR owns the bump.

## Why

#283 shipped `blocks` for a subset of providers and left the record-side
capture + several provider builders out of scope, with blocks-only
fixtures treated as by-design-unsupported. To ship the feature as
*complete* for 1.35.0, it needs to: capture/replay block order on the
remaining providers, accept a fixture that is *just* `{ blocks: [...]
}`, and document the feature where authors actually look.

## Changes

**Feature completion**
- **First-class blocks-only fixtures** — a response may be `{ blocks:
[...] }` with no `content`/`toolCalls`; the shared recognizer was
relaxed *additively* (legacy recognition byte-identical).
- **Record side** — block-order capture added to the Cohere, Bedrock
(event-stream), and Gemini-Interactions collapsers;
`normalizeToolArguments` zero-arg ("{}") fix. Gemini-Interactions is
args-only by design (step-index can't reconcile arrival-order blocks).
- **Replay builders** — Cohere, Bedrock (invoke), Bedrock-Converse, and
Gemini-Interactions now honor `blocks` ordering (streaming +
non-streaming where the wire allows).
- **toolCall `arguments`** may be a JSON object or a string (objects
auto-stringify), consistently across top-level toolCalls and block
toolCalls.
- **Validation hardening** — `validateBlocks` rejects empty-text blocks
and warns on blocks/content divergence.

**Completion gaps found + fixed during code review** (each with a local
red→green on the real surface)
- Ollama **non-streaming** dropped the entire payload for a blocks-only
fixture (F0 made it newly reachable) — now backfills content/tool_calls
from blocks.
- Realtime + Gemini-Live WS surfaces silently dropped blocks-only
payloads — now honor blocks.
- Programmatic `addFixture` with object block args returned HTTP 500
(bypassed normalize) — `resolveFixtureBlocks` now tolerates object args.
- A valid `{ content: "", blocks: [...] }` fixture spuriously
hard-errored at load — suppressed when non-empty blocks drive output.
- Gemini-Live empty-text block leaked past `truncateAfterChunks` and
shifted recorded-timing indices — guard now skips emission cleanly.

**Docs**
- `docs/fixtures/index.html`: per-provider observability matrix,
blocks-only authoring, reasoning cells.
- Authoring on-ramps: `write-fixtures` skill/command, `docs/examples`
(worked **tool-first** example), `docs/record-replay`, root + pytest
READMEs.
- A runnable `fixtures/examples/llm/blocks-tool-first.json` with a
permanent loadability test.

## Wire limitations (documented, not gaps)

OpenAI-chat, Ollama, and Cohere **non-streaming** expose separate
content/tool_calls fields, so order is not observable on the wire —
payload is delivered, order is a no-op. Captured in the observability
matrix.

## Test plan

- Full suite: **144 files / 4244 tests** pass
- `tsc --noEmit`, `eslint`, `prettier --check`, `build`, `test:exports`
(node10/node16-CJS/ESM/bundler) all green
- Every behavioral fix verified red→green on the real surface (live
endpoint / real WS / real loader), independently re-confirmed in review

## Known non-blocking follow-ups

Pre-existing items (predate this branch) and minor polish are tracked
but intentionally out of scope: a shared stringify helper to de-dup the
three normalize sites, JSON-serializability check for object args at
load time, and assorted pre-existing collapser/request-converter edge
cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fixtures can't express tool-call-before-text ("tool-first") ordering; recorder collapses block order

1 participant