Skip to content

Commit 5d0f9ec

Browse files
committed
fix(showcase/aimock): drop stale turnIndex:0 gate from multi-turn D6 pill fixtures
D6 drives each feature's pills as sequential turns in one chat thread. aimock matches turnIndex against the count of assistant messages in history, so a "turnIndex": 0 gate on a per-pill tool-call leg only matches the FIRST pill — pills 2+ (turnIndex 1/2/3) match no fixture, 503 under strict, and the harness reports "timeout: assistant did not respond". Remove the turnIndex:0 gate from the multi-turn tool-call legs of frontend-tools (sunset/forest/cosmic) and tool-rendering-reasoning-chain (the three chained pills: Compare AAPL/MSFT, compare-to-smaller dice, weather-there flights) so they match on their distinct userMessage substrings + context, mirroring the already-green langgraph-python fixtures. Each pill's substring is unambiguous and its toolCallId-keyed follow-up fixture is ordered before the de-gated first leg (first-match wins), so follow-up narration still wins and there is no cross-matching. The reasoning-chain "Find flights from SFO to JFK." pill keeps its turnIndex:0 gate to match the green langgraph-python reference exactly. Proven locally with aimock --strict: multi-turn requests for these pills returned 503 with the gate (negative control) and now return 200 across all turns with the correct fixture/narration.
1 parent b14ba93 commit 5d0f9ec

16 files changed

Lines changed: 16 additions & 64 deletions

showcase/aimock/d6/agno/frontend-tools.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
"_comment": "frontend-tools 'Sunset' pill — 1st leg: emit change_background tool call only. NB: do NOT include `content` here. agent_framework_openai's ChatCompletions client splits an assistant message that carries BOTH `content` and `tool_calls` into two separate history entries (one with content, one with tool_calls), so on the follow-up leg aimock sees the standalone content message and re-matches THIS fixture by `userMessage` substring → another change_background tool call → infinite loop. The toolCallId-keyed follow-up fixture immediately above provides the post-tool narration text.",
2121
"match": {
2222
"userMessage": "Make the background a sunset gradient",
23-
"turnIndex": 0,
2423
"context": "agno"
2524
},
2625
"response": {
@@ -49,7 +48,6 @@
4948
"_comment": "frontend-tools 'Forest' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
5049
"match": {
5150
"userMessage": "deep green forest gradient",
52-
"turnIndex": 0,
5351
"context": "agno"
5452
},
5553
"response": {
@@ -78,7 +76,6 @@
7876
"_comment": "frontend-tools 'Cosmic' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
7977
"match": {
8078
"userMessage": "navy → magenta cosmic gradient",
81-
"turnIndex": 0,
8279
"context": "agno"
8380
},
8481
"response": {
@@ -94,4 +91,4 @@
9491
}
9592
}
9693
]
97-
}
94+
}

showcase/aimock/d6/agno/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "agno"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "agno"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "agno"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

showcase/aimock/d6/built-in-agent/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "built-in-agent"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "built-in-agent"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "built-in-agent"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

showcase/aimock/d6/claude-sdk-typescript/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "claude-sdk-typescript"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "claude-sdk-typescript"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "claude-sdk-typescript"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

showcase/aimock/d6/crewai-crews/frontend-tools.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
"_comment": "frontend-tools 'Sunset' pill — 1st leg: emit change_background tool call only. NB: do NOT include `content` here. agent_framework_openai's ChatCompletions client splits an assistant message that carries BOTH `content` and `tool_calls` into two separate history entries (one with content, one with tool_calls), so on the follow-up leg aimock sees the standalone content message and re-matches THIS fixture by `userMessage` substring → another change_background tool call → infinite loop. The toolCallId-keyed follow-up fixture immediately above provides the post-tool narration text.",
2121
"match": {
2222
"userMessage": "Make the background a sunset gradient",
23-
"turnIndex": 0,
2423
"context": "crewai-crews"
2524
},
2625
"response": {
@@ -49,7 +48,6 @@
4948
"_comment": "frontend-tools 'Forest' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
5049
"match": {
5150
"userMessage": "deep green forest gradient",
52-
"turnIndex": 0,
5351
"context": "crewai-crews"
5452
},
5553
"response": {
@@ -78,7 +76,6 @@
7876
"_comment": "frontend-tools 'Cosmic' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
7977
"match": {
8078
"userMessage": "navy → magenta cosmic gradient",
81-
"turnIndex": 0,
8279
"context": "crewai-crews"
8380
},
8481
"response": {
@@ -94,4 +91,4 @@
9491
}
9592
}
9693
]
97-
}
94+
}

showcase/aimock/d6/crewai-crews/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "crewai-crews"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "crewai-crews"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "crewai-crews"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

showcase/aimock/d6/langgraph-fastapi/frontend-tools.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
"_comment": "frontend-tools 'Sunset' pill — 1st leg: emit change_background tool call only. NB: do NOT include `content` here. agent_framework_openai's ChatCompletions client splits an assistant message that carries BOTH `content` and `tool_calls` into two separate history entries (one with content, one with tool_calls), so on the follow-up leg aimock sees the standalone content message and re-matches THIS fixture by `userMessage` substring → another change_background tool call → infinite loop. The toolCallId-keyed follow-up fixture immediately above provides the post-tool narration text.",
2121
"match": {
2222
"userMessage": "Make the background a sunset gradient",
23-
"turnIndex": 0,
2423
"context": "langgraph-fastapi"
2524
},
2625
"response": {
@@ -49,7 +48,6 @@
4948
"_comment": "frontend-tools 'Forest' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
5049
"match": {
5150
"userMessage": "deep green forest gradient",
52-
"turnIndex": 0,
5351
"context": "langgraph-fastapi"
5452
},
5553
"response": {
@@ -78,7 +76,6 @@
7876
"_comment": "frontend-tools 'Cosmic' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
7977
"match": {
8078
"userMessage": "navy → magenta cosmic gradient",
81-
"turnIndex": 0,
8279
"context": "langgraph-fastapi"
8380
},
8481
"response": {
@@ -94,4 +91,4 @@
9491
}
9592
}
9693
]
97-
}
94+
}

showcase/aimock/d6/langgraph-fastapi/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "langgraph-fastapi"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "langgraph-fastapi"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "langgraph-fastapi"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

showcase/aimock/d6/langroid/frontend-tools.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
"_comment": "frontend-tools 'Sunset' pill — 1st leg: emit change_background tool call only. NB: do NOT include `content` here. agent_framework_openai's ChatCompletions client splits an assistant message that carries BOTH `content` and `tool_calls` into two separate history entries (one with content, one with tool_calls), so on the follow-up leg aimock sees the standalone content message and re-matches THIS fixture by `userMessage` substring → another change_background tool call → infinite loop. The toolCallId-keyed follow-up fixture immediately above provides the post-tool narration text.",
2121
"match": {
2222
"userMessage": "Make the background a sunset gradient",
23-
"turnIndex": 0,
2423
"context": "langroid"
2524
},
2625
"response": {
@@ -49,7 +48,6 @@
4948
"_comment": "frontend-tools 'Forest' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
5049
"match": {
5150
"userMessage": "deep green forest gradient",
52-
"turnIndex": 0,
5351
"context": "langroid"
5452
},
5553
"response": {
@@ -78,7 +76,6 @@
7876
"_comment": "frontend-tools 'Cosmic' pill — 1st leg: emit change_background tool call only. See the Sunset fixture above for why we drop `content`.",
7977
"match": {
8078
"userMessage": "navy → magenta cosmic gradient",
81-
"turnIndex": 0,
8279
"context": "langroid"
8380
},
8481
"response": {
@@ -94,4 +91,4 @@
9491
}
9592
}
9693
]
97-
}
94+
}

showcase/aimock/d6/langroid/tool-rendering-reasoning-chain.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
"_comment": "tool-rendering-reasoning-chain pill 1 (stocks) — first leg: emit get_stock_price(AAPL). The `Compare AAPL and MSFT stocks` substring is unique to this demo's pill prompt — other integrations' reasoning-chain demos still send the older `How is AAPL doing?` prompt — so we don't need a tool-name gate and the fixture stays scoped to langgraph-python without affecting fleet-wide aimock traffic. MUST appear before the bare 'AAPL' fixtures later in this file.",
4141
"match": {
4242
"userMessage": "Compare AAPL and MSFT stocks",
43-
"turnIndex": 0,
4443
"context": "langroid"
4544
},
4645
"response": {
@@ -89,7 +88,6 @@
8988
"_comment": "tool-rendering-reasoning-chain pill 2 (dice) — first leg: emit roll_dice(sides=20). Unique substring 'compare it to a smaller one' keeps us scoped to this demo (other integrations' dice pills still send 'Roll a 20-sided die for me.' and fall through to the older 5x roll_d20 fixtures further down). NB: aimock's `toolName` gate is a tool-LIST gate, not a tool-CALL gate; we don't use it here because the reasoning-chain agents across the fleet all register `roll_dice`, so a `toolName: roll_dice` claim would NOT have distinguished this demo from the others.",
9089
"match": {
9190
"userMessage": "compare it to a smaller one",
92-
"turnIndex": 0,
9391
"context": "langroid"
9492
},
9593
"response": {
@@ -138,7 +136,6 @@
138136
"_comment": "tool-rendering-reasoning-chain pill 3 (flights + destination weather) — first leg: emit search_flights(SFO,JFK). The 'show me the weather there' substring is unique to this demo's pill, so the basic tool-rendering demo's bare 'Find flights from SFO to JFK.' prompt (and the equivalent prompts in other integrations' reasoning-chain demos that have not yet been ported to the chained phrasing) flow through to the basic single-tool flights fixture later in the file.",
139137
"match": {
140138
"userMessage": "show me the weather there",
141-
"turnIndex": 0,
142139
"context": "langroid"
143140
},
144141
"response": {
@@ -190,4 +187,4 @@
190187
}
191188
}
192189
]
193-
}
190+
}

0 commit comments

Comments
 (0)