forked from CopilotKit/CopilotKit
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsubagents.json
More file actions
385 lines (385 loc) · 22.4 KB
/
Copy pathsubagents.json
File metadata and controls
385 lines (385 loc) · 22.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
{
"_meta": {
"description": "D6 fixtures for langgraph-python / subagents",
"sourceFile": "d5-all.json",
"created": "2026-05-21"
},
"fixtures": [
{
"match": {
"userMessage": "Research the benefits of remote work and draft a one-paragraph summary",
"hasToolResult": false,
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_research_agent_001",
"name": "research_agent",
"arguments": "{\"task\":\"Benefits of remote work\"}"
}
]
}
},
{
"_comment": "Nested: research sub-agent single-turn LLM call",
"match": {
"userMessage": "Benefits of remote work",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "- Eliminates commute, returning ~10 hours/week to employees\n- Surveys consistently show higher job satisfaction among remote workers\n- Employers gain access to a geographically unbounded talent pool\n- Reduced office overhead (rent, utilities, maintenance)\n- Trade-offs: ad-hoc collaboration, mentorship of junior staff, and cultural cohesion degrade without intentional replacement rituals"
}
},
{
"match": {
"userMessage": "Research the benefits of remote work and draft a one-paragraph summary",
"turnIndex": 1,
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_writing_agent_001",
"name": "writing_agent",
"arguments": "{\"task\":\"One-paragraph summary on the benefits of remote work, grounded in the research facts.\\n\\nFacts:\\n- Eliminates commute, returning ~10 hours per week\\n- Higher reported job satisfaction in repeated surveys\\n- Wider talent pool for employers, no geographic limit\\n- Reduced office overhead\\n- Trade-offs: collaboration, mentorship, cultural cohesion\"}"
}
]
}
},
{
"_comment": "Nested: writing sub-agent single-turn LLM call",
"match": {
"userMessage": "One-paragraph summary on the benefits of remote work",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "Remote work returns roughly ten hours a week to employees by eliminating the commute, and repeated surveys show meaningfully higher job satisfaction among remote workers. Employers benefit too: a geographically unbounded talent pool and lower office overhead. The honest counterweight is that ad-hoc collaboration, mentorship of junior staff, and cultural cohesion all degrade without intentional rituals to replace what an office provided implicitly."
}
},
{
"match": {
"userMessage": "Research the benefits of remote work and draft a one-paragraph summary",
"turnIndex": 2,
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_critique_agent_001",
"name": "critique_agent",
"arguments": "{\"task\":\"Remote work returns roughly ten hours a week to employees by eliminating the commute, and repeated surveys show meaningfully higher job satisfaction among remote workers. Employers benefit too: a geographically unbounded talent pool and lower office overhead. The honest counterweight is that ad-hoc collaboration, mentorship of junior staff, and cultural cohesion all degrade without intentional rituals to replace what an office provided implicitly.\"}"
}
]
}
},
{
"_comment": "Nested: critique sub-agent single-turn LLM call",
"match": {
"userMessage": "Remote work returns roughly ten hours",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "1. Add a specific citation or date range for the surveys mentioned — 'repeated surveys' is vague and weakens the empirical claim.\n2. The ten-hours figure needs sourcing; without it the opening reads as an unsupported assertion rather than a research finding.\n3. The final sentence about 'intentional rituals' is abstract — one concrete example (e.g., scheduled pair-programming sessions, virtual coffee chats) would make the counterweight more actionable."
}
},
{
"match": {
"userMessage": "Research the benefits of remote work and draft a one-paragraph summary",
"turnIndex": 3,
"context": "langgraph-python"
},
"response": {
"content": "Here is the summary, after research → drafting → critique:\n\nRemote work returns roughly ten hours a week to employees by eliminating the commute, and repeated surveys show meaningfully higher job satisfaction among remote workers. Employers benefit too: a geographically unbounded talent pool and lower office overhead. The honest counterweight is that ad-hoc collaboration, mentorship of junior staff, and cultural cohesion all degrade without intentional rituals to replace what an office provided implicitly."
}
},
{
"_comment": "Subagents pill 1 — 'Write a blog post' / cold exposure training. Drives supervisor → research_agent → writing_agent → critique_agent → final reply, plus three nested sub-agent turns.",
"match": {
"userMessage": "Produce a short blog post about the benefits of cold exposure training",
"hasToolResult": false,
"toolName": "research_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p1_research_001",
"name": "research_agent",
"arguments": "{\"task\":\"Cold exposure training key facts\"}"
}
]
}
},
{
"_comment": "Nested: research sub-agent returns deterministic facts about cold exposure training.",
"match": {
"userMessage": "Cold exposure training key facts",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "- Brief cold immersion (cold showers, ice baths) triggers a sympathetic-nervous-system response that releases noradrenaline\n- Repeated exposure is associated with improved self-reported mood and stress tolerance\n- Activates brown adipose tissue, modestly increasing basal metabolic rate\n- May reduce post-exercise muscle soreness when used as a recovery modality\n- Health risk for people with cardiovascular conditions; sessions should be short (1-3 minutes) and supervised at first"
}
},
{
"match": {
"userMessage": "Produce a short blog post about the benefits of cold exposure training",
"turnIndex": 1,
"toolName": "writing_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p1_writing_001",
"name": "writing_agent",
"arguments": "{\"task\":\"Short blog-post paragraph on the benefits of cold exposure training, grounded in the research facts.\\n\\nFacts:\\n- Brief cold immersion triggers a noradrenaline release\\n- Repeated exposure improves self-reported mood and stress tolerance\\n- Activates brown adipose tissue, modestly raises basal metabolic rate\\n- May reduce post-exercise muscle soreness\\n- Cardiovascular risk; keep early sessions short and supervised\"}"
}
]
}
},
{
"_comment": "Nested: writing sub-agent returns deterministic prose for cold exposure.",
"match": {
"userMessage": "Short blog-post paragraph on the benefits of cold exposure training",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "Cold exposure training — short, deliberate plunges into cold water or showers — has earned a foothold in modern recovery routines for reasons grounded in physiology rather than folklore. Each immersion produces a measurable surge of noradrenaline, the same chemistry that underpins the lift practitioners report in mood and focus afterward; with repetition, that response is associated with greater day-to-day stress tolerance. Cold also activates brown adipose tissue and can blunt post-exercise soreness, making it a low-cost adjunct for active people. The honest caveat is cardiovascular risk: keep early sessions to one to three minutes, and if you have a heart condition, get a green light before you start."
}
},
{
"match": {
"userMessage": "Produce a short blog post about the benefits of cold exposure training",
"turnIndex": 2,
"toolName": "critique_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p1_critique_001",
"name": "critique_agent",
"arguments": "{\"task\":\"Critique the cold-exposure blog draft. Draft: Cold exposure training — short, deliberate plunges into cold water or showers — has earned a foothold in modern recovery routines for reasons grounded in physiology rather than folklore. Each immersion produces a measurable surge of noradrenaline...\"}"
}
]
}
},
{
"_comment": "Nested: critique sub-agent returns deterministic critique of cold-exposure draft.",
"match": {
"userMessage": "Critique the cold-exposure blog draft",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "1. The phrase 'measurable surge of noradrenaline' should cite a study or rough magnitude — without a number, the claim reads as marketing rather than evidence.\n2. 'Modern recovery routines' is filler; lead with the physiological mechanism instead and let the cultural framing follow.\n3. The cardiovascular caveat is buried at the end. Move it earlier or make it a standalone closing line so a reader who skims still sees it."
}
},
{
"match": {
"userMessage": "Produce a short blog post about the benefits of cold exposure training",
"turnIndex": 3,
"context": "langgraph-python"
},
"response": {
"content": "Here is the polished blog post on cold exposure training, after research → drafting → critique:\n\nCold exposure training — short, deliberate plunges into cold water or showers — produces a measurable noradrenaline surge that practitioners feel as a mood and focus lift; repeated exposure is associated with greater stress tolerance, brown-adipose activation, and reduced post-exercise soreness. Keep early sessions to one to three minutes, and if you have a cardiovascular condition, clear it with a clinician first."
}
},
{
"_comment": "Subagents pill 2 — 'Explain a topic' / LLM tool calling.",
"match": {
"userMessage": "Explain how large language models handle tool calling",
"hasToolResult": false,
"toolName": "research_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p2_research_001",
"name": "research_agent",
"arguments": "{\"task\":\"How LLMs handle tool calling\"}"
}
]
}
},
{
"_comment": "Nested: research sub-agent for LLM tool calling.",
"match": {
"userMessage": "How LLMs handle tool calling",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "- The model is shown a tool schema (name, description, JSON-schema parameters) inside the system or developer prompt at request time\n- During decoding, instead of emitting natural-language text, the model emits a structured tool_call block (function name + JSON-encoded arguments)\n- The application runs the tool, packages the result into a tool message, and resends the full conversation so the model can continue\n- Modern decoders use constrained decoding or grammars to keep the arguments syntactically valid JSON\n- The model decides on tool use turn-by-turn — there is no out-of-band channel; tool calls are just a different message role in the same chat thread"
}
},
{
"match": {
"userMessage": "Explain how large language models handle tool calling",
"turnIndex": 1,
"toolName": "writing_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p2_writing_001",
"name": "writing_agent",
"arguments": "{\"task\":\"One-paragraph explanation of how LLMs handle tool calling, grounded in the research.\\n\\nFacts:\\n- Tool schemas (name, description, JSON-schema params) are passed in the prompt\\n- Models emit a structured tool_call block instead of text\\n- Application runs the tool and replays the result as a tool message\\n- Constrained decoding keeps arguments valid JSON\\n- Tool use is decided turn-by-turn in the same chat thread\"}"
}
]
}
},
{
"_comment": "Nested: writing sub-agent for LLM tool calling.",
"match": {
"userMessage": "One-paragraph explanation of how LLMs handle tool calling",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "Large language models handle tool calling by treating tools as a structured extension of the chat protocol rather than a separate channel. At request time the application supplies each tool's name, description, and JSON-schema parameters in the prompt; during decoding the model can emit a tool_call block — a function name plus JSON-encoded arguments — instead of plain text, with constrained decoding keeping the arguments syntactically valid. The application then executes the tool and replays the result back as a tool-role message, and the model continues the conversation from there. The decision to call a tool is made turn-by-turn, so a single user request can fan out into a chain of tool calls that the model orchestrates as it reads each result."
}
},
{
"match": {
"userMessage": "Explain how large language models handle tool calling",
"turnIndex": 2,
"toolName": "critique_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p2_critique_001",
"name": "critique_agent",
"arguments": "{\"task\":\"Critique the LLM tool-calling explanation draft. Draft: Large language models handle tool calling by treating tools as a structured extension of the chat protocol rather than a separate channel. At request time the application supplies each tool's name, description, and JSON-schema parameters in the prompt...\"}"
}
]
}
},
{
"_comment": "Nested: critique sub-agent for LLM tool calling.",
"match": {
"userMessage": "Critique the LLM tool-calling explanation draft",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "1. The opening contrast 'rather than a separate channel' assumes the reader already knows what a 'separate channel' would mean — either drop the contrast or give a one-clause example (e.g., 'rather than a side API the model talks to in parallel').\n2. 'Constrained decoding keeping the arguments syntactically valid' is technically correct but vague; mention that this is what makes the JSON parseable on the application side.\n3. The final sentence introduces multi-tool chains without saying who controls the loop — clarify that the application is the runtime that decides whether to keep going, not the model itself."
}
},
{
"match": {
"userMessage": "Explain how large language models handle tool calling",
"turnIndex": 3,
"context": "langgraph-python"
},
"response": {
"content": "Here is the explanation of LLM tool calling, after research → drafting → critique:\n\nLLMs treat tool calling as part of the chat protocol: each tool's name, description, and JSON-schema parameters are passed in the prompt, and during decoding the model can emit a structured tool_call block (function name + JSON arguments) instead of plain text. The application — not the model — runs the tool, returns the result as a tool-role message, and decides whether to keep looping. Constrained decoding keeps the arguments valid JSON the application can parse, and the model orchestrates multi-step chains turn-by-turn from inside the same conversation."
}
},
{
"_comment": "Subagents pill 3 — 'Summarize a topic' / reusable rockets. Note: the original Railway-side bug for this pill was the concurrent-update on `delegations`; the agent state reducer fix unblocks this fixture chain.",
"match": {
"userMessage": "Summarize the current state of reusable rockets",
"hasToolResult": false,
"toolName": "research_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p3_research_001",
"name": "research_agent",
"arguments": "{\"task\":\"Current state of reusable rockets\"}"
}
]
}
},
{
"_comment": "Nested: research sub-agent for reusable rockets.",
"match": {
"userMessage": "Current state of reusable rockets",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "- SpaceX Falcon 9 routinely lands and re-flies first stages; individual boosters have flown more than 20 missions each\n- Falcon Heavy reuses both side boosters; the center core has been recovered on a subset of flights\n- Rocket Lab's Electron has demonstrated mid-air booster catch but routine reuse is still in development\n- SpaceX Starship is targeting full reuse of both stages; orbital test flights are ongoing as of 2024-2025\n- Reuse is the dominant lever on launch cost: Falcon 9 list pricing is set well below expendable competitors largely because of stage recovery"
}
},
{
"match": {
"userMessage": "Summarize the current state of reusable rockets",
"turnIndex": 1,
"toolName": "writing_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p3_writing_001",
"name": "writing_agent",
"arguments": "{\"task\":\"One polished paragraph summarizing the current state of reusable rockets, grounded in the research.\\n\\nFacts:\\n- Falcon 9 first stages routinely re-fly, some 20+ flights\\n- Falcon Heavy reuses side boosters; center core recovered sometimes\\n- Rocket Lab Electron demonstrating mid-air catch, reuse still in development\\n- SpaceX Starship targeting full reuse of both stages, in flight test\\n- Reuse drives launch cost downward\"}"
}
]
}
},
{
"_comment": "Nested: writing sub-agent for reusable rockets.",
"match": {
"userMessage": "One polished paragraph summarizing the current state of reusable rockets",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "Reusable rockets have shifted from a research goal to the default cost lever in commercial spaceflight. SpaceX's Falcon 9 routinely lands and re-flies its first stage — individual boosters have now flown twenty-plus missions — and Falcon Heavy reuses both side boosters with intermittent recovery of the center core. Smaller-class operators like Rocket Lab have demonstrated mid-air booster catch but have not yet made reuse routine, while SpaceX's Starship is in active flight testing toward full two-stage reuse. The economic consequence is already visible: Falcon 9 list pricing sits well below expendable competitors precisely because the dominant cost — building a fresh first stage every flight — has been amortized across many missions."
}
},
{
"match": {
"userMessage": "Summarize the current state of reusable rockets",
"turnIndex": 2,
"toolName": "critique_agent",
"context": "langgraph-python"
},
"response": {
"toolCalls": [
{
"id": "call_d5_subagents_p3_critique_001",
"name": "critique_agent",
"arguments": "{\"task\":\"Critique the reusable rockets summary draft. Draft: Reusable rockets have shifted from a research goal to the default cost lever in commercial spaceflight. SpaceX's Falcon 9 routinely lands and re-flies its first stage — individual boosters have now flown twenty-plus missions...\"}"
}
]
}
},
{
"_comment": "Nested: critique sub-agent for reusable rockets.",
"match": {
"userMessage": "Critique the reusable rockets summary draft",
"turnIndex": 0,
"context": "langgraph-python"
},
"response": {
"content": "1. 'Default cost lever' is jargon that pre-supposes the reader already accepts the framing — open instead with the concrete result (Falcon 9 reflight count) and let the framing emerge.\n2. The Starship sentence is hedged ('in active flight testing toward full reuse') in a way that obscures the actual milestone reached as of writing — name the latest test outcome or drop the clause.\n3. The closing economic claim asserts pricing is 'well below expendable competitors' without a reference price; one number (e.g., $/kg-to-LEO) would land the point much harder than the qualitative claim alone."
}
},
{
"match": {
"userMessage": "Summarize the current state of reusable rockets",
"turnIndex": 3,
"context": "langgraph-python"
},
"response": {
"content": "Here is the polished summary on reusable rockets, after research → drafting → critique:\n\nReusable rockets are now mainstream in commercial spaceflight. SpaceX's Falcon 9 lands and re-flies its first stage routinely — individual boosters have flown twenty-plus missions — and Falcon Heavy reuses both side boosters. Rocket Lab has demonstrated mid-air Electron booster catch but reuse is not yet routine, while SpaceX Starship is in active orbital flight testing with full two-stage reuse as the target. The economic impact is already priced in: Falcon 9 sits well below expendable competitors per kilogram to low Earth orbit because amortizing a recovered first stage across many missions removes the largest single cost from the launch."
}
}
]
}