none approach drops tool_calls on streaming requests
Repo: https://github.com/algorithmicsuperintelligence/optillm
Version tested: 0.3.15
File: optillm/server.py
Summary
When OptiLLM runs with approach none (direct pass-through — no optimization prefix on the model name), streaming requests that include tools do not return tool_calls to the client.
The proxy buffers the upstream response, extracts only assistant text, and synthesizes a single SSE chunk with finish_reason: "stop". OpenAI-compatible agent clients (Zed, Cursor, custom tool loops) see the announcement text but never receive tool metadata, so tool execution never starts.
none_approach() itself is implemented correctly as a transparent proxy — the bug is in how /v1/chat/completions handles the none branch when stream: true.
Steps to reproduce
-
Start OptiLLM pointing at any OpenAI-compatible upstream that supports tool calling:
optillm --base-url http://127.0.0.1:4000/v1 --port 8000
-
Send a streaming chat completion with tools (model has no optimization prefix → routes to none):
curl -s -N http://127.0.0.1:8000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Run echo hello with the shell tool"}],
"tools": [{
"type": "function",
"function": {
"name": "shell",
"description": "Run a shell command",
"parameters": {
"type": "object",
"properties": {"command": {"type": "string"}},
"required": ["command"]
}
}
}],
"tool_choice": "auto",
"stream": true
}'
Actual behavior
OptiLLM returns a single synthesized chunk:
{"choices":[{"delta":{"role":"assistant","content":"Sure! Let me run that..."},"finish_reason":"stop"}]}
Then [DONE]. No tool_calls in the stream.
Expected behavior
OptiLLM should forward upstream SSE chunks verbatim, including tool_calls deltas and finish_reason: "tool_calls", e.g.:
{"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...","function":{"name":"shell","arguments":""}}]}}]}
Hitting the same upstream directly (bypassing OptiLLM) with stream: true produces the expected tool-call chunks.
Root cause
In proxy(), the none branch currently does:
execute_single_approach() # reconstructs messages from parse_conversation() text
→ none_approach(stream=False) # stream stripped from kwargs
→ extract_contents(result) # text from choices[0].message.content only
→ generate_streaming_response() # fake SSE with finish_reason: "stop"
Problems:
generate_streaming_response() only emits text — it has no concept of tool_calls.
- Upstream is always called non-streaming for the
none path (kwargs.pop('stream', None) in execute_single_approach).
- Messages are reconstructed from
parse_conversation(), which flattens user/assistant text and drops tool role messages and prior tool_calls — breaking multi-turn agent loops.
Secondary issue (non-streaming)
Some providers return assistant text in choices[0] and tool_calls in choices[1]. OptiLLM returns the raw response, so clients that only read choices[0] miss tools. (Similar to goose#6369.)
Impact
Any OpenAI-compatible client using OptiLLM as a proxy with:
stream: true
tools / tool_choice
- approach
none (unprefixed model name, or explicit none- prefix)
…will fail to execute tools. This affects IDE agents, MCP integrations, and custom agent frameworks.
Optimization approaches (rto-, cot_reflection-, etc.) are unaffected — they don't claim to be transparent proxies.
Proposed fix
For operation == 'SINGLE' and approaches[0] == 'none' only:
stream: true → call none_approach upstream with stream=True and yield SSE chunks verbatim (generate_stream_passthrough).
stream: false → call none_approach with the original request messages (not reconstructed), then optionally merge split tool choices into choices[0].
- Leave
generate_streaming_response() unchanged for optimization approaches that produce text output.
A patch is attached in none-passthrough-tool-calls.patch (~70 lines, optillm/server.py only).
Workaround (today)
Bypass OptiLLM for tool-heavy agent sessions and point clients directly at the upstream OpenAI-compatible endpoint.
noneapproach dropstool_callson streaming requestsRepo: https://github.com/algorithmicsuperintelligence/optillm
Version tested: 0.3.15
File:
optillm/server.pySummary
When OptiLLM runs with approach
none(direct pass-through — no optimization prefix on the model name), streaming requests that includetoolsdo not returntool_callsto the client.The proxy buffers the upstream response, extracts only assistant text, and synthesizes a single SSE chunk with
finish_reason: "stop". OpenAI-compatible agent clients (Zed, Cursor, custom tool loops) see the announcement text but never receive tool metadata, so tool execution never starts.none_approach()itself is implemented correctly as a transparent proxy — the bug is in how/v1/chat/completionshandles thenonebranch whenstream: true.Steps to reproduce
Start OptiLLM pointing at any OpenAI-compatible upstream that supports tool calling:
Send a streaming chat completion with tools (model has no optimization prefix → routes to
none):Actual behavior
OptiLLM returns a single synthesized chunk:
{"choices":[{"delta":{"role":"assistant","content":"Sure! Let me run that..."},"finish_reason":"stop"}]}Then
[DONE]. Notool_callsin the stream.Expected behavior
OptiLLM should forward upstream SSE chunks verbatim, including
tool_callsdeltas andfinish_reason: "tool_calls", e.g.:{"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...","function":{"name":"shell","arguments":""}}]}}]}Hitting the same upstream directly (bypassing OptiLLM) with
stream: trueproduces the expected tool-call chunks.Root cause
In
proxy(), thenonebranch currently does:Problems:
generate_streaming_response()only emits text — it has no concept oftool_calls.nonepath (kwargs.pop('stream', None)inexecute_single_approach).parse_conversation(), which flattens user/assistant text and dropstoolrole messages and priortool_calls— breaking multi-turn agent loops.Secondary issue (non-streaming)
Some providers return assistant text in
choices[0]andtool_callsinchoices[1]. OptiLLM returns the raw response, so clients that only readchoices[0]miss tools. (Similar to goose#6369.)Impact
Any OpenAI-compatible client using OptiLLM as a proxy with:
stream: truetools/tool_choicenone(unprefixed model name, or explicitnone-prefix)…will fail to execute tools. This affects IDE agents, MCP integrations, and custom agent frameworks.
Optimization approaches (
rto-,cot_reflection-, etc.) are unaffected — they don't claim to be transparent proxies.Proposed fix
For
operation == 'SINGLE' and approaches[0] == 'none'only:stream: true→ callnone_approachupstream withstream=Trueand yield SSE chunks verbatim (generate_stream_passthrough).stream: false→ callnone_approachwith the original requestmessages(not reconstructed), then optionally merge split tool choices intochoices[0].generate_streaming_response()unchanged for optimization approaches that produce text output.A patch is attached in
none-passthrough-tool-calls.patch(~70 lines,optillm/server.pyonly).Workaround (today)
Bypass OptiLLM for tool-heavy agent sessions and point clients directly at the upstream OpenAI-compatible endpoint.