none approach drops tool_calls on streaming requests

# `none` approach drops `tool_calls` on streaming requests

**Repo:** https://github.com/algorithmicsuperintelligence/optillm  
**Version tested:** 0.3.15  
**File:** `optillm/server.py`

## Summary

When OptiLLM runs with approach `none` (direct pass-through — no optimization prefix on the model name), **streaming requests that include `tools` do not return `tool_calls` to the client**.

The proxy buffers the upstream response, extracts only assistant text, and synthesizes a single SSE chunk with `finish_reason: "stop"`. OpenAI-compatible agent clients (Zed, Cursor, custom tool loops) see the announcement text but never receive tool metadata, so tool execution never starts.

`none_approach()` itself is implemented correctly as a transparent proxy — the bug is in how `/v1/chat/completions` handles the `none` branch when `stream: true`.

## Steps to reproduce

1. Start OptiLLM pointing at any OpenAI-compatible upstream that supports tool calling:

   ```bash
   optillm --base-url http://127.0.0.1:4000/v1 --port 8000
   ```

2. Send a **streaming** chat completion with tools (model has no optimization prefix → routes to `none`):

   ```bash
   curl -s -N http://127.0.0.1:8000/v1/chat/completions \
     -H "Authorization: Bearer $OPENAI_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "model": "gpt-4o",
       "messages": [{"role": "user", "content": "Run echo hello with the shell tool"}],
       "tools": [{
         "type": "function",
         "function": {
           "name": "shell",
           "description": "Run a shell command",
           "parameters": {
             "type": "object",
             "properties": {"command": {"type": "string"}},
             "required": ["command"]
           }
         }
       }],
       "tool_choice": "auto",
       "stream": true
     }'
   ```

## Actual behavior

OptiLLM returns a single synthesized chunk:

```json
{"choices":[{"delta":{"role":"assistant","content":"Sure! Let me run that..."},"finish_reason":"stop"}]}
```

Then `[DONE]`. **No `tool_calls` in the stream.**

## Expected behavior

OptiLLM should forward upstream SSE chunks verbatim, including `tool_calls` deltas and `finish_reason: "tool_calls"`, e.g.:

```json
{"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...","function":{"name":"shell","arguments":""}}]}}]}
```

Hitting the same upstream **directly** (bypassing OptiLLM) with `stream: true` produces the expected tool-call chunks.

## Root cause

In `proxy()`, the `none` branch currently does:

```
execute_single_approach()     # reconstructs messages from parse_conversation() text
  → none_approach(stream=False)  # stream stripped from kwargs
  → extract_contents(result)       # text from choices[0].message.content only
  → generate_streaming_response()  # fake SSE with finish_reason: "stop"
```

Problems:

1. **`generate_streaming_response()` only emits text** — it has no concept of `tool_calls`.
2. **Upstream is always called non-streaming** for the `none` path (`kwargs.pop('stream', None)` in `execute_single_approach`).
3. **Messages are reconstructed** from `parse_conversation()`, which flattens user/assistant text and drops `tool` role messages and prior `tool_calls` — breaking multi-turn agent loops.

## Secondary issue (non-streaming)

Some providers return assistant text in `choices[0]` and `tool_calls` in `choices[1]`. OptiLLM returns the raw response, so clients that only read `choices[0]` miss tools. (Similar to [goose#6369](https://github.com/block/goose/commit/9aee76391c298a17785aae3c21835b4d66e8d5f9).)

## Impact

Any OpenAI-compatible client using OptiLLM as a proxy with:

- `stream: true`
- `tools` / `tool_choice`
- approach `none` (unprefixed model name, or explicit `none-` prefix)

…will fail to execute tools. This affects IDE agents, MCP integrations, and custom agent frameworks.

Optimization approaches (`rto-`, `cot_reflection-`, etc.) are unaffected — they don't claim to be transparent proxies.

## Proposed fix

For `operation == 'SINGLE' and approaches[0] == 'none'` only:

1. **`stream: true`** → call `none_approach` upstream with `stream=True` and yield SSE chunks verbatim (`generate_stream_passthrough`).
2. **`stream: false`** → call `none_approach` with the **original request `messages`** (not reconstructed), then optionally merge split tool choices into `choices[0]`.
3. Leave `generate_streaming_response()` unchanged for optimization approaches that produce text output.

A patch is attached in `none-passthrough-tool-calls.patch` (~70 lines, `optillm/server.py` only).

## Workaround (today)

Bypass OptiLLM for tool-heavy agent sessions and point clients directly at the upstream OpenAI-compatible endpoint.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

none approach drops tool_calls on streaming requests #312

`none` approach drops `tool_calls` on streaming requests

Summary

Steps to reproduce

Actual behavior

Expected behavior

Root cause

Secondary issue (non-streaming)

Impact

Proposed fix

Workaround (today)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

none approach drops tool_calls on streaming requests #312

Description

none approach drops tool_calls on streaming requests

Summary

Steps to reproduce

Actual behavior

Expected behavior

Root cause

Secondary issue (non-streaming)

Impact

Proposed fix

Workaround (today)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`none` approach drops `tool_calls` on streaming requests