[backend] refactor: route web chat through GatewayRunner pipeline for parity with Telegram/Slack/Discord

## Problem

The Web UI chat currently uses a **separate, thin code path** (`ApiServerAdapter._handle_chat_completions()` → `_create_agent()` → `agent.run_conversation()`) that bypasses the full gateway pipeline used by all messaging platforms (Telegram, Slack, Discord).

This means the Web UI is missing critical infrastructure that messaging platforms get for free:

| Capability | Telegram | Slack/Discord | **Web UI** |
|---|---|---|---|
| Persistent sessions (auto-reset, expiry) | ✅ | ✅ | ⚠️ opt-in only |
| Session context injection | ✅ rich | ✅ rich | ❌ none |
| Session hygiene (auto-compress oversized transcripts) | ✅ | ✅ | ❌ |
| Plugin hooks (`pre_gateway_dispatch`) | ✅ | ✅ | ❌ |
| Auto-skills (topic/channel bindings) | ✅ | ✅ | ❌ |
| Command handling (`/stop`, `/new`, `/steer`, `/queue`) | ✅ | ✅ | ❌ |
| Interrupt support | ✅ | ✅ | ❌ |
| Agent caching (LRU + idle TTL warm starts) | ✅ | ✅ | ❌ fresh per request |
| Provider routing (allow/ignore/order/sort) | ✅ | ✅ | ⚠️ basic |
| Typing indicators | ✅ | ✅ | ❌ |

## Architecture

### Current: Two Separate Paths

```
Web UI:
  POST /v1/chat/completions
    → ApiServerAdapter._handle_chat_completions()
      → _create_agent() [separate, thin]
      → _run_agent()
        → agent.run_conversation()
      ← JSON/SSE response

Telegram/Slack/Discord:
  Platform SDK event
    → PlatformAdapter._handle_*_message()
      → _build_message_event()
      → self.handle_message(event)
        → BasePlatformAdapter.handle_message()
          → _process_message_background()
            → GatewayRunner._handle_message()
              ┌─────────────────────────────────────────┐
              │ FULL GATEWAY PIPELINE                   │
              │ • Auth / pairing                        │
              │ • Session persistence                   │
              │ • build_session_context()               │
              │ • Plugin hooks                          │
              │ • Auto-skills                           │
              │ • Session hygiene (auto-compression)    │
              │ • Vision enrichment                     │
              │ • _run_agent() with agent caching       │
              │ • run_conversation()                    │
              └─────────────────────────────────────────┘
            ← agent_result
            → Response delivery (platform-specific)
```

### Proposed: Shared Pipeline, Forked Delivery

```
All paths:
  → GatewayRunner._handle_message()
    → [full gateway pipeline]
    ← agent_result

Delivery (platform-specific):
  Telegram → MarkdownV2, chunking, typing, media extraction
  Slack    → Block Kit, threads, reactions
  Web UI   → SSE stream, JSON, no formatting/chunking
```

## Implementation Plan

### Phase 1: Route Web Chat Through Gateway Pipeline

**File: `hermes-agent/gateway/platforms/api_server.py`**

The `/api/sessions/{id}/chat/stream` endpoint should:

1. Build a `MessageEvent` from the web chat request (mapping session headers to `SessionSource`)
2. Route it through `self.handle_message(event)` like other platform adapters
3. Let the gateway pipeline handle session management, context injection, hygiene, etc.
4. Capture the agent result and stream it back as SSE

```python
# Current (thin path):
agent = self._create_agent(ephemeral_system_prompt=system_prompt, session_id=session_id)
result = await self._run_agent(user_message=user_message, ...)

# Proposed (gateway path):
event = self._build_message_event(session_id, user_message, system_prompt, ...)
await self.handle_message(event)  # → full gateway pipeline
# → SSE events from streaming callbacks
```

### Phase 2: SessionSource Mapping

Map web UI headers to `SessionSource`:

- `X-Hermes-Session-Id` → `session_id`
- `X-Hermes-Session-Key` → stable memory scope key
- Platform = `"switchui"` (new value) or reuse `"api_server"`

### Phase 3: Platform Toolsets Config

Add `platform_toolsets.switchui` to config.yaml so the web chat has its own toolset configuration, matching the pattern used by `platform_toolsets.telegram`.

### Phase 4: Streaming Callbacks Through Gateway Path

Ensure `stream_delta_callback`, `tool_start_callback`, and `tool_complete_callback` are wired into the agent when dispatched through `handle_message()`, so SSE events still flow to the Web UI.

### Phase 5: Override Delivery Tail

The Web UI delivery differs from messaging platforms:

- No MarkdownV2 conversion (Web UI renders its own markdown)
- No message chunking
- No `MEDIA:<path>` extraction (handled differently)
- No typing indicators (Web UI has its own loading state)
- JSON/SSE response format

This can be achieved by:
- The web endpoint providing its own `_message_handler` callback
- Or by detecting `platform="switchui"` and skipping platform-specific formatting

## Expected Benefits

After implementation, Web UI chat will immediately gain:

1. **Warm agent starts** — LRU cache with idle TTL (no cold start per request)
2. **Session context** — "You are on Switch UI. User: Rohit..." injected into system prompt
3. **Auto-compression** — sessions that grow too large get compressed at 85% threshold
4. **Plugin hooks** — full `pre_gateway_dispatch` plugin chain
5. **Interrupt support** — `/stop` and agent interruption work
6. **Provider routing** — per-session model/provider routing applies
7. **Auto-skills** — topic/channel skill bindings work if configured
8. **Session hygiene** — consistent with messaging platforms

## Acceptance Criteria

- [ ] Web chat routes through `GatewayRunner._handle_message()`
- [ ] Session context is injected (platform, user, workspace)
- [ ] Agent caching works (warm starts, not fresh per request)
- [ ] Session hygiene auto-compresses oversized transcripts
- [ ] SSE streaming still works for tool events and token deltas
- [ ] `/stop` and interrupt work from the Web UI
- [ ] No regression in existing Web UI functionality
- [ ] `/v1/chat/completions` (OpenAI-compatible) continues to work as-is for external clients

## Notes

- The `/v1/chat/completions` endpoint (OpenAI-compatible API) should remain unchanged — it serves external clients that expect the OpenAI protocol
- The `/api/sessions/{id}/chat/stream` endpoint (used by Switch UI) is the one to refactor
- The gateway pipeline was designed to be platform-agnostic above the delivery layer; the API server just never connected to it
- This is primarily a gateway-side change; minimal changes needed in the Switch UI frontend

## Related Code

- **Thin path (to be replaced):** `hermes-agent/gateway/platforms/api_server.py` → `_handle_chat_completions()`, `_create_agent()`, `_run_agent()`
- **Full pipeline (to be reused):** `hermes-agent/gateway/run.py` → `_handle_message()`, `_handle_message_with_agent()`, `_run_agent()`
- **Base adapter (routing):** `hermes-agent/gateway/platforms/base.py` → `handle_message()`, `_process_message_background()`
- **Telegram adapter (reference):** `hermes-agent/gateway/platforms/telegram.py` → `_handle_text_message()`, `_build_message_event()`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] refactor: route web chat through GatewayRunner pipeline for parity with Telegram/Slack/Discord #188

Problem

Architecture

Current: Two Separate Paths

Proposed: Shared Pipeline, Forked Delivery

Implementation Plan

Phase 1: Route Web Chat Through Gateway Pipeline

Phase 2: SessionSource Mapping

Phase 3: Platform Toolsets Config

Phase 4: Streaming Callbacks Through Gateway Path

Phase 5: Override Delivery Tail

Expected Benefits

Acceptance Criteria

Notes

Related Code

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Capability	Telegram	Slack/Discord	Web UI
Persistent sessions (auto-reset, expiry)	✅	✅	⚠️ opt-in only
Session context injection	✅ rich	✅ rich	❌ none
Session hygiene (auto-compress oversized transcripts)	✅	✅	❌
Plugin hooks (`pre_gateway_dispatch`)	✅	✅	❌
Auto-skills (topic/channel bindings)	✅	✅	❌
Command handling (`/stop`, `/new`, `/steer`, `/queue`)	✅	✅	❌
Interrupt support	✅	✅	❌
Agent caching (LRU + idle TTL warm starts)	✅	✅	❌ fresh per request
Provider routing (allow/ignore/order/sort)	✅	✅	⚠️ basic
Typing indicators	✅	✅	❌

[backend] refactor: route web chat through GatewayRunner pipeline for parity with Telegram/Slack/Discord #188

Description

Problem

Architecture

Current: Two Separate Paths

Proposed: Shared Pipeline, Forked Delivery

Implementation Plan

Phase 1: Route Web Chat Through Gateway Pipeline

Phase 2: SessionSource Mapping

Phase 3: Platform Toolsets Config

Phase 4: Streaming Callbacks Through Gateway Path

Phase 5: Override Delivery Tail

Expected Benefits

Acceptance Criteria

Notes

Related Code

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions