Problem
The Web UI chat currently uses a separate, thin code path (ApiServerAdapter._handle_chat_completions() → _create_agent() → agent.run_conversation()) that bypasses the full gateway pipeline used by all messaging platforms (Telegram, Slack, Discord).
This means the Web UI is missing critical infrastructure that messaging platforms get for free:
| Capability |
Telegram |
Slack/Discord |
Web UI |
| Persistent sessions (auto-reset, expiry) |
✅ |
✅ |
⚠️ opt-in only |
| Session context injection |
✅ rich |
✅ rich |
❌ none |
| Session hygiene (auto-compress oversized transcripts) |
✅ |
✅ |
❌ |
Plugin hooks (pre_gateway_dispatch) |
✅ |
✅ |
❌ |
| Auto-skills (topic/channel bindings) |
✅ |
✅ |
❌ |
Command handling (/stop, /new, /steer, /queue) |
✅ |
✅ |
❌ |
| Interrupt support |
✅ |
✅ |
❌ |
| Agent caching (LRU + idle TTL warm starts) |
✅ |
✅ |
❌ fresh per request |
| Provider routing (allow/ignore/order/sort) |
✅ |
✅ |
⚠️ basic |
| Typing indicators |
✅ |
✅ |
❌ |
Architecture
Current: Two Separate Paths
Web UI:
POST /v1/chat/completions
→ ApiServerAdapter._handle_chat_completions()
→ _create_agent() [separate, thin]
→ _run_agent()
→ agent.run_conversation()
← JSON/SSE response
Telegram/Slack/Discord:
Platform SDK event
→ PlatformAdapter._handle_*_message()
→ _build_message_event()
→ self.handle_message(event)
→ BasePlatformAdapter.handle_message()
→ _process_message_background()
→ GatewayRunner._handle_message()
┌─────────────────────────────────────────┐
│ FULL GATEWAY PIPELINE │
│ • Auth / pairing │
│ • Session persistence │
│ • build_session_context() │
│ • Plugin hooks │
│ • Auto-skills │
│ • Session hygiene (auto-compression) │
│ • Vision enrichment │
│ • _run_agent() with agent caching │
│ • run_conversation() │
└─────────────────────────────────────────┘
← agent_result
→ Response delivery (platform-specific)
Proposed: Shared Pipeline, Forked Delivery
All paths:
→ GatewayRunner._handle_message()
→ [full gateway pipeline]
← agent_result
Delivery (platform-specific):
Telegram → MarkdownV2, chunking, typing, media extraction
Slack → Block Kit, threads, reactions
Web UI → SSE stream, JSON, no formatting/chunking
Implementation Plan
Phase 1: Route Web Chat Through Gateway Pipeline
File: hermes-agent/gateway/platforms/api_server.py
The /api/sessions/{id}/chat/stream endpoint should:
- Build a
MessageEvent from the web chat request (mapping session headers to SessionSource)
- Route it through
self.handle_message(event) like other platform adapters
- Let the gateway pipeline handle session management, context injection, hygiene, etc.
- Capture the agent result and stream it back as SSE
# Current (thin path):
agent = self._create_agent(ephemeral_system_prompt=system_prompt, session_id=session_id)
result = await self._run_agent(user_message=user_message, ...)
# Proposed (gateway path):
event = self._build_message_event(session_id, user_message, system_prompt, ...)
await self.handle_message(event) # → full gateway pipeline
# → SSE events from streaming callbacks
Phase 2: SessionSource Mapping
Map web UI headers to SessionSource:
X-Hermes-Session-Id → session_id
X-Hermes-Session-Key → stable memory scope key
- Platform =
"switchui" (new value) or reuse "api_server"
Phase 3: Platform Toolsets Config
Add platform_toolsets.switchui to config.yaml so the web chat has its own toolset configuration, matching the pattern used by platform_toolsets.telegram.
Phase 4: Streaming Callbacks Through Gateway Path
Ensure stream_delta_callback, tool_start_callback, and tool_complete_callback are wired into the agent when dispatched through handle_message(), so SSE events still flow to the Web UI.
Phase 5: Override Delivery Tail
The Web UI delivery differs from messaging platforms:
- No MarkdownV2 conversion (Web UI renders its own markdown)
- No message chunking
- No
MEDIA:<path> extraction (handled differently)
- No typing indicators (Web UI has its own loading state)
- JSON/SSE response format
This can be achieved by:
- The web endpoint providing its own
_message_handler callback
- Or by detecting
platform="switchui" and skipping platform-specific formatting
Expected Benefits
After implementation, Web UI chat will immediately gain:
- Warm agent starts — LRU cache with idle TTL (no cold start per request)
- Session context — "You are on Switch UI. User: Rohit..." injected into system prompt
- Auto-compression — sessions that grow too large get compressed at 85% threshold
- Plugin hooks — full
pre_gateway_dispatch plugin chain
- Interrupt support —
/stop and agent interruption work
- Provider routing — per-session model/provider routing applies
- Auto-skills — topic/channel skill bindings work if configured
- Session hygiene — consistent with messaging platforms
Acceptance Criteria
Notes
- The
/v1/chat/completions endpoint (OpenAI-compatible API) should remain unchanged — it serves external clients that expect the OpenAI protocol
- The
/api/sessions/{id}/chat/stream endpoint (used by Switch UI) is the one to refactor
- The gateway pipeline was designed to be platform-agnostic above the delivery layer; the API server just never connected to it
- This is primarily a gateway-side change; minimal changes needed in the Switch UI frontend
Related Code
- Thin path (to be replaced):
hermes-agent/gateway/platforms/api_server.py → _handle_chat_completions(), _create_agent(), _run_agent()
- Full pipeline (to be reused):
hermes-agent/gateway/run.py → _handle_message(), _handle_message_with_agent(), _run_agent()
- Base adapter (routing):
hermes-agent/gateway/platforms/base.py → handle_message(), _process_message_background()
- Telegram adapter (reference):
hermes-agent/gateway/platforms/telegram.py → _handle_text_message(), _build_message_event()
Problem
The Web UI chat currently uses a separate, thin code path (
ApiServerAdapter._handle_chat_completions()→_create_agent()→agent.run_conversation()) that bypasses the full gateway pipeline used by all messaging platforms (Telegram, Slack, Discord).This means the Web UI is missing critical infrastructure that messaging platforms get for free:
pre_gateway_dispatch)/stop,/new,/steer,/queue)Architecture
Current: Two Separate Paths
Proposed: Shared Pipeline, Forked Delivery
Implementation Plan
Phase 1: Route Web Chat Through Gateway Pipeline
File:
hermes-agent/gateway/platforms/api_server.pyThe
/api/sessions/{id}/chat/streamendpoint should:MessageEventfrom the web chat request (mapping session headers toSessionSource)self.handle_message(event)like other platform adaptersPhase 2: SessionSource Mapping
Map web UI headers to
SessionSource:X-Hermes-Session-Id→session_idX-Hermes-Session-Key→ stable memory scope key"switchui"(new value) or reuse"api_server"Phase 3: Platform Toolsets Config
Add
platform_toolsets.switchuito config.yaml so the web chat has its own toolset configuration, matching the pattern used byplatform_toolsets.telegram.Phase 4: Streaming Callbacks Through Gateway Path
Ensure
stream_delta_callback,tool_start_callback, andtool_complete_callbackare wired into the agent when dispatched throughhandle_message(), so SSE events still flow to the Web UI.Phase 5: Override Delivery Tail
The Web UI delivery differs from messaging platforms:
MEDIA:<path>extraction (handled differently)This can be achieved by:
_message_handlercallbackplatform="switchui"and skipping platform-specific formattingExpected Benefits
After implementation, Web UI chat will immediately gain:
pre_gateway_dispatchplugin chain/stopand agent interruption workAcceptance Criteria
GatewayRunner._handle_message()/stopand interrupt work from the Web UI/v1/chat/completions(OpenAI-compatible) continues to work as-is for external clientsNotes
/v1/chat/completionsendpoint (OpenAI-compatible API) should remain unchanged — it serves external clients that expect the OpenAI protocol/api/sessions/{id}/chat/streamendpoint (used by Switch UI) is the one to refactorRelated Code
hermes-agent/gateway/platforms/api_server.py→_handle_chat_completions(),_create_agent(),_run_agent()hermes-agent/gateway/run.py→_handle_message(),_handle_message_with_agent(),_run_agent()hermes-agent/gateway/platforms/base.py→handle_message(),_process_message_background()hermes-agent/gateway/platforms/telegram.py→_handle_text_message(),_build_message_event()