Summary
Add an /improve page to Switch UI that monitors per-profile / per-skill agent health and drives the self-improving harness loop (Karpathy autoresearch ratchet) as a human-in-the-loop approval queue. This page is a thin consumer of the agent-side engine API (/api/improve/*) — all heavy machinery (metrics store, eval runner, meta-agent proposer, git ratchet, experiment state machine) lives in the hermes-agent plugin.
Companion / dependency: agent engine issue → Interstellar-code/hermes-agent#133. This UI issue is blocked on that plugin's /api/improve/* surface.
This is the UI half of docs/self-improving-agent-proposal.md (§4, §6c). Engine/agent work is out of scope here.
Architecture
- Consume
/api/improve/* via the existing dashboard-proxy pattern (same as jobs/kanban).
- Capability-probe the endpoint — hide the
/improve nav + page when the agent plugin isn't installed/enabled (mirror jobs/kanban probing).
- No business logic in the UI: it renders proposals, scores, diffs, history, and posts approve/reject/pause actions back to the API.
Page structure (/improve route)
1. Per-profile observability (P0 — zero-risk, immediate value)
Scorecard built from data the dashboard already fetches, scoped per profile:
- sessions/day, error+warn rate, cost trend, token efficiency, retries.
- Per-skill row (P4): invocations, error rate (logs↔sessions correlated), retries, avg tokens/cost, last-used, trend.
- Trends sourced from the agent metrics store via API (survives restarts).
- This section alone answers "is my harness degrading?" and ships before any loop.
2. Proposal queue (state: proposed)
Card per pending experiment:
- Side-by-side old vs new diff (SOUL.md / profile
system_prompt) — exactly one atomic change, highlighted.
- Meta-agent rationale.
- Offline eval table: per-scenario pass/fail before vs after, aggregate delta, eval token cost.
- Actions: Approve / Reject / Edit-then-approve (reject is logged so the idea isn't re-proposed).
3. Observation window (state: live)
- Progress: "12 / 30 sessions observed" (window configurable per profile).
- Live metrics vs baseline: error/warn rate, completion, retries, token efficiency, periodic LLM-judge spot-scores.
- This is the second verification stage (production proof, beyond offline eval).
4. Verdict + History
Controls
- Pause/resume per profile (pause stops new proposals; in-flight experiment finishes its window).
- One-experiment-in-flight-per-profile reflected in UI state.
Phasing (UI-side, tracks agent phases)
- P0 — observability page (consume metrics endpoints). Ship first, standalone value.
- P1 — surface scenario-suite runs + results (manual "run eval" trigger).
- P2 — proposal queue + approve/reject/edit + observation-window cards.
- P3 — memory-hygiene / USER.md staleness views (separate metrics).
- P4 — extend scorecards + queue to skills/plugins (reuses everything).
Out of scope (separate issue)
- The engine, metrics store, eval runner, meta-agent, git ratchet, and
/api/improve/* implementation → Interstellar-code/hermes-agent#133.
Reference
docs/self-improving-agent-proposal.md — §4 (what Switch UI already has: profile file read/write, analytics/logs/sessions APIs), §6c (/improve experiment lifecycle spec). Never-upstream fork differentiator ("Skill Health / Agent Improvement" page).
Summary
Add an
/improvepage to Switch UI that monitors per-profile / per-skill agent health and drives the self-improving harness loop (Karpathy autoresearch ratchet) as a human-in-the-loop approval queue. This page is a thin consumer of the agent-side engine API (/api/improve/*) — all heavy machinery (metrics store, eval runner, meta-agent proposer, git ratchet, experiment state machine) lives in the hermes-agent plugin.Companion / dependency: agent engine issue →
Interstellar-code/hermes-agent#133. This UI issue is blocked on that plugin's/api/improve/*surface.This is the UI half of
docs/self-improving-agent-proposal.md(§4, §6c). Engine/agent work is out of scope here.Architecture
/api/improve/*via the existing dashboard-proxy pattern (same asjobs/kanban)./improvenav + page when the agent plugin isn't installed/enabled (mirror jobs/kanban probing).Page structure (
/improveroute)1. Per-profile observability (P0 — zero-risk, immediate value)
Scorecard built from data the dashboard already fetches, scoped per profile:
2. Proposal queue (state:
proposed)Card per pending experiment:
system_prompt) — exactly one atomic change, highlighted.3. Observation window (state:
live)4. Verdict + History
results.tsv-equivalent — every experiment with diff, offline + live scores, verdict, cost.Controls
Phasing (UI-side, tracks agent phases)
Out of scope (separate issue)
/api/improve/*implementation →Interstellar-code/hermes-agent#133.Reference
docs/self-improving-agent-proposal.md— §4 (what Switch UI already has: profile file read/write, analytics/logs/sessions APIs), §6c (/improveexperiment lifecycle spec). Never-upstream fork differentiator ("Skill Health / Agent Improvement" page).