[/improve] Self-improving agent UI — per-profile health + proposal/approval queue

## Summary

Add an `/improve` page to Switch UI that monitors per-profile / per-skill agent health and drives the **self-improving harness loop** (Karpathy autoresearch ratchet) as a human-in-the-loop approval queue. This page is a **thin consumer** of the agent-side engine API (`/api/improve/*`) — all heavy machinery (metrics store, eval runner, meta-agent proposer, git ratchet, experiment state machine) lives in the hermes-agent plugin.

**Companion / dependency:** agent engine issue → `Interstellar-code/hermes-agent#133`. This UI issue is blocked on that plugin's `/api/improve/*` surface.

This is the **UI half** of `docs/self-improving-agent-proposal.md` (§4, §6c). Engine/agent work is out of scope here.

## Architecture

- Consume `/api/improve/*` via the **existing dashboard-proxy pattern** (same as `jobs`/`kanban`).
- **Capability-probe** the endpoint — hide the `/improve` nav + page when the agent plugin isn't installed/enabled (mirror jobs/kanban probing).
- No business logic in the UI: it renders proposals, scores, diffs, history, and posts approve/reject/pause actions back to the API.

## Page structure (`/improve` route)

### 1. Per-profile observability (P0 — zero-risk, immediate value)
Scorecard built from data the dashboard already fetches, scoped **per profile**:
- sessions/day, error+warn rate, cost trend, token efficiency, retries.
- Per-skill row (P4): invocations, error rate (logs↔sessions correlated), retries, avg tokens/cost, last-used, trend.
- Trends sourced from the agent metrics store via API (survives restarts).
- **This section alone answers "is my harness degrading?" and ships before any loop.**

### 2. Proposal queue (state: `proposed`)
Card per pending experiment:
- Side-by-side **old vs new diff** (SOUL.md / profile `system_prompt`) — exactly one atomic change, highlighted.
- Meta-agent rationale.
- Offline eval table: per-scenario pass/fail **before vs after**, aggregate delta, eval token cost.
- Actions: **Approve / Reject / Edit-then-approve** (reject is logged so the idea isn't re-proposed).

### 3. Observation window (state: `live`)
- Progress: "12 / 30 sessions observed" (window configurable per profile).
- Live metrics vs baseline: error/warn rate, completion, retries, token efficiency, periodic LLM-judge spot-scores.
- This is the **second verification stage** (production proof, beyond offline eval).

### 4. Verdict + History
- Verified → baseline vN+1; regressed → auto-revert notification ("experiment #14 reverted: error rate +18%").
- History tab: `results.tsv`-equivalent — every experiment with diff, offline + live scores, verdict, cost.
- **Baseline version curve per profile** — score ratcheting up over time.

### Controls
- **Pause/resume per profile** (pause stops new proposals; in-flight experiment finishes its window).
- One-experiment-in-flight-per-profile reflected in UI state.

## Phasing (UI-side, tracks agent phases)

- **P0** — observability page (consume metrics endpoints). Ship first, standalone value.
- **P1** — surface scenario-suite runs + results (manual "run eval" trigger).
- **P2** — proposal queue + approve/reject/edit + observation-window cards.
- **P3** — memory-hygiene / USER.md staleness views (separate metrics).
- **P4** — extend scorecards + queue to skills/plugins (reuses everything).

## Out of scope (separate issue)

- The engine, metrics store, eval runner, meta-agent, git ratchet, and `/api/improve/*` implementation → `Interstellar-code/hermes-agent#133`.

## Reference

`docs/self-improving-agent-proposal.md` — §4 (what Switch UI already has: profile file read/write, analytics/logs/sessions APIs), §6c (`/improve` experiment lifecycle spec). Never-upstream fork differentiator ("Skill Health / Agent Improvement" page).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[/improve] Self-improving agent UI — per-profile health + proposal/approval queue #206

Summary

Architecture

Page structure (`/improve` route)

1. Per-profile observability (P0 — zero-risk, immediate value)

2. Proposal queue (state: `proposed`)

3. Observation window (state: `live`)

4. Verdict + History

Controls

Phasing (UI-side, tracks agent phases)

Out of scope (separate issue)

Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[/improve] Self-improving agent UI — per-profile health + proposal/approval queue #206

Description

Summary

Architecture

Page structure (/improve route)

1. Per-profile observability (P0 — zero-risk, immediate value)

2. Proposal queue (state: proposed)

3. Observation window (state: live)

4. Verdict + History

Controls

Phasing (UI-side, tracks agent phases)

Out of scope (separate issue)

Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Page structure (`/improve` route)

2. Proposal queue (state: `proposed`)

3. Observation window (state: `live`)