Epic / tracking issue.
Context
The history browse CLI (`list`/`runs`/`show`/`compare`/`clean`) was cut from the evals release (merged in 54d6037): functional and green, but its output proved hard to read and the UX was not yet stabilised. We shipped the settled parts and deferred the browse UI.
What stays (do NOT re-litigate):
- The writer is always-on: every run appends one entry to `.protest/history.jsonl` (`HistoryPlugin` + `protest/history/`, schema_version=1). Data accumulates from day one.
- The cut reader is preserved verbatim on branch `archive/history-cli` (commit d80588d) — resurrect subcommands from there.
Strategy
Reintroduce one subcommand at a time, driven by real usage, each one polished and legible before it ships. The stable contract to protect is the JSONL format, not the views.
Design backlog (fix as part of reintroduction)
Smaller items (no separate issue yet):
Legibility (the reason it was cut)
The standout offender was the `list` "Scores" column rendering unlabeled arrows (`↗↗→`) — three arrows for three scores with no way to tell which is which. Whatever comes back must be readable at a glance: label the arrows, add an inline legend for the `+ - ⟳ * ✗` markers, and surface what `(scoring modified)` means without a docs trip.
Epic / tracking issue.
Context
The history browse CLI (`list`/`runs`/`show`/`compare`/`clean`) was cut from the evals release (merged in 54d6037): functional and green, but its output proved hard to read and the UX was not yet stabilised. We shipped the settled parts and deferred the browse UI.
What stays (do NOT re-litigate):
Strategy
Reintroduce one subcommand at a time, driven by real usage, each one polished and legible before it ships. The stable contract to protect is the JSONL format, not the views.
Design backlog (fix as part of reintroduction)
compareshould diff score deltas, not just pass/fail verdicts #104 — `compare` diffs score deltas, not just pass/fail verdicts (highest value)Smaller items (no separate issue yet):
Legibility (the reason it was cut)
The standout offender was the `list` "Scores" column rendering unlabeled arrows (`↗↗→`) — three arrows for three scores with no way to tell which is which. Whatever comes back must be readable at a glance: label the arrows, add an inline legend for the `+ - ⟳ * ✗` markers, and surface what `(scoring modified)` means without a docs trip.