Reporting: machine-readable per-case results (verdict + scores + cost) alongside the .md artifacts

### Context

The per-case `.md` artifacts are excellent for human debugging (I used them to diagnose an LLM agent overwriting a property mid-run). But when I wanted the **pass/fail matrix programmatically** — which cases failed, which scores — I had to grep `.protest/results/<run>/<case>.md` and `.protest/last_run_stdout`. The latter was truncated and ANSI-laden, and greps for the verdict lines didn't match cleanly.

### Ask

A structured, stable run summary alongside the `.md` files, e.g. `.protest/results/<run>/summary.json`:

```json
{
  "suite": "atelier",
  "passed": 9,
  "total": 10,
  "cost": 0.0052,
  "cases": [
    {"name": "check_compatible_no_alert", "passed": false,
     "scores": {"alert_ok": false}, "reason": "..."}
  ]
}
```

This makes downstream tooling (custom dashboards, gating scripts, agentic workflows that read results) trivial — no parsing of Rich/ANSI output.

### Relation to #32

#32 (JUnit XML) targets CI consumption of *tests*; this is a richer, **eval-aware** JSON (per-case scores + cost + reason) for interactive/programmatic inspection. They could share a serializer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reporting: machine-readable per-case results (verdict + scores + cost) alongside the .md artifacts #114

Context

Ask

Relation to #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reporting: machine-readable per-case results (verdict + scores + cost) alongside the .md artifacts #114

Description

Context

Ask

Relation to #32

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions