Skip to content

Reporting: machine-readable per-case results (verdict + scores + cost) alongside the .md artifacts #114

Description

@renaudcepre

Context

The per-case .md artifacts are excellent for human debugging (I used them to diagnose an LLM agent overwriting a property mid-run). But when I wanted the pass/fail matrix programmatically — which cases failed, which scores — I had to grep .protest/results/<run>/<case>.md and .protest/last_run_stdout. The latter was truncated and ANSI-laden, and greps for the verdict lines didn't match cleanly.

Ask

A structured, stable run summary alongside the .md files, e.g. .protest/results/<run>/summary.json:

{
  "suite": "atelier",
  "passed": 9,
  "total": 10,
  "cost": 0.0052,
  "cases": [
    {"name": "check_compatible_no_alert", "passed": false,
     "scores": {"alert_ok": false}, "reason": "..."}
  ]
}

This makes downstream tooling (custom dashboards, gating scripts, agentic workflows that read results) trivial — no parsing of Rich/ANSI output.

Relation to #32

#32 (JUnit XML) targets CI consumption of tests; this is a richer, eval-aware JSON (per-case scores + cost + reason) for interactive/programmatic inspection. They could share a serializer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestreportingTest reporting and output

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions