Dashboard labels heuristic scores as "model confidence" — could mislead engineers into over-trusting it

## What's wrong

There's no trained machine learning model running in this pipeline today. Detection is adaptive-threshold + HSV color masking (`src/detection/contour.py`), and failure-type classification is a hand-written rule list (`src/classification/rules.py`). The "confidence" numbers shown throughout the dashboard and PDF reports are formulas like:

```python
return round(min(1.0, 0.35 + 0.4 * hue_score + 0.25 * sat_score), 4)
```

These are reasonable engineering heuristics, but they are not calibrated probabilities from a trained model — there's no guarantee that "confidence 0.8" means "80% of the time this is correct" the way a properly calibrated classifier's output would.

The UI and exports currently present this number as "model confidence" / "mean confidence" without distinguishing it from a real model score. (YOLOv8 and SAM wrappers do exist in `src/detection/yolo.py` and `src/segmentation/sam.py` for when trained weights are eventually available — at that point this label would become accurate.)

## Why it matters

`AGENTS.md`'s own design philosophy says to "distinguish model estimates from confirmed findings" and the README states "all model outputs are estimates." A confidence score that looks like ML output but is actually a hand-tuned formula is exactly the kind of thing that can quietly erode that distinction for engineering or HSE users reading exported reports, especially once this tool's outputs start feeding into re-completion or regulatory decisions.

## What needs to happen

- Relabel "confidence" wherever it's shown (dashboard metrics, tooltips, per-defect tables, PDF reports) to make clear it's a rule-based heuristic score, not a trained model probability — e.g. "rule-based confidence" or "heuristic match score" instead of "model confidence."
- Add a short note to the tooltip/caption (and README "Key Metrics" table) explaining what the score is actually derived from.
- When YOLOv8/SAM are eventually used with trained weights (`ready=True`), the label can switch back to "model confidence" for those runs, since at that point it would be accurate — so this should be conditional on which detector/classifier actually produced the result, not a blanket rename.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard labels heuristic scores as "model confidence" — could mislead engineers into over-trusting it #7

What's wrong

Why it matters

What needs to happen

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dashboard labels heuristic scores as "model confidence" — could mislead engineers into over-trusting it #7

Description

What's wrong

Why it matters

What needs to happen

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions