What's wrong
There's no trained machine learning model running in this pipeline today. Detection is adaptive-threshold + HSV color masking (src/detection/contour.py), and failure-type classification is a hand-written rule list (src/classification/rules.py). The "confidence" numbers shown throughout the dashboard and PDF reports are formulas like:
return round(min(1.0, 0.35 + 0.4 * hue_score + 0.25 * sat_score), 4)
These are reasonable engineering heuristics, but they are not calibrated probabilities from a trained model — there's no guarantee that "confidence 0.8" means "80% of the time this is correct" the way a properly calibrated classifier's output would.
The UI and exports currently present this number as "model confidence" / "mean confidence" without distinguishing it from a real model score. (YOLOv8 and SAM wrappers do exist in src/detection/yolo.py and src/segmentation/sam.py for when trained weights are eventually available — at that point this label would become accurate.)
Why it matters
AGENTS.md's own design philosophy says to "distinguish model estimates from confirmed findings" and the README states "all model outputs are estimates." A confidence score that looks like ML output but is actually a hand-tuned formula is exactly the kind of thing that can quietly erode that distinction for engineering or HSE users reading exported reports, especially once this tool's outputs start feeding into re-completion or regulatory decisions.
What needs to happen
- Relabel "confidence" wherever it's shown (dashboard metrics, tooltips, per-defect tables, PDF reports) to make clear it's a rule-based heuristic score, not a trained model probability — e.g. "rule-based confidence" or "heuristic match score" instead of "model confidence."
- Add a short note to the tooltip/caption (and README "Key Metrics" table) explaining what the score is actually derived from.
- When YOLOv8/SAM are eventually used with trained weights (
ready=True), the label can switch back to "model confidence" for those runs, since at that point it would be accurate — so this should be conditional on which detector/classifier actually produced the result, not a blanket rename.
What's wrong
There's no trained machine learning model running in this pipeline today. Detection is adaptive-threshold + HSV color masking (
src/detection/contour.py), and failure-type classification is a hand-written rule list (src/classification/rules.py). The "confidence" numbers shown throughout the dashboard and PDF reports are formulas like:These are reasonable engineering heuristics, but they are not calibrated probabilities from a trained model — there's no guarantee that "confidence 0.8" means "80% of the time this is correct" the way a properly calibrated classifier's output would.
The UI and exports currently present this number as "model confidence" / "mean confidence" without distinguishing it from a real model score. (YOLOv8 and SAM wrappers do exist in
src/detection/yolo.pyandsrc/segmentation/sam.pyfor when trained weights are eventually available — at that point this label would become accurate.)Why it matters
AGENTS.md's own design philosophy says to "distinguish model estimates from confirmed findings" and the README states "all model outputs are estimates." A confidence score that looks like ML output but is actually a hand-tuned formula is exactly the kind of thing that can quietly erode that distinction for engineering or HSE users reading exported reports, especially once this tool's outputs start feeding into re-completion or regulatory decisions.What needs to happen
ready=True), the label can switch back to "model confidence" for those runs, since at that point it would be accurate — so this should be conditional on which detector/classifier actually produced the result, not a blanket rename.