The rubric scoring mode in backend/services/judge.py hardcodes hallucination_detected: false:
else: # rubric
prompt = f"""...
{{
"score": <float 0.0 to 1.0>,
"passed": <true if score >= 0.7>,
"hallucination_detected": false,
"hallucination_explanation": null,
...
}}"""
This means rubric-scored test cases -- creative writing, open-ended reasoning, tone evaluation -- cannot flag hallucinated content even when the model introduces spurious facts.
Why this matters: A rubric-scored case like 'Explain the difference between REST and GraphQL' has no single expected answer so exact-match hallucination detection doesn't apply. But the rubric can specify 'must accurately describe HTTP methods' -- if the model invents a non-existent HTTP status code, that hallucination should be flagged.
Suggestion: Add rubric-adaptive hallucination detection. Include in the rubric judge prompt: 'Check whether the output contains any factual claims that are verifiably false in the real world, independent of the rubric criteria.' This catches factual hallucinations even where there is no expected_output to compare against.
Also make hallucination_detected non-optional in rubric mode. The schema should return a triple (hallucination_detected, hallucination_explanation, reasoning) for all four scoring modes. Rubric mode cannot do string-level comparison, but it can do factual-claim verification against world knowledge.
Alternatively, if hallucination detection is intentionally disabled for rubric mode, document this as a known limitation in the README scoring table -- users relying on rubric scoring may not realize they have no hallucination coverage.
The rubric scoring mode in backend/services/judge.py hardcodes hallucination_detected: false:
This means rubric-scored test cases -- creative writing, open-ended reasoning, tone evaluation -- cannot flag hallucinated content even when the model introduces spurious facts.
Why this matters: A rubric-scored case like 'Explain the difference between REST and GraphQL' has no single expected answer so exact-match hallucination detection doesn't apply. But the rubric can specify 'must accurately describe HTTP methods' -- if the model invents a non-existent HTTP status code, that hallucination should be flagged.
Suggestion: Add rubric-adaptive hallucination detection. Include in the rubric judge prompt: 'Check whether the output contains any factual claims that are verifiably false in the real world, independent of the rubric criteria.' This catches factual hallucinations even where there is no expected_output to compare against.
Also make hallucination_detected non-optional in rubric mode. The schema should return a triple (hallucination_detected, hallucination_explanation, reasoning) for all four scoring modes. Rubric mode cannot do string-level comparison, but it can do factual-claim verification against world knowledge.
Alternatively, if hallucination detection is intentionally disabled for rubric mode, document this as a known limitation in the README scoring table -- users relying on rubric scoring may not realize they have no hallucination coverage.