Skip to content

Calibrate Claude relevance/specificity scores across queries #16

@vl-coding

Description

@vl-coding

Justification scores (1-10) aren't calibrated across queries -- an 8 for one query isn't comparable to an 8 for another, even though scores are surfaced to users.

Explore rubric-based scoring, fixed reference anchors in the prompt, or a calibration pass.

See docs/LIMITATIONS.md, Models section, Claude justifications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions