Skip to content

Rebalance filter priority_score: raise keyword-overlap weight / lower 0.25·credibility blend #44

Description

@smodee

Context

Follow-up split out of #13 (closed) and PR #30. PR #30 promoted ~22 national/international outlets from unknown (domain_score 0.2) to Tier 3 trusted_media (0.6) so legitimate outbreak reporting was no longer floored below the filter's credibility threshold. It also added a topical relevance term to the search-stage score (#4).

Problem

The relevance term added in #30 lives in search-stage ranking only — it does not feed the filter's keep decision. The filter's compute_priority_score still blends in 0.25·credibility heavily. Now that reputable outlets sit at Tier 3 (0.6), that credibility weight can push off-topic pieces from those same outlets over the keep threshold on authority alone.

Observed during the #3/#4/#13 review: an H5N1 run kept a CBS transcript that was off-topic — admitted on credibility, not relevance. This is the direct, PR-acknowledged trade-off of the Tier 3 promotions ("Known interaction to flag for reviewers" in #30).

Proposed change

In bioscancast/stages/filtering/heuristics.py (compute_priority_score / heuristic weights in stages/filtering/config.py):

  • Raise the keyword-overlap / relevance weight in the filter's keep decision, and/or
  • Lower the 0.25·credibility blend weight,

so a Tier 3 domain no longer clears the borderline on authority alone when topical overlap is low.

Notes / guardrails

Full method + numbers: data/investigations/findings-issues-3-4-13.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions