Skip to content

stale search results due to no freshness awareness #14

@ayush00git

Description

@ayush00git

The search_docs tool returns results purely based on vector similarity with no awareness of when a document was last synced. This means that when a user asks about "current updates" or "recent activity", the retriever surfaces the most semantically similar chunks regardless of recency — newer issues and PRs get missed in favour of older ones that happen to match better semantically.

The root cause is twofold. First, the query engine in src/rag/query.ts uses a fixed similarityTopK of 3 with no metadata filtering, so there is no way to bias results toward recently ingested or recently updated documents. Second, the ingested documents originally had no created_at, updated_at, or ingested_at fields in their metadata, which meant Qdrant had no date payload to filter or sort on even if the retriever wanted to.

The consequence is that a user asking about open issues or the latest PRs could get confidently wrong answers — the system would cite old closed issues or outdated PR states without any indication that the data might be stale. There was also no mechanism to tell the user how long ago a given issue or PR was last synced, or to prompt them to re-sync if newer activity existed on GitHub.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions