Skip to content

[FEATURE] Semantic vector search via semantic_query MCP tool #11

Description

@Wolfvin

Problem

Current search is regex/BM25 — purely lexical. If an agent is looking for 'authentication logic' but the actual function is named verify_jwt_claims, grep and symbol search will miss it. Agents end up reading many files manually to find the right entry point.

Proposed Change

Add a semantic_query MCP tool:

semantic_query(query='user authentication flow', top_k=10)

Returns ranked symbols/files with similarity scores.

Implementation Options

Option A (zero-dependency): TF-IDF over docstrings + symbol names + surrounding comments. Fast, no model needed, good enough for most cases.

Option B (local embeddings): Use sentence-transformers with a small model (all-MiniLM-L6, ~80MB). Store 384-dim float32 vectors in SQLite blob column. Cosine sim at query time.

Option C (code-specific): Use nomic-embed-code or similar code embedding model.

Recommend starting with Option A as non-breaking enhancement, Option B as optional install.

Why This Matters

Agents spend most tokens on exploration when they don't know where something is. Semantic search collapses 'find the right file' from 5-10 read calls to 1 query.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions