Summary
CodeLens scan is single-threaded. Add --jobs N parallelism, regex prefilter (skip files that cannot match), AST disk cache (avoid re-parsing across commands), and optional shared daemon (1 process per project, N MCP clients). Target: 2-4x speedup on 8-core for scan, <2x slowdown for rule matching vs search regex.
Worker consensus (6 reports)
| Worker |
Source |
Contribution |
| CodeGraph |
update!/CodeLens_CodeGraph_Upgrade_Analysis.md #5 |
Worker-thread pool: query pool (CPU-heavy codelens_explore in worker threads) + parse pool (ProcessPoolExecutor for tree-sitter, not thread-safe). 2-4x speedup on 8-core. CODELENS_QUERY_POOL_SIZE / CODELENS_PARSE_WORKERS env vars. |
| CodeGraph |
same file #4 |
Shared daemon architecture — 1 detached codelens serve --mcp --daemon per project root, N concurrent MCP clients over Unix socket. 1 watcher, 1 SQLite WAL writer, 1 tree-sitter warm-up. Idle timeout 300s. |
| CodeGraph |
same file #6 |
Watchdog stack — PPID watchdog (orphan detection), liveness watchdog (heartbeat), stale stdin teardown. |
| Opengrep |
update!/CodeLens_Opengrep_Upgrade_Analysis.md #59 |
Parallelism / --jobs N for scan. Python multiprocessing.Pool (tree-sitter not thread-safe). |
| Semgrep |
update!/CodeLens_Upgrade_Issues_from_Semgrep.md CL-009 |
Pre-filtering optimization — derive "fast regex" from each rule pattern (e.g. eval(...) → eval), run ripgrep to filter candidate files. Skips 90%+ of files in <1s. 3x speedup. --no-prefilter to disable. |
| Semgrep |
same file CL-020 |
Disk cache for AST parse results — ~/.codelens/cache/ keyed by SHA-256 of file content + parser version. Auto-evict >30 days. codelens cache clear / codelens cache stats. --no-cache for benchmarks. |
| UBS |
update!/CodeLens_UBS_Upgrade_Analysis.md #21 |
--jobs=N (0=auto, 1=deterministic for CI, 16=explicit) + --only=LANG filter (only scan Python+Rust). 2-4x speedup on multi-core. |
| RepoAudit |
update!/CodeLens_Upgrade_Issues_from_RepoAudit.md CL-042 |
LLM response cache + token cost tracking (related — disk cache pattern reused for LLM). |
Proposed phased scope
Phase 1 — Regex prefilter (P1, 1-2 weeks, quick win)
- New
scripts/prefilter.py
- Analyze each rule pattern at load time, extract literal tokens (identifiers, strings)
- Build prefilter regex from tokens
- Run
ripgrep subprocess to filter candidate files before AST parse
- Stats in output:
{prefilter: {total_files, passed, skipped, time_ms}}
--no-prefilter flag to disable
- Target: 3x speedup on 5000-file repo with 100+ rules
Phase 2 — --jobs N parallelism (P1, 1-2 weeks)
concurrent.futures.ProcessPoolExecutor for CPU-bound parse (tree-sitter not thread-safe)
--jobs N flag (0=auto-detect cpu_count, 1=single-threaded for CI determinism, N=explicit)
JOBS env var
- Worker entry point takes
(task_id, file_path, language), returns (task_id, ExtractionResult)
--only=LANG[,LANG,...] filter (skip irrelevant parsers)
- Per-worker recycle (WASM memory grows but never shrinks)
- 2-stage retry (fresh worker, then comment-stripped)
- Target: 2-4x speedup on 8-core
Phase 3 — AST disk cache (P1, 1 week)
- New
scripts/disk_cache.py
- Cache at
~/.codelens/cache/ keyed by SHA-256 of file content + parser version
- Store pickled AST
- Auto-evict entries >30 days
codelens cache clear / codelens cache stats commands
--no-cache flag for benchmarks
- Hit ratio exposed in output
Phase 4 — Shared daemon (P2, 3-4 weeks, depends on Phase 2)
codelens serve --mcp --daemon — detached process per project root
- Unix-domain socket (Linux/macOS) or named pipe (Windows)
- N concurrent MCP clients share 1 engine (1 watcher, 1 SQLite WAL writer, 1 tree-sitter warm-up)
- Daemon registry in
~/.codelens/daemons/ keyed by SHA-256 of project root path
codelens daemons command (list/stop)
- Idle timeout 300s
CODELENS_NO_DAEMON=1 opt-out
Phase 5 — Watchdog stack (P2, 2 weeks, depends on Phase 4)
- PPID watchdog — orphan detection via
os.getppid() polling (POSIX) or parent liveness (Windows)
- Liveness watchdog — separate process, parent writes heartbeat byte to child's stdin every 1s, child SIGKILLs parent if no byte within 30s
- Stale stdin teardown — listen for stdin
error event, destroy stream on terminal event
Acceptance criteria
Related
Summary
CodeLens scan is single-threaded. Add
--jobs Nparallelism, regex prefilter (skip files that cannot match), AST disk cache (avoid re-parsing across commands), and optional shared daemon (1 process per project, N MCP clients). Target: 2-4x speedup on 8-core for scan, <2x slowdown for rule matching vssearchregex.Worker consensus (6 reports)
update!/CodeLens_CodeGraph_Upgrade_Analysis.md#5codelens_explorein worker threads) + parse pool (ProcessPoolExecutorfor tree-sitter, not thread-safe). 2-4x speedup on 8-core.CODELENS_QUERY_POOL_SIZE/CODELENS_PARSE_WORKERSenv vars.codelens serve --mcp --daemonper project root, N concurrent MCP clients over Unix socket. 1 watcher, 1 SQLite WAL writer, 1 tree-sitter warm-up. Idle timeout 300s.update!/CodeLens_Opengrep_Upgrade_Analysis.md#59--jobs Nfor scan. Pythonmultiprocessing.Pool(tree-sitter not thread-safe).update!/CodeLens_Upgrade_Issues_from_Semgrep.mdCL-009eval(...)→eval), runripgrepto filter candidate files. Skips 90%+ of files in <1s. 3x speedup.--no-prefilterto disable.~/.codelens/cache/keyed by SHA-256 of file content + parser version. Auto-evict >30 days.codelens cache clear/codelens cache stats.--no-cachefor benchmarks.update!/CodeLens_UBS_Upgrade_Analysis.md#21--jobs=N(0=auto, 1=deterministic for CI, 16=explicit) +--only=LANGfilter (only scan Python+Rust). 2-4x speedup on multi-core.update!/CodeLens_Upgrade_Issues_from_RepoAudit.mdCL-042Proposed phased scope
Phase 1 — Regex prefilter (P1, 1-2 weeks, quick win)
scripts/prefilter.pyripgrepsubprocess to filter candidate files before AST parse{prefilter: {total_files, passed, skipped, time_ms}}--no-prefilterflag to disablePhase 2 —
--jobs Nparallelism (P1, 1-2 weeks)concurrent.futures.ProcessPoolExecutorfor CPU-bound parse (tree-sitter not thread-safe)--jobs Nflag (0=auto-detect cpu_count, 1=single-threaded for CI determinism, N=explicit)JOBSenv var(task_id, file_path, language), returns(task_id, ExtractionResult)--only=LANG[,LANG,...]filter (skip irrelevant parsers)Phase 3 — AST disk cache (P1, 1 week)
scripts/disk_cache.py~/.codelens/cache/keyed by SHA-256 of file content + parser versioncodelens cache clear/codelens cache statscommands--no-cacheflag for benchmarksPhase 4 — Shared daemon (P2, 3-4 weeks, depends on Phase 2)
codelens serve --mcp --daemon— detached process per project root~/.codelens/daemons/keyed by SHA-256 of project root pathcodelens daemonscommand (list/stop)CODELENS_NO_DAEMON=1opt-outPhase 5 — Watchdog stack (P2, 2 weeks, depends on Phase 4)
os.getppid()polling (POSIX) or parent liveness (Windows)errorevent, destroy stream on terminal eventAcceptance criteria
Related