Follow-up from #127 (opt-in URL dedup cache, Codex review round-5).
Current behavior: the cache dedupes against already-persisted captures. Two identical /web/capture requests that arrive before either persists doc-index.json both miss the lookup and run separate browser captures (producing two doc_ids). This is correct (no wrong data) but the dedup is ineffective for the concurrent multi-agent burst — degrading to the pre-cache behavior.
Proposal: serialize cache misses per composite key (url + content_type + extract_tables + lang) with a per-key async lock + recheck-under-lock before launching the browser, mirroring the docreader's per-doc_id WeakValueDictionary lock registry.
Why deferred: sequential repeats (capture now, re-request later) — the common case — already hit the cache; the tight concurrent-burst window is narrow, and a correct keyed-lock registry held across the slow browse+persist is non-trivial. Not blocking the opt-in v1.
Scope: services/browser/mantisfetch_browser/__init__.py capture() cache-miss path.
Follow-up from #127 (opt-in URL dedup cache, Codex review round-5).
Current behavior: the cache dedupes against already-persisted captures. Two identical
/web/capturerequests that arrive before either persistsdoc-index.jsonboth miss the lookup and run separate browser captures (producing two doc_ids). This is correct (no wrong data) but the dedup is ineffective for the concurrent multi-agent burst — degrading to the pre-cache behavior.Proposal: serialize cache misses per composite key (url + content_type + extract_tables + lang) with a per-key async lock + recheck-under-lock before launching the browser, mirroring the docreader's per-doc_id
WeakValueDictionarylock registry.Why deferred: sequential repeats (capture now, re-request later) — the common case — already hit the cache; the tight concurrent-burst window is narrow, and a correct keyed-lock registry held across the slow browse+persist is non-trivial. Not blocking the opt-in v1.
Scope:
services/browser/mantisfetch_browser/__init__.pycapture()cache-miss path.