Summary
StateClient.get(max_count=N, metadata=...) materializes the entire matching history on every call (just to sort by timestamp and slice the newest N). Called frequently on a large collection, this causes unbounded process-memory growth.
Observed in the plantbot exhibition deployment: the agent became unresponsive (and the Raspberry Pi host unreachable) ~1h after every boot. main.py grew to ~2.3 GB RSS (~1.9 GB anonymous) and exhausted RAM + swap, leading to thrashing / hang. All modules run as threads in one process (Agent.start), so they share that heap.
Root cause
get() first calls collection.get(include=['metadatas'], where=metadata) with no limit, materializing every matching row's ids + metadata, sorts client-side, then fetches the full data for only the top-N. The first pass is O(collection size) per call. conversation_prompter calls it every 1 s for 3 kinds; the collection is 8k+ rows and grows monotonically (persists across restarts), so each call materializes an ever-larger set and RSS ratchets up (Python / glibc do not return it to the OS).
This continues the work in ccf4313 (optimize state get function), which removed the heavy embeddings/documents from the first pass but left the full metadata scan.
Evidence (chromadb 0.5.23, collection = 8059 rows)
- Looping the current
get() (3 kinds x 150) -> +67 MB, monotonic, no plateau (~0.5 MB/call). At 1 Hz that is ~33 MB/min ~= 2 GB/hr, which matches the ~1h-to-OOM timeline.
gc.collect() reclaims most of it -> allocation churn outpacing GC, not a C-level leak.
- A bounded fetch (
limit=10) -> +0 MB.
Proposed fix
When max_count is set, avoid the full scan. States are appended in time order, so the newest live at the tail by insertion order: fetch a small tail window via offset/limit (a few x max_count), then sort by timestamp and slice. Validated against this collection: returns the exact same newest-N as today, and drops growth from +67 MB to +0.3 MB over the same 450 calls.
Related (separate issue, not this PR)
Each StateClient builds its own DefaultEmbeddingFunction (all-MiniLM ONNX) in-process (~842 MB measured); multiple modules in one process multiply this. Will file separately.
Summary
StateClient.get(max_count=N, metadata=...)materializes the entire matching history on every call (just to sort by timestamp and slice the newest N). Called frequently on a large collection, this causes unbounded process-memory growth.Observed in the plantbot exhibition deployment: the agent became unresponsive (and the Raspberry Pi host unreachable) ~1h after every boot.
main.pygrew to ~2.3 GB RSS (~1.9 GB anonymous) and exhausted RAM + swap, leading to thrashing / hang. All modules run as threads in one process (Agent.start), so they share that heap.Root cause
get()first callscollection.get(include=['metadatas'], where=metadata)with no limit, materializing every matching row's ids + metadata, sorts client-side, then fetches the full data for only the top-N. The first pass is O(collection size) per call.conversation_promptercalls it every 1 s for 3 kinds; the collection is 8k+ rows and grows monotonically (persists across restarts), so each call materializes an ever-larger set and RSS ratchets up (Python / glibc do not return it to the OS).This continues the work in
ccf4313(optimize state get function), which removed the heavy embeddings/documents from the first pass but left the full metadata scan.Evidence (chromadb 0.5.23, collection = 8059 rows)
get()(3 kinds x 150) -> +67 MB, monotonic, no plateau (~0.5 MB/call). At 1 Hz that is ~33 MB/min ~= 2 GB/hr, which matches the ~1h-to-OOM timeline.gc.collect()reclaims most of it -> allocation churn outpacing GC, not a C-level leak.limit=10) -> +0 MB.Proposed fix
When
max_countis set, avoid the full scan. States are appended in time order, so the newest live at the tail by insertion order: fetch a small tail window viaoffset/limit(a few xmax_count), then sort by timestamp and slice. Validated against this collection: returns the exact same newest-N as today, and drops growth from +67 MB to +0.3 MB over the same 450 calls.Related (separate issue, not this PR)
Each
StateClientbuilds its ownDefaultEmbeddingFunction(all-MiniLM ONNX) in-process (~842 MB measured); multiple modules in one process multiply this. Will file separately.