You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every StateClient builds its own embedding function in __init__. With the (default) DefaultEmbeddingFunction this loads the all-MiniLM ONNX model + onnxruntime in-process. Because Agent.start() runs each module as a thread in one process and _run_module_process creates a fresh StateClient per module, a single agent ends up with N copies of the embedding stack in one heap.
This is the dominant baseline memory cost (separate from the per-call leak in #56). On the plantbot deployment (Raspberry Pi 4, ~8 modules) it is a large part of why main.py sits at multiple GB.
~8 module StateClients in one process multiply the 842 MB baseline.
The local CPU embedding on every add()/query() also drives high CPU (contributes to codex/vision timeouts and thermal throttling on the Pi).
Why it is not a trivial flag flip
The per-module StateClient in Agent._run_module_process is created with no embedder argument, so it always uses the default — modules cannot currently opt out from user code.
embedder="none" (DummyEmbeddingFunction) makes add() store dummy vectors, so semantic query() / retrieval stops working for that client. Many modules only ever call get() (recency) and never query(), but some (e.g. deliberate-recall) need real embeddings. So this needs a design decision, not a blanket switch.
Possible directions (for discussion)
Share one embedding function per process (module-level singleton keyed by embedder config) so N StateClients reuse one model instead of N.
Lazy-load the embedder on first add()/query(), so read-only (get()-only) modules never pay for it.
Let modules inherit the agent's state_embedder choice (thread it through Agent.start -> _run_module_process), so a deployment can pick none/gemma/openai globally.
Summary
Every
StateClientbuilds its own embedding function in__init__. With the (default)DefaultEmbeddingFunctionthis loads the all-MiniLM ONNX model + onnxruntime in-process. BecauseAgent.start()runs each module as a thread in one process and_run_module_processcreates a freshStateClientper module, a single agent ends up with N copies of the embedding stack in one heap.This is the dominant baseline memory cost (separate from the per-call leak in #56). On the plantbot deployment (Raspberry Pi 4, ~8 modules) it is a large part of why
main.pysits at multiple GB.Evidence (chromadb 0.5.23)
StateClient(default embedder) -> 842 MB RSS.embedder="none"-> 93 MB RSS.StateClients in one process multiply the 842 MB baseline.The local CPU embedding on every
add()/query()also drives high CPU (contributes to codex/vision timeouts and thermal throttling on the Pi).Why it is not a trivial flag flip
StateClientinAgent._run_module_processis created with no embedder argument, so it always uses the default — modules cannot currently opt out from user code.embedder="none"(DummyEmbeddingFunction) makesadd()store dummy vectors, so semanticquery()/ retrieval stops working for that client. Many modules only ever callget()(recency) and neverquery(), but some (e.g. deliberate-recall) need real embeddings. So this needs a design decision, not a blanket switch.Possible directions (for discussion)
StateClients reuse one model instead of N.add()/query(), so read-only (get()-only) modules never pay for it.state_embedderchoice (thread it throughAgent.start->_run_module_process), so a deployment can picknone/gemma/openaiglobally.Filed as the follow-up promised in #57.