Skip to content

Each StateClient loads its own embedding model in-process (~842MB) -> multiplied across module threads #58

Description

@atsmsmr

Summary

Every StateClient builds its own embedding function in __init__. With the (default) DefaultEmbeddingFunction this loads the all-MiniLM ONNX model + onnxruntime in-process. Because Agent.start() runs each module as a thread in one process and _run_module_process creates a fresh StateClient per module, a single agent ends up with N copies of the embedding stack in one heap.

This is the dominant baseline memory cost (separate from the per-call leak in #56). On the plantbot deployment (Raspberry Pi 4, ~8 modules) it is a large part of why main.py sits at multiple GB.

Evidence (chromadb 0.5.23)

  • Creating one StateClient (default embedder) -> 842 MB RSS.
  • A client with embedder="none" -> 93 MB RSS.
  • ~8 module StateClients in one process multiply the 842 MB baseline.

The local CPU embedding on every add()/query() also drives high CPU (contributes to codex/vision timeouts and thermal throttling on the Pi).

Why it is not a trivial flag flip

  • The per-module StateClient in Agent._run_module_process is created with no embedder argument, so it always uses the default — modules cannot currently opt out from user code.
  • embedder="none" (DummyEmbeddingFunction) makes add() store dummy vectors, so semantic query() / retrieval stops working for that client. Many modules only ever call get() (recency) and never query(), but some (e.g. deliberate-recall) need real embeddings. So this needs a design decision, not a blanket switch.

Possible directions (for discussion)

  1. Share one embedding function per process (module-level singleton keyed by embedder config) so N StateClients reuse one model instead of N.
  2. Lazy-load the embedder on first add()/query(), so read-only (get()-only) modules never pay for it.
  3. Let modules inherit the agent's state_embedder choice (thread it through Agent.start -> _run_module_process), so a deployment can pick none/gemma/openai globally.
  4. Reconsider whether recency-only state should live in a vector store at all (related discussion in StateClient.get() scans the whole collection on every call -> unbounded memory growth #56).

Filed as the follow-up promised in #57.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions