Each StateClient loads its own embedding model in-process (~842MB) -> multiplied across module threads

## Summary
Every `StateClient` builds its own embedding function in `__init__`. With the (default) `DefaultEmbeddingFunction` this loads the all-MiniLM ONNX model + onnxruntime **in-process**. Because `Agent.start()` runs each module as a **thread in one process** and `_run_module_process` creates a fresh `StateClient` per module, a single agent ends up with N copies of the embedding stack in one heap.

This is the dominant *baseline* memory cost (separate from the per-call leak in #56). On the plantbot deployment (Raspberry Pi 4, ~8 modules) it is a large part of why `main.py` sits at multiple GB.

## Evidence (chromadb 0.5.23)
- Creating **one** `StateClient` (default embedder) -> **842 MB RSS**.
- A client with `embedder="none"` -> **93 MB RSS**.
- ~8 module `StateClient`s in one process multiply the 842 MB baseline.

The local CPU embedding on every `add()`/`query()` also drives high CPU (contributes to codex/vision timeouts and thermal throttling on the Pi).

## Why it is not a trivial flag flip
- The per-module `StateClient` in `Agent._run_module_process` is created with no embedder argument, so it always uses the default — modules cannot currently opt out from user code.
- `embedder="none"` (DummyEmbeddingFunction) makes `add()` store dummy vectors, so semantic `query()` / retrieval stops working for that client. Many modules only ever call `get()` (recency) and never `query()`, but some (e.g. deliberate-recall) need real embeddings. So this needs a design decision, not a blanket switch.

## Possible directions (for discussion)
1. **Share one embedding function per process** (module-level singleton keyed by embedder config) so N `StateClient`s reuse one model instead of N.
2. **Lazy-load** the embedder on first `add()`/`query()`, so read-only (`get()`-only) modules never pay for it.
3. Let modules **inherit the agent's `state_embedder`** choice (thread it through `Agent.start` -> `_run_module_process`), so a deployment can pick `none`/`gemma`/`openai` globally.
4. Reconsider whether recency-only state should live in a vector store at all (related discussion in #56).

Filed as the follow-up promised in #57.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Each StateClient loads its own embedding model in-process (~842MB) -> multiplied across module threads #58

Summary

Evidence (chromadb 0.5.23)

Why it is not a trivial flag flip

Possible directions (for discussion)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Each StateClient loads its own embedding model in-process (~842MB) -> multiplied across module threads #58

Description

Summary

Evidence (chromadb 0.5.23)

Why it is not a trivial flag flip

Possible directions (for discussion)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions