Problem
Basic Memory's experimental LiteLLM embedding provider currently supports provider/model selection, dimensions, role-specific input_type, batching, and request concurrency, but it does not expose an api_base / custom endpoint setting for embedding calls.
That makes it awkward or impossible to use Basic Memory with OpenAI-compatible embedding servers that are not hosted at the default provider endpoint.
A concrete example is running llama-server --embedding directly and exposing its OpenAI-compatible /v1/embeddings API. LiteLLM supports this shape by calling embeddings with an openai/... model string and an api_base, but Basic Memory does not currently provide a config field that reaches litellm.aembedding(...).
Why this matters
This is useful for local and self-hosted embedding deployments.
For example, Ollama can accept batched embedding input, but in some local setups it still runs embedding work through a single llama.cpp slot, which makes large Basic Memory reindexes extremely slow. Running llama-server --embedding directly with multiple slots can be materially faster while still exposing an OpenAI-compatible embeddings endpoint.
Current code shape
From a quick read of current main:
-
src/basic_memory/repository/litellm_provider.py
LiteLLMEmbeddingProvider accepts api_key, timeout, dimensions, batch size, concurrency, and role settings.
_embed() builds params for litellm.aembedding(...).
- The params include
model, input, drop_params, timeout, optionally dimensions, api_key, and input_type.
- There is no
api_base.
-
src/basic_memory/config.py
- Semantic embedding config includes provider, model, dimensions, dimension forwarding, batch size, request concurrency, document/query input types, sync batch size, and FastEmbed runtime knobs.
- There is no endpoint/base URL field for semantic embedding providers.
-
src/basic_memory/repository/embedding_provider_factory.py
- The factory passes LiteLLM model, batch size, request concurrency, input types, and dimension forwarding into
LiteLLMEmbeddingProvider.
- There is no endpoint/base URL field in the provider cache key or provider constructor call.
Suggested solution
Add api_base option and pass it to the LiteLLM embedding providers.
Problem
Basic Memory's experimental LiteLLM embedding provider currently supports provider/model selection, dimensions, role-specific
input_type, batching, and request concurrency, but it does not expose anapi_base/ custom endpoint setting for embedding calls.That makes it awkward or impossible to use Basic Memory with OpenAI-compatible embedding servers that are not hosted at the default provider endpoint.
A concrete example is running
llama-server --embeddingdirectly and exposing its OpenAI-compatible/v1/embeddingsAPI. LiteLLM supports this shape by calling embeddings with anopenai/...model string and anapi_base, but Basic Memory does not currently provide a config field that reacheslitellm.aembedding(...).Why this matters
This is useful for local and self-hosted embedding deployments.
For example, Ollama can accept batched embedding input, but in some local setups it still runs embedding work through a single llama.cpp slot, which makes large Basic Memory reindexes extremely slow. Running
llama-server --embeddingdirectly with multiple slots can be materially faster while still exposing an OpenAI-compatible embeddings endpoint.Current code shape
From a quick read of current
main:src/basic_memory/repository/litellm_provider.pyLiteLLMEmbeddingProvideracceptsapi_key,timeout, dimensions, batch size, concurrency, and role settings._embed()builds params forlitellm.aembedding(...).model,input,drop_params,timeout, optionallydimensions,api_key, andinput_type.api_base.src/basic_memory/config.pysrc/basic_memory/repository/embedding_provider_factory.pyLiteLLMEmbeddingProvider.Suggested solution
Add api_base option and pass it to the LiteLLM embedding providers.