[FEATURE] Add api_base support for LiteLLM semantic embedding providers

## Problem

Basic Memory's experimental LiteLLM embedding provider currently supports provider/model selection, dimensions, role-specific `input_type`, batching, and request concurrency, but it does not expose an `api_base` / custom endpoint setting for embedding calls.

That makes it awkward or impossible to use Basic Memory with OpenAI-compatible embedding servers that are not hosted at the default provider endpoint.

A concrete example is running `llama-server --embedding` directly and exposing its OpenAI-compatible `/v1/embeddings` API. LiteLLM supports this shape by calling embeddings with an `openai/...` model string and an `api_base`, but Basic Memory does not currently provide a config field that reaches `litellm.aembedding(...)`.

## Why this matters

This is useful for local and self-hosted embedding deployments.

For example, Ollama can accept batched embedding input, but in some local setups it still runs embedding work through a single llama.cpp slot, which makes large Basic Memory reindexes extremely slow. Running `llama-server --embedding` directly with multiple slots can be materially faster while still exposing an OpenAI-compatible embeddings endpoint.

## Current code shape

From a quick read of current `main`:

- `src/basic_memory/repository/litellm_provider.py`
  - `LiteLLMEmbeddingProvider` accepts `api_key`, `timeout`, dimensions, batch size, concurrency, and role settings.
  - `_embed()` builds params for `litellm.aembedding(...)`.
  - The params include `model`, `input`, `drop_params`, `timeout`, optionally `dimensions`, `api_key`, and `input_type`.
  - There is no `api_base`.

- `src/basic_memory/config.py`
  - Semantic embedding config includes provider, model, dimensions, dimension forwarding, batch size, request concurrency, document/query input types, sync batch size, and FastEmbed runtime knobs.
  - There is no endpoint/base URL field for semantic embedding providers.

- `src/basic_memory/repository/embedding_provider_factory.py`
  - The factory passes LiteLLM model, batch size, request concurrency, input types, and dimension forwarding into `LiteLLMEmbeddingProvider`.
  - There is no endpoint/base URL field in the provider cache key or provider constructor call.

## Suggested solution

Add api_base option and pass it to the LiteLLM embedding providers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add api_base support for LiteLLM semantic embedding providers #1005

Problem

Why this matters

Current code shape

Suggested solution

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Add api_base support for LiteLLM semantic embedding providers #1005

Description

Problem

Why this matters

Current code shape

Suggested solution

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions