Location
router/main.py (inside classify_request, sync_adaptive_router_roster, get_best_free_model, proxy_models, proxy_memory, get_llamacpp_metrics, check_http_endpoint, and _register_ollama_models_in_db)
Problem Description
Although a shared client implementation exists via get_http_client(), multiple vital routing functions instantiate localized async with httpx.AsyncClient() engines on every request invocation. This completely bypasses connection pooling mechanics and directly triggers socket exhaustion (TIME_WAIT state accumulation) under heavy concurrency.
Fix Overview
Deprecate localized context managers and leverage the unified global proxy client retrieved via get_http_client(). Custom timeouts should be passed directly to the request calls.
Proposed Implementation Steps
- Open
router/main.py.
- Locate all local
async with httpx.AsyncClient(...) as client: context managers.
- Replace them with a reference to
get_http_client().
- Overwrite/pass the specific timeout to the method call (e.g.
client.get(url, timeout=...)).
Code Example / Diff
Example: Fixing get_best_free_model
@@ -1235,3 +1235,3 @@
try:
- async with httpx.AsyncClient(timeout=2.0) as client:
- r = await client.get("https://openrouter.ai/api/v1/models")
+ client = get_http_client()
+ r = await client.get("https://openrouter.ai/api/v1/models", timeout=2.0)
Location
router/main.py(insideclassify_request,sync_adaptive_router_roster,get_best_free_model,proxy_models,proxy_memory,get_llamacpp_metrics,check_http_endpoint, and_register_ollama_models_in_db)Problem Description
Although a shared client implementation exists via
get_http_client(), multiple vital routing functions instantiate localizedasync with httpx.AsyncClient()engines on every request invocation. This completely bypasses connection pooling mechanics and directly triggers socket exhaustion (TIME_WAIT state accumulation) under heavy concurrency.Fix Overview
Deprecate localized context managers and leverage the unified global proxy client retrieved via
get_http_client(). Custom timeouts should be passed directly to the request calls.Proposed Implementation Steps
router/main.py.async with httpx.AsyncClient(...) as client:context managers.get_http_client().client.get(url, timeout=...)).Code Example / Diff
Example: Fixing
get_best_free_model