Skip to content

Whisper RAM gate: stop TTS server unconditionally loading large-v3 (~3GB) #367

@dotdevdotdev

Description

@dotdevdotdev

Carved from #366 (Part 2). Independent — no dependency on #365, ship first.

Problem

agentwire/tts_server.py lifespan startup loads WhisperModel("large-v3") on every boot, unconditionally (~3GB+ RAM) — even when host STT is served elsewhere (moonshine on :8101) or not needed at all.

210  print("Loading Whisper model (large-v3)...")
211  try:
212      whisper_model = WhisperModel("large-v3", device="cuda", compute_type="float16")
213  except (ValueError, RuntimeError):
215      whisper_model = WhisperModel("large-v3", device="cpu", compute_type="int8")
216  print("Whisper model loaded!")

(import at tts_server.py:38; whisper_model initialized None at :74.) /transcribe (:436-440) already 503s when whisper_model is None; /health reports it (:478). So the off-state is already handled downstream — only the load itself is unconditional.

Fix (no compat — pre-launch)

Gate lines 210–216 (both the cuda branch and the cpu-fallback branch) behind env AGENTWIRE_TTS_WHISPER (default off → leave whisper_model = None). Optionally lazy-import WhisperModel so the TTS venv doesn't need faster-whisper installed when off.

Acceptance

  • Default boot (no AGENTWIRE_TTS_WHISPER): TTS server starts with whisper_model = None, no large-v3 load, ~3GB RAM freed; /transcribe → 503, /health shows whisper unavailable.
  • AGENTWIRE_TTS_WHISPER=1: large-v3 loads as before (cuda, then cpu fallback intact); /transcribe works.
  • No other startup behavior changes.

Files

  • agentwire/tts_server.py — gate 210–216, import 38, init 74.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions