Carved from #366 (Part 2). Independent — no dependency on #365, ship first.
Problem
agentwire/tts_server.py lifespan startup loads WhisperModel("large-v3") on every boot, unconditionally (~3GB+ RAM) — even when host STT is served elsewhere (moonshine on :8101) or not needed at all.
210 print("Loading Whisper model (large-v3)...")
211 try:
212 whisper_model = WhisperModel("large-v3", device="cuda", compute_type="float16")
213 except (ValueError, RuntimeError):
215 whisper_model = WhisperModel("large-v3", device="cpu", compute_type="int8")
216 print("Whisper model loaded!")
(import at tts_server.py:38; whisper_model initialized None at :74.) /transcribe (:436-440) already 503s when whisper_model is None; /health reports it (:478). So the off-state is already handled downstream — only the load itself is unconditional.
Fix (no compat — pre-launch)
Gate lines 210–216 (both the cuda branch and the cpu-fallback branch) behind env AGENTWIRE_TTS_WHISPER (default off → leave whisper_model = None). Optionally lazy-import WhisperModel so the TTS venv doesn't need faster-whisper installed when off.
Acceptance
Files
agentwire/tts_server.py — gate 210–216, import 38, init 74.
Carved from #366 (Part 2). Independent — no dependency on #365, ship first.
Problem
agentwire/tts_server.pylifespan startup loadsWhisperModel("large-v3")on every boot, unconditionally (~3GB+ RAM) — even when host STT is served elsewhere (moonshine on:8101) or not needed at all.(import at
tts_server.py:38;whisper_modelinitializedNoneat:74.)/transcribe(:436-440) already 503s whenwhisper_model is None;/healthreports it (:478). So the off-state is already handled downstream — only the load itself is unconditional.Fix (no compat — pre-launch)
Gate lines 210–216 (both the cuda branch and the cpu-fallback branch) behind env
AGENTWIRE_TTS_WHISPER(default off → leavewhisper_model = None). Optionally lazy-importWhisperModelso the TTS venv doesn't need faster-whisper installed when off.Acceptance
AGENTWIRE_TTS_WHISPER): TTS server starts withwhisper_model = None, no large-v3 load, ~3GB RAM freed;/transcribe→ 503,/healthshows whisper unavailable.AGENTWIRE_TTS_WHISPER=1: large-v3 loads as before (cuda, then cpu fallback intact);/transcribeworks.Files
agentwire/tts_server.py— gate 210–216, import 38, init 74.