Skip to content

Voice/STT overhaul: moonshine host STT, free Whisper RAM, transcribe-to-stdout, toggle+voice-pick Hammerspoon PTT #366

@jordan-piinpoint

Description

@jordan-piinpoint

Capturing a session's worth of local voice/STT work to upstream for everyone. Verified against current main — see corrections (notably: the moonshine backend is already shipped; only docs + the stdout path + the RAM gate are net-new).

Summary

Four parts — three concrete repo edits (1–3 + docs), one out-of-repo reference (4):

  1. moonshine as host STT on :8101already in repo; needs docs + a service wrapper.
  2. Stop the TTS server (:8100) unconditionally loading WhisperModel("large-v3") — gate behind AGENTWIRE_TTS_WHISPER (default off). CONFIRMED unconditional load.
  3. New agentwire listen stop --stdout transcribe-to-stdout mode — net-new.
  4. Hammerspoon PTT rewrite (toggle + voice target-picker) — lives in ~/.hammerspoon/init.lua, ship as a reference example.

Depends on #365. Recommend splitting into 3 issues (below).

Part 1 — moonshine host STT (already shipped; docs + service)

agentwire stt start --backend moonshine is wired end-to-end already: stt/engine.py:17 (KNOWN_BACKENDS), :20-39 (_load_moonshine), :89-101 (auto tries moonshine first; clear install hint), :140-147 (transcribe branch); __main__.py:1740-1767 (cmd_stt_start env-passes STT_BACKEND/MOONSHINE_MODEL, port :8101), :10898-10911 (stt start/serve parsers already have --backend/--model/--port). Left to do: docs (agentwire-cli/agentwire-config skills + a wiki page for --backend moonshine on :8101 and stt.moonshine_model) and a launchd service for :8101 (follow-up).

Part 2 — Whisper RAM gate (concrete; confirmed)

agentwire/tts_server.py lifespan startup loads large-v3 every boot, unconditionally (~3GB+):

210  print("Loading Whisper model (large-v3)...")
211  try:
212      whisper_model = WhisperModel("large-v3", device="cuda", compute_type="float16")
213  except (ValueError, RuntimeError):
215      whisper_model = WhisperModel("large-v3", device="cpu", compute_type="int8")
216  print("Whisper model loaded!")

(import tts_server.py:38). /transcribe (:436-440) already 503s when whisper_model is None; health reports it (:478).
Fix: gate lines 210–216 (both cuda+cpu branches) behind AGENTWIRE_TTS_WHISPER (default off → leave whisper_model=None, init at :74). Optionally lazy-import WhisperModel so the TTS venv doesn't need it when off.

Part 3 — transcribe-to-stdout (net-new)

agentwire/listen.py:208 stop_recording(session, voice_prompt=True, type_at_cursor=False) has two output branches (type_at_cursor paste 303–341; default send-to-tmux 342–391). Add a third transcribe_only branch right after log(f"Transcribed: {text}") at line 301 — print raw text to stdout, return 0, before the type_at_cursor dispatch. Inherits the stt.backend: custom + stt.url (:8101) requirement (:275-283). CLI: cmd_listen_stop (__main__.py:6561-6566) add transcribe_only=getattr(args,'stdout',False); listen stop parser (:11246-11251) add --stdout. Note cmd_listen_toggle (:6575-6582) also calls stop_recording — decide stop-only vs toggle-reachable.

Part 4 — Hammerspoon reference (out-of-repo, docs only)

Toggle-based PTT + voice target-picker lives in ~/.hammerspoon/init.lua (not version-controlled). Ship as a reference example (docs/wiki/voice/hammerspoon-ptt.md or examples/). The repo already accommodates Hammerspoon as an external caller (listen.py:54-59 path fallbacks; type_at_cursor shells hs -c at 303–341). Bake in the gotchas the author found: hs.chooser:choices() is a setter (keep choices in a var); hs.chooser:select(row) fires completion + closes (use to auto-confirm); character-level fuzzy match (Levenshtein + per-word containment bonus) so STT typos match.

Split recommendation

Dependencies & follow-ups

  • Depends on STT: stt.backend config field is overloaded (host-shim flag vs engine selector) #365 (the stt.backend/stt.engine split removes the --backend moonshine workaround) — except 366a, which is independent.
  • Follow-up: launchd service for :8101 (stt start runs in tmux; dies on reboot).
  • Follow-up: /health 2s probe flake — listen.py:111-117 fast-fails after a 2s health timeout before sending audio; on a cold server this spuriously reports "unavailable." Tune timeout / retry / drop the pre-probe.

Files

agentwire/tts_server.py (Part 2: 210–216 + import 38) · agentwire/listen.py (Part 3: stop_recording @208, hook after @301; /health @111-117 follow-up) · agentwire/__main__.py (Part 3: cmd_listen_stop @6561-6566, parser @11246-11251; Part 1 shipped surface @1740-1767/@10898-10911) · agentwire/stt/engine.py (Part 1, already supports moonshine — no edit) · docs/skills.


Code-review corrections (vs original): (1) Part 1 is not net-newagentwire stt start --backend moonshine + the backend already ship on main; Part 1's real work is docs + a launchd follow-up. (2) The Whisper load has a cuda→cpu fallback (tts_server.py:212-215) — the gate must wrap both branches. (3) stop_recording has no transcribe_only param today (signature (session, voice_prompt=True, type_at_cursor=False)) — Part 3 adds it. (4) Recommend splitting into 366a/b/c so the RAM win can land immediately without waiting on #365.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions