You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Capturing a session's worth of local voice/STT work to upstream for everyone. Verified against current main — see corrections (notably: the moonshine backend is already shipped; only docs + the stdout path + the RAM gate are net-new).
Summary
Four parts — three concrete repo edits (1–3 + docs), one out-of-repo reference (4):
moonshine as host STT on :8101 — already in repo; needs docs + a service wrapper.
Stop the TTS server (:8100) unconditionally loading WhisperModel("large-v3") — gate behind AGENTWIRE_TTS_WHISPER (default off). CONFIRMED unconditional load.
New agentwire listen stop --stdout transcribe-to-stdout mode — net-new.
Hammerspoon PTT rewrite (toggle + voice target-picker) — lives in ~/.hammerspoon/init.lua, ship as a reference example.
Depends on #365. Recommend splitting into 3 issues (below).
agentwire stt start --backend moonshine is wired end-to-end already: stt/engine.py:17 (KNOWN_BACKENDS), :20-39 (_load_moonshine), :89-101 (auto tries moonshine first; clear install hint), :140-147 (transcribe branch); __main__.py:1740-1767 (cmd_stt_start env-passes STT_BACKEND/MOONSHINE_MODEL, port :8101), :10898-10911 (stt start/serve parsers already have --backend/--model/--port). Left to do: docs (agentwire-cli/agentwire-config skills + a wiki page for --backend moonshine on :8101 and stt.moonshine_model) and a launchd service for :8101 (follow-up).
Part 2 — Whisper RAM gate (concrete; confirmed)
agentwire/tts_server.py lifespan startup loads large-v3 every boot, unconditionally (~3GB+):
(import tts_server.py:38). /transcribe (:436-440) already 503s when whisper_model is None; health reports it (:478). Fix: gate lines 210–216 (both cuda+cpu branches) behind AGENTWIRE_TTS_WHISPER (default off → leave whisper_model=None, init at :74). Optionally lazy-import WhisperModel so the TTS venv doesn't need it when off.
Part 3 — transcribe-to-stdout (net-new)
agentwire/listen.py:208stop_recording(session, voice_prompt=True, type_at_cursor=False) has two output branches (type_at_cursor paste 303–341; default send-to-tmux 342–391). Add a third transcribe_only branch right after log(f"Transcribed: {text}") at line 301 — print raw text to stdout, return 0, before the type_at_cursor dispatch. Inherits the stt.backend: custom + stt.url (:8101) requirement (:275-283). CLI: cmd_listen_stop (__main__.py:6561-6566) add transcribe_only=getattr(args,'stdout',False); listen stop parser (:11246-11251) add --stdout. Note cmd_listen_toggle (:6575-6582) also calls stop_recording — decide stop-only vs toggle-reachable.
Part 4 — Hammerspoon reference (out-of-repo, docs only)
Toggle-based PTT + voice target-picker lives in ~/.hammerspoon/init.lua (not version-controlled). Ship as a reference example (docs/wiki/voice/hammerspoon-ptt.md or examples/). The repo already accommodates Hammerspoon as an external caller (listen.py:54-59 path fallbacks; type_at_cursor shells hs -c at 303–341). Bake in the gotchas the author found: hs.chooser:choices() is a setter (keep choices in a var); hs.chooser:select(row) fires completion + closes (use to auto-confirm); character-level fuzzy match (Levenshtein + per-word containment bonus) so STT typos match.
Follow-up: launchd service for :8101 (stt start runs in tmux; dies on reboot).
Follow-up:/health 2s probe flake — listen.py:111-117 fast-fails after a 2s health timeout before sending audio; on a cold server this spuriously reports "unavailable." Tune timeout / retry / drop the pre-probe.
Code-review corrections (vs original): (1) Part 1 is not net-new — agentwire stt start --backend moonshine + the backend already ship on main; Part 1's real work is docs + a launchd follow-up. (2) The Whisper load has a cuda→cpu fallback (tts_server.py:212-215) — the gate must wrap both branches. (3) stop_recording has notranscribe_only param today (signature (session, voice_prompt=True, type_at_cursor=False)) — Part 3 adds it. (4) Recommend splitting into 366a/b/c so the RAM win can land immediately without waiting on #365.
Capturing a session's worth of local voice/STT work to upstream for everyone. Verified against current
main— see corrections (notably: the moonshine backend is already shipped; only docs + the stdout path + the RAM gate are net-new).Summary
Four parts — three concrete repo edits (1–3 + docs), one out-of-repo reference (4):
:8101— already in repo; needs docs + a service wrapper.:8100) unconditionally loadingWhisperModel("large-v3")— gate behindAGENTWIRE_TTS_WHISPER(default off). CONFIRMED unconditional load.agentwire listen stop --stdouttranscribe-to-stdout mode — net-new.~/.hammerspoon/init.lua, ship as a reference example.Depends on #365. Recommend splitting into 3 issues (below).
Part 1 — moonshine host STT (already shipped; docs + service)
agentwire stt start --backend moonshineis wired end-to-end already:stt/engine.py:17(KNOWN_BACKENDS),:20-39(_load_moonshine),:89-101(auto tries moonshine first; clear install hint),:140-147(transcribe branch);__main__.py:1740-1767(cmd_stt_startenv-passesSTT_BACKEND/MOONSHINE_MODEL, port:8101),:10898-10911(stt start/serveparsers already have--backend/--model/--port). Left to do: docs (agentwire-cli/agentwire-configskills + a wiki page for--backend moonshineon:8101andstt.moonshine_model) and a launchd service for:8101(follow-up).Part 2 — Whisper RAM gate (concrete; confirmed)
agentwire/tts_server.pylifespan startup loads large-v3 every boot, unconditionally (~3GB+):(import
tts_server.py:38)./transcribe(:436-440) already 503s whenwhisper_model is None; health reports it (:478).Fix: gate lines 210–216 (both cuda+cpu branches) behind
AGENTWIRE_TTS_WHISPER(default off → leavewhisper_model=None, init at:74). Optionally lazy-importWhisperModelso the TTS venv doesn't need it when off.Part 3 — transcribe-to-stdout (net-new)
agentwire/listen.py:208stop_recording(session, voice_prompt=True, type_at_cursor=False)has two output branches (type_at_cursorpaste 303–341; default send-to-tmux 342–391). Add a thirdtranscribe_onlybranch right afterlog(f"Transcribed: {text}")at line 301 — print rawtextto stdout,return 0, before thetype_at_cursordispatch. Inherits thestt.backend: custom+stt.url(:8101) requirement (:275-283). CLI:cmd_listen_stop(__main__.py:6561-6566) addtranscribe_only=getattr(args,'stdout',False);listen stopparser (:11246-11251) add--stdout. Notecmd_listen_toggle(:6575-6582) also callsstop_recording— decide stop-only vs toggle-reachable.Part 4 — Hammerspoon reference (out-of-repo, docs only)
Toggle-based PTT + voice target-picker lives in
~/.hammerspoon/init.lua(not version-controlled). Ship as a reference example (docs/wiki/voice/hammerspoon-ptt.mdorexamples/). The repo already accommodates Hammerspoon as an external caller (listen.py:54-59path fallbacks;type_at_cursorshellshs -cat 303–341). Bake in the gotchas the author found:hs.chooser:choices()is a setter (keep choices in a var);hs.chooser:select(row)fires completion + closes (use to auto-confirm); character-level fuzzy match (Levenshtein + per-word containment bonus) so STT typos match.Split recommendation
feature:tts,area:tech-debt): tiny, self-contained, no STT:stt.backendconfig field is overloaded (host-shim flag vs engine selector) #365 dependency — ship first.feature:stt): docs for the already-shipped backend + the new--stdoutpath. Depends on STT:stt.backendconfig field is overloaded (host-shim flag vs engine selector) #365.area:docs,feature:stt): pure docs/example; depends on #366b's--stdout.Dependencies & follow-ups
stt.backendconfig field is overloaded (host-shim flag vs engine selector) #365 (thestt.backend/stt.enginesplit removes the--backend moonshineworkaround) — except 366a, which is independent.:8101(stt startruns in tmux; dies on reboot)./health2s probe flake —listen.py:111-117fast-fails after a 2s health timeout before sending audio; on a cold server this spuriously reports "unavailable." Tune timeout / retry / drop the pre-probe.Files
agentwire/tts_server.py(Part 2: 210–216 + import 38) ·agentwire/listen.py(Part 3:stop_recording@208, hook after @301;/health@111-117 follow-up) ·agentwire/__main__.py(Part 3:cmd_listen_stop@6561-6566, parser @11246-11251; Part 1 shipped surface @1740-1767/@10898-10911) ·agentwire/stt/engine.py(Part 1, already supports moonshine — no edit) · docs/skills.