Skip to content

Operationalize typed operations and engines (follow-up to #688) #689

Description

@tony

Summary

This issue operationalizes the architecture proposed in #688. Where #688 establishes
what the typed-operation and engine substrate should be, this issue commits to how
it gets built: a phased, independently-mergeable sequence, a concrete reuse plan that
graduates the strongest pieces of the existing prototype branches, and a set of resolved
decisions.

It also records one deliberate revision to #688 (see "Revision: mode lives in the
type"), the analysis requested alongside the design — shortcomings, sunset candidates,
concepts to revisit, a typing-for-transparency plan, and a documentation plan — and the
trunk debt the study surfaced.

All of this lands under libtmux.experimental.{ops,engines} first and touches no
existing public API
. Promotion to libtmux.{ops,engines} happens only after the
cross-engine contract suite and a downstream green run (tmuxp + libtmux-mcp) pass.

Relationship to #688

Inherited from #688 unchanged:

  • The split into inert typed operations (libtmux.ops) and execution engines
    (libtmux.engines), with the object API as a compatibility facade.
  • Operations are immutable, serializable, version-aware, and carry their result type.
  • Engine interfaces are typing.Protocols, not base classes.
  • The core stays stdlib-dataclass-only; Pydantic/MCP schemas live at the edges.
  • The engine families: classic subprocess, control mode, asyncio, async control, lazy,
    async-lazy, concrete.

Revision: mode lives in the type, not a runtime-bound attribute

#688 states "Control mode should be an engine choice rather than a separate object
hierarchy … a sync Server can be bound to ControlModeEngine."
This issue revises
that: the execution mode is encoded in the facade type, e.g. AsyncSession,
LazyControlWindow, AsyncLazyControlWindow, rather than carried as a runtime engine
attribute on a single Server class.

The reason is return-type honesty. A single Session.new_window() cannot be
statically typed to return a live Window and a deferred LazyWindow plan node and
an Awaitable[AsyncWindow] depending on a runtime flag — the type system cannot express
that. Encoding the mode in the class gives each method exactly one statically-known
return type, and async's defasync def coloring stops being a codegen problem and
becomes "a different facade tree over the same spine." Every facade family — sync/async ×
eager/lazy × classic/control — is a thin layer over one shared spine of pure typed
operations
. Nothing is generated at runtime; everything is statically typed so checkers
and IDEs see real return types.

Architecture

Three layers, bottom-up:

  1. The spine — libtmux.experimental.ops (pure, no I/O). Inert typed operation
    values, a typed result hierarchy, a registry, and serialization. This is the single
    source of truth that static checkers consume and that fixture-based, parametrizable,
    sync- and async-agnostic tests run against without a tmux server.
  2. Engines — libtmux.experimental.engines. Each engine owns a transport, an
    execution policy, and its own error policy. Engines execute rendered operations and
    return typed results. A TmuxEngine Protocol keeps them interchangeable.
  3. Facade families. Per-mode object families (Server/Session/Window/Pane/
    Client) built on the spine. The classic family reproduces today's behavior exactly;
    newer families add async, lazy, and control-mode variants.

The spine: operations as the source of truth

An operation is a frozen stdlib dataclass carrying everything an engine needs to render,
validate, and type a command — but it never dispatches:

  • kind: a Literal[...] discriminator that is simultaneously the registry key and the
    result-type link (a discriminated union static checkers can narrow).
  • scope: one of server, session, window, pane, client.
  • target: a closed sum — SessionId('$N'), WindowId('@N'), PaneId('%N'),
    Special(token) over tmux's enumerated target tokens, and SlotRef(slot, suffix) for
    deferred refs to values that do not exist yet.
  • args, effects, safety: Literal['readonly', 'mutating', 'destructive'].
  • min_version and a flag_version_map consulted at render time, replacing the ~49
    inline has_gte_version(...) literals scattered across the object methods today.

Operation[ResultT] is generic over its result type, so engine.run(op) is statically
known to return that operation's result.

Results and the failure model (per-engine)

Result types share one shape (operation, argv, status, returncode, stdout,
stderr, payload), with specialized payloads (SplitWindowResult,
CapturePaneResult, etc.). Error policy is owned by the engine, not global:

  • The classic engine reproduces today's libtmux.{Server,Session,Window,Pane}
    behavior exactly — it returns live ORM objects and raises in-facade as it does now.
    This is the compatibility contract downstreams depend on.
  • Newer engines return typed result objects (Go/Rust-style: the error is data on the
    result) with an opt-in result.raise_for_status() that raises a typed
    TmuxCommandError.

This mirrors CPython's subprocess.CompletedProcess — a plain value plus a short
check_returncode() — and matches libtmux's existing reality: tmux_cmd already never
raises on a tmux-side error; raising is layered on by ~92 explicit raise_if_stderr()
call sites today. Engine-broken conditions (missing binary, lost socket, protocol
desync) always raise, on every engine — those are distinct from a tmux command failing,
which is data on the returned result.

One tmux-specific wrinkle to settle before the result type is frozen: tmux frequently
signals failure via stderr text while exiting 0, and the has-session path folds
stderr into stdout. raise_for_status() for tmux therefore considers non-empty stderr,
not returncode alone — a deliberate divergence from CPython's returncode-only test, to be
documented on the method.

Engine seam: no injection

There is no runtime engine injected into the existing Server. The existing
libtmux.{Server,Session,Window,Pane} are untouched. New work is parallel,
engine-typed facade families that you import directly; each binds the shared spine to its
engine. This cleanly sidesteps the dual-dispatch hazard (today neo.fetch_objs and
get_version bypass Server.cmd) because the new families route reads, writes, and
version queries through their engine from the start.

Prior art to graduate

The study found the two halves of #688 already built — on different branches — and the
typed-operation substrate (the actual heart of #688) absent everywhere. Plan:

Asset Source branch Disposition
TmuxEngine Protocol, CommandRequest/CommandResult, EngineSpec + name-keyed registry libtmux-protocol-engines (src/libtmux/engines/{base,registry,subprocess}.py) Graduate as the canonical engine lineage
Bytes-based ControlParser (typed notifications, octal unescape), persistent control engine with weakref.finalize lifecycle, Subscription libtmux-protocol-engines (src/libtmux/engines/control_mode/*) Graduate near-verbatim in Phase 4
Native imsg client (v8) libtmux-protocol-engines (src/libtmux/engines/imsg/*) Easter-egg engine (Phase 6): opt-in, v8-only, no-attach, separately test-gated
Sans-I/O drive() resolution generator (sync and async drivers diverge only at runner.cmd vs await runner.cmd), forward-ref/SlotRef capture, fail-closed COMMAND_SPECS scope/chainability registry chainable-commands-experiment-00 (src/libtmux/_experimental/chain/{_resolve,ir,plan,chain}.py) Graduate drive() + the registry concept; build the typed-op values net-new
Persistent tmux -C runner with %begin/%end/%error parsing chainable-commands-experiment-00 (src/libtmux/_experimental/chain/control.py) Superseded by the protocol-engines control parser
Async subprocess via create_subprocess_exec asyncio branch Basis for the real asyncio engine (not to_thread)

What the spine adds net-new: the typed Operation value (kind/scope/effects/result-type/
version metadata), the typed Result hierarchy with raise_for_status(), the operation
registry keyed by kind, and stdlib serialization. None of these exist on any branch.

Typing for transparency

  • Frozen dataclass operations with a discriminated kind: Literal[...] so runtime
    dispatch and static narrowing share one source of truth.
  • Operation[ResultT] generic linking each kind to its result subclass; the result type
    is carried as metadata, not regenerated per call.
  • Targets as a closed sum (ids, Special literal tokens from tmux's target tables, and
    SlotRef), so an illegal target is a type error, not a runtime surprise.
  • Effects and safety carried as typed flags, so MCP annotations and safety tiers derive
    from the operation rather than a hand-maintained table downstream.
  • The registry as the single source of truth: one entry per kind with scope,
    chainability, result type, min_version, flag_version_map, effects, safety, and a
    primitive-vs-composite marker (an operation that wraps one tmux command vs one composed
    from others, e.g. a synthesized has-server check).
  • A typed status: Literal['complete', 'failed', 'skipped', 'unknown'] on the result, so
    cross-engine equality and serialization assert on an enum, not ad-hoc returncode reads.

Serialization

Operations and results serialize to/from plain dicts with no live objects, subprocess
handles, or event-loop references — stable kind/scope/target/args/status/payload only.
Round-trip tests guard the schema. An optional edge module exposes Pydantic/JSON-Schema/
MCP schemas behind an extra; the core never imports Pydantic.

Testing

A single cross-engine contract suite is the promotion gate. One operation spec is
parametrized over engines (a provider fixture, stable ids) and asserts: result equality
across engines, serialization round-trips, version-gated rendering, deferred refs across
scopes, and the full status vocabulary (complete/failed/skipped/unknown).
Environment-conditional engines (control, imsg) use pytest.param(marks=...), never
collection errors. Async variants use pytest-asyncio; the shared drive() core means the
resolution logic is tested once and reused by both sync and async drivers. A concrete,
no-tmux engine backs doctests so every example executes without +SKIP.

Documentation

Operations warrant a dedicated, autogenerated catalog rather than hand-written pages. The
plan is a custom Sphinx domain (tmuxop) — the same pattern the maintainer's gp-sphinx
already ships (an argparse domain plus a registry→autodoc package) — registering object
types operation/result/scope with cross-reference roles and an auto-generated index.
A catalog directive walks the operation registry and emits one entry per operation with
its scope, minimum tmux version and per-flag version gates, effects/safety tier,
primitive-vs-composite marker, and cross-linked result type. Because the registry is the
single source of truth for both runtime and docs, the drift seen today (e.g. formats.py
reference lists vs neo.Obj fields) cannot recur. Catalog examples are driven by the
concrete engine so they run under the project's no-+SKIP doctest rule. Performance is
described as capability ("control mode pipelines batches over one connection"), never as a
fragile metric.

Phased plan

Each phase is independently mergeable, keeps the suite green, and changes no existing API.

  • Phase 0 — Packages + engine contract. Create libtmux.experimental.{ops,engines};
    port the TmuxEngine Protocol, CommandRequest/CommandResult, and EngineSpec +
    registry from libtmux-protocol-engines; add Result.raise_for_status() and
    exc.TmuxCommandError. Exit: suite + mypy + doctests + build-docs green; zero public
    API change.
  • Phase 1 — Inert ops spine (pure, no tmux). The frozen Operation value, the typed
    Result hierarchy, the operation registry, serialization, and version-gated render,
    with seed operations (split-window, capture-pane, send-keys, select-layout).
    Exit: 100% of this phase's tests run with no tmux server; serialization round-trips;
    registry fails closed; mypy clean.
  • Phase 2 — Classic engine + classic facade slice. The classic subprocess engine and
    a classic facade that runs one operation end-to-end (split-window) returning a live
    Pane, identical in signature, return type, timing, and raise behavior to today.
    Exit: parity verified; tmuxp + libtmux-mcp split flows green via a branch pin.
  • Phase 3 — Concrete engine + contract suite (the gate). A deterministic no-tmux
    engine and the parametrized contract matrix that becomes the promotion gate. Exit:
    contract suite green across classic + concrete for all Phase-1 operations.
  • Phase 4 — Control-mode parser then engine. Graduate the bytes ControlParser and
    the persistent control engine + Subscription, gated by --engine=. Exit: the
    fixture and contract suites pass under --engine=control_mode; results equal classic.
  • Phase 5 — asyncio engine + lazy/async-lazy facades. A real async engine
    (create_subprocess_exec, cancellation — not to_thread) and the lazy/async-lazy
    facades over the shared drive() core; AsyncLazyControlWindow and siblings
    materialize here. Exit: contract suite parametrized across classic/concrete/control/
    asyncio with result equality; cancellation leaks no subprocess.
  • Phase 6 — imsg easter-egg engine. The native binary client as a registered, opt-in,
    separately test-gated engine — proof that the operation/result contract is
    transport-agnostic (same SplitWindowResult from subprocess, tmux -C, or the binary
    peer protocol). Exit: opt-in tests green; framed as "native client, v8 only, no
    attach."

Nothing leaves experimental until the contract suite and a tmuxp + libtmux-mcp
downstream green run pass.

Shortcomings in current libtmux (motivation)

Surfaced by the study; these are why the substrate is needed.

  • Build is fused to dispatch. subprocess.Popen(...).communicate() runs inside
    tmux_cmd.__init__ (src/libtmux/common.py), so no object exists between "argv built"
    and "process run" — nothing can introspect, validate, batch, serialize, or dry-run an
    operation.
  • No typed result, scattered failure policy. tmux_cmd exposes only
    cmd/stdout/stderr/returncode and never raises on a tmux error; raising is bolted on by
    ~92 raise_if_stderr() call sites. There is no check_returncode()-style opt-in.
  • Version gating scattered. ~49 inline has_gte_version(...) literals with no
    capability table; neo.py independently re-gates format tokens. Flag/version drift is
    unauditable.
  • Dual dispatch paths. neo.fetch_objs and get_version construct tmux_cmd
    directly, re-implementing socket-flag insertion, so any engine injected at Server.cmd
    would silently miss all reads. (The engine-typed-facade model avoids this by routing
    everything through the engine from the start.)
  • Server.cmd() has no timeout, forcing libtmux-mcp to shell out in several modules.
  • No serialization and no per-operation metadata (effects/safety/scope/primitive-vs-
    composite), so libtmux-mcp hand-maintains a large parallel layer (tags, annotation
    presets, an exception-to-result classifier).
  • client scope is unmodeled in both the ORM operation surface and the chain
    prototype's scope literal.

Sunset candidates

  • The _experimental.chain package layout and its per-scope Bound*Commands namespaces
    with a .raw() escape hatch — keep the tests as a porting checklist; replace the layout
    with a single registry-driven typed-op builder so typing is the default path.
  • The asyncio.to_thread async executors and the asyncio-2 branch (the latter deletes
    large test suites) — replace with a real async engine.
  • The half-wired async-first codegen pipeline (common_async.py) — replace with
    hand-written async behind the engine Protocol.
  • The control-mode branch's string-based _internal/engines prototype — strictly
    inferior to the protocol-engines bytes parser; mine only for feature ideas.
  • The CommandResultLike three-field passthrough as the result contract — replaced by the
    typed Result base with raise_for_status().

Concepts to revisit

  • The cmd() error contract and whether raise_for_status() trips on non-empty stderr
    (the tmux divergence above).
  • neo vs ops: whether ops absorbs reads (neo becomes the read half over the same
    engine) — the single biggest seam decision.
  • has_minimum_version/has_gte_version scatter → a declarative capability table, while
    preserving warn-and-degrade behavior and stacklevel at the user-facing boundary.
  • Eager ORM semantics vs lazy/plan timing — lazy must stay strictly additive and opt-in.
  • Engine exception taxonomy — a single umbrella with meaningful subclasses (timeout vs
    connection-lost vs desync) plus a retryable/expected classification.
  • Whether attach/interactive operations need a non-capturing engine mode or stay a
    documented special case for v1.

Trunk debt (noted, not actioned)

The study surfaced pre-existing debt in trunk, recorded here for visibility only — no
action is proposed under this issue per the repository's trunk-cleanup policy: a dead
Pane.split branch using substring rather than list-membership matching; 43 inert
DeprecatedError shells; formats.py reference lists drifted from neo.Obj fields; and
EnvironmentMixin duplicate logic with a dead ternary.

Decisions

Resolved for this work:

  • Code lives under libtmux.experimental.{ops,engines} now; promotion to
    libtmux.{ops,engines} only after the contract suite + downstream green run.
  • Mode is encoded in the facade type (the revision above), realized as thin engine-typed
    facades over one shared spine — no runtime code generation.
  • Error policy is per-engine: classic reproduces today's behavior; newer engines return
    typed results with opt-in raise_for_status().
  • The core is stdlib-dataclass-only; Pydantic/MCP schemas live behind an optional extra.
  • The registry/kind set is explicitly unstable while under experimental, frozen at
    promotion.
  • imsg ships as an opt-in, test-gated easter-egg engine.

Deferred (as in #688): the complete operation registry; final class names for every
operation/result; rollback/compensation semantics for multi-operation plans; control-mode
subscription/event-streaming public API; op-version coexistence (op:name@version) until
a real need appears.

Prior art and references

In-repo prototype branches: chainable-commands-experiment-00,
libtmux-protocol-engines, control-mode, asyncio, libtmux-async-first-codegen.

Refs above are pinned to git tags where the checkout sat on one, otherwise a 7-char
commit ref.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions