Skip to content

RFC: extract shared atomic_write BEGIN IMMEDIATE transaction primitive into db.py #3

@faxik

Description

@faxik

Problem

The "atomic write under BEGIN IMMEDIATE" transaction pattern is copy-pasted verbatim in two places:

  • merge.py:239-289 — the lock-acquire critical section inside merge.merge() (NOT start_session).
  • milestones.py:812-851pull_next.

Both repeat the identical save/restore + BEGIN IMMEDIATE / COMMIT / ROLLBACK boilerplate:

saved_isolation = conn.isolation_level
conn.isolation_level = None
try:
    conn.execute("BEGIN IMMEDIATE")
    try:
        ... critical section ...
        conn.execute("COMMIT")
    except Exception:
        conn.execute("ROLLBACK")
        raise
finally:
    conn.isolation_level = saved_isolation

CLAUDE.md even documents the coupling: milestones "reuses the merge.py:239-289 save/restore pattern." That is coupling-by-convention — a bug fixed in one copy won't reach the other. This is a shallow, duplicated chunk of the most error-prone code in the codebase (manual transaction control). In-process, pure-SQLite, trivially deepenable.

The crux: three exit paths

The critical section has three exit paths, not two — which is why a naive context manager is wrong:

  1. Completion → COMMIT.
  2. Exception → ROLLBACK, re-raise.
  3. Voluntary abort returning a value WITHOUT raisingmerge.py:251 (lock_held), merge.py:268 (main_moved), and milestones.py:825 (return None when no candidate) all do conn.execute("ROLLBACK"); return <value>.

The abort decision must happen inside the lock (TOCTOU): the lock row + current_main_head_fn() in merge, and candidate selection in pull_next, are the lock-protected state.

Proposed Interface

A single context manager atomic_write(conn) in db.py (infrastructure layer), yielding a Tx handle whose .abort(value) covers exit path 3.

class _Abort(Exception):           # private sentinel, only raised via Tx.abort()
    def __init__(self, value): self.value = value

@dataclass
class Tx:
    aborted: bool = False
    abort_value: Any = None
    def abort(self, value=None) -> NoReturn:
        self.abort_value = value; self.aborted = True
        raise _Abort(value)

def _safe_rollback(conn):          # never masks an in-flight exception
    try: conn.execute("ROLLBACK")
    except sqlite3.OperationalError: pass

@contextmanager
def atomic_write(conn) -> Iterator[Tx]:
    saved = conn.isolation_level
    conn.isolation_level = None
    tx = Tx()
    try:
        conn.execute("BEGIN IMMEDIATE")
        try:
            yield tx
        except _Abort:
            _safe_rollback(conn)                 # abort: swallow
        except BaseException:
            _safe_rollback(conn); raise          # real error: rollback + propagate
        else:
            conn.execute("COMMIT")               # COMMIT in else: never double-rolls-back
    finally:
        conn.isolation_level = saved

Trivial happy path (the dominant future caller — close-gate, batch ops):

with atomic_write(conn):
    conn.execute(...)
# committed; isolation restored. zero ceremony.

Abort-with-value (the two current sites):

with atomic_write(conn) as tx:
    ...
    if no_candidate:
        tx.abort(None)        # rolls back, leaves the block
    ...
if tx.aborted:
    return tx.abort_value
return success_value

What it hides: the isolation_level save/None/restore dance, BEGIN IMMEDIATE issuance, the three-way exit fork, COMMIT-failure handling, and the class of bug where a newly-added early return forgets its ROLLBACK.

Dependency Strategy

In-process — merged directly. Pure SQLite over the connection the caller already holds; no I/O, no new third-party deps (contextlib, dataclasses, sqlite3 only). Both modules already symbol-import from codebugs.db (merge.py:429, milestones.py:1713); the refactor extends those lines with atomic_write, Tx. No import cycle — db.py imports domain modules only lazily inside _ensure_modules_loaded(). Concurrency stays SQLite file-lock (WAL + BEGIN IMMEDIATE).

Also: make db.connect() set PRAGMA busy_timeout=5000 explicitly (behavior-preserving — it's already Python's sqlite3.connect default) so cross-connection contention serializes rather than relying on an implicit default.

Testing Strategy

New boundary tests (mostly :memory: — BEGIN IMMEDIATE runs in-memory, as existing test_merge.py proves; only the contention test needs tmp_path):

  • happy path commits + persists; isolation_level restored.
  • tx.abort(v) rolls back (rows not persisted), returns v, sets tx.aborted.
  • body raising a real exception rolls back + re-raises.
  • COMMIT-failure (deferred constraint) re-raises the real error, not a "no transaction active" rollback error.
  • (tmp_path, two db.connect() conns) second atomic_write serializes against the first up to busy_timeout.

Old tests to delete: none — there are no shallow-helper tests for this pattern today. The existing behavioral tests of merge.merge() and milestones.pull_next are the regression guarantee: they must pass unchanged after the refactor.

Implementation Recommendations

  • Owns: the manual-transaction lifecycle for a single SQLite write critical section — isolation save/restore, BEGIN IMMEDIATE, the COMMIT/ROLLBACK/abort fork.
  • Hides: that there are three exit paths and that one of them must roll back yet still return a value; the COMMIT-failure masking trap; the autocommit toggling required to issue manual BEGIN.
  • Exposes: atomic_write(conn) -> Tx and tx.abort(value). _Abort stays private to db.py.
  • Migration: replace the two copy-pasted blocks; preserve each site's post-block re-read (milestones._get_item_by_ref stays after the with, post-commit). Future write-critical sections adopt atomic_write instead of re-pasting the pattern.

Adversarial review

Stress-tested by a 3-agent adversarial review (adversary / defender / judge). Verdict: 8/10 — fix mandatory items, then ship. 0 FATAL; all @contextmanager/_Abort/BaseException control-flow semantics verified correct against CPython; 132-test baseline green.

Findings fixed into the design before this RFC:

  • COMMIT-failure bug — original "COMMIT inside the body try" let a failed COMMIT trigger a second ROLLBACK that masks the real error. Fixed via else: branch + _safe_rollback (judge built+ran the fix).
  • busy_timeout — adversary's "BEGIN IMMEDIATE will raise not block" was refuted (Python's default 5s timeout serializes, proven by the passing two-thread test test_milestones.py:801); documentation gap conceded → explicit PRAGMA added.
  • Doc-accuracy: correct function name (merge.merge()), import claim, :memory: test strategy.

Full design + corrections appendix: docs/superpowers/specs/2026-06-18-atomic-write-primitive-design.md.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions