Problem
The "atomic write under BEGIN IMMEDIATE" transaction pattern is copy-pasted verbatim in two places:
merge.py:239-289 — the lock-acquire critical section inside merge.merge() (NOT start_session).
milestones.py:812-851 — pull_next.
Both repeat the identical save/restore + BEGIN IMMEDIATE / COMMIT / ROLLBACK boilerplate:
saved_isolation = conn.isolation_level
conn.isolation_level = None
try:
conn.execute("BEGIN IMMEDIATE")
try:
... critical section ...
conn.execute("COMMIT")
except Exception:
conn.execute("ROLLBACK")
raise
finally:
conn.isolation_level = saved_isolation
CLAUDE.md even documents the coupling: milestones "reuses the merge.py:239-289 save/restore pattern." That is coupling-by-convention — a bug fixed in one copy won't reach the other. This is a shallow, duplicated chunk of the most error-prone code in the codebase (manual transaction control). In-process, pure-SQLite, trivially deepenable.
The crux: three exit paths
The critical section has three exit paths, not two — which is why a naive context manager is wrong:
- Completion → COMMIT.
- Exception → ROLLBACK, re-raise.
- Voluntary abort returning a value WITHOUT raising —
merge.py:251 (lock_held), merge.py:268 (main_moved), and milestones.py:825 (return None when no candidate) all do conn.execute("ROLLBACK"); return <value>.
The abort decision must happen inside the lock (TOCTOU): the lock row + current_main_head_fn() in merge, and candidate selection in pull_next, are the lock-protected state.
Proposed Interface
A single context manager atomic_write(conn) in db.py (infrastructure layer), yielding a Tx handle whose .abort(value) covers exit path 3.
class _Abort(Exception): # private sentinel, only raised via Tx.abort()
def __init__(self, value): self.value = value
@dataclass
class Tx:
aborted: bool = False
abort_value: Any = None
def abort(self, value=None) -> NoReturn:
self.abort_value = value; self.aborted = True
raise _Abort(value)
def _safe_rollback(conn): # never masks an in-flight exception
try: conn.execute("ROLLBACK")
except sqlite3.OperationalError: pass
@contextmanager
def atomic_write(conn) -> Iterator[Tx]:
saved = conn.isolation_level
conn.isolation_level = None
tx = Tx()
try:
conn.execute("BEGIN IMMEDIATE")
try:
yield tx
except _Abort:
_safe_rollback(conn) # abort: swallow
except BaseException:
_safe_rollback(conn); raise # real error: rollback + propagate
else:
conn.execute("COMMIT") # COMMIT in else: never double-rolls-back
finally:
conn.isolation_level = saved
Trivial happy path (the dominant future caller — close-gate, batch ops):
with atomic_write(conn):
conn.execute(...)
# committed; isolation restored. zero ceremony.
Abort-with-value (the two current sites):
with atomic_write(conn) as tx:
...
if no_candidate:
tx.abort(None) # rolls back, leaves the block
...
if tx.aborted:
return tx.abort_value
return success_value
What it hides: the isolation_level save/None/restore dance, BEGIN IMMEDIATE issuance, the three-way exit fork, COMMIT-failure handling, and the class of bug where a newly-added early return forgets its ROLLBACK.
Dependency Strategy
In-process — merged directly. Pure SQLite over the connection the caller already holds; no I/O, no new third-party deps (contextlib, dataclasses, sqlite3 only). Both modules already symbol-import from codebugs.db (merge.py:429, milestones.py:1713); the refactor extends those lines with atomic_write, Tx. No import cycle — db.py imports domain modules only lazily inside _ensure_modules_loaded(). Concurrency stays SQLite file-lock (WAL + BEGIN IMMEDIATE).
Also: make db.connect() set PRAGMA busy_timeout=5000 explicitly (behavior-preserving — it's already Python's sqlite3.connect default) so cross-connection contention serializes rather than relying on an implicit default.
Testing Strategy
New boundary tests (mostly :memory: — BEGIN IMMEDIATE runs in-memory, as existing test_merge.py proves; only the contention test needs tmp_path):
- happy path commits + persists;
isolation_level restored.
tx.abort(v) rolls back (rows not persisted), returns v, sets tx.aborted.
- body raising a real exception rolls back + re-raises.
- COMMIT-failure (deferred constraint) re-raises the real error, not a "no transaction active" rollback error.
- (
tmp_path, two db.connect() conns) second atomic_write serializes against the first up to busy_timeout.
Old tests to delete: none — there are no shallow-helper tests for this pattern today. The existing behavioral tests of merge.merge() and milestones.pull_next are the regression guarantee: they must pass unchanged after the refactor.
Implementation Recommendations
- Owns: the manual-transaction lifecycle for a single SQLite write critical section — isolation save/restore,
BEGIN IMMEDIATE, the COMMIT/ROLLBACK/abort fork.
- Hides: that there are three exit paths and that one of them must roll back yet still return a value; the COMMIT-failure masking trap; the autocommit toggling required to issue manual BEGIN.
- Exposes:
atomic_write(conn) -> Tx and tx.abort(value). _Abort stays private to db.py.
- Migration: replace the two copy-pasted blocks; preserve each site's post-block re-read (
milestones._get_item_by_ref stays after the with, post-commit). Future write-critical sections adopt atomic_write instead of re-pasting the pattern.
Adversarial review
Stress-tested by a 3-agent adversarial review (adversary / defender / judge). Verdict: 8/10 — fix mandatory items, then ship. 0 FATAL; all @contextmanager/_Abort/BaseException control-flow semantics verified correct against CPython; 132-test baseline green.
Findings fixed into the design before this RFC:
- COMMIT-failure bug — original "COMMIT inside the body
try" let a failed COMMIT trigger a second ROLLBACK that masks the real error. Fixed via else: branch + _safe_rollback (judge built+ran the fix).
busy_timeout — adversary's "BEGIN IMMEDIATE will raise not block" was refuted (Python's default 5s timeout serializes, proven by the passing two-thread test test_milestones.py:801); documentation gap conceded → explicit PRAGMA added.
- Doc-accuracy: correct function name (
merge.merge()), import claim, :memory: test strategy.
Full design + corrections appendix: docs/superpowers/specs/2026-06-18-atomic-write-primitive-design.md.
🤖 Generated with Claude Code
Problem
The "atomic write under
BEGIN IMMEDIATE" transaction pattern is copy-pasted verbatim in two places:merge.py:239-289— the lock-acquire critical section insidemerge.merge()(NOTstart_session).milestones.py:812-851—pull_next.Both repeat the identical save/restore + BEGIN IMMEDIATE / COMMIT / ROLLBACK boilerplate:
CLAUDE.mdeven documents the coupling: milestones "reuses themerge.py:239-289save/restore pattern." That is coupling-by-convention — a bug fixed in one copy won't reach the other. This is a shallow, duplicated chunk of the most error-prone code in the codebase (manual transaction control). In-process, pure-SQLite, trivially deepenable.The crux: three exit paths
The critical section has three exit paths, not two — which is why a naive context manager is wrong:
merge.py:251(lock_held),merge.py:268(main_moved), andmilestones.py:825(return Nonewhen no candidate) all doconn.execute("ROLLBACK"); return <value>.The abort decision must happen inside the lock (TOCTOU): the lock row +
current_main_head_fn()in merge, and candidate selection inpull_next, are the lock-protected state.Proposed Interface
A single context manager
atomic_write(conn)indb.py(infrastructure layer), yielding aTxhandle whose.abort(value)covers exit path 3.Trivial happy path (the dominant future caller — close-gate, batch ops):
Abort-with-value (the two current sites):
What it hides: the
isolation_levelsave/None/restore dance,BEGIN IMMEDIATEissuance, the three-way exit fork, COMMIT-failure handling, and the class of bug where a newly-added early return forgets itsROLLBACK.Dependency Strategy
In-process — merged directly. Pure SQLite over the connection the caller already holds; no I/O, no new third-party deps (
contextlib,dataclasses,sqlite3only). Both modules already symbol-import fromcodebugs.db(merge.py:429,milestones.py:1713); the refactor extends those lines withatomic_write, Tx. No import cycle —db.pyimports domain modules only lazily inside_ensure_modules_loaded(). Concurrency stays SQLite file-lock (WAL +BEGIN IMMEDIATE).Also: make
db.connect()setPRAGMA busy_timeout=5000explicitly (behavior-preserving — it's already Python'ssqlite3.connectdefault) so cross-connection contention serializes rather than relying on an implicit default.Testing Strategy
New boundary tests (mostly
:memory:— BEGIN IMMEDIATE runs in-memory, as existingtest_merge.pyproves; only the contention test needstmp_path):isolation_levelrestored.tx.abort(v)rolls back (rows not persisted), returnsv, setstx.aborted.tmp_path, twodb.connect()conns) secondatomic_writeserializes against the first up tobusy_timeout.Old tests to delete: none — there are no shallow-helper tests for this pattern today. The existing behavioral tests of
merge.merge()andmilestones.pull_nextare the regression guarantee: they must pass unchanged after the refactor.Implementation Recommendations
BEGIN IMMEDIATE, the COMMIT/ROLLBACK/abort fork.atomic_write(conn) -> Txandtx.abort(value)._Abortstays private todb.py.milestones._get_item_by_refstays after thewith, post-commit). Future write-critical sections adoptatomic_writeinstead of re-pasting the pattern.Adversarial review
Stress-tested by a 3-agent adversarial review (adversary / defender / judge). Verdict: 8/10 — fix mandatory items, then ship. 0 FATAL; all
@contextmanager/_Abort/BaseExceptioncontrol-flow semantics verified correct against CPython; 132-test baseline green.Findings fixed into the design before this RFC:
try" let a failed COMMIT trigger a second ROLLBACK that masks the real error. Fixed viaelse:branch +_safe_rollback(judge built+ran the fix).busy_timeout— adversary's "BEGIN IMMEDIATE will raise not block" was refuted (Python's default 5s timeout serializes, proven by the passing two-thread testtest_milestones.py:801); documentation gap conceded → explicit PRAGMA added.merge.merge()), import claim,:memory:test strategy.Full design + corrections appendix:
docs/superpowers/specs/2026-06-18-atomic-write-primitive-design.md.🤖 Generated with Claude Code