Skip to content

Fix lock auto-cleanup race: a waiter can unlink a freshly-acquired live lock, breaking mutual exclusion #491

Description

@dotdevdotdev

Goal

Make stale-lock recovery safe so two concurrent ensure runs can never both believe they hold a session lock.

Why it matters

In the blocking-wait path (locking.py:104-111), when a waiter sees the holder PID is dead it calls remove_lock(session) which unconditionally lock_path.unlink() (:308-326) WITHOUT re-checking the lock is still unheld. Between get_lock_holder() reading a dead PID and the unlink, a different process can open the same path, acquire the flock, and write its own PID; the racing waiter then deletes that live holder's lock file, and both proceed on different inodes each thinking they hold the lock. is_session_locked also has a TOCTOU window (acquire-then-immediately-release flock, :158-159). The module exists precisely to guarantee one ensure per session (docstring). Scope: only the opt-in --wait-lock path triggers this; the interleave is rare; this code is untested (test_locking.py covers only path-sanitization). But it's a correctness hole in a safety primitive — duplicate dispatch / racing send-keys / clobbered state.

Proposed approach

Preferred: drop the unlink-based recovery entirely and rely on flock's kernel-level auto-release on holder death — a dead PID's flock is already released, so the next LOCK_EX|LOCK_NB poll simply succeeds without any file deletion (the manual cleanup is both unnecessary and unsafe). Minimal: replace the blind remove_lock in the dead-holder branch with remove_stale_lock (which already gates on is_session_locked at :303), and re-check liveness under the about-to-be-held flock. Add a regression test for the dead-holder + concurrent-acquire interleave.

Effort / risk

Effort M. Risk: medium — concurrency code; test carefully. The flock-only approach is the cleanest and removes the most code (fits No-Backwards-Compat).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:bugNon-app work category: bugbugSomething isn't workingfeature:sessionsApplication feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions