Summary
Tina4 v3.13's migration runner opens a transaction for the migration-tracking lookup queries and never closes/commits it. With psycopg2 (autocommit off by default) the connection is then left in idle in transaction state indefinitely. Over time these leak across pod restarts and eventually exhaust the postgres connection pool — at which point new boots fail with:
FATAL: remaining connection slots are reserved for roles with the SUPERUSER attribute
Tina4's autodiscovery opens a fresh connection per src/app/*.py it loads, so when the pool is saturated every module load fails and src/routes/* never register. The HTTP server still answers /health-check (probe passes → pod marked Ready), but every actual app route returns 404 — a silent "ready but broken" state.
Versions
tina4_python 3.13.9
psycopg2 (postgres driver, transactions=on by default)
- Python 3.13 / Alpine
- PostgreSQL 14 / 16 (reproduced on both)
Reproduction
- Boot a Tina4 v3.13 app against a postgres DB
- Look at
pg_stat_activity shortly after startup:
SELECT pid, state, age(now(), xact_start) AS tx_age, substr(query, 1, 80)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND application_name = '' -- the app
ORDER BY backend_start;
You'll see at least one session like:
pid state last_query tx_age
2371466 idle in transaction SELECT MAX(batch) as max_batch FROM tina4_migration 8m
The transaction stays open for the lifetime of the connection (we've observed sessions sitting in this state for 2+ days, eventually being killed by idle_in_transaction_session_timeout).
Smoking gun in source
Two unwrapped reads in tina4_python/migration/runner.py:
- Line 297 —
_get_next_batch:
row = db.fetch_one("SELECT MAX(batch) as max_batch FROM tina4_migration")
- Line 491:
row = db.fetch_one("SELECT MAX(batch) as max_batch FROM tina4_migration WHERE passed = 1")
Neither path commits or rolls back, and the db.fetch_one helper appears to start an implicit transaction (psycopg2 default). The transaction therefore sits open holding row-level locks on tina4_migration and a connection slot.
Impact
- Each fresh pod start leaks (at minimum) one
idle in transaction connection.
- On a busy cluster with frequent restarts, the pool fills in days. We hit
FATAL: remaining connection slots are reserved after ~89 restarts over 38 hours on a postgres with max_connections=100.
- Symptoms are confusing because the pod looks healthy: liveness/readiness pass (just
/health-check), but actual app routes 404 because the module autodiscovery silently failed during boot when connections were saturated.
Workaround we're using
Set a per-role timeout so postgres auto-kills the zombies:
ALTER ROLE <app_role> SET idle_in_transaction_session_timeout = '300s';
This stops the leak from saturating the pool, but doesn't fix the underlying transaction-handling bug.
Suggested fix
Wrap the read-only lookups in explicit with db.transaction(): ... (or commit immediately), or run them in autocommit mode since they're plain SELECTs that don't need transactional isolation. Same treatment likely needed in the analogous spots in tina4_python/database/ where helper-style db.fetch_one/db.fetch_all start implicit transactions.
Happy to send a PR if the team confirms the intended fix shape.
Summary
Tina4 v3.13's migration runner opens a transaction for the migration-tracking lookup queries and never closes/commits it. With
psycopg2(autocommit off by default) the connection is then left inidle in transactionstate indefinitely. Over time these leak across pod restarts and eventually exhaust the postgres connection pool — at which point new boots fail with:Tina4's autodiscovery opens a fresh connection per
src/app/*.pyit loads, so when the pool is saturated every module load fails andsrc/routes/*never register. The HTTP server still answers/health-check(probe passes → pod marked Ready), but every actual app route returns 404 — a silent "ready but broken" state.Versions
tina4_python3.13.9psycopg2(postgres driver, transactions=on by default)Reproduction
pg_stat_activityshortly after startup:You'll see at least one session like:
The transaction stays open for the lifetime of the connection (we've observed sessions sitting in this state for 2+ days, eventually being killed by
idle_in_transaction_session_timeout).Smoking gun in source
Two unwrapped reads in
tina4_python/migration/runner.py:_get_next_batch:Neither path commits or rolls back, and the
db.fetch_onehelper appears to start an implicit transaction (psycopg2 default). The transaction therefore sits open holding row-level locks ontina4_migrationand a connection slot.Impact
idle in transactionconnection.FATAL: remaining connection slots are reservedafter ~89 restarts over 38 hours on a postgres withmax_connections=100./health-check), but actual app routes 404 because the module autodiscovery silently failed during boot when connections were saturated.Workaround we're using
Set a per-role timeout so postgres auto-kills the zombies:
This stops the leak from saturating the pool, but doesn't fix the underlying transaction-handling bug.
Suggested fix
Wrap the read-only lookups in explicit
with db.transaction(): ...(or commit immediately), or run them in autocommit mode since they're plain SELECTs that don't need transactional isolation. Same treatment likely needed in the analogous spots intina4_python/database/where helper-styledb.fetch_one/db.fetch_allstart implicit transactions.Happy to send a PR if the team confirms the intended fix shape.