Summary
Google sign-in returns 500 Internal server error (unhandled exception) when a users row already exists for the account's id but has a NULL google_id — i.e. a stub created by graph_service.ensure_user_exists. The new-user path does a blind INSERT and looks users up only by google_id, so it can neither find nor adopt the stub and collides on the primary key. The result is a permanent sign-in 500 loop for the affected account.
Discovered 2026-06-30 while re-testing onboarding on staging (after deleting a user to start fresh).
Root cause
The new-user branch in routes/auth.py:
routes/auth.py:379 — looks up the existing user by google_id only:
existing = table("users").select("id,is_approved", filters={"google_id": f"eq.{google_id}"})
routes/auth.py:403 — if not found, does a blind insert by id:
table("users").insert({"id": user_id, "email": ..., "google_id": google_id, ...})
A stub user with the same id but google_id IS NULL is created by:
services/graph_service.py:81 (ensure_user_exists, called from get_graph at graph_service.py:145):
table("users").insert({"id": user_id, "streak_count": 0})
Because the stub's google_id is NULL, the lookup at auth.py:379 misses it, control falls to the insert at auth.py:403, and the insert collides on users_pkey. The httpx raise_for_status() turns the PostgREST 409 into an unhandled exception → main.py unhandled_exception_handler → 500.
How a stub gets created before sign-in
user_id is deterministic: user_{google_id} (auth.py:400), and the sapling_session cookie is HMAC-signed and survives a DB delete. So after deleting a user:
- A still-"logged in" tab calls the graph endpoint →
get_graph → ensure_user_exists → inserts the stub.
- The user signs in with Google → 409 → 500 (loop).
This is the exact path hit during onboarding re-testing, but any future flow that calls ensure_user_exists for a not-yet-OAuth'd id reproduces it.
Reproduction
Confirmed against staging by replaying the two inserts:
stub created (id only, google_id null)
REPRODUCED 500 cause -> HTTP 409: {"code":"23505",
"details":"Key (id)=(...) already exists.",
"message":"duplicate key value violates unique constraint \"users_pkey\""}
Impact
- Any account whose row is deleted (or otherwise stub-created before OAuth) is locked out with a 500 until manually repaired.
- Manual recovery today:
UPDATE users SET google_id='<sub>', is_approved=true WHERE id='user_<sub>' (forces sign-in down the working UPDATE path), or delete the stub and clear the sapling_session cookie before re-signing-in.
Proposed fix
Make user creation idempotent. Options (pick one):
- Upsert on
id at auth.py:403 (on_conflict="id") so an existing stub gets google_id/email/auth_provider merged instead of colliding.
- Look up by the deterministic
id (user_{google_id}) instead of (or in addition to) google_id, and UPDATE a stub in place — adopting the stub into a full user.
Either way, a stub should be promoted to a real user rather than causing a 409.
Further testing / follow-ups
References
routes/auth.py:356-410 (new-user path), :379 (lookup), :403 (insert), :425 (/pending gate)
services/graph_service.py:70-83 (ensure_user_exists), :145 (get_graph caller)
main.py:140-146 (unhandled exception → 500 + request_id)
- Related shipped fix: onboarding completion persistence (
830555e, branch fix/onboarding-completion-persistence)
Summary
Google sign-in returns
500 Internal server error(unhandled exception) when ausersrow already exists for the account's id but has aNULLgoogle_id— i.e. a stub created bygraph_service.ensure_user_exists. The new-user path does a blindINSERTand looks users up only bygoogle_id, so it can neither find nor adopt the stub and collides on the primary key. The result is a permanent sign-in 500 loop for the affected account.Discovered 2026-06-30 while re-testing onboarding on staging (after deleting a user to start fresh).
Root cause
The new-user branch in
routes/auth.py:routes/auth.py:379— looks up the existing user bygoogle_idonly:routes/auth.py:403— if not found, does a blind insert by id:A stub user with the same
idbutgoogle_id IS NULLis created by:services/graph_service.py:81(ensure_user_exists, called fromget_graphatgraph_service.py:145):Because the stub's
google_idis NULL, the lookup at auth.py:379 misses it, control falls to the insert at auth.py:403, and the insert collides onusers_pkey. Thehttpxraise_for_status()turns the PostgREST 409 into an unhandled exception →main.pyunhandled_exception_handler→500.How a stub gets created before sign-in
user_idis deterministic:user_{google_id}(auth.py:400), and thesapling_sessioncookie is HMAC-signed and survives a DB delete. So after deleting a user:get_graph→ensure_user_exists→ inserts the stub.This is the exact path hit during onboarding re-testing, but any future flow that calls
ensure_user_existsfor a not-yet-OAuth'd id reproduces it.Reproduction
Confirmed against staging by replaying the two inserts:
Impact
UPDATE users SET google_id='<sub>', is_approved=true WHERE id='user_<sub>'(forces sign-in down the working UPDATE path), or delete the stub and clear thesapling_sessioncookie before re-signing-in.Proposed fix
Make user creation idempotent. Options (pick one):
idat auth.py:403 (on_conflict="id") so an existing stub getsgoogle_id/email/auth_providermerged instead of colliding.id(user_{google_id}) instead of (or in addition to)google_id, andUPDATEa stub in place — adopting the stub into a full user.Either way, a stub should be promoted to a real user rather than causing a 409.
Further testing / follow-ups
google_idNULL) → OAuth sign-in succeeds and promotes the stub (no 409).ensure_user_existsshould set anything beyond{id, streak_count}(and/or whether sign-out should clearsapling_sessionso deleted accounts don't linger).References
routes/auth.py:356-410(new-user path),:379(lookup),:403(insert),:425(/pendinggate)services/graph_service.py:70-83(ensure_user_exists),:145(get_graphcaller)main.py:140-146(unhandled exception → 500 +request_id)830555e, branchfix/onboarding-completion-persistence)