Skip to content

Auth: sign-in 500s (409 duplicate key) when a stub users row exists — make user creation idempotent #285

Description

@AndresL230

Summary

Google sign-in returns 500 Internal server error (unhandled exception) when a users row already exists for the account's id but has a NULL google_id — i.e. a stub created by graph_service.ensure_user_exists. The new-user path does a blind INSERT and looks users up only by google_id, so it can neither find nor adopt the stub and collides on the primary key. The result is a permanent sign-in 500 loop for the affected account.

Discovered 2026-06-30 while re-testing onboarding on staging (after deleting a user to start fresh).

Root cause

The new-user branch in routes/auth.py:

  • routes/auth.py:379 — looks up the existing user by google_id only:
    existing = table("users").select("id,is_approved", filters={"google_id": f"eq.{google_id}"})
  • routes/auth.py:403 — if not found, does a blind insert by id:
    table("users").insert({"id": user_id, "email": ..., "google_id": google_id, ...})

A stub user with the same id but google_id IS NULL is created by:

  • services/graph_service.py:81 (ensure_user_exists, called from get_graph at graph_service.py:145):
    table("users").insert({"id": user_id, "streak_count": 0})

Because the stub's google_id is NULL, the lookup at auth.py:379 misses it, control falls to the insert at auth.py:403, and the insert collides on users_pkey. The httpx raise_for_status() turns the PostgREST 409 into an unhandled exception → main.py unhandled_exception_handler500.

How a stub gets created before sign-in

user_id is deterministic: user_{google_id} (auth.py:400), and the sapling_session cookie is HMAC-signed and survives a DB delete. So after deleting a user:

  1. A still-"logged in" tab calls the graph endpoint → get_graphensure_user_exists → inserts the stub.
  2. The user signs in with Google → 409 → 500 (loop).

This is the exact path hit during onboarding re-testing, but any future flow that calls ensure_user_exists for a not-yet-OAuth'd id reproduces it.

Reproduction

Confirmed against staging by replaying the two inserts:

stub created (id only, google_id null)
REPRODUCED 500 cause -> HTTP 409: {"code":"23505",
  "details":"Key (id)=(...) already exists.",
  "message":"duplicate key value violates unique constraint \"users_pkey\""}

Impact

  • Any account whose row is deleted (or otherwise stub-created before OAuth) is locked out with a 500 until manually repaired.
  • Manual recovery today: UPDATE users SET google_id='<sub>', is_approved=true WHERE id='user_<sub>' (forces sign-in down the working UPDATE path), or delete the stub and clear the sapling_session cookie before re-signing-in.

Proposed fix

Make user creation idempotent. Options (pick one):

  1. Upsert on id at auth.py:403 (on_conflict="id") so an existing stub gets google_id/email/auth_provider merged instead of colliding.
  2. Look up by the deterministic id (user_{google_id}) instead of (or in addition to) google_id, and UPDATE a stub in place — adopting the stub into a full user.

Either way, a stub should be promoted to a real user rather than causing a 409.

Further testing / follow-ups

  • Unit/integration test: stub exists (google_id NULL) → OAuth sign-in succeeds and promotes the stub (no 409).
  • Test: brand-new user (no stub) still creates correctly.
  • Test: existing fully-provisioned user still takes the UPDATE path.
  • Decide whether ensure_user_exists should set anything beyond {id, streak_count} (and/or whether sign-out should clear sapling_session so deleted accounts don't linger).
  • Confirm the same race can't strand a genuinely-new first-time user.

References

  • routes/auth.py:356-410 (new-user path), :379 (lookup), :403 (insert), :425 (/pending gate)
  • services/graph_service.py:70-83 (ensure_user_exists), :145 (get_graph caller)
  • main.py:140-146 (unhandled exception → 500 + request_id)
  • Related shipped fix: onboarding completion persistence (830555e, branch fix/onboarding-completion-persistence)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions