Skip to content

feat(portal): security hardening — origin checks + token auth (close the no-auth gap behind the bind default) #247

@dotdevdotdev

Description

@dotdevdotdev

Goal

Finish the portal security hardening the council review flagged (#245 / conscience's audit). PR #246 shipped the bind default (127.0.0.1), which protects strangers who install — but two real vectors remain:

  1. CSRF/browser vector (affects everyone, including loopback-only users): SECURITY.md admits no Origin/CSRF checks are enforced on state-changing POSTs. A malicious page open in a browser on the trusted machine can fire POSTs at https://localhost:8765/api/* (scheduler control, missions dispatch, project deletion, artifact upload) — the loopback bind does nothing against this, because the attacker's code is already running inside the perimeter.
  2. LAN exposure has zero protection once opted in: the moment a user sets server.host: 0.0.0.0 for phone access — the product's headline feature — anyone who can reach the port drives every endpoint. Opting into the demo experience currently means opting out of all protection. The Phase-1 demo video (gtm: launch-readiness funnel repair + demo video + spike launches (path to 1000 stars) #245) will literally advertise this mode; it must be honest before the launch spikes.

Scope

In:

  • Origin validation on all state-changing HTTP requests and the WebSocket upgrade
  • Bearer-token auth for API + WebSocket, enforced for non-loopback binds
  • Portal UI: one-time token entry per device
  • Docs: README trust model + SECURITY.md updated to the new posture

Out:

  • Full user/identity system, sessions, MFA — Cloudflare Tunnel + Zero Trust remains the recommended pattern for internet exposure
  • TLS changes (existing self-signed cert flow unchanged)
  • TTS/STT service auth (separate trust domain, out of SECURITY.md scope)

Approach

Task 1 — Origin check (the CSRF guard; small, ships first)

  • On every state-changing request (POST/PUT/DELETE under /api/*) and on WebSocket upgrade, validate the Origin header:
    • Absent Origin → allow. CSRF is a browser vector; curl/CLI/scripts don't send Origin and must keep working.
    • Present Origin → must match the portal's own scheme+host+port, localhost/127.0.0.1 equivalents, or an entry in a new server.allowed_origins config list (required for Cloudflare Tunnel users, whose browser origin is the tunnel domain).
    • Mismatch → 403 with a log line naming the origin.
  • Config: server.allowed_origins: [] (list of exact origins, e.g. https://portal.example.com).
  • This protects loopback-only users immediately and costs nothing in UX.

Task 2 — Token auth (makes 0.0.0.0 honest)

  • server.auth_token in config. Auto-generate (32+ bytes urlsafe) on first agentwire init / first portal start if absent; agentwire portal token prints it, --rotate regenerates.
  • Enforcement policy: required whenever the bind is non-loopback. Portal refuses to start on 0.0.0.0 without a token configured (clear error telling the user to run agentwire portal token). Loopback binds: token optional (origin check already covers the browser vector; local processes are inside the trust boundary anyway).
  • Transport: Authorization: Bearer <token> on HTTP; token in the WebSocket connect (subprotocol or first-message auth — pick whichever the current WS framing makes cleaner). Constant-time comparison.
  • Portal UI: on 401, show a token-entry screen once per device; store in localStorage; attach to all subsequent requests/WS connects. The phone flow becomes: open portal → paste token once → done.
  • Local callers (hooks, queue-processor, CLI helpers that hit the portal API) read the token from ~/.agentwire/config.yaml — same machine, no new secret distribution.

Task 3 — Docs to match

  • README network & trust model note: update from "there is no auth yet" to the new posture (token required for LAN, origin checks always on).
  • SECURITY.md trust model section rewritten accordingly; keep the Cloudflare Tunnel + Zero Trust recommendation for anything internet-facing.
  • agentwire-config skill: document server.allowed_origins + server.auth_token.

Phases

  • Phase 1 — Origin validation on state-changing /api/* + WS upgrade, server.allowed_origins config, tests (absent-origin allow, match allow, mismatch 403, tunnel-domain allowlist)
  • Phase 2 — Token generation + storage, Bearer enforcement on API + WS for non-loopback binds, refuse-to-start rule, agentwire portal token CLI, constant-time compare, tests
  • Phase 3 — Portal token-entry UI (once per device, localStorage), 401 flow, phone-flow verification
  • Phase 4 — README / SECURITY.md / config-skill updates

Verification

  • Origin: curl -X POST localhost:8765/api/... (no Origin) still works; same request with Origin: https://evil.example → 403; with the portal's own origin → 200; tunnel domain in allowed_origins → 200.
  • Token: portal on 0.0.0.0 with no token refuses to start with an actionable error; with token, unauthenticated /api/* → 401, Bearer-authed → 200; WS connect without token rejected.
  • Phone flow: fresh mobile browser → token prompt → paste once → full portal works across reloads.
  • Local tooling: hooks/queue-processor/CLI paths that call the portal still function with token enforcement on.
  • Docs: README trust-model note no longer says "no auth"; SECURITY.md matches the implementation.

Related: #245 (Phase 0 shipped the bind default in #246; this issue closes the remaining gap before the launch spikes).

Built by dotdev.dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions