diff --git a/deploy/systemd/README.md b/deploy/systemd/README.md index a95f18b..f5b86ec 100644 --- a/deploy/systemd/README.md +++ b/deploy/systemd/README.md @@ -17,6 +17,18 @@ them to fit their host layout. | `user/security-scanner-scan-all.service` | **No-sudo** user-level variant (`%h`-based, runs as the invoking user). | | `user/security-scanner-scan-all.timer` | Schedules the user `.service` (default: every 2 hours). | +**Scale worker pool + periodic jobs (M3, see §9):** + +| Path | Purpose | +| --- | --- | +| `security-scanner-scan-worker@.service` | Instanced daemon template; `scan-worker@1..N` are N independent processes, each a distinct fence-token holder (FR-4). | +| `scan-worker.target` | Brings the whole worker pool up/down at once. | +| `security-scanner-lease-reaper.{service,timer}` | Reclaims expired job + repo leases (FR-6) on a timer. | +| `security-scanner-incr-poll.{service,timer}` | `discover-updates --enqueue --from-catalog --ls-remote-skip` (FR-2). | +| `security-scanner-baseline.{service,timer}` | Per-repo baseline `ScanJob` enqueue (FR-3). | +| `security-scanner-freshness-eval.{service,timer}` | Per-repo staleness detector + `BREACH_COUNTER` rollup (FR-8). | +| `security-scanner-catalog-reconcile.{service,timer}` | Org catalog reconcile (FR-1). **Governance-gated: keep DISABLED until GATE 2** (default provider refuses live fetch). | + Two flavors: - **System-level** (`§4`) — requires root, runs as a dedicated `scanner` user, full hardening. Use for shared/production hosts. @@ -319,7 +331,63 @@ manually if you also want to clean state. --- -## 9. Related documents +## 9. Scale worker pool (M3) — N processes + periodic timers + +The scale redesign (design.md v2, FR-4) replaces the single weekly `scan-all` +oneshot with a **queue + N-worker-pool** model: a per-repo job queue, N +independent worker processes draining it, and several periodic timers feeding +and maintaining the queue. The `scan-all` units above still work; the units in +this section are the scale path. + +> **Box-gated.** The deployment box is OFFLINE. These artifacts are what a future +> box deploy instantiates; DEPLOYED behavior (N live processes, `Restart=on-failure` +> recovery on a real crash, and the real cadence values) is NOT proven here. The +> `OnCalendar=` values in every timer are **GATE-1 placeholders** — the box load +> gate sets the real cadences (poll interval, baseline window, N). Do not treat +> them as load-validated. + +### 9a. Worker pool + +`security-scanner-scan-worker@.service` is an instanced (templated) unit. The +systemd instance name `%i` is threaded into `--worker-id scan-worker@%i`, so +`scan-worker@1 .. scan-worker@N` run as N independent OS processes, each a +distinct RepoLease fence-token holder. The RepoLease CAS (M2) guarantees two +instances never scan the same repo concurrently (FR-4). + +Bring up N instances (pick N from the box load gate): + +```bash +sudo systemctl enable --now scan-worker@{1..8} # example: 8 workers +sudo systemctl enable --now scan-worker.target # group start/stop +# stop the whole pool: +sudo systemctl stop scan-worker.target +``` + +Each instance is `Type=simple` (long-running daemon, polls until `SIGTERM`) with +`Restart=on-failure`; a crashed instance is restarted by systemd and its stranded +leases are reclaimed by the lease-reaper timer below. + +### 9b. Periodic timers + +```bash +sudo systemctl enable --now security-scanner-lease-reaper.timer +sudo systemctl enable --now security-scanner-incr-poll.timer +sudo systemctl enable --now security-scanner-baseline.timer +sudo systemctl enable --now security-scanner-freshness-eval.timer +``` + +### 9c. catalog-reconcile — DISABLED until GATE 2 + +Do **not** `systemctl enable` `security-scanner-catalog-reconcile.timer` yet. The +`reconcile` command's default org-list provider is a governance-gated stub that +REFUSES to fetch live GitHub (a live org GET is gated to a human PR + the +autopilot `ghas-live-fetch-or-mutation-required` stop-condition, GATE 2). As +shipped the unit fails closed; enabling the timer early only schedules failing +runs. Enable it only after GATE 2 clears and a live provider is wired. + +--- + +## 10. Related documents - `docs/workbench/adrs/ADR-20260531-periodic-multi-repo-scan-catalog.md` - `docs/workbench/specs/2026-05-31-scan-all-and-target-catalog.md` diff --git a/deploy/systemd/scan-worker.target b/deploy/systemd/scan-worker.target new file mode 100644 index 0000000..4367dac --- /dev/null +++ b/deploy/systemd/scan-worker.target @@ -0,0 +1,17 @@ +[Unit] +Description=security-scanner incremental worker pool (N scan-worker@ instances) +Documentation=https://github.com/source-security-dev/security-scanner +# Aggregates the N independent scan-worker@ daemon instances into one start/stop +# unit so an operator brings the whole pool up or down at once. +# +# Bring up N instances (choose N from the box load gate — N is box-gated, not +# invented here): +# systemctl enable --now scan-worker@{1..N} +# systemctl enable --now scan-worker.target +# +# Each scan-worker@i WantedBy=scan-worker.target, so enabling the instances wires +# them into this target. Stop the whole pool with: +# systemctl stop scan-worker.target + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-baseline.service b/deploy/systemd/security-scanner-baseline.service new file mode 100644 index 0000000..9a34cca --- /dev/null +++ b/deploy/systemd/security-scanner-baseline.service @@ -0,0 +1,41 @@ +[Unit] +Description=security-scanner baseline (enqueue per-repo baseline ScanJobs, FR-3) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot + +# --- Operator: replace placeholders below --- +User=scanner +Group=scanner +WorkingDirectory=/opt/security-scanner + +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +EnvironmentFile=-/etc/security-scanner/scm.env + +# Enqueue one low-priority baseline ScanJob per INCLUDED catalog repo (NOT +# scan-all; per-repo queue baseline, SC-3). Backpressure skips this run when the +# pending backlog is over threshold; the rolling 1/N slice covers the catalog +# across runs. The --rolling-offset placeholder below assumes rolling is left at +# its default divisor; GATE 1 sets the real divisor/offset rotation. +ExecStart=/usr/bin/uv run security-scanner baseline + +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-baseline.timer b/deploy/systemd/security-scanner-baseline.timer new file mode 100644 index 0000000..9a926f6 --- /dev/null +++ b/deploy/systemd/security-scanner-baseline.timer @@ -0,0 +1,23 @@ +[Unit] +Description=Scheduler for security-scanner baseline enqueue +Documentation=https://github.com/source-security-dev/security-scanner + +[Timer] +# PLACEHOLDER CADENCE — set by GATE 1 (box online load validation). The design +# default is a weekly full-scan window (DEFAULT_BASELINE_CADENCE_HOURS = 24*7), +# mirroring the pre-scale weekly full scan, but the real 500-repo baseline window +# (rolling-slice rotation vs queue depth) is a load-gate decision; do NOT invent +# a load-validated value. Placeholder: weekly, Sunday 04:00. +OnCalendar=Sun *-*-* 04:00:00 + +# If the host was off when the timer fired, run as soon as possible afterwards. +Persistent=true + +# Randomized delay so a fleet doesn't all enqueue baselines at the same instant. +RandomizedDelaySec=900 + +# Bind to the matching .service unit (same basename). +Unit=security-scanner-baseline.service + +[Install] +WantedBy=timers.target diff --git a/deploy/systemd/security-scanner-catalog-reconcile.service b/deploy/systemd/security-scanner-catalog-reconcile.service new file mode 100644 index 0000000..a33c076 --- /dev/null +++ b/deploy/systemd/security-scanner-catalog-reconcile.service @@ -0,0 +1,56 @@ +[Unit] +Description=security-scanner catalog-reconcile (org catalog reconcile + coverage gap, FR-1) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +# ============================================================================= +# GOVERNANCE GATE 2 — DO NOT ENABLE THIS UNIT (or its .timer) UNTIL GATE 2 CLEARS +# ----------------------------------------------------------------------------- +# The reconcile command's DEFAULT org-list provider is a governance-gated stub +# (GovernanceGatedOrgRepoListProvider) that REFUSES to fetch live GitHub: a live +# org GET is gated to a human PR + the autopilot +# `ghas-live-fetch-or-mutation-required` stop-condition (GATE 2). As shipped, +# `security-scanner reconcile` (the ExecStart below) will FAIL CLOSED with an +# "inject a provider" error rather than reach live GitHub by accident. +# +# So this unit is intentionally INERT until GATE 2: enabling the timer before the +# gate clears only produces failing oneshot runs (no live fetch, no mutation). +# After GATE 2, the operator wires the GATE-2 live provider (out of scope here) +# and only THEN enables the timer. +# ============================================================================= + +[Service] +Type=oneshot + +# --- Operator: replace placeholders below --- +User=scanner +Group=scanner +WorkingDirectory=/opt/security-scanner + +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +EnvironmentFile=-/etc/security-scanner/scm.env + +# Reconcile the org catalog (FR-1) and thread the coverage gap into the freshness +# rollup (--evaluate-freshness materializes BREACH_COUNTER.coverage_gap). NOTE: +# the default provider refuses live fetch (GATE 2 above) — this run fails closed +# until a GATE-2 live provider is wired. +ExecStart=/usr/bin/uv run security-scanner reconcile --evaluate-freshness + +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-catalog-reconcile.timer b/deploy/systemd/security-scanner-catalog-reconcile.timer new file mode 100644 index 0000000..c4ca8cb --- /dev/null +++ b/deploy/systemd/security-scanner-catalog-reconcile.timer @@ -0,0 +1,29 @@ +[Unit] +Description=Scheduler for security-scanner catalog-reconcile +Documentation=https://github.com/source-security-dev/security-scanner + +# ============================================================================= +# GOVERNANCE GATE 2 — DO NOT `systemctl enable` THIS TIMER UNTIL GATE 2 CLEARS. +# The bound .service refuses live org fetch until a GATE-2 live provider is wired +# (see security-scanner-catalog-reconcile.service). Enabling this timer early +# only schedules failing fail-closed runs. Keep it DISABLED until GATE 2. +# ============================================================================= + +[Timer] +# PLACEHOLDER CADENCE — set by GATE 1 (box load validation) AND only meaningful +# once GATE 2 clears. The design soft target is hourly org reconcile (Data Flow +# step 1), but the real cadence (GitHub rate-limit budget vs catalog drift) is a +# load-gate decision; do NOT invent a load-validated value. Placeholder: hourly. +OnCalendar=*-*-* *:00:00 + +# If the host was off when the timer fired, run as soon as possible afterwards. +Persistent=true + +# Randomized delay so a fleet doesn't all hit the GitHub org API at once. +RandomizedDelaySec=300 + +# Bind to the matching .service unit (same basename). +Unit=security-scanner-catalog-reconcile.service + +[Install] +WantedBy=timers.target diff --git a/deploy/systemd/security-scanner-freshness-eval.service b/deploy/systemd/security-scanner-freshness-eval.service new file mode 100644 index 0000000..6610c1c --- /dev/null +++ b/deploy/systemd/security-scanner-freshness-eval.service @@ -0,0 +1,42 @@ +[Unit] +Description=security-scanner freshness-eval (per-repo staleness detector + BREACH_COUNTER, FR-8) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot + +# --- Operator: replace placeholders below --- +User=scanner +Group=scanner +WorkingDirectory=/opt/security-scanner + +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +EnvironmentFile=-/etc/security-scanner/scm.env + +# Scheduled staleness DETECTOR: staleness is the absence of a worker event, so it +# cannot hang off worker writes — it runs on a timer, enumerates REPO_HEALTH, +# evaluates per-repo breaches against both thresholds, and materializes the +# BREACH_COUNTER rollup the read API consumes O(1). The --poll-interval-hours / +# --baseline-cadence-hours / --margin-hours thresholds default to placeholders +# (the load gate sets the real cadence values). +ExecStart=/usr/bin/uv run security-scanner freshness-eval + +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-freshness-eval.timer b/deploy/systemd/security-scanner-freshness-eval.timer new file mode 100644 index 0000000..2b40a0a --- /dev/null +++ b/deploy/systemd/security-scanner-freshness-eval.timer @@ -0,0 +1,20 @@ +[Unit] +Description=Scheduler for security-scanner freshness-eval +Documentation=https://github.com/source-security-dev/security-scanner + +[Timer] +# PLACEHOLDER CADENCE — set by GATE 1 (box online load validation). The detector +# only needs to run often enough to surface a breach within the alert SLA; the +# real cadence (vs REPO_HEALTH scan cost at 500 repos) is a load-gate decision, +# do NOT invent a load-validated value. Placeholder: every 10 minutes. +OnCalendar=*-*-* *:0/10:00 +OnBootSec=180 + +# If the host was off when the timer fired, run as soon as possible afterwards. +Persistent=true + +# Bind to the matching .service unit (same basename). +Unit=security-scanner-freshness-eval.service + +[Install] +WantedBy=timers.target diff --git a/deploy/systemd/security-scanner-incr-poll.service b/deploy/systemd/security-scanner-incr-poll.service new file mode 100644 index 0000000..37d46f6 --- /dev/null +++ b/deploy/systemd/security-scanner-incr-poll.service @@ -0,0 +1,55 @@ +[Unit] +Description=security-scanner incr-poll (discover changed refs + enqueue scan jobs, FR-2) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot + +# --- Operator: replace placeholders below --- +User=scanner +Group=scanner +WorkingDirectory=/opt/security-scanner + +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +EnvironmentFile=-/etc/security-scanner/scm.env + +# Poll the INCLUDED catalog (M1) repos, probe ref SHAs with git ls-remote and +# skip repos whose refs are unchanged (SC-6a poll-storm mitigation), and enqueue +# one SCAN_JOB per newly observed unscanned commit. The scan-worker@ pool drains +# the queue. +# +# --cadence-seconds is the SC-6d cadence budget: a poll cycle whose wall-time +# exceeds it fires a cadence-overrun ALERT (to the notification-log seam) instead +# of silently falling behind. The value below is a GATE 1 PLACEHOLDER (set by box +# online load validation, like the timer OnCalendar cadences) — it is NOT a +# load-validated number. Tune it to the incr-poll.timer OnCalendar period at GATE 1. +ExecStart=/usr/bin/uv run security-scanner discover-updates \ + --enqueue \ + --from-catalog \ + --ls-remote-skip \ + --cadence-seconds 300 + +# discover-updates exit codes: 0 = ok, 1 = fatal, 2 = at least one repo's fetch +# failed but others completed. Treat 2 as non-failure so one bad repo does not +# fail the whole poll. +SuccessExitStatus=0 2 + +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-incr-poll.timer b/deploy/systemd/security-scanner-incr-poll.timer new file mode 100644 index 0000000..cfcfc94 --- /dev/null +++ b/deploy/systemd/security-scanner-incr-poll.timer @@ -0,0 +1,23 @@ +[Unit] +Description=Scheduler for security-scanner incr-poll +Documentation=https://github.com/source-security-dev/security-scanner + +[Timer] +# PLACEHOLDER CADENCE — set by GATE 1 (box online load validation). The design's +# soft target is ~5min incremental poll cadence (Data Flow step 2), but the real +# 500-repo poll interval (ls-remote storm vs freshness) is a load-gate decision; +# do NOT invent a load-validated value. Placeholder: every 5 minutes. +OnCalendar=*-*-* *:0/5:00 +OnBootSec=120 + +# If the host was off when the timer fired, run as soon as possible afterwards. +Persistent=true + +# Small randomized delay so a fleet doesn't all ls-remote at the same instant. +RandomizedDelaySec=30 + +# Bind to the matching .service unit (same basename). +Unit=security-scanner-incr-poll.service + +[Install] +WantedBy=timers.target diff --git a/deploy/systemd/security-scanner-lease-reaper.service b/deploy/systemd/security-scanner-lease-reaper.service new file mode 100644 index 0000000..9536ba0 --- /dev/null +++ b/deploy/systemd/security-scanner-lease-reaper.service @@ -0,0 +1,41 @@ +[Unit] +Description=security-scanner lease-reaper (reclaim expired job + repo leases, FR-6) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot + +# --- Operator: replace placeholders below --- +User=scanner +Group=scanner +WorkingDirectory=/opt/security-scanner + +# DynamoDB-compatible backend connection (reaper is dynamodb-only). +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +EnvironmentFile=-/etc/security-scanner/scm.env + +# One bounded sweep: return expired leased jobs to pending (fence-bumped), +# dead-letter jobs past max_lease_expiries, release expired repo leases. This is +# the recovery path when a scan-worker@ instance crashes (its leases expire and +# are reclaimed here). +ExecStart=/usr/bin/uv run security-scanner reap-expired-leases + +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=multi-user.target diff --git a/deploy/systemd/security-scanner-lease-reaper.timer b/deploy/systemd/security-scanner-lease-reaper.timer new file mode 100644 index 0000000..b96d06f --- /dev/null +++ b/deploy/systemd/security-scanner-lease-reaper.timer @@ -0,0 +1,20 @@ +[Unit] +Description=Scheduler for security-scanner lease-reaper +Documentation=https://github.com/source-security-dev/security-scanner + +[Timer] +# PLACEHOLDER CADENCE — set by GATE 1 (box online load validation). The reaper +# must run well within the worker lease window (DEFAULT_LEASE_SECONDS=300s) so a +# crashed worker's leases are reclaimed promptly; do NOT invent a load-validated +# value here. Placeholder: every 2 minutes (*:0/2 = minute 0,2,4,...). +OnCalendar=*-*-* *:0/2:00 +OnBootSec=60 + +# If the host was off when the timer fired, run as soon as possible afterwards. +Persistent=true + +# Bind to the matching .service unit (same basename). +Unit=security-scanner-lease-reaper.service + +[Install] +WantedBy=timers.target diff --git a/deploy/systemd/security-scanner-scan-worker@.service b/deploy/systemd/security-scanner-scan-worker@.service new file mode 100644 index 0000000..468555b --- /dev/null +++ b/deploy/systemd/security-scanner-scan-worker@.service @@ -0,0 +1,78 @@ +[Unit] +Description=security-scanner incremental scan-worker instance %i (queue-draining daemon) +Documentation=https://github.com/source-security-dev/security-scanner +After=network-online.target +Wants=network-online.target + +[Service] +# Long-running queue-draining daemon (NOT oneshot like scan-all): each instance +# polls the queue, leases jobs, and scans until SIGTERM. N instances run as +# independent OS processes (scan-worker@1 .. scan-worker@N); RepoLease (CAS + +# fence, M2) prevents two instances scanning the same repo concurrently (FR-4). +Type=simple + +# --- Operator: replace placeholders below --- +# User/Group that owns the catalog, cache, and notification log. +User=scanner +Group=scanner + +# Project root (where pyproject.toml lives). uv resolves the venv from here. +WorkingDirectory=/opt/security-scanner + +# DynamoDB-compatible backend connection (scan-worker is dynamodb-only). +Environment=SECURITY_SCANNER_STORAGE_BACKEND=dynamodb +Environment=SECURITY_SCANNER_DYNAMO_TABLE=security-scanner +Environment=SECURITY_SCANNER_DYNAMO_ENDPOINT=http://127.0.0.1:8000 +Environment=SECURITY_SCANNER_AWS_REGION=ap-northeast-2 + +# SCM auth — required for private repos. Use systemd credentials or an +# EnvironmentFile for production; the inline form below is for quickstart. +EnvironmentFile=-/etc/security-scanner/scm.env +# Or inline (less secure): +# Environment=GH_TOKEN= +# Environment=GITLAB_TOKEN= + +# The systemd instance name %i becomes the worker id stamped on every lease, so +# scan-worker@1 .. scan-worker@N are N distinct fence-token holders. --daemon +# polls until SIGTERM; the notification-log path matches the scan-all unit so +# external tooling (Promtail, Vector, etc.) tails one location. +ExecStart=/usr/bin/uv run security-scanner scan-worker \ + --daemon \ + --worker-id scan-worker@%i \ + --notification-log ${LOGS_DIRECTORY}/scan-worker.log.jsonl + +# Worker-crash recovery: systemd restarts a crashed instance; the lease-reaper +# timer (security-scanner-lease-reaper.timer) fence-bumps and reclaims the leases +# the dead instance held. DEPLOYED Restart behavior is box-gated (offline box). +Restart=on-failure +RestartSec=5 + +# Graceful drain: SIGTERM lets the daemon finish its in-flight job + release its +# repo lease before exit (the worker installs a SIGINT/SIGTERM shutdown handler). +KillSignal=SIGTERM +TimeoutStopSec=330 + +# scan-worker exit-code semantics mirror the worker CLI: +# 0 = clean drain / shutdown (no permanent failures) +# 1 = fatal storage/runtime error +# 2 = at least one job dead-lettered this run (alertable, not a crash) +# Treat 2 as a successful daemon exit so Restart=on-failure does NOT loop on a +# poison job that was correctly dead-lettered. +SuccessExitStatus=0 2 + +# Hardening (loosen if your environment forbids any of these). +NoNewPrivileges=yes +PrivateTmp=yes +ProtectSystem=strict +ProtectHome=read-only +# systemd creates the per-service log and cache dirs (named "security-scanner" +# under its managed state roots) with correct ownership and auto-grants this unit +# RW to them, even under ProtectSystem=strict — so no explicit ReadWritePaths is +# needed and no machine-local absolute path is hardcoded here. +# LogsDirectory also exports LOGS_DIRECTORY (the absolute log dir), which the +# ExecStart --notification-log expands at runtime. +LogsDirectory=security-scanner +CacheDirectory=security-scanner + +[Install] +WantedBy=scan-worker.target diff --git a/dynamodb-local-metadata.json b/dynamodb-local-metadata.json new file mode 100644 index 0000000..ba34020 --- /dev/null +++ b/dynamodb-local-metadata.json @@ -0,0 +1 @@ +{"installationId":"eb528f78-f923-41e3-bff5-93acfdaa91b4","telemetryEnabled":"true"} \ No newline at end of file diff --git a/eval/verifier-corpus/harness/candidates.jsonl b/eval/verifier-corpus/harness/candidates.jsonl index 74cce63..88889b2 100644 --- a/eval/verifier-corpus/harness/candidates.jsonl +++ b/eval/verifier-corpus/harness/candidates.jsonl @@ -1,22 +1,22 @@ -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:4790348bcad6a6223dca2af688c82c3d8c2a430a94734b9cc2c1078e688b4822"}, "findingId": "finding_aae8718fae0433dd", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/positive.env\",4,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/positive.env", "lineEnd": null, "lineStart": 4}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:36697503eb7b8e666cbf9826cedc2968069ddca3ddf0cb7f24f685af4dd999a7"}, "findingId": "finding_24e40c8b470bd6a3", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/database.env\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/database.env", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:563816a7ca1664428aea4b64630d829f1147ce6f888d123e69c1e412e75dec9e"}, "findingId": "finding_2ee79410cd93d57c", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/settings.toml\",10,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "config/settings.toml", "lineEnd": null, "lineStart": 10}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:05a58bbd6d879652cdffa48cfdeff1b3ede451bd2dda187ed82d67b76ebb214a"}, "findingId": "finding_a678f78485e50ab0", "fingerprint": "[\"synthetic-org/verifier-harness\",\"settings/prod.yaml\",5,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "settings/prod.yaml", "lineEnd": null, "lineStart": 5}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:200378dd0b07270c269d93eb6a6d4ce082b876a9a617d14c0d6bbbf5c8784009"}, "findingId": "finding_ff80007ea24ebd40", "fingerprint": "[\"synthetic-org/verifier-harness\",\"src/app/secrets.py\",42,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "src/app/secrets.py", "lineEnd": null, "lineStart": 42}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:0428bb3795c96342d2e5094f8a685c7db4498570dd82f28fa6642414749ae902"}, "findingId": "finding_ffd80950f49a3db2", "fingerprint": "[\"synthetic-org/verifier-harness\",\"internal/auth.go\",18,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "internal/auth.go", "lineEnd": null, "lineStart": 18}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:0497dd5b129740eea129100c4b10d9a474df27715c5f58f0d6c1fa58782efd34"}, "findingId": "finding_eee7d32f875d1e7e", "fingerprint": "[\"synthetic-org/verifier-harness\",\"lib/client.rb\",7,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "lib/client.rb", "lineEnd": null, "lineStart": 7}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:5188f495c0e5f4127adfb06b8c1874cdc79c825fa359ba59592183fd35028a39"}, "findingId": "finding_aa1515e56f8c78a2", "fingerprint": "[\"synthetic-org/verifier-harness\",\"services/payment.ts\",25,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "services/payment.ts", "lineEnd": null, "lineStart": 25}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:b201e8ee66351cf1d2865230fdb3d92e697b095da0f5f82f8da70ddfe2fc6c4c"}, "findingId": "finding_3fae230b287cbfb8", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/example.md\",5,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "docs/example.md", "lineEnd": null, "lineStart": 5}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:8632a394f8f7b33c2eaf08dafffbd2e2e643deb3a3110ab29e9a9ac4e6d5455d"}, "findingId": "finding_f5629a9839992c40", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/setup.md\",12,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "docs/setup.md", "lineEnd": null, "lineStart": 12}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:e20883ba227c9148bc1192464d81655f905ec019a68edd60beba403158fdda60"}, "findingId": "finding_629466f031c50d61", "fingerprint": "[\"synthetic-org/verifier-harness\",\"README.rst\",4,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "README.rst", "lineEnd": null, "lineStart": 4}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:f22a423f1f94bc681bfcec5a6ceb0641c30818d884020cd09ba12c0885aa1302"}, "findingId": "finding_02cc46475bfdd191", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/api/reference.txt\",9,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "docs/api/reference.txt", "lineEnd": null, "lineStart": 9}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:ee4e3939d723306817c21510dec775623e8ff04beabc2b4b02df59050b3eda59"}, "findingId": "finding_a46388d7dd53fd56", "fingerprint": "[\"synthetic-org/verifier-harness\",\"examples/quickstart.py\",6,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "examples/quickstart.py", "lineEnd": null, "lineStart": 6}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:6e131c45305ab502454b2d15d5b575ce9be937230284c98e09ff9ae63fe9143c"}, "findingId": "finding_5fa7b574d009b3ed", "fingerprint": "[\"synthetic-org/verifier-harness\",\"samples/config.env\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "samples/config.env", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:857364ec729f1044484d6008d5059cdda214e293e98b3ce2a7f7680b8fe6ba34"}, "findingId": "finding_ea73afe698f3d962", "fingerprint": "[\"synthetic-org/verifier-harness\",\"test/fixtures/creds.json\",2,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "test/fixtures/creds.json", "lineEnd": null, "lineStart": 2}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:933a4f9eb729537d9494458accaf80c49ceb42ec54bc01fc026af28fa5d01d18"}, "findingId": "finding_cdb1a4dec3338070", "fingerprint": "[\"synthetic-org/verifier-harness\",\"tests/test_login.py\",30,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "tests/test_login.py", "lineEnd": null, "lineStart": 30}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:8e4a339f33bc167a25bf9779f15d7a135d4024a8a80147675763295d87ba3bcb"}, "findingId": "finding_9a09f843316d378e", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/legacy.env\",8,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/legacy.env", "lineEnd": null, "lineStart": 8}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:dc02430c55c4e47098c3e717b0fba3f9598616ebc2b94ff0ea6bdf17572626a7"}, "findingId": "finding_5edd52c83c9d4c88", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/template.toml\",6,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "config/template.toml", "lineEnd": null, "lineStart": 6}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:15f3e7400e3a2aa1f4da8f9abf3fa23a2de27bf227c418104b08cf8fff11c239"}, "findingId": "finding_40ee8887f222fbd0", "fingerprint": "[\"synthetic-org/verifier-harness\",\"src/utils/format.py\",15,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "src/utils/format.py", "lineEnd": null, "lineStart": 15}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:922626e7698512c84d1cd796836be9bdb55d019d374811af959b4b2e7676d88e"}, "findingId": "finding_e634b9be1f9231d8", "fingerprint": "[\"synthetic-org/verifier-harness\",\"internal/consts.go\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "internal/consts.go", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:ba447519f43eda1b96a5adc9c885539391eddfccc83c40f6adfacb69b287b78f"}, "findingId": "finding_223611539975a65b", "fingerprint": "[\"synthetic-org/verifier-harness\",\"data/blob.bin\",1,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "data/blob.bin", "lineEnd": null, "lineStart": 1}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} -{"category": "SECRET", "confidence": "MEDIUM", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:f67156ede8c78a437d075be8251a6b1146d095262a67ab9b14541866abe31515"}, "findingId": "finding_34de71c589210c56", "fingerprint": "[\"synthetic-org/verifier-harness\",\"Makefile\",9,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "Makefile", "lineEnd": null, "lineStart": 9}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:4790348bcad6a6223dca2af688c82c3d8c2a430a94734b9cc2c1078e688b4822"}, "findingId": "finding_aae8718fae0433dd", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/positive.env\",4,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/positive.env", "lineEnd": null, "lineStart": 4}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:36697503eb7b8e666cbf9826cedc2968069ddca3ddf0cb7f24f685af4dd999a7"}, "findingId": "finding_24e40c8b470bd6a3", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/database.env\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/database.env", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:563816a7ca1664428aea4b64630d829f1147ce6f888d123e69c1e412e75dec9e"}, "findingId": "finding_2ee79410cd93d57c", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/settings.toml\",10,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "config/settings.toml", "lineEnd": null, "lineStart": 10}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:05a58bbd6d879652cdffa48cfdeff1b3ede451bd2dda187ed82d67b76ebb214a"}, "findingId": "finding_a678f78485e50ab0", "fingerprint": "[\"synthetic-org/verifier-harness\",\"settings/prod.yaml\",5,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "settings/prod.yaml", "lineEnd": null, "lineStart": 5}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:200378dd0b07270c269d93eb6a6d4ce082b876a9a617d14c0d6bbbf5c8784009"}, "findingId": "finding_ff80007ea24ebd40", "fingerprint": "[\"synthetic-org/verifier-harness\",\"src/app/secrets.py\",42,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "src/app/secrets.py", "lineEnd": null, "lineStart": 42}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:0428bb3795c96342d2e5094f8a685c7db4498570dd82f28fa6642414749ae902"}, "findingId": "finding_ffd80950f49a3db2", "fingerprint": "[\"synthetic-org/verifier-harness\",\"internal/auth.go\",18,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "internal/auth.go", "lineEnd": null, "lineStart": 18}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:0497dd5b129740eea129100c4b10d9a474df27715c5f58f0d6c1fa58782efd34"}, "findingId": "finding_eee7d32f875d1e7e", "fingerprint": "[\"synthetic-org/verifier-harness\",\"lib/client.rb\",7,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "lib/client.rb", "lineEnd": null, "lineStart": 7}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:5188f495c0e5f4127adfb06b8c1874cdc79c825fa359ba59592183fd35028a39"}, "findingId": "finding_aa1515e56f8c78a2", "fingerprint": "[\"synthetic-org/verifier-harness\",\"services/payment.ts\",25,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "services/payment.ts", "lineEnd": null, "lineStart": 25}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:b201e8ee66351cf1d2865230fdb3d92e697b095da0f5f82f8da70ddfe2fc6c4c"}, "findingId": "finding_3fae230b287cbfb8", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/example.md\",5,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "docs/example.md", "lineEnd": null, "lineStart": 5}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:8632a394f8f7b33c2eaf08dafffbd2e2e643deb3a3110ab29e9a9ac4e6d5455d"}, "findingId": "finding_f5629a9839992c40", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/setup.md\",12,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "docs/setup.md", "lineEnd": null, "lineStart": 12}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:e20883ba227c9148bc1192464d81655f905ec019a68edd60beba403158fdda60"}, "findingId": "finding_629466f031c50d61", "fingerprint": "[\"synthetic-org/verifier-harness\",\"README.rst\",4,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "README.rst", "lineEnd": null, "lineStart": 4}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:f22a423f1f94bc681bfcec5a6ceb0641c30818d884020cd09ba12c0885aa1302"}, "findingId": "finding_02cc46475bfdd191", "fingerprint": "[\"synthetic-org/verifier-harness\",\"docs/api/reference.txt\",9,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "docs/api/reference.txt", "lineEnd": null, "lineStart": 9}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:ee4e3939d723306817c21510dec775623e8ff04beabc2b4b02df59050b3eda59"}, "findingId": "finding_a46388d7dd53fd56", "fingerprint": "[\"synthetic-org/verifier-harness\",\"examples/quickstart.py\",6,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "examples/quickstart.py", "lineEnd": null, "lineStart": 6}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:6e131c45305ab502454b2d15d5b575ce9be937230284c98e09ff9ae63fe9143c"}, "findingId": "finding_5fa7b574d009b3ed", "fingerprint": "[\"synthetic-org/verifier-harness\",\"samples/config.env\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "samples/config.env", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:857364ec729f1044484d6008d5059cdda214e293e98b3ce2a7f7680b8fe6ba34"}, "findingId": "finding_ea73afe698f3d962", "fingerprint": "[\"synthetic-org/verifier-harness\",\"test/fixtures/creds.json\",2,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "test/fixtures/creds.json", "lineEnd": null, "lineStart": 2}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:933a4f9eb729537d9494458accaf80c49ceb42ec54bc01fc026af28fa5d01d18"}, "findingId": "finding_cdb1a4dec3338070", "fingerprint": "[\"synthetic-org/verifier-harness\",\"tests/test_login.py\",30,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "tests/test_login.py", "lineEnd": null, "lineStart": 30}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:8e4a339f33bc167a25bf9779f15d7a135d4024a8a80147675763295d87ba3bcb"}, "findingId": "finding_9a09f843316d378e", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/legacy.env\",8,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "config/legacy.env", "lineEnd": null, "lineStart": 8}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:dc02430c55c4e47098c3e717b0fba3f9598616ebc2b94ff0ea6bdf17572626a7"}, "findingId": "finding_5edd52c83c9d4c88", "fingerprint": "[\"synthetic-org/verifier-harness\",\"config/template.toml\",6,\"synthetic-api-key\"]", "gitleaks": null, "location": {"filePath": "config/template.toml", "lineEnd": null, "lineStart": 6}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-api-key", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:15f3e7400e3a2aa1f4da8f9abf3fa23a2de27bf227c418104b08cf8fff11c239"}, "findingId": "finding_40ee8887f222fbd0", "fingerprint": "[\"synthetic-org/verifier-harness\",\"src/utils/format.py\",15,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "src/utils/format.py", "lineEnd": null, "lineStart": 15}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:922626e7698512c84d1cd796836be9bdb55d019d374811af959b4b2e7676d88e"}, "findingId": "finding_e634b9be1f9231d8", "fingerprint": "[\"synthetic-org/verifier-harness\",\"internal/consts.go\",3,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "internal/consts.go", "lineEnd": null, "lineStart": 3}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:ba447519f43eda1b96a5adc9c885539391eddfccc83c40f6adfacb69b287b78f"}, "findingId": "finding_223611539975a65b", "fingerprint": "[\"synthetic-org/verifier-harness\",\"data/blob.bin\",1,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "data/blob.bin", "lineEnd": null, "lineStart": 1}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} +{"category": "SECRET", "confidence": "MEDIUM", "disposition": "unreviewed", "evidence": {"contextArtifactRef": null, "redacted": true, "secretHash": "salted-sha256:f67156ede8c78a437d075be8251a6b1146d095262a67ab9b14541866abe31515"}, "findingId": "finding_34de71c589210c56", "fingerprint": "[\"synthetic-org/verifier-harness\",\"Makefile\",9,\"synthetic-fake-token\"]", "gitleaks": null, "location": {"filePath": "Makefile", "lineEnd": null, "lineStart": 9}, "repo": {"branch": null, "commit": null, "fullName": "synthetic-org/verifier-harness"}, "ruleId": "synthetic-fake-token", "scan": {"rulePackVersion": "secret-rules-0.1.0", "scanRunId": "scan_harness"}, "severity": "HIGH", "sourceTool": "gitleaks", "sourceToolVersion": null, "status": "OPEN", "triage": {"reason": null, "verdict": "NEEDS_REVIEW", "verifier": null}} diff --git a/src/security_scanner/baseline/ghas_api/__init__.py b/src/security_scanner/baseline/ghas_api/__init__.py index 5a5ffeb..d82589f 100644 --- a/src/security_scanner/baseline/ghas_api/__init__.py +++ b/src/security_scanner/baseline/ghas_api/__init__.py @@ -1,4 +1,11 @@ -"""Read-only GHAS Secret Scanning API import and normalization.""" +"""Read-only GHAS Secret Scanning API import and normalization. + +FR-12 (scale redesign #2): this module is RETAINED UNCHANGED by #2. No new GHAS +automation (webhook receiver / periodic reconcile job) is added — that is +governance human-PR gated and out of scope. The M1 ``CATALOG`` org repo list +(``runtime/catalog_reconcile.py``) is the shared org input axis for both gitleaks +coverage and any FUTURE GHAS reconcile; #2 adds no GHAS wiring to it. +""" from __future__ import annotations diff --git a/src/security_scanner/cli/app.py b/src/security_scanner/cli/app.py index 5f0687f..3235390 100644 --- a/src/security_scanner/cli/app.py +++ b/src/security_scanner/cli/app.py @@ -16,6 +16,8 @@ doctor, migrate, quickstart, + read_api, + reconcile, report, scan, scan_health, @@ -37,6 +39,8 @@ targets, migrate, disposition, + reconcile, + read_api, ) diff --git a/src/security_scanner/cli/commands/read_api.py b/src/security_scanner/cli/commands/read_api.py new file mode 100644 index 0000000..f513087 --- /dev/null +++ b/src/security_scanner/cli/commands/read_api.py @@ -0,0 +1,82 @@ +"""read-api subcommand: read-only scanner read API snapshot (FR-9, M7). + +Emits one dashboard-refresh snapshot (freshness rollup + coverage + queue +backlog, and optionally disposition-filtered findings) as JSON on stdout. This is +the read query-layer CONTRACT surface the M8 dashboard consumes; it does NOT +start a network server — live HTTP serving (and real authn) is deploy-gated (see +``runtime.read_api`` trust model / ``ReadApiServerConfig``). Each panel is read +cost-bounded: freshness O(1) (materialized BREACH_COUNTER), coverage ≤ org size +(CATALOG), backlog O(status-partitions) (status COUNT, never a full-table Scan). +""" + +from __future__ import annotations + +import argparse +import json +import sys + +from security_scanner.cli._args import add_incremental_storage_args +from security_scanner.cli._store import dynamodb_config_from_args, store_from_args +from security_scanner.core.finding.model import Disposition +from security_scanner.runtime.finding_query import FindingQueryRequest +from security_scanner.runtime.read_api import read_dashboard_snapshot + +_VALID_DISPOSITIONS = tuple(d.value for d in Disposition) + + +def register(subparsers) -> None: + parser = subparsers.add_parser( + "read-api", + help=( + "Emit one read-API snapshot (freshness rollup + coverage + queue " + "backlog, optionally findings) as JSON; read-only, no server bound." + ), + ) + add_incremental_storage_args(parser) + parser.add_argument( + "--include-findings", + action="store_true", + help=( + "Also include the (public-safe, redacted) findings panel. Off by " + "default so the snapshot stays a cheap live poll." + ), + ) + parser.add_argument( + "--disposition", + action="append", + choices=_VALID_DISPOSITIONS, + metavar="DISPOSITION", + help=( + "Restrict the findings panel to these dispositions (repeatable: " + f"{', '.join(_VALID_DISPOSITIONS)}). Implies --include-findings." + ), + ) + parser.set_defaults(func=cmd_read_api) + + +def cmd_read_api(args: argparse.Namespace) -> int: + """Build and print one read-API snapshot as JSON (FR-9).""" + if args.storage_backend != "dynamodb": + print( + "error: read-api supports --storage-backend dynamodb only", + file=sys.stderr, + ) + return 2 + + findings_request: FindingQueryRequest | None = None + if args.include_findings or args.disposition: + findings_request = FindingQueryRequest( + storage_backend=args.storage_backend, + dynamodb_config=dynamodb_config_from_args(args), + dispositions=args.disposition, + ) + + try: + store = store_from_args(args) + snapshot = read_dashboard_snapshot(store, findings_request=findings_request) + except Exception as exc: # noqa: BLE001 - fatal storage/runtime error. + print(f"error: read-api failed: {exc}", file=sys.stderr) + return 1 + + print(json.dumps(snapshot.to_dict(), indent=2, sort_keys=True)) + return 0 diff --git a/src/security_scanner/cli/commands/reconcile.py b/src/security_scanner/cli/commands/reconcile.py new file mode 100644 index 0000000..3ccd2a4 --- /dev/null +++ b/src/security_scanner/cli/commands/reconcile.py @@ -0,0 +1,146 @@ +"""reconcile subcommand: reconcile the org catalog + thread the coverage gap. + +Runs the M1 catalog reconcile (FR-1, design Data Flow step 1) against an +INJECTABLE org-list provider, then computes the coverage gap and feeds it into +M5's freshness evaluator so the materialized ``BREACH_COUNTER.coverage_gap`` +reflects org repos not yet covered/scanned. + +GOVERNANCE: the default provider is a stub that REFUSES to fetch live GitHub. A +live org GET is gated to a human PR + the autopilot +``ghas-live-fetch-or-mutation-required`` stop-condition (GATE 2). Until that +clears, a caller MUST inject a provider (a fixture, or the GATE-2 live one-liner) +— so nothing reaches live GitHub by accident from this command. +""" + +from __future__ import annotations + +import argparse +import datetime as dt +import sys + +from security_scanner.cli._args import add_incremental_storage_args +from security_scanner.cli._store import store_from_args +from security_scanner.runtime.catalog_reconcile import ( + GovernanceGatedOrgRepoListProvider, + OrgRepoListProvider, + coverage_gap_from_store, + run_catalog_reconcile, +) +from security_scanner.runtime.scan_health import ( + DEFAULT_BASELINE_CADENCE_HOURS, + DEFAULT_MARGIN_HOURS, + DEFAULT_POLL_INTERVAL_HOURS, + FreshnessThresholds, + run_freshness_evaluator, +) + + +def register(subparsers) -> None: + parser = subparsers.add_parser( + "reconcile", + help=( + "Reconcile the org catalog (FR-1) and thread the coverage gap into " + "the freshness rollup. Live org fetch is governance-gated; inject a " + "provider." + ), + ) + add_incremental_storage_args(parser) + parser.add_argument( + "--opt-out", + metavar="URL", + action="append", + default=[], + help="Repo URL to exclude from coverage (recorded with a reason, not " + "dropped). Repeatable.", + ) + parser.add_argument( + "--evaluate-freshness", + action="store_true", + help="After reconcile, run the freshness evaluator with the computed " + "coverage gap so BREACH_COUNTER.coverage_gap is materialized.", + ) + # Same float-hours config surface scan-health uses (FR-8); only consulted when + # --evaluate-freshness is set. Concrete cadences stay load-gate decisions. + parser.add_argument( + "--poll-interval-hours", + type=float, + default=DEFAULT_POLL_INTERVAL_HOURS, + metavar="HOURS", + ) + parser.add_argument( + "--baseline-cadence-hours", + type=float, + default=DEFAULT_BASELINE_CADENCE_HOURS, + metavar="HOURS", + ) + parser.add_argument( + "--margin-hours", + type=float, + default=DEFAULT_MARGIN_HOURS, + metavar="HOURS", + ) + parser.set_defaults(func=cmd_reconcile, org_list_provider=None) + + +def cmd_reconcile( + args: argparse.Namespace, + *, + org_list_provider: OrgRepoListProvider | None = None, +) -> int: + """Reconcile the catalog, compute the coverage gap, optionally evaluate. + + ``org_list_provider`` is injectable (tests pass a fixture; an in-process + caller can pass the GATE-2 live provider). It defaults to the + governance-gated stub that refuses to fetch, so the CLI never reaches live + GitHub by accident — the stub raises a clear "inject a provider" error. + """ + if args.storage_backend != "dynamodb": + print( + "error: reconcile supports --storage-backend dynamodb only", + file=sys.stderr, + ) + return 2 + + provider = ( + org_list_provider + if org_list_provider is not None + else getattr(args, "org_list_provider", None) + or GovernanceGatedOrgRepoListProvider() + ) + now = dt.datetime.now(dt.UTC) + + try: + store = store_from_args(args) + summary = run_catalog_reconcile( + store, + provider, + opt_out=args.opt_out, + now=now, + ) + coverage_gap = coverage_gap_from_store(store) + if args.evaluate_freshness: + thresholds = FreshnessThresholds.from_cadences( + poll_interval_hours=args.poll_interval_hours, + baseline_cadence_hours=args.baseline_cadence_hours, + margin_hours=args.margin_hours, + ) + run_freshness_evaluator( + store, + now=now, + thresholds=thresholds, + coverage_gap=coverage_gap, + ) + except Exception as exc: # noqa: BLE001 - fatal storage/provider error. + # A governance-gated default provider raises here with the "inject a + # provider" message; a transient provider failure also surfaces here and, + # per the additive invariant, did NOT delete any existing catalog row. + print(f"error: reconcile failed: {exc}", file=sys.stderr) + return 1 + + print( + f"reconcile: OK {summary.total} repos " + f"({summary.added} added / {summary.updated} updated / " + f"{summary.excluded} excluded), coverage gap {coverage_gap}", + file=sys.stdout, + ) + return 0 diff --git a/src/security_scanner/cli/commands/report.py b/src/security_scanner/cli/commands/report.py index 36529b6..63eb7cf 100644 --- a/src/security_scanner/cli/commands/report.py +++ b/src/security_scanner/cli/commands/report.py @@ -228,7 +228,16 @@ def cmd_evaluate(args: argparse.Namespace) -> int: def cmd_compare_ghas(args: argparse.Namespace) -> int: - """Compare local findings against GitHub Secret Scanning alerts.""" + """Compare local findings against GitHub Secret Scanning alerts. + + FR-12 (scale redesign #2, GHAS 유지): this compare-ghas surface and the + ``baseline/ghas_api/`` module it uses are RETAINED UNCHANGED by #2 — no new + GHAS automation (webhook receiver / periodic reconcile job) is added, since + those are governance human-PR gated and out of scope. The M1 catalog org repo + list is the SHARED input axis for both gitleaks coverage and any future GHAS + reconcile, but #2 does not wire compare-ghas to the catalog (it still reads + ``list_scan_targets`` here). Behavior below is intentionally left as-is. + """ if args.source == "csv": print( "error: CSV GHAS comparison is phased out; use --source github", diff --git a/src/security_scanner/cli/commands/scan.py b/src/security_scanner/cli/commands/scan.py index ccdd328..a2e8099 100644 --- a/src/security_scanner/cli/commands/scan.py +++ b/src/security_scanner/cli/commands/scan.py @@ -1,8 +1,9 @@ -"""scan / discover-updates / scan-worker / queue-status / residual / scan-all subcommands.""" +"""scan / discover-updates / scan-worker / reap-expired-leases / queue-status / scan-all.""" from __future__ import annotations import argparse +import datetime as dt import os import signal import sys @@ -16,6 +17,21 @@ DISCOVERY_SCANNER_VERSION, ) from security_scanner.cli._store import dynamodb_config_from_args, store_from_args +from security_scanner.runtime.alert_sink import ( + DEFAULT_RENOTIFY_WINDOW_HOURS, + AlertDispatcher, + InMemoryAlertStateStore, + NotificationLogAlertSink, + alert_from_cadence_overrun, +) +from security_scanner.runtime.baseline_enqueue import ( + DEFAULT_BACKPRESSURE_THRESHOLD, + DEFAULT_ROLLING_DIVISOR, + BaselineEnqueueRequest, + BaselineEnqueueSummary, + BaselineScannerConfig, + run_baseline_enqueue, +) from security_scanner.runtime.branch_residual import ( BranchResidual, ResidualDiff, @@ -30,6 +46,8 @@ IncrementalDiscoveryRequest, IncrementalDiscoverySummary, SubprocessGitDiscovery, + catalog_repo_targets, + evaluate_poll_cadence, run_incremental_discovery, ) from security_scanner.runtime.local_scan import ( @@ -38,6 +56,7 @@ run_local_scan, ) from security_scanner.runtime.notification_log import DEFAULT_NOTIFICATION_LOG_PATH +from security_scanner.runtime.poll_fetch import SubprocessLsRemoteRunner from security_scanner.runtime.queue_status import ( QueueStatusRequest, read_queue_status, @@ -67,6 +86,14 @@ SCAN_ALL_LOCK_PATH: Path = DEFAULT_SCAN_ALL_LOCK_PATH +# incr-poll cadence budget (SC-6d): a poll cycle whose wall-time exceeds this +# many seconds is falling behind its cadence and must ALERT, not silently fall +# behind. 0 disables the check. The CONCRETE budget is a GATE-1 load-validation +# decision (mirrors the systemd timer OnCalendar placeholders); this is only a +# disabled-by-default placeholder so the mechanism is wired but no invented +# load-validated number is baked in. +DEFAULT_POLL_CADENCE_SECONDS = 0.0 + def register(subparsers) -> None: scan_parser = subparsers.add_parser( @@ -120,9 +147,72 @@ def register(subparsers) -> None: metavar="PATTERN", help="Remote ref glob to observe; may be repeated.", ) + discover_parser.add_argument( + "--from-catalog", + action="store_true", + help="Use the INCLUDED CATALOG repos (M1) as the repo set instead of the " + "manifest scan-target list (the scale incr-poll repo source).", + ) + discover_parser.add_argument( + "--ls-remote-skip", + action="store_true", + help="Probe remote ref SHAs with git ls-remote and skip fetching repos " + "whose refs are unchanged (SC-6a poll-storm mitigation).", + ) + discover_parser.add_argument( + "--cadence-seconds", + type=float, + default=DEFAULT_POLL_CADENCE_SECONDS, + metavar="SECONDS", + help="Poll cadence budget in seconds (SC-6d): a cycle whose wall-time " + "exceeds this fires a cadence-overrun ALERT instead of silently falling " + "behind. 0 disables the check. The concrete budget is a GATE-1 " + f"load-validation placeholder (default: {DEFAULT_POLL_CADENCE_SECONDS:g}).", + ) + discover_parser.add_argument( + "--notification-log", + metavar="FILE", + default=str(DEFAULT_NOTIFICATION_LOG_PATH), + help="Path to the JSONL notification log a cadence-overrun alert is " + "appended to (default: " + "~/.local/state/security-scanner/scan-all.log.jsonl). Real " + "Slack/email/webhook delivery is deploy-gated.", + ) add_incremental_storage_args(discover_parser) discover_parser.set_defaults(func=cmd_discover_updates) + baseline_parser = subparsers.add_parser( + "baseline", + help="Enqueue one low-priority baseline ScanJob per INCLUDED catalog " + "repo (NOT scan-all; per-repo queue baseline, SC-3).", + ) + baseline_parser.add_argument( + "--backpressure-threshold", + type=int, + default=DEFAULT_BACKPRESSURE_THRESHOLD, + metavar="N", + help="Skip baseline enqueue this run when the pending backlog exceeds N " + f"(default: {DEFAULT_BACKPRESSURE_THRESHOLD}).", + ) + baseline_parser.add_argument( + "--rolling-divisor", + type=int, + default=DEFAULT_ROLLING_DIVISOR, + metavar="N", + help="Enqueue only 1/N of the catalog this run (rolling baseline " + f"fallback; default: {DEFAULT_ROLLING_DIVISOR}, 1 disables rolling).", + ) + baseline_parser.add_argument( + "--rolling-offset", + type=int, + default=0, + metavar="N", + help="Rolling-baseline slice offset for this invocation (a timer " + "advances it across runs to cover the whole catalog).", + ) + add_incremental_storage_args(baseline_parser) + baseline_parser.set_defaults(func=cmd_baseline) + scan_worker_parser = subparsers.add_parser( "scan-worker", help="Lease and process incremental scan jobs.", @@ -167,6 +257,14 @@ def register(subparsers) -> None: add_incremental_storage_args(scan_worker_parser) scan_worker_parser.set_defaults(func=cmd_scan_worker) + reaper_parser = subparsers.add_parser( + "reap-expired-leases", + help="Reclaim expired job + repo leases once (the lease-reaper timer " + "entrypoint, FR-6).", + ) + add_incremental_storage_args(reaper_parser) + reaper_parser.set_defaults(func=cmd_reap_expired_leases) + queue_status_parser = subparsers.add_parser( "queue-status", help="Show incremental queue job and lease counts.", @@ -312,8 +410,17 @@ def _render_scan_result(result) -> None: ) -def cmd_discover_updates(args: argparse.Namespace) -> int: - """Discover changed refs and optionally enqueue incremental scan jobs.""" +def cmd_discover_updates( + args: argparse.Namespace, *, clock=None, now_factory=utc_now_iso +) -> int: + """Discover changed refs and optionally enqueue incremental scan jobs. + + ``clock`` is the injectable monotonic-seconds seam (defaults to + ``time.monotonic``) the SC-6d cadence-overrun measurement reads; ``now_factory`` + stamps the alert ``event_at``. Both are injected the same way the other + timer/scan-all entrypoints inject time so the deployed path is testable + without wall-clock sleeps. + """ if args.storage_backend != "dynamodb": print( "error: discover-updates supports --storage-backend dynamodb only", @@ -321,8 +428,21 @@ def cmd_discover_updates(args: argparse.Namespace) -> int: ) return 2 + if clock is None: + import time + + clock = time.monotonic + + started = clock() try: store = store_from_args(args) + # Repo source (M4 catalog seam): --from-catalog feeds the INCLUDED + # CATALOG repos (the scale incr-poll source); otherwise the legacy + # manifest scan-target list is used via store.list_scan_targets(). + targets = catalog_repo_targets(store) if args.from_catalog else None + # ls-remote skip (SC-6a): only injected on opt-in so the legacy path + # (fetch every target) stays the default for manifest callers/tests. + ls_remote = SubprocessLsRemoteRunner() if args.ls_remote_skip else None summary = run_incremental_discovery( IncrementalDiscoveryRequest( mode=( @@ -339,20 +459,89 @@ def cmd_discover_updates(args: argparse.Namespace) -> int: rule_pack_version=RULE_PACK_VERSION, scanner_config_hash=DISCOVERY_SCANNER_CONFIG_HASH, ), + targets=targets, max_targets=args.max_targets, ref_patterns=tuple(args.ref_pattern or DEFAULT_REF_PATTERNS), + ls_remote=ls_remote, ) ) except Exception as exc: # noqa: BLE001 - fatal catalog/storage/runtime error. print(f"error: discovery failed: {exc}", file=sys.stderr) return 1 + # SC-6d: the deployed poll cycle must ALERT when it cannot keep up with its + # cadence, not silently fall behind (the exact staleness failure mode the + # redesign exists to prevent). Measure the discovery cycle wall-time via the + # injectable clock and, on overrun, route through the SAME M9 dispatcher/sink + # the freshness-eval timer uses so the cadence-overrun shares one alert + # vocabulary and de-dup/re-notify policy. Alerting never fails the poll. + _alert_poll_cadence_overrun( + store, + cycle_seconds=max(0.0, clock() - started), + cadence_seconds=args.cadence_seconds, + targets=summary.targets, + notification_log=args.notification_log, + renotify_window_hours=getattr( + args, "renotify_window_hours", DEFAULT_RENOTIFY_WINDOW_HOURS + ), + now_factory=now_factory, + ) + _render_discovery_summary(summary) return 2 if summary.has_partial_failure else 0 +def _alert_poll_cadence_overrun( + store, + *, + cycle_seconds: float, + cadence_seconds: float, + targets: int, + notification_log: str, + renotify_window_hours: float, + now_factory, +) -> None: + """Fire a cadence-overrun alert to the M9 sink when the poll cycle overran. + + Builds the M4 ``PollCadenceSignal`` from the measured cycle wall-time and, + only on overrun, routes it via ``alert_from_cadence_overrun`` through the M9 + ``AlertDispatcher`` to the ``NotificationLogAlertSink`` — the same notification + -log seam and de-dup/re-notify policy the freshness-eval timer uses. The + de-dup state lives on the store's ALERT_STATE when the store exposes it (the + durable dynamodb path), else an in-memory store for fakes/single runs. A + within-budget cycle (or ``cadence_seconds <= 0``) fires nothing, and any + alerting failure is swallowed so it never breaks the poll cycle. + """ + try: + signal = evaluate_poll_cadence( + cycle_seconds=cycle_seconds, + cadence_seconds=cadence_seconds, + targets=targets, + ) + now = now_factory() + if isinstance(now, str): + now = dt.datetime.fromisoformat(now) + alert = alert_from_cadence_overrun(signal, now=now) + if alert is None: + return + state = ( + store + if hasattr(store, "read_alert_state") and hasattr(store, "put_alert_state") + else InMemoryAlertStateStore() + ) + dispatcher = AlertDispatcher( + sink=NotificationLogAlertSink(notification_log), + state=state, + window_hours=renotify_window_hours, + ) + dispatcher.dispatch(alert, now=now) + except Exception: # noqa: BLE001 - alerting must never break the poll cycle. + return + + def _render_discovery_summary(summary: IncrementalDiscoverySummary) -> None: print(f"targets: {summary.targets}") + print(f"skipped idle (ls-remote): {summary.skipped_idle}") print(f"fetch ok: {summary.fetch_ok}") print(f"fetch failed: {summary.fetch_failed_count}") for failure in summary.fetch_failed: @@ -363,6 +552,61 @@ def _render_discovery_summary(summary: IncrementalDiscoverySummary) -> None: print(f"skipped non-fast-forward: {summary.skipped_non_fast_forward}") +def cmd_baseline(args: argparse.Namespace) -> int: + """Enqueue one low-priority baseline ScanJob per INCLUDED catalog repo (SC-3). + + NOT scan-all: this builds per-repo queue jobs the N-worker pool drains, with + priority separation, backpressure, and a rolling 1/N slice. See + ``runtime.baseline_enqueue`` for why scan-all is not reused. + """ + if args.storage_backend != "dynamodb": + print( + "error: baseline supports --storage-backend dynamodb only", + file=sys.stderr, + ) + return 2 + + try: + store = store_from_args(args) + summary = run_baseline_enqueue( + BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=BaselineScannerConfig( + scanner_name=DISCOVERY_SCANNER_NAME, + scanner_version=DISCOVERY_SCANNER_VERSION, + rule_pack_version=RULE_PACK_VERSION, + scanner_config_hash=DISCOVERY_SCANNER_CONFIG_HASH, + ), + backpressure_threshold=args.backpressure_threshold, + rolling_divisor=args.rolling_divisor, + rolling_offset=args.rolling_offset, + ) + ) + except Exception as exc: # noqa: BLE001 - fatal catalog/storage/runtime error. + print(f"error: baseline enqueue failed: {exc}", file=sys.stderr) + return 1 + + _render_baseline_summary(summary) + return 0 + + +def _render_baseline_summary(summary: BaselineEnqueueSummary) -> None: + print(f"included repos: {summary.included_repos}") + print( + f"rolling slice: 1/{summary.rolling_divisor} " + f"offset {summary.rolling_offset} -> {summary.selected_repos} repo(s)" + ) + print(f"pending backlog: {summary.backlog}") + if summary.throttled: + print("throttled: backlog over threshold, baseline enqueue skipped") + print("jobs enqueued: 0") + return + print("throttled: no") + print(f"jobs enqueued: {summary.jobs_enqueued}") + print(f"duplicates skipped: {summary.duplicates_skipped}") + + def cmd_scan_worker(args: argparse.Namespace) -> int: """Process queued incremental scan jobs once or as a polling daemon.""" if not args.once and not args.daemon: @@ -441,6 +685,35 @@ def _render_scan_worker_summary(summary: ScanWorkerSummary) -> None: print(f"dead-lettered: {summary.dead_lettered}") +def cmd_reap_expired_leases(args: argparse.Namespace) -> int: + """Run one lease-reaper sweep (the ``lease-reaper`` timer entrypoint, FR-6). + + M2 added the ``reap_expired_leases`` store op but deliberately left the timer + wiring to M3; this is the CLI surface the ``lease-reaper`` systemd timer + calls on a cadence. A single bounded sweep returns expired leased jobs to + pending (fence-bumped), dead-letters jobs past ``max_lease_expiries``, and + releases expired repo leases, then exits. + """ + if args.storage_backend != "dynamodb": + print( + "error: reap-expired-leases supports --storage-backend dynamodb only", + file=sys.stderr, + ) + return 2 + + try: + store = store_from_args(args) + summary = store.reap_expired_leases(now=dt.datetime.now(dt.UTC)) + except Exception as exc: # noqa: BLE001 - fatal storage/runtime error. + print(f"error: reap-expired-leases failed: {exc}", file=sys.stderr) + return 1 + + print(f"jobs returned to pending: {summary.jobs_returned_to_pending}") + print(f"jobs dead-lettered: {summary.jobs_dead_lettered}") + print(f"repo leases released: {summary.repo_leases_released}") + return 0 + + def cmd_queue_status(args: argparse.Namespace) -> int: """Read incremental queue status counts.""" if args.storage_backend != "dynamodb": diff --git a/src/security_scanner/cli/commands/scan_health.py b/src/security_scanner/cli/commands/scan_health.py index 2da420d..5af90e1 100644 --- a/src/security_scanner/cli/commands/scan_health.py +++ b/src/security_scanner/cli/commands/scan_health.py @@ -1,8 +1,10 @@ -"""scan-health subcommand: fail closed when the latest scan run is stale. +"""scan-health subcommand: fail closed when ANY repo is stale (per-repo, M5). -A downstream publisher (e.g. the ops ``publish-reposcan`` job) runs this as a -precondition so it refuses to republish the local DB once scans stop -succeeding, instead of silently serving stale findings. +Replaces the global single-record gate. A downstream consumer runs this as a +precondition; it now evaluates EACH repo's REPO_HEALTH against two thresholds +(incremental + baseline) so a single fresh repo can no longer mask 499 stale +ones (the silent-staleness bug). It reports per-repo breach counts and exits +non-zero when any repo breaches. """ from __future__ import annotations @@ -13,32 +15,129 @@ from security_scanner.cli._args import add_incremental_storage_args from security_scanner.cli._store import store_from_args -from security_scanner.runtime.scan_health import evaluate_scan_freshness +from security_scanner.runtime.alert_sink import ( + DEFAULT_RENOTIFY_WINDOW_HOURS, + AlertDispatcher, + NotificationLogAlertSink, + run_freshness_evaluator_with_alerts, +) +from security_scanner.runtime.notification_log import DEFAULT_NOTIFICATION_LOG_PATH +from security_scanner.runtime.scan_health import ( + DEFAULT_BASELINE_CADENCE_HOURS, + DEFAULT_MARGIN_HOURS, + DEFAULT_POLL_INTERVAL_HOURS, + FreshnessThresholds, + evaluate_freshness_breaches, +) -DEFAULT_MAX_AGE_HOURS = 26.0 + +def _add_threshold_args(parser: argparse.ArgumentParser) -> None: + """Add the shared ``--*-hours`` freshness threshold knobs (FR-8). + + Reused by both ``scan-health`` (read-only gate) and ``freshness-eval`` + (scheduled detector that materializes BREACH_COUNTER). The CONCRETE cadence + numbers are load-gate decisions; these stay configurable defaults (no + invented load-validated values), and the margin reuses the legacy 24h+2h + idiom. + """ + parser.add_argument( + "--poll-interval-hours", + type=float, + default=DEFAULT_POLL_INTERVAL_HOURS, + metavar="HOURS", + help=( + "Incremental poll cadence; a repo breaches when its last incremental " + "success is older than poll-interval + margin " + f"(default: {DEFAULT_POLL_INTERVAL_HOURS:g})." + ), + ) + parser.add_argument( + "--baseline-cadence-hours", + type=float, + default=DEFAULT_BASELINE_CADENCE_HOURS, + metavar="HOURS", + help=( + "Baseline full-scan cadence; a repo breaches when its last full-scan " + "success is older than baseline-cadence + margin " + f"(default: {DEFAULT_BASELINE_CADENCE_HOURS:g})." + ), + ) + parser.add_argument( + "--margin-hours", + type=float, + default=DEFAULT_MARGIN_HOURS, + metavar="HOURS", + help=( + "Grace margin added to both cadences before a repo is stale " + f"(default: {DEFAULT_MARGIN_HOURS:g})." + ), + ) def register(subparsers) -> None: parser = subparsers.add_parser( "scan-health", - help="Gate on scan-run freshness; exit non-zero when the last scan is stale.", + help=( + "Gate on per-repo scan freshness; exit non-zero when any repo is stale." + ), ) add_incremental_storage_args(parser) + _add_threshold_args(parser) + parser.set_defaults(func=cmd_scan_health) + + eval_parser = subparsers.add_parser( + "freshness-eval", + help=( + "Run the scheduled freshness evaluator: enumerate REPO_HEALTH, compute " + "per-repo breaches, and materialize the BREACH_COUNTER rollup (FR-8)." + ), + ) + add_incremental_storage_args(eval_parser) + _add_threshold_args(eval_parser) + _add_alert_args(eval_parser) + eval_parser.set_defaults(func=cmd_freshness_eval) + + +def _add_alert_args(parser: argparse.ArgumentParser) -> None: + """Add the M9 alert-sink knobs to the ``freshness-eval`` timer (FR-8). + + The evaluator is the DETECTOR; M9 turns each detection into an ACTIVE alert + on the existing notification-log seam. ``--notification-log`` selects the + default JSONL sink path (real Slack/email/webhook channels are a deploy-gated + one-line ``AlertSink``, not wired here). ``--renotify-window-hours`` is the + de-dup/re-notify window: a persistently-stale ``(repo, kind)`` re-fires after + the window so it can never be forgotten but does not spam every tick. + ``--backlog-alert-threshold`` arms the queue-backlog-growth alert (0 = off). + """ + parser.add_argument( + "--notification-log", + metavar="FILE", + default=str(DEFAULT_NOTIFICATION_LOG_PATH), + help="Path to the JSONL notification log alerts are appended to " + "(default: ~/.local/state/security-scanner/scan-all.log.jsonl). Real " + "Slack/email/webhook delivery is deploy-gated.", + ) parser.add_argument( - "--max-age-hours", + "--renotify-window-hours", type=float, - default=DEFAULT_MAX_AGE_HOURS, + default=DEFAULT_RENOTIFY_WINDOW_HOURS, metavar="HOURS", - help=( - "Maximum age of the latest successful scan before it is considered " - f"stale (default: {DEFAULT_MAX_AGE_HOURS:g})." - ), + help="Suppress a repeat alert for the same (repo, kind) within this " + "window, but re-fire after it so a persistently stale repo is never " + f"silently forgotten (default: {DEFAULT_RENOTIFY_WINDOW_HOURS:g}).", + ) + parser.add_argument( + "--backlog-alert-threshold", + type=int, + default=0, + metavar="N", + help="Alert when the queue backlog (pending+leased) grows past N " + "(default: 0 = disabled).", ) - parser.set_defaults(func=cmd_scan_health) def cmd_scan_health(args: argparse.Namespace) -> int: - """Read the latest scan-run health record and gate on its freshness.""" + """Evaluate every repo's freshness and gate on per-repo breaches.""" if args.storage_backend != "dynamodb": print( "error: scan-health supports --storage-backend dynamodb only", @@ -48,15 +147,119 @@ def cmd_scan_health(args: argparse.Namespace) -> int: try: store = store_from_args(args) - latest = store.read_latest_scan_run_health() + health_records = store.read_all_repo_health() except Exception as exc: # noqa: BLE001 - fatal storage/runtime error. print(f"error: scan-health failed: {exc}", file=sys.stderr) return 1 - verdict = evaluate_scan_freshness( - latest, + thresholds = FreshnessThresholds.from_cadences( + poll_interval_hours=args.poll_interval_hours, + baseline_cadence_hours=args.baseline_cadence_hours, + margin_hours=args.margin_hours, + ) + evaluation = evaluate_freshness_breaches( + health_records, now=dt.datetime.now(dt.UTC), - max_age_hours=args.max_age_hours, + thresholds=thresholds, + ) + counter = evaluation.counter + + if not health_records: + # No repo health at all: fail closed (nothing has ever scanned). + print( + "scan-health: STALE no REPO_HEALTH records found (no repo has scanned)", + file=sys.stderr, + ) + return 1 + + if counter.total_breaches == 0: + print( + f"scan-health: OK {counter.repos_evaluated} repos fresh " + "(0 incremental / 0 baseline breaches)", + file=sys.stdout, + ) + return 0 + + breached_ids = ", ".join(sorted(breach.repo_id for breach in evaluation.breaches)) + print( + f"scan-health: STALE {counter.total_breaches}/{counter.repos_evaluated} " + f"repos breached ({counter.incremental_breaches} incremental / " + f"{counter.baseline_breaches} baseline): {breached_ids}", + file=sys.stderr, + ) + return 1 + + +def _coverage_gap_for(store) -> int | None: + """Best-effort coverage gap from CATALOG vs REPO_HEALTH (M1->M5/M9 seam). + + Returns the count of INCLUDED org repos with no REPO_HEALTH row yet, or None + when the store has no CATALOG (pre-M1 stores / fixture stores without a + catalog) so the coverage-gap alert seam stays explicit rather than reporting + a misleading 0. + """ + from security_scanner.runtime.catalog_reconcile import coverage_gap_from_store + + if not hasattr(store, "read_all_catalog_entries"): + return None + try: + return coverage_gap_from_store(store) + except Exception: # noqa: BLE001 - coverage is advisory; never fail the timer. + return None + + +def cmd_freshness_eval(args: argparse.Namespace) -> int: + """Run the scheduled freshness evaluator AND fire alerts (the F3 fix, M9). + + Unlike ``scan-health`` (a read-only gate), this is the staleness DETECTOR: it + enumerates REPO_HEALTH, evaluates per-repo breaches, and WRITES the + materialized ``BREACH_COUNTER`` rollup the read API consumes O(1). M5 added + ``run_freshness_evaluator``; M3 wired the CLI surface + timer; M9 connects the + detections to a pluggable ALERT SINK so a scheduled detection of a stale repo + RESULTS IN an active alert (per-repo SLA breach, coverage gap, dead-letter / + backlog growth), de-duped/re-notified per (repo, kind) via durable + ALERT_STATE. It always exits 0 on a successful pass (detection, not a gate); + a fatal storage error exits 1. + """ + if args.storage_backend != "dynamodb": + print( + "error: freshness-eval supports --storage-backend dynamodb only", + file=sys.stderr, + ) + return 2 + + thresholds = FreshnessThresholds.from_cadences( + poll_interval_hours=args.poll_interval_hours, + baseline_cadence_hours=args.baseline_cadence_hours, + margin_hours=args.margin_hours, + ) + try: + store = store_from_args(args) + sink = NotificationLogAlertSink(args.notification_log) + dispatcher = AlertDispatcher( + sink=sink, + state=store, + window_hours=args.renotify_window_hours, + ) + evaluation = run_freshness_evaluator_with_alerts( + store, + now=dt.datetime.now(dt.UTC), + thresholds=thresholds, + dispatcher=dispatcher, + coverage_gap=_coverage_gap_for(store), + backlog_threshold=args.backlog_alert_threshold, + ) + except Exception as exc: # noqa: BLE001 - fatal storage/runtime error. + print(f"error: freshness-eval failed: {exc}", file=sys.stderr) + return 1 + + counter = evaluation.counter + print( + f"freshness-eval: {counter.repos_evaluated} repos evaluated, " + f"{counter.total_breaches} breached " + f"({counter.incremental_breaches} incremental / " + f"{counter.baseline_breaches} baseline); BREACH_COUNTER materialized, " + "alerts dispatched", + file=sys.stdout, ) - print(verdict.message, file=sys.stdout if verdict.ok else sys.stderr) - return 0 if verdict.ok else 1 + return 0 diff --git a/src/security_scanner/core/finding/model.py b/src/security_scanner/core/finding/model.py index d046c71..45c43cf 100644 --- a/src/security_scanner/core/finding/model.py +++ b/src/security_scanner/core/finding/model.py @@ -22,6 +22,7 @@ import hashlib import json import os +from collections.abc import Iterable from dataclasses import dataclass, field @@ -72,12 +73,114 @@ class Verdict(str, enum.Enum): FALSE_POSITIVE = "FALSE_POSITIVE" +class Disposition(str, enum.Enum): + """Dashboard/read-API-facing disposition vocabulary (FR-11, F6). + + This is the *presentation* name set requested by the scale-redesign design + (``verified | false_positive | unreviewed``). It is NOT a third, clashing + source of truth: every value maps 1:1 onto the canonical :class:`Verdict` + enum the verifier/LLM track already produces, via ``DISPOSITION_TO_VERDICT`` + / ``VERDICT_TO_DISPOSITION``. ``Finding.disposition`` is derived from + ``triage.verdict`` so the two can never drift, and the verifier track fills + the field for free by writing a verdict (no translation at the write site). + """ + VERIFIED = "verified" + FALSE_POSITIVE = "false_positive" + UNREVIEWED = "unreviewed" + + +# Single source of truth for the verdict <-> disposition reconciliation (F6). +# Total and bijective over the three Verdict values, so every verifier verdict +# has exactly one disposition and vice versa. Mapping fixed by design.md: +# verified <-> true_positive, false_positive <-> false_positive, +# unreviewed <-> needs_review. +DISPOSITION_TO_VERDICT: dict[str, str] = { + Disposition.VERIFIED.value: Verdict.TRUE_POSITIVE.value, + Disposition.FALSE_POSITIVE.value: Verdict.FALSE_POSITIVE.value, + Disposition.UNREVIEWED.value: Verdict.NEEDS_REVIEW.value, +} +VERDICT_TO_DISPOSITION: dict[str, str] = { + verdict: disposition for disposition, verdict in DISPOSITION_TO_VERDICT.items() +} + +# Default disposition for a finding that has not been triaged. Tied to the +# Triage.verdict default (NEEDS_REVIEW) so the model has one default, not two. +DEFAULT_DISPOSITION = VERDICT_TO_DISPOSITION[Verdict.NEEDS_REVIEW.value] + + +def disposition_for_verdict(verdict: str) -> str: + """Map a canonical :class:`Verdict` value to its disposition name (F6). + + Unknown/unmapped verdicts fall back to the default ``unreviewed`` so a + forward-compatible verdict added later never crashes the read/dashboard + path; the mapping for the three known verdicts is exact. + """ + return VERDICT_TO_DISPOSITION.get(verdict, DEFAULT_DISPOSITION) + + +def verdict_for_disposition(disposition: str) -> str: + """Map a disposition name back to its canonical :class:`Verdict` value (F6).""" + try: + return DISPOSITION_TO_VERDICT[disposition] + except KeyError as exc: + raise ValueError( + f"Invalid disposition {disposition!r}. " + f"Must be one of: {sorted(DISPOSITION_TO_VERDICT)}" + ) from exc + + +# Dashboard default emphasis order (FR-11 / Q10): surface unreviewed + verified +# (the signal an operator must act on), de-emphasize false_positive (noise) by +# sorting it last. Used by ``order_findings_for_dashboard``. +DASHBOARD_DISPOSITION_ORDER: tuple[str, ...] = ( + Disposition.UNREVIEWED.value, + Disposition.VERIFIED.value, + Disposition.FALSE_POSITIVE.value, +) + + +def filter_findings_by_disposition( + findings: Iterable[Finding], dispositions: Iterable[str] +) -> list[Finding]: + """Return the findings whose disposition is in *dispositions* (FR-11 filter). + + The read API (M7) and the dashboard consume this to slice findings by + disposition. Validates the requested names against the allowed set so a + typo'd filter fails loudly instead of silently returning nothing. + """ + requested = set(dispositions) + unknown = requested - _VALID_DISPOSITIONS + if unknown: + raise ValueError( + f"Invalid disposition filter {sorted(unknown)}. " + f"Must be one of: {sorted(_VALID_DISPOSITIONS)}" + ) + return [finding for finding in findings if finding.disposition in requested] + + +def order_findings_for_dashboard( + findings: Iterable[Finding], +) -> list[Finding]: + """Order findings by dashboard emphasis (FR-11 / Q10). + + ``unreviewed`` and ``verified`` first (the actionable signal), then + ``false_positive`` last (de-emphasized noise). Stable within each + disposition group so any prior ordering of equals is preserved. + """ + rank = {value: index for index, value in enumerate(DASHBOARD_DISPOSITION_ORDER)} + return sorted( + findings, + key=lambda finding: rank.get(finding.disposition, len(rank)), + ) + + # Fast lookup sets for validation _VALID_CATEGORIES: frozenset[str] = frozenset(v.value for v in Category) _VALID_SEVERITIES: frozenset[str] = frozenset(v.value for v in Severity) _VALID_CONFIDENCE_LEVELS: frozenset[str] = frozenset(v.value for v in ConfidenceLevel) _VALID_STATUSES: frozenset[str] = frozenset(v.value for v in Status) _VALID_VERDICTS: frozenset[str] = frozenset(v.value for v in Verdict) +_VALID_DISPOSITIONS: frozenset[str] = frozenset(v.value for v in Disposition) # --------------------------------------------------------------------------- @@ -464,6 +567,23 @@ def __post_init__(self) -> None: ) # triage.verdict is validated by Triage.__post_init__ + # ------------------------------------------------------------------ + # Disposition (FR-11, F6) — derived view over the canonical triage verdict + # ------------------------------------------------------------------ + + @property + def disposition(self) -> str: + """Dashboard/read-API disposition derived from ``triage.verdict`` (F6). + + Not an independently-stored field: it is a 1:1 projection of the + canonical verdict via ``VERDICT_TO_DISPOSITION``. An untriaged finding's + verdict defaults to ``NEEDS_REVIEW`` so its disposition defaults to + ``unreviewed``. Keeping it derived means the verifier track fills this + field for free by writing a verdict — there is no second value to keep + in sync and nothing for ``from_dict`` to trust from older payloads. + """ + return disposition_for_verdict(self.triage.verdict) + # ------------------------------------------------------------------ # Factory # ------------------------------------------------------------------ @@ -573,6 +693,10 @@ def to_dict(self) -> dict: "fingerprint": self.fingerprint, "status": self.status, "triage": self.triage.to_dict(), + # Read-API/dashboard disposition (FR-11, F6). Emitted as a derived + # projection of triage.verdict, NOT a second stored field: from_dict + # ignores it (see below) so wire and model can never disagree. + "disposition": self.disposition, "scan": self.scan.to_dict(), "gitleaks": self.gitleaks.to_redacted_dict() if self.gitleaks else None, } @@ -582,6 +706,12 @@ def from_dict(cls, data: dict) -> "Finding": """Deserialise from a camelCase dict (e.g. parsed from JSON). Raises ValueError for unrecognised enum values (via __post_init__). + + Backward/forward compatible on ``disposition`` (FR-11, F6): a legacy + finding written before this field existed simply has no ``disposition`` + key, and a newer one carries the derived value — either way it is + ignored here and re-derived from ``triage.verdict``, so the field is + never a second source of truth that could decode out of sync. """ return cls( finding_id=data["findingId"], diff --git a/src/security_scanner/runtime/alert_sink.py b/src/security_scanner/runtime/alert_sink.py new file mode 100644 index 0000000..54329ee --- /dev/null +++ b/src/security_scanner/runtime/alert_sink.py @@ -0,0 +1,604 @@ +"""Pluggable alert sink + detector wiring (FR-8 alerts, M9, the F3 fix). + +The whole scale-redesign exists because a prior silent-staleness incident went +unnoticed (design Q7: "침묵이 구조적으로 불가능"). M5 built the scheduled +freshness-evaluator that DETECTS staleness on a timer and exposes an +``on_breaches`` hook but implements NO sink; M4 built a cadence-overrun signal +routed to the notification-log seam. **M9 turns those detections into an ACTIVE +alert** that reaches a sink — not a dashboard nobody opens — so staleness can +never be silent. + +What this module provides +------------------------- +1. ``AlertSink`` — a Protocol with a single ``emit(alert)`` method. The default + :class:`NotificationLogAlertSink` reuses the EXISTING append-only JSONL + notification-log seam (``notification_log.py``); it does NOT invent a parallel + substrate. :class:`RecordingAlertSink` is the in-memory fake the tests assert + against. Real channels (Slack/email/webhook) are a one-line ``AlertSink`` + implementation left for the deploy gate — NOT implemented here (live external + delivery is a side effect needing channel config + human confirmation). + +2. ``Alert`` — one actionable alert carrying enough context (``repo_id``, + ``kind``, age/threshold detail) to act on without re-evaluating. + +3. ``AlertDispatcher`` — the de-dup / re-notify policy (design Data Flow step 6: + "de-dup/재통지 cadence로 지속 stale repo가 스팸도 침묵도 아니게"). It suppresses + a repeat alert for the same ``(repo, kind)`` within a re-notify window but + RE-FIRES after the window, so a persistently-stale repo neither spams every + cycle nor goes silent. Its only state is a last-alerted timestamp per + ``(repo, kind)`` persisted in the ``ALERT_STATE`` entity (idempotent, like the + rest of the single-table writes). + +4. Detector → alert mappers: per-repo incremental/baseline SLA breach, coverage + gap, dead-letter / queue-backlog growth, and the M4 cadence-overrun, all + funneled through the dispatcher to the sink. +""" + +from __future__ import annotations + +import datetime as dt +from collections.abc import Sequence +from dataclasses import dataclass, field +from typing import Any, Protocol, runtime_checkable + +from security_scanner.runtime.incremental_discovery import PollCadenceSignal +from security_scanner.storage.base import RepoFreshnessBreach + +# --- alert kinds ----------------------------------------------------------- +# Stable string tags (not an enum) so a kind written to ALERT_STATE / the +# notification log and one read back share one vocabulary, mirroring the +# JOB_TYPE_* / SCAN_JOB_STATUS_* constant idiom already in the codebase. +ALERT_KIND_INCREMENTAL_BREACH = "incremental_sla_breach" +ALERT_KIND_BASELINE_BREACH = "baseline_sla_breach" +ALERT_KIND_COVERAGE_GAP = "coverage_gap" +ALERT_KIND_DEAD_LETTER = "dead_letter_increase" +ALERT_KIND_QUEUE_BACKLOG = "queue_backlog_growth" +ALERT_KIND_CADENCE_OVERRUN = "cadence_overrun" + +# Synthetic repo_id for org-wide (non-per-repo) alerts so the (repo, kind) +# de-dup key is well-defined for coverage-gap / dead-letter / backlog / cadence +# signals that are not scoped to a single repository. +ALERT_SCOPE_ORG = "__org__" + + +@dataclass(frozen=True) +class Alert: + """One actionable alert routed to a sink (FR-8). + + ``repo_id`` + ``kind`` form the de-dup identity. ``detail`` carries the + actionable context (age/threshold/counts) so an operator — or a future + Slack/email body — needs nothing else to act. ``event_at`` is the ISO + timestamp the alert was raised. ``message`` is a one-line human summary. + """ + + repo_id: str + kind: str + message: str + event_at: str + detail: dict[str, Any] = field(default_factory=dict) + + @property + def dedup_key(self) -> tuple[str, str]: + """Return the ``(repo_id, kind)`` identity the dispatcher de-dups on.""" + return (self.repo_id, self.kind) + + def to_record(self) -> dict[str, Any]: + """Return the JSONL notification-log record for this alert. + + Shares the ``type`` / ``event_at`` shape of the existing + notification-log record builders so the default sink reuses that seam. + """ + return { + "type": "alert", + "kind": self.kind, + "event_at": self.event_at, + "repo_id": self.repo_id, + "message": self.message, + "detail": dict(self.detail), + } + + +@runtime_checkable +class AlertSink(Protocol): + """Pluggable alert delivery seam (FR-8). + + One method so a real channel (Slack/email/webhook) is a one-line + implementation added at the deploy gate. ``emit`` MUST NOT raise on a + delivery problem — an alert-channel failure must never break the evaluator + timer that produced it (the same fail-soft contract the notification log + already keeps). + """ + + def emit(self, alert: Alert) -> None: + """Deliver one alert. Fail soft; never raise on a delivery problem.""" + + +class NotificationLogAlertSink: + """Default sink: append the alert to the existing notification-log JSONL. + + Reuses ``notification_log.write_record`` (the same append-only seam + ``scan-all`` and the M4 cadence-overrun already write to) rather than a new + substrate. ``write_record`` already degrades gracefully on OSError (stderr + warning, no raise), so a channel problem never breaks the evaluator. + """ + + def __init__(self, path, *, writer=None) -> None: + from pathlib import Path + + from security_scanner.runtime.notification_log import ( + DEFAULT_NOTIFICATION_LOG_PATH, + write_record, + ) + + resolved = path if path is not None else DEFAULT_NOTIFICATION_LOG_PATH + # The CLI passes a string path; ``write_record`` needs a Path (it touches + # ``path.parent``). Coerce here so any string/Path caller works. + self._path = Path(resolved) + self._writer = writer or write_record + + def emit(self, alert: Alert) -> None: + self._writer(self._path, alert.to_record()) + + +class RecordingAlertSink: + """In-memory recording sink for tests (the fake sink M9's evidence uses). + + Records every emitted alert so a test can assert WHICH alerts fired with + WHAT context — the F3 evidence is that a scheduled detection of a stale repo + RESULTS IN an alert reaching the sink, not merely sink plumbing existing. + """ + + def __init__(self) -> None: + self.alerts: list[Alert] = [] + + def emit(self, alert: Alert) -> None: + self.alerts.append(alert) + + # Convenience accessors the tests read. + @property + def kinds(self) -> list[str]: + return [a.kind for a in self.alerts] + + def for_repo(self, repo_id: str) -> list[Alert]: + return [a for a in self.alerts if a.repo_id == repo_id] + + def of_kind(self, kind: str) -> list[Alert]: + return [a for a in self.alerts if a.kind == kind] + + +# --- de-dup / re-notify state --------------------------------------------- + + +@runtime_checkable +class AlertStateStore(Protocol): + """Durable last-alerted timestamp per ``(repo, kind)`` (de-dup state, M9). + + The ONLY state the de-dup/re-notify policy needs. Kept minimal and + idempotent like the rest of the single-table writes: read the last-alerted + ISO timestamp for a ``(repo, kind)`` and overwrite it when an alert fires. + """ + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + """Return the last-alerted ISO timestamp for ``(repo, kind)`` or None.""" + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + """Record that ``(repo, kind)`` was alerted at ``alerted_at`` (ISO).""" + + +# Default re-notify window: a persistently-stale repo re-fires roughly daily, so +# it can never be forgotten, but does not spam every (e.g. 5-min) evaluator tick. +# Like the freshness cadences, the CONCRETE value is a load-gate tuning decision; +# this stays a configurable default (no invented load-validated number). +DEFAULT_RENOTIFY_WINDOW_HOURS = 24.0 + + +class InMemoryAlertStateStore: + """In-process ``AlertStateStore`` for tests and single-process runs. + + A plain dict keyed on ``(repo_id, kind)``. The durable DynamoDB-backed + implementation lives on the store (``ALERT_STATE`` entity); this is the + no-DB substrate the M9 tests drive. + """ + + def __init__(self) -> None: + self._state: dict[tuple[str, str], str] = {} + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + return self._state.get((repo_id, kind)) + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + self._state[(repo_id, kind)] = alerted_at + + +@dataclass +class AlertDispatcher: + """De-dup / re-notify gate in front of a sink (design Data Flow step 6). + + Policy (stated): for each ``(repo_id, kind)`` the dispatcher tracks the last + time it alerted. An incoming alert is: + + - SUPPRESSED when the same ``(repo, kind)`` alerted within the re-notify + window (``now - last_alerted <= window``) — a persistently-stale repo + does not spam every evaluation cycle; + - DELIVERED when it has never alerted for that ``(repo, kind)`` OR the + window has elapsed (``now - last_alerted > window``) — so a still-stale + repo RE-FIRES after the window and can never be silently forgotten. + + Suppression is strictly per ``(repo, kind)``: suppressing repo A's alert + never suppresses repo B's, which is the exact incident shape (one fresh repo + must not mask a different stale repo). Delivery updates the last-alerted + timestamp so the window restarts from the delivered alert. + """ + + sink: AlertSink + state: AlertStateStore + window_hours: float = DEFAULT_RENOTIFY_WINDOW_HOURS + + def dispatch(self, alert: Alert, *, now: dt.datetime) -> bool: + """Route one alert through the de-dup gate. Return True when delivered.""" + last_iso = self.state.read_alert_state(alert.repo_id, alert.kind) + if last_iso is not None and not self._window_elapsed(last_iso, now): + return False + self.sink.emit(alert) + self.state.put_alert_state(alert.repo_id, alert.kind, alert.event_at) + return True + + def dispatch_all( + self, alerts: Sequence[Alert], *, now: dt.datetime + ) -> list[Alert]: + """Dispatch many alerts, returning the ones actually delivered.""" + delivered: list[Alert] = [] + for alert in alerts: + if self.dispatch(alert, now=now): + delivered.append(alert) + return delivered + + def _window_elapsed(self, last_iso: str, now: dt.datetime) -> bool: + from security_scanner.storage.adapters.nosql_db.items import datetime_from_iso + + last = datetime_from_iso(last_iso) + age_hours = (now - last).total_seconds() / 3600.0 + # window_hours <= 0 disables suppression entirely (always re-fire). + if self.window_hours <= 0: + return True + return age_hours > self.window_hours + + +# --- detector -> alert mappers --------------------------------------------- + + +def _now_iso(now: dt.datetime) -> str: + if now.tzinfo is None: + raise ValueError( + "now must be timezone-aware (UTC); a naive datetime would be " + "silently reinterpreted as local time by astimezone" + ) + return now.astimezone(dt.timezone.utc).replace(microsecond=0).isoformat() + + +def alerts_from_breaches( + breaches: Sequence[RepoFreshnessBreach], *, now: dt.datetime +) -> list[Alert]: + """Map per-repo freshness breaches to alerts (incremental + baseline SLA). + + One repo can breach BOTH classes, so it yields one alert per breached class + — each de-dups on its own ``(repo, kind)`` so a baseline-only re-fire does + not depend on the incremental one. Each alert carries the breached class and + the repo's last-successful timestamps so an operator sees exactly what is + stale without re-querying. + """ + event_at = _now_iso(now) + alerts: list[Alert] = [] + for breach in breaches: + if breach.incremental: + alerts.append( + Alert( + repo_id=breach.repo_id, + kind=ALERT_KIND_INCREMENTAL_BREACH, + message=( + f"repo {breach.repo_id} incremental SLA breach: " + f"last incremental success " + f"{breach.last_successful_incremental_at or 'NEVER'}" + ), + event_at=event_at, + detail={ + "lastSuccessfulIncrementalAt": ( + breach.last_successful_incremental_at + ), + }, + ) + ) + if breach.baseline: + alerts.append( + Alert( + repo_id=breach.repo_id, + kind=ALERT_KIND_BASELINE_BREACH, + message=( + f"repo {breach.repo_id} baseline SLA breach: " + f"last full-scan success " + f"{breach.last_successful_full_scan_at or 'NEVER'}" + ), + event_at=event_at, + detail={ + "lastSuccessfulFullScanAt": ( + breach.last_successful_full_scan_at + ), + }, + ) + ) + return alerts + + +def alert_from_coverage_gap( + coverage_gap: int | None, *, now: dt.datetime +) -> Alert | None: + """Map a coverage gap (org repos not yet covered) to an alert, else None. + + ``coverage_gap`` is the org-N-vs-covered-M half (CATALOG vs REPO_HEALTH, + M1->M5 seam). A gap of 0 (or unknown ``None``) raises nothing — silence only + when fully covered. A positive gap means included org repos have never + scanned, the "침묵 커버리지 갭" the design calls out, so it alerts. + """ + if not coverage_gap: # None or 0 + return None + return Alert( + repo_id=ALERT_SCOPE_ORG, + kind=ALERT_KIND_COVERAGE_GAP, + message=( + f"coverage gap: {coverage_gap} included org repo(s) not yet covered" + ), + event_at=_now_iso(now), + detail={"coverageGap": coverage_gap}, + ) + + +def alert_from_dead_letter_growth( + *, + current: int, + previous: int | None, + now: dt.datetime, +) -> Alert | None: + """Map a dead-letter increase to an alert, else None. + + A growing dead-letter count means jobs are terminally failing (the design's + starvation / max-lease-expiry dead-letter path). Alerts when ``current`` + exceeds the ``previous`` observed count (a real increase, not a steady + backlog), so a one-off dead-letter is surfaced but a stable count does not + re-fire every tick. ``previous`` is None on the first observation: a nonzero + first reading still alerts (fail toward visibility). + """ + baseline = previous if previous is not None else 0 + if current <= baseline: + return None + return Alert( + repo_id=ALERT_SCOPE_ORG, + kind=ALERT_KIND_DEAD_LETTER, + message=( + f"dead-letter increase: {current} dead-lettered job(s) " + f"(was {baseline})" + ), + event_at=_now_iso(now), + detail={"deadLetter": current, "previousDeadLetter": baseline}, + ) + + +def alert_from_backlog_growth( + *, + current: int, + previous: int | None, + threshold: int, + now: dt.datetime, +) -> Alert | None: + """Map queue-backlog growth past a threshold to an alert, else None. + + Backlog (pending + leased work) that grows past ``threshold`` means the + worker pool is falling behind — a precursor to staleness. Alerts only when + the backlog both EXCEEDS ``threshold`` and GREW vs the previous observation, + so a transient depth or a draining backlog is not noise. ``threshold <= 0`` + disables the check. + """ + if threshold <= 0 or current <= threshold: + return None + baseline = previous if previous is not None else 0 + if current <= baseline: + return None + return Alert( + repo_id=ALERT_SCOPE_ORG, + kind=ALERT_KIND_QUEUE_BACKLOG, + message=( + f"queue backlog growth: {current} job(s) pending+leased " + f"exceeds threshold {threshold} (was {baseline})" + ), + event_at=_now_iso(now), + detail={ + "backlog": current, + "previousBacklog": baseline, + "threshold": threshold, + }, + ) + + +def run_freshness_evaluator_with_alerts( + store, + *, + now: dt.datetime, + thresholds, + dispatcher: AlertDispatcher, + coverage_gap: int | None = None, + backlog_threshold: int = 0, +): + """Run the scheduled evaluator AND fire alerts to the sink (M9 — the F3 fix). + + This is the function that ties "M9 done" to a FIRED alert: it runs M5's + ``run_freshness_evaluator`` (which DETECTS per-repo staleness on a timer and + materializes the BREACH_COUNTER) and, via the ``on_breaches`` hook, funnels + every detection through the de-dup ``dispatcher`` to the sink. So a scheduled + detection of a stale repo RESULTS IN an alert reaching the sink, not merely + sink plumbing existing. + + Triggers wired here, all through the same dispatcher (one de-dup/re-notify + policy): + + - per-repo incremental SLA breach (``on_breaches`` -> alerts_from_breaches) + - per-repo baseline SLA breach (same) + - coverage gap (org repos not yet covered) + - dead-letter increase (terminal-failure growth) + - queue-backlog growth past threshold + + The M4 cadence-overrun is fired from the deployed incr-poll command + (``cli.commands.scan.cmd_discover_updates``, which times its discovery cycle + and routes an overrun through ``alert_from_cadence_overrun`` to this same + dispatcher/sink), so it is not duplicated here. ``store`` may optionally expose + ``read_queue_backlog`` (dead-letter / backlog signals); when absent those + triggers are skipped (the freshness store contract does not require them). + Returns the M5 ``FreshnessEvaluation`` so the CLI keeps reporting the rollup. + """ + from security_scanner.runtime.scan_health import run_freshness_evaluator + + def _on_breaches(breaches): + dispatcher.dispatch_all( + alerts_from_breaches(breaches, now=now), now=now + ) + + evaluation = run_freshness_evaluator( + store, + now=now, + thresholds=thresholds, + coverage_gap=coverage_gap, + on_breaches=_on_breaches, + ) + + coverage_alert = alert_from_coverage_gap(coverage_gap, now=now) + if coverage_alert is not None: + dispatcher.dispatch(coverage_alert, now=now) + + _dispatch_queue_alerts( + store, dispatcher, now=now, backlog_threshold=backlog_threshold + ) + return evaluation + + +def _dispatch_queue_alerts( + store, + dispatcher: AlertDispatcher, + *, + now: dt.datetime, + backlog_threshold: int, +) -> None: + """Fire dead-letter / backlog-growth alerts off the queue backlog counter. + + Reuses M7's O(status-partitions) ``read_queue_backlog`` (never a full-table + Scan) and compares against the last observed counts persisted in ALERT_STATE + (the same de-dup substrate), so a steady backlog does not re-fire every tick + but a real GROWTH does. No-op when the store does not expose the backlog read + (the freshness contract does not mandate it). + """ + read_backlog = getattr(store, "read_queue_backlog", None) + if read_backlog is None: + return + try: + backlog = read_backlog() + except Exception: # noqa: BLE001 - alerting must never break the evaluator. + return + + counts = backlog.job_counts_by_status + dead_letter = int(counts.get("dead_letter", 0)) + prev_dead = _read_int_state(dispatcher.state, ALERT_KIND_DEAD_LETTER) + dead_alert = alert_from_dead_letter_growth( + current=dead_letter, previous=prev_dead, now=now + ) + if dead_alert is not None and dispatcher.dispatch(dead_alert, now=now): + # Persist the observed count as the new baseline so a stable count does + # not re-fire; dispatch already stamped the (repo, kind) alert time. + _write_int_state(dispatcher.state, ALERT_KIND_DEAD_LETTER, dead_letter) + + prev_backlog = _read_int_state(dispatcher.state, ALERT_KIND_QUEUE_BACKLOG) + backlog_alert = alert_from_backlog_growth( + current=backlog.backlog, + previous=prev_backlog, + threshold=backlog_threshold, + now=now, + ) + if backlog_alert is not None and dispatcher.dispatch(backlog_alert, now=now): + _write_int_state( + dispatcher.state, ALERT_KIND_QUEUE_BACKLOG, backlog.backlog + ) + + +# Count baselines (dead-letter / backlog) are stashed in the SAME ALERT_STATE +# substrate under an org-scoped ":count" key so the dead-letter/backlog +# growth comparison needs no second entity. They are stored as the ISO-less raw +# count string; helpers below read/write them defensively. +def _count_key(kind: str) -> tuple[str, str]: + return (ALERT_SCOPE_ORG, f"{kind}:count") + + +def _read_int_state(state: AlertStateStore, kind: str) -> int | None: + repo_id, count_kind = _count_key(kind) + raw = state.read_alert_state(repo_id, count_kind) + if raw is None: + return None + try: + return int(raw) + except (TypeError, ValueError): + return None + + +def _write_int_state(state: AlertStateStore, kind: str, value: int) -> None: + repo_id, count_kind = _count_key(kind) + state.put_alert_state(repo_id, count_kind, str(value)) + + +def alert_from_cadence_overrun( + signal: PollCadenceSignal, *, now: dt.datetime +) -> Alert | None: + """Map an M4 poll cadence-overrun signal to an alert, else None. + + Routes the existing ``PollCadenceSignal`` (M4) THROUGH the M9 sink so the + cadence-overrun shares the de-dup/re-notify policy and one alert vocabulary, + rather than only landing on the raw notification-log seam. A non-overrun + cycle raises nothing (healthy cycle stays silent). + """ + if not signal.overrun: + return None + return Alert( + repo_id=ALERT_SCOPE_ORG, + kind=ALERT_KIND_CADENCE_OVERRUN, + message=( + f"poll cadence overrun: cycle {signal.cycle_seconds:g}s exceeds " + f"cadence {signal.cadence_seconds:g}s by " + f"{signal.overrun_seconds:g}s ({signal.targets} targets)" + ), + event_at=_now_iso(now), + detail={ + "cycleSeconds": signal.cycle_seconds, + "cadenceSeconds": signal.cadence_seconds, + "overrunSeconds": signal.overrun_seconds, + "targets": signal.targets, + }, + ) + + +__all__ = [ + "ALERT_KIND_BASELINE_BREACH", + "ALERT_KIND_CADENCE_OVERRUN", + "ALERT_KIND_COVERAGE_GAP", + "ALERT_KIND_DEAD_LETTER", + "ALERT_KIND_INCREMENTAL_BREACH", + "ALERT_KIND_QUEUE_BACKLOG", + "ALERT_SCOPE_ORG", + "DEFAULT_RENOTIFY_WINDOW_HOURS", + "Alert", + "AlertDispatcher", + "AlertSink", + "AlertStateStore", + "InMemoryAlertStateStore", + "NotificationLogAlertSink", + "RecordingAlertSink", + "alert_from_backlog_growth", + "alert_from_cadence_overrun", + "alert_from_coverage_gap", + "alert_from_dead_letter_growth", + "alerts_from_breaches", + "run_freshness_evaluator_with_alerts", +] diff --git a/src/security_scanner/runtime/baseline_enqueue.py b/src/security_scanner/runtime/baseline_enqueue.py new file mode 100644 index 0000000..37592a4 --- /dev/null +++ b/src/security_scanner/runtime/baseline_enqueue.py @@ -0,0 +1,293 @@ +"""Baseline per-repo enqueue for the scale queue (M4, SC-3). + +The reviewer's SC-3 (blocker): the v1 claim "baseline reuses scan-all" is FALSE. +``runtime.scan_all.run_scan_all`` is a SERIAL, in-process batch — it fetches and +scans every enabled target inside one process under one global flock, writes a +notification-log summary, and NEVER enqueues a ``ScanJob``. It has no per-repo +queue granularity, no priority, no backpressure, and no way for the N-worker pool +to drain it concurrently. So baseline at 500+ repos CANNOT be scan-all. + +This module is the NEW baseline path the design's Data Flow step 3 calls for: +enumerate the INCLUDED CATALOG and enqueue ONE ``ScanJob(job_type="baseline")`` +PER REPO so the same worker pool that drains incremental jobs also drains +baseline jobs, with three reviewer-mandated controls: + + * **priority separation** — baseline jobs carry a HIGH numeric priority + (``BASELINE_JOB_PRIORITY``) so they sort AFTER incremental jobs in the + ascending ``gsi1sk`` dequeue order (lower number served first). A baseline + flood therefore cannot starve incremental change detection. + * **backpressure** — baseline enqueue is THROTTLED (skipped this invocation) + when the pending-queue backlog already exceeds a threshold, so baseline does + not pile work onto an already-saturated queue. + * **rolling baseline** — a deterministic 1/Nth-of-repos-per-invocation slice is + enqueued each run, so the full catalog is covered over N invocations even + when the worker pool cannot full-scan all repos within one window (graceful + degradation, design "윈도 미달 대비 rolling baseline 폴백"). + +We deliberately do NOT call ``run_scan_all`` here (SC-3): it is the wrong shape +(serial in-process, no queue, no per-repo lease/priority). Reusing it would +re-introduce the single-process bottleneck the scale redesign exists to remove. +""" + +from __future__ import annotations + +import datetime as dt +import hashlib +from collections.abc import Callable, Sequence +from dataclasses import dataclass, field + +from security_scanner.storage.adapters.nosql_db.items import ( + SCAN_JOB_STATUS_PENDING, + repo_id_for_scan_target_url, + scan_job_id_for, +) +from security_scanner.storage.base import ( + JOB_TYPE_BASELINE, + CatalogStore, + IncrementalScanStore, + ScanJob, +) + +# Baseline jobs are LOW precedence: the queue sorts gsi1sk ascending on +# ``nextAttemptAt#priority:08d#...`` (lower number dequeues first), and +# incremental jobs use priority 100 (incremental_discovery.DEFAULT_JOB_PRIORITY). +# Baseline uses a much HIGHER number so every incremental job is served before any +# baseline job — incremental change detection is never starved by a baseline +# flood (SC-3 priority separation). +BASELINE_JOB_PRIORITY = 900 +DEFAULT_MAX_ATTEMPTS = 3 +# A baseline job has no specific commit (it is a full-tree scan), so its scan +# "commit" slot is a stable per-repo marker. Using a fixed sentinel keeps the +# derived job_id deterministic per repo per invocation window, so re-running +# baseline enqueue for an already-queued repo is an idempotent no-op rather than +# a duplicate job. +BASELINE_COMMIT_SENTINEL = "baseline" + +# Default pending-backlog threshold above which baseline enqueue is throttled +# (skipped this invocation). LOGIC default only; the real threshold is a box-gate +# decision tied to worker-pool throughput, so it stays a parameter. +DEFAULT_BACKPRESSURE_THRESHOLD = 1000 +# Default rolling-baseline divisor: cover 1/7th of the catalog per invocation so +# a daily timer completes a full pass weekly (mirrors the pre-scale weekly full +# scan). LOGIC default only; the real divisor is a box-gate decision. +DEFAULT_ROLLING_DIVISOR = 7 + + +@dataclass(frozen=True) +class BaselineScannerConfig: + """Scanner tuple stamped onto baseline jobs (mirrors discovery's tuple).""" + + scanner_name: str + scanner_version: str + rule_pack_version: str + scanner_config_hash: str + + +@dataclass(frozen=True) +class BaselineEnqueueRequest: + """Inputs for one baseline enqueue invocation (SC-3). + + ``catalog_store`` supplies the INCLUDED repo set (M1); ``queue_store`` is the + incremental queue the baseline jobs are enqueued onto and whose backlog drives + backpressure. ``rolling_divisor`` and ``rolling_offset`` select the + deterministic 1/Nth slice for this invocation; a timer wrapper advances the + offset across runs so the whole catalog is covered over ``rolling_divisor`` + invocations. + """ + + catalog_store: CatalogStore + queue_store: IncrementalScanStore + scanner: BaselineScannerConfig + backpressure_threshold: int = DEFAULT_BACKPRESSURE_THRESHOLD + rolling_divisor: int = DEFAULT_ROLLING_DIVISOR + rolling_offset: int = 0 + now_factory: Callable[[], dt.datetime] = lambda: dt.datetime.now(dt.UTC).replace( + microsecond=0 + ) + + +@dataclass(frozen=True) +class BaselineEnqueueSummary: + """Operator-facing outcome of one baseline enqueue invocation.""" + + included_repos: int + selected_repos: int + jobs_enqueued: int = 0 + duplicates_skipped: int = 0 + throttled: bool = False + backlog: int = 0 + rolling_divisor: int = 1 + rolling_offset: int = 0 + selected_repo_ids: list[str] = field(default_factory=list) + + +def _pending_backlog(store: IncrementalScanStore, now: dt.datetime) -> int: + """Return the pending-queue depth (the backpressure signal). + + Backpressure measures the PENDING backlog — work waiting to be leased — from + the same maintained queue-status the read API/queue-status command read, so + baseline throttling and operator visibility agree on one number. + """ + status = store.get_queue_status(now) + return status.job_counts_by_status.get(SCAN_JOB_STATUS_PENDING, 0) + + +def select_rolling_slice( + repo_ids: Sequence[str], + *, + divisor: int, + offset: int, +) -> list[str]: + """Deterministically select 1/divisor of the repos for this invocation (SC-3). + + Each repo is assigned a stable bucket in ``[0, divisor)`` by hashing its + repo_id (so the bucketing is independent of catalog ordering and stable across + runs), and this invocation enqueues only the repos whose bucket equals + ``offset % divisor``. Over ``divisor`` invocations with advancing offsets every + bucket — hence every included repo — is covered exactly once. ``divisor <= 1`` + disables rolling and selects every repo (a single full pass). + """ + if divisor <= 1: + return list(repo_ids) + target_bucket = offset % divisor + selected: list[str] = [] + for repo_id in repo_ids: + digest = hashlib.sha256(repo_id.encode("utf-8")).hexdigest() + bucket = int(digest, 16) % divisor + if bucket == target_bucket: + selected.append(repo_id) + return selected + + +def _baseline_job_for_repo( + *, + repo_id: str, + repo_url: str, + scanner: BaselineScannerConfig, + now: dt.datetime, +) -> ScanJob: + """Build one low-priority baseline ScanJob for a repo (no specific commit).""" + job_id = scan_job_id_for( + repo_id=repo_id, + commit_sha=BASELINE_COMMIT_SENTINEL, + scanner_name=scanner.scanner_name, + scanner_version=scanner.scanner_version, + rule_pack_version=scanner.rule_pack_version, + scanner_config_hash=scanner.scanner_config_hash, + ) + return ScanJob( + job_id=job_id, + repo_id=repo_id, + repo_url=repo_url, + ref_name="refs/remotes/origin/HEAD", + old_sha=None, + new_sha=BASELINE_COMMIT_SENTINEL, + commit_sha=BASELINE_COMMIT_SENTINEL, + commit_range=None, + scanner_name=scanner.scanner_name, + scanner_version=scanner.scanner_version, + rule_pack_version=scanner.rule_pack_version, + scanner_config_hash=scanner.scanner_config_hash, + priority=BASELINE_JOB_PRIORITY, + status=SCAN_JOB_STATUS_PENDING, + # A baseline completion advances lastSuccessfulFullScanAt (FR-7/SC-5), + # NOT the incremental field — that is what job_type selects downstream. + job_type=JOB_TYPE_BASELINE, + attempts=0, + max_attempts=DEFAULT_MAX_ATTEMPTS, + worker_id=None, + lease_until=None, + next_attempt_at=now, + created_at=now, + updated_at=now, + ) + + +def run_baseline_enqueue( + request: BaselineEnqueueRequest, +) -> BaselineEnqueueSummary: + """Enqueue one low-priority baseline ScanJob per selected INCLUDED repo (SC-3). + + Steps: + + 1. Enumerate the CATALOG and keep INCLUDED repos (opt-outs are skipped). + 2. Apply the deterministic rolling-baseline slice (1/divisor this run). + 3. BACKPRESSURE: if the pending backlog already exceeds the threshold, skip + enqueue entirely this invocation (``throttled=True``) so baseline never + piles onto a saturated queue. + 4. Otherwise enqueue one ``ScanJob(job_type="baseline", priority=high)`` per + selected repo. Enqueue is idempotent (deterministic job_id + the store's + conditional put), so a repo already queued is counted as a duplicate skip + rather than double-enqueued. + + Deliberately does NOT call ``run_scan_all`` (SC-3): scan-all is a serial + in-process batch with no queue/priority/lease, which is exactly the + single-process bottleneck the queue replaces. + """ + now = _now(request) + included = [ + entry + for entry in request.catalog_store.read_all_catalog_entries() + if entry.included + ] + included_by_id = { + repo_id_for_scan_target_url(entry.repo_url): entry for entry in included + } + included_ids = list(included_by_id) + + selected_ids = select_rolling_slice( + included_ids, + divisor=request.rolling_divisor, + offset=request.rolling_offset, + ) + + backlog = _pending_backlog(request.queue_store, now) + if backlog > request.backpressure_threshold: + # Throttle: the queue is already backed up past the threshold. Enqueue + # nothing this invocation; the next invocation re-evaluates. Incremental + # detection keeps draining ahead of any deferred baseline work. + return BaselineEnqueueSummary( + included_repos=len(included), + selected_repos=len(selected_ids), + jobs_enqueued=0, + duplicates_skipped=0, + throttled=True, + backlog=backlog, + rolling_divisor=request.rolling_divisor, + rolling_offset=request.rolling_offset, + selected_repo_ids=list(selected_ids), + ) + + jobs_enqueued = 0 + duplicates_skipped = 0 + for repo_id in selected_ids: + entry = included_by_id[repo_id] + job = _baseline_job_for_repo( + repo_id=repo_id, + repo_url=entry.repo_url, + scanner=request.scanner, + now=now, + ) + if request.queue_store.enqueue_commit_scan_job(job): + jobs_enqueued += 1 + else: + duplicates_skipped += 1 + + return BaselineEnqueueSummary( + included_repos=len(included), + selected_repos=len(selected_ids), + jobs_enqueued=jobs_enqueued, + duplicates_skipped=duplicates_skipped, + throttled=False, + backlog=backlog, + rolling_divisor=request.rolling_divisor, + rolling_offset=request.rolling_offset, + selected_repo_ids=list(selected_ids), + ) + + +def _now(request: BaselineEnqueueRequest) -> dt.datetime: + value = request.now_factory() + if value.tzinfo is None: + return value.replace(tzinfo=dt.UTC) + return value.astimezone(dt.UTC).replace(microsecond=0) diff --git a/src/security_scanner/runtime/catalog_reconcile.py b/src/security_scanner/runtime/catalog_reconcile.py new file mode 100644 index 0000000..10ea765 --- /dev/null +++ b/src/security_scanner/runtime/catalog_reconcile.py @@ -0,0 +1,225 @@ +"""Catalog reconcile operation (FR-1, M1, design Data Flow step 1). + +Reconciles the org repository list into the ``CATALOG`` entity on a timer. The +org list comes from an INJECTABLE provider, never a live ``gh api`` call inside +this module: the design's Open Question flags that a live org-list GET may trip +the autopilot ``ghas-live-fetch-or-mutation-required`` stop-condition, and +FR-12/governance gates live GitHub fetches to a human PR (GATE 2). So this module +is implemented against an ``OrgRepoListProvider`` interface and exercised with a +FIXTURE org list; the live provider is a one-liner seam left for GATE 2. + +Two invariants from the design: + + - **opt-out is recorded, not dropped**: an opted-out repo is written with + ``included=False`` + an ``excluded_reason`` so it never silently disappears + from coverage accounting. + - **additive on transient failure**: if the provider errors or returns a + partial list, existing CATALOG entries are NEVER deleted — a repo is never + silently dropped from coverage. Only the provider failure is surfaced (the + timer wrapper turns a persistent failure into a coverage-gap alert). +""" + +from __future__ import annotations + +import datetime as dt +from collections.abc import Iterable +from dataclasses import dataclass +from typing import Protocol, runtime_checkable + +from security_scanner.storage.adapters.nosql_db.items import ( + datetime_to_iso, + repo_id_for_scan_target_url, +) +from security_scanner.storage.base import CatalogEntry, CatalogStore + + +@dataclass(frozen=True) +class OrgRepo: + """One repository as reported by an org-list provider. + + The provider yields only the repo's clone/HTML URL; the reconcile derives the + canonical ``repo_id`` from it with the SAME ``repo_id_for_scan_target_url`` + hash the incremental queue already uses, so a catalog repo_id and a ScanJob + repo_id for the same URL agree. + """ + + repo_url: str + + +@runtime_checkable +class OrgRepoListProvider(Protocol): + """Injectable org repo-list source (the governance seam, FR-1/GATE 2). + + Mirrors the injectable shape of ``baseline.ghas_api.GhApiRunner``: a small + interface the reconcile depends on so tests inject a FIXTURE list and the + live ``gh api`` implementation is a one-liner added only once the governance + gate (GATE 2) clears. ``list_org_repos`` raises to signal a transient/total + fetch failure; the reconcile treats that as additive (keeps existing rows). + """ + + def list_org_repos(self) -> list[OrgRepo]: + """Return the org's repositories, or raise on fetch failure.""" + + +class GovernanceGatedOrgRepoListProvider: + """Default provider that REFUSES to fetch (GATE 2 governance seam). + + Wired as the CLI default so nothing calls live GitHub by accident. A live + org GET is gated to a human PR + must clear the autopilot + ``ghas-live-fetch-or-mutation-required`` stop-condition (design Open + Question / GATE 2). Until then, callers must inject a provider (a fixture in + tests; the GATE-2 live ``gh api`` one-liner in production). + """ + + def list_org_repos(self) -> list[OrgRepo]: + raise RuntimeError( + "live org repo-list fetch is governance-gated (FR-12/GATE 2): " + "inject an OrgRepoListProvider (fixture in tests, gated live " + "gh-api provider in production) instead of fetching here" + ) + + +class FixtureOrgRepoListProvider: + """In-memory provider over a fixed URL list (tests + local runs). + + The fixture substrate the M1 logic is exercised against while live org fetch + stays gated (GATE 2). Construct from repo URLs; raises nothing. + """ + + def __init__(self, repo_urls: Iterable[str]) -> None: + self._repos = [OrgRepo(repo_url=url) for url in repo_urls] + + def list_org_repos(self) -> list[OrgRepo]: + return list(self._repos) + + +@dataclass(frozen=True) +class ReconcileSummary: + """Accounting for one reconcile pass (FR-1). + + ``total`` is every CATALOG row after the pass (org size axis, the N in the + design's "org N 중 M 스캔됨"). ``excluded`` are opt-out rows kept with a + reason. ``added`` newly first-seen this pass; ``updated`` pre-existing rows + whose ``last_reconciled`` advanced. + """ + + added: int + updated: int + excluded: int + total: int + + +def run_catalog_reconcile( + store: CatalogStore, + org_list_provider: OrgRepoListProvider, + *, + opt_out: Iterable[str] = (), + now: dt.datetime, + excluded_reason: str = "opt-out", +) -> ReconcileSummary: + """Reconcile the org repo list into the CATALOG entity (FR-1, M1). + + Pulls the org list from the INJECTED provider, applies the ``opt_out`` + exclusion list (matched by repo_id, derived from each opt-out URL the same + way as a provider URL), and UPSERTS one ``CatalogEntry`` per org repo: + + - a repo not yet in CATALOG is added with ``first_seen = now`` and + ``included`` reflecting the opt-out list; + - a repo already in CATALOG keeps its original ``first_seen`` and advances + ``last_reconciled`` (opt-out membership is re-evaluated each pass so a + newly opted-out / opted-back-in repo flips ``included`` without losing + its history). + + Additive on transient failure: if ``list_org_repos`` raises, NO catalog row + is deleted; the error propagates so the timer wrapper can raise a coverage-gap + alert (design step 1: "영구 실패만 커버리지 갭 알림"). Repos that vanish from a + successful (full) list are likewise NOT deleted here — never silently drop a + repo from coverage; pruning a truly-removed repo is a deliberate later op, not + a side effect of one possibly-partial reconcile. + """ + now_iso = datetime_to_iso(now) + opt_out_ids = {repo_id_for_scan_target_url(url) for url in opt_out} + + # Snapshot existing rows once so first_seen is preserved without a per-repo + # round trip, and so an existing row's identity (added vs updated) is known. + existing = _existing_by_repo_id(store) + + added = 0 + updated = 0 + excluded = 0 + seen_ids: set[str] = set() + for repo in org_list_provider.list_org_repos(): + repo_id = repo_id_for_scan_target_url(repo.repo_url) + if repo_id in seen_ids: + # Provider returned the same repo twice (paginated dupes); the first + # occurrence already produced a row, so skip the duplicate. + continue + seen_ids.add(repo_id) + + is_excluded = repo_id in opt_out_ids + prior = existing.get(repo_id) + first_seen = prior.first_seen if prior is not None else now_iso + entry = CatalogEntry( + repo_id=repo_id, + repo_url=repo.repo_url, + included=not is_excluded, + first_seen=first_seen, + last_reconciled=now_iso, + excluded_reason=excluded_reason if is_excluded else None, + ) + store.put_catalog_entry(entry) + + if prior is None: + added += 1 + else: + updated += 1 + if is_excluded: + excluded += 1 + + # ``total`` is the org size axis (N): every catalog row after the pass, + # including rows that pre-existed but did not appear in this (possibly + # partial) list — they were NOT dropped (additive invariant). + total = len(set(existing) | seen_ids) + return ReconcileSummary( + added=added, updated=updated, excluded=excluded, total=total + ) + + +def compute_coverage_gap( + catalog_entries: Iterable[CatalogEntry], + covered_repo_ids: Iterable[str], +) -> int: + """Coverage gap = included org repos with no successful scan yet (M1→M5 seam). + + Precise definition, consistent with design "org N 중 M 스캔됨": of the + INCLUDED catalog repos (N minus opt-outs), how many have NOT yet been + covered — i.e. have no REPO_HEALTH record (have never successfully scanned). + ``covered_repo_ids`` is the set of repo_ids that DO have a REPO_HEALTH row. + + Opt-out (``included=False``) repos are excluded from the denominator: they are + intentionally not scanned, so they are not a coverage gap. The result feeds + M5's freshness evaluator ``coverage_gap`` seam, materializing into + ``BREACH_COUNTER.coverage_gap`` as "org repos not yet covered/scanned". + """ + covered = set(covered_repo_ids) + return sum( + 1 + for entry in catalog_entries + if entry.included and entry.repo_id not in covered + ) + + +def coverage_gap_from_store(store) -> int: + """Compute the coverage gap from a store that has CATALOG + REPO_HEALTH. + + Convenience wiring for the freshness timer / CLI: enumerates CATALOG (bounded + by org size) and the REPO_HEALTH ids, then applies ``compute_coverage_gap``. + The store must implement ``read_all_catalog_entries`` (CatalogStore) and + ``read_all_repo_health`` (RepoHealthStore). + """ + covered = {health.repo_id for health in store.read_all_repo_health()} + return compute_coverage_gap(store.read_all_catalog_entries(), covered) + + +def _existing_by_repo_id(store: CatalogStore) -> dict[str, CatalogEntry]: + return {entry.repo_id: entry for entry in store.read_all_catalog_entries()} diff --git a/src/security_scanner/runtime/finding_query.py b/src/security_scanner/runtime/finding_query.py index 19269ac..e8415f5 100644 --- a/src/security_scanner/runtime/finding_query.py +++ b/src/security_scanner/runtime/finding_query.py @@ -2,11 +2,15 @@ from __future__ import annotations +from collections.abc import Sequence from dataclasses import dataclass from pathlib import Path from typing import Callable -from security_scanner.core.finding.model import Finding +from security_scanner.core.finding.model import ( + Finding, + filter_findings_by_disposition, +) from security_scanner.storage.base import FindingReader from security_scanner.storage.adapters.nosql_db.transport import ( DynamoDbCompatibleConfig, @@ -25,6 +29,10 @@ class FindingQueryRequest: jsonl_path: str | Path | None = None scan_run_id: str | None = None dynamodb_config: DynamoDbCompatibleConfig | None = None + # Optional disposition filter (FR-11). When set, only findings whose + # disposition is in this set are returned; this is the read-API seam M7 + # consumes to power the dashboard's disposition filter. None = no filter. + dispositions: Sequence[str] | None = None def _reader_for_request( @@ -53,5 +61,9 @@ def read_findings( """Read findings for a query request through the explicit reader seam.""" reader = store or _reader_for_request(request, store_factory) if request.scan_run_id: - return reader.read_for_scan_run(request.scan_run_id) - return reader.read_all() + findings = reader.read_for_scan_run(request.scan_run_id) + else: + findings = reader.read_all() + if request.dispositions is not None: + findings = filter_findings_by_disposition(findings, request.dispositions) + return findings diff --git a/src/security_scanner/runtime/incremental_discovery.py b/src/security_scanner/runtime/incremental_discovery.py index 2b00df0..dcc8ed2 100644 --- a/src/security_scanner/runtime/incremental_discovery.py +++ b/src/security_scanner/runtime/incremental_discovery.py @@ -10,6 +10,14 @@ from typing import Callable, Protocol, Sequence from security_scanner.catalog.scan_target import ScanTarget +from security_scanner.runtime.poll_fetch import ( + FetchExecutor, + FetchOutcome, + FetchTask, + LsRemoteRunner, + SerialFetchExecutor, + decide_ls_remote_skip, +) from security_scanner.storage.adapters.nosql_db.items import ( SCAN_JOB_STATUS_PENDING, normalize_scan_target_url, @@ -17,6 +25,8 @@ scan_job_id_for, ) from security_scanner.storage.base import ( + JOB_TYPE_INCREMENTAL, + CatalogStore, IncrementalScanStore, RefState, ScanJob, @@ -27,8 +37,16 @@ DISCOVERY_MODE_INITIALIZE = "initialize" DISCOVERY_MODE_ENQUEUE = "enqueue" DEFAULT_REF_PATTERNS = ("refs/remotes/origin/*",) +# Incremental jobs dequeue ahead of baseline jobs. The queue sorts gsi1sk +# ascending on ``nextAttemptAt#priority#...`` (lower priority number = served +# first), so incremental MUST carry a LOWER numeric priority than baseline. See +# baseline.BASELINE_JOB_PRIORITY for the low-precedence counterpart (SC-3). DEFAULT_JOB_PRIORITY = 100 DEFAULT_MAX_ATTEMPTS = 3 +# Default ceiling on simultaneous in-flight fetches for the bounded pool (SC-6b). +# This is a LOGIC default only; the real pool size is a box-gate decision (do not +# invent a load-validated value) so it stays injectable per invocation. +DEFAULT_FETCH_CONCURRENCY = 8 class GitDiscoveryError(RuntimeError): @@ -134,6 +152,112 @@ def _run(self, repo_path: Path, args: Sequence[str]) -> str: return result.stdout +def catalog_repo_targets(store: CatalogStore) -> list[ScanTarget]: + """Return INCLUDED catalog repos as discovery targets (M1->M4 seam). + + incr-poll's repo set is the org CATALOG (M1), NOT the manual + ``targets.local.yaml`` manifest: the design's Data Flow step 2 says discovery + iterates the CATALOG. Opt-out rows (``included=False``) are skipped — they are + intentionally not scanned — so an excluded repo never enters discovery. The + catalog ``repo_url`` is mapped onto the same ``ScanTarget`` currency the + per-repo loop already speaks, keeping one clean seam: a catalog-fed run and a + legacy manifest-fed run share the identical downstream loop. + """ + return [ + ScanTarget(url=entry.repo_url, name=entry.repo_url, enabled=True) + for entry in store.read_all_catalog_entries() + if entry.included + ] + + +class CadenceOverrun(RuntimeError): + """Marker type so callers can distinguish an overrun from a fatal error.""" + + +@dataclass(frozen=True) +class PollCadenceSignal: + """Cadence-overrun signal for one poll cycle (SC-6d). + + The reviewer's SC-6d: a poll cycle that cannot keep up with its cadence must + ALERT, not silently fall behind (that silent fall-behind is exactly the + staleness failure mode #2 exists to prevent). This is an advisory SIGNAL + object — modelled on M5's ``RepoFreshnessBreach``/``BreachCounter`` evaluator + pattern — carrying whether the cycle overran and by how much. It does NOT wire + a sink; the sink is M9. A caller hands it to the existing notification-log / + ``on_overrun`` seam. + """ + + cycle_seconds: float + cadence_seconds: float + overrun: bool + targets: int + + @property + def overrun_seconds(self) -> float: + """Return how far the cycle ran past its cadence (0 when within budget).""" + return max(0.0, self.cycle_seconds - self.cadence_seconds) + + +def evaluate_poll_cadence( + *, + cycle_seconds: float, + cadence_seconds: float, + targets: int, +) -> PollCadenceSignal: + """Build a cadence-overrun signal for one poll cycle (SC-6d). + + Pure function (mirrors M5's ``evaluate_repo_freshness``): a cycle whose wall + time exceeds its cadence budget is flagged ``overrun=True`` so the caller + surfaces it via the alert seam instead of letting the poller silently fall + behind. ``cadence_seconds <= 0`` disables the check (an un-budgeted manual + run never reports overrun). + """ + overrun = cadence_seconds > 0 and cycle_seconds > cadence_seconds + return PollCadenceSignal( + cycle_seconds=cycle_seconds, + cadence_seconds=cadence_seconds, + overrun=overrun, + targets=targets, + ) + + +def alert_poll_cadence_overrun( + signal: PollCadenceSignal, + *, + notification_log_path: Path, + event_at: str, + notification_writer: Callable[[Path, dict], None] | None = None, +) -> bool: + """Push an overrun signal to the EXISTING notification-log seam (SC-6d). + + Mirrors M5's ``on_breaches`` hook: the evaluator produces a signal and this + thin adapter routes it to the existing notification-log alert seam (the same + append-only JSONL ``scan-all`` uses) rather than a brand-new sink — the + pluggable sink is M9. No-op (returns False) when the cycle did not overrun, so + a healthy cycle is silent and only a real fall-behind alerts. Returns True + when an alert record was written. + """ + if not signal.overrun: + return False + from security_scanner.runtime.notification_log import ( + cadence_overrun_record, + write_record, + ) + + writer = notification_writer or write_record + writer( + Path(notification_log_path), + cadence_overrun_record( + event_at=event_at, + cycle_seconds=signal.cycle_seconds, + cadence_seconds=signal.cadence_seconds, + overrun_seconds=signal.overrun_seconds, + targets=signal.targets, + ), + ) + return True + + @dataclass(frozen=True) class DiscoveryScannerConfig: """Scanner tuple used to dedupe discovery-created jobs.""" @@ -163,6 +287,10 @@ class IncrementalDiscoverySummary: jobs_enqueued: int = 0 ledger_skipped: int = 0 skipped_non_fast_forward: int = 0 + # SC-6a: repos whose ls-remote SHAs matched the REF_STATE cursor, so the + # fetch was skipped. The whole point of the skip pass is that this number is + # large (most repos are idle between polls) and ``fetch_ok`` is small. + skipped_idle: int = 0 @property def fetch_failed_count(self) -> int: @@ -177,46 +305,180 @@ def has_partial_failure(self) -> bool: @dataclass(frozen=True) class IncrementalDiscoveryRequest: - """Inputs for incremental discovery orchestration.""" + """Inputs for incremental discovery orchestration. + + The repo set is supplied via ``targets`` (the catalog-fed list from + ``catalog_repo_targets`` for incr-poll, or the legacy + ``store.list_scan_targets()`` manifest list). When ``targets`` is ``None`` the + request falls back to ``store.list_scan_targets()`` so pre-M4 manifest callers + keep working unchanged. + + ``ls_remote`` + ``fetch_executor`` + ``fetch_concurrency`` enable the SC-6 + poll path: when ``ls_remote`` is supplied, a skip pass probes remote SHAs and + only CHANGED repos are fetched, via a bounded-concurrency executor. When + ``ls_remote`` is ``None`` the legacy serial per-target fetch runs (every + target fetched), so old callers/tests are unaffected. + """ mode: str store: IncrementalScanStore fetch_repo: Callable[[str], Path] git: GitDiscovery scanner: DiscoveryScannerConfig + targets: Sequence[ScanTarget] | None = None max_targets: int | None = None ref_patterns: Sequence[str] = DEFAULT_REF_PATTERNS + ls_remote: LsRemoteRunner | None = None + fetch_executor: FetchExecutor = field(default_factory=SerialFetchExecutor) + fetch_concurrency: int = DEFAULT_FETCH_CONCURRENCY now_factory: Callable[[], dt.datetime] = lambda: dt.datetime.now(dt.UTC).replace( microsecond=0 ) +def _resolve_targets( + request: IncrementalDiscoveryRequest, +) -> list[ScanTarget]: + """Resolve the repo set: explicit ``targets`` else the legacy manifest list. + + incr-poll passes the catalog-fed ``targets`` (see ``catalog_repo_targets``); + pre-M4 manifest callers leave it ``None`` and fall back to + ``store.list_scan_targets()``. Either way only enabled targets are kept and + ``max_targets`` truncates the result. + """ + source = ( + list(request.targets) + if request.targets is not None + else request.store.list_scan_targets() + ) + targets = [target for target in source if target.enabled] + if request.max_targets is not None: + targets = targets[: request.max_targets] + return targets + + +def _cursor_shas_for( + request: IncrementalDiscoveryRequest, repo_id: str +) -> dict[str, str]: + """Read the REF_STATE cursor SHAs for one repo (ls-remote skip input). + + Returns ``{ref_name: last_seen_sha}`` for the refs the ls-remote skip pass + compares against. ``ref_patterns`` are typically GLOBS + (``refs/remotes/origin/*``), so we cannot resolve them to concrete ref names + via a point lookup. Instead we read EVERY stored ref state for the repo and + keep the ones whose concrete name matches ``ref_patterns`` (concrete names + and globs both, via :func:`_ref_matches_patterns`). A repo with no matching + cursor reads as "absent", which ``decide_ls_remote_skip`` treats as changed. + """ + cursor: dict[str, str] = {} + for state in request.store.list_ref_states(repo_id): + if _ref_matches_patterns(state.ref_name, request.ref_patterns): + cursor[state.ref_name] = state.last_seen_sha + return cursor + + +def _plan_fetches( + request: IncrementalDiscoveryRequest, + targets: Sequence[ScanTarget], +) -> tuple[list[ScanTarget], int]: + """ls-remote skip pass (SC-6a): split targets into fetch-now vs idle-skip. + + When no ``ls_remote`` runner is injected, EVERY target is fetched (legacy + behaviour) and the idle-skip count is 0. Otherwise each target is probed and + only CHANGED repos are returned for fetching; the rest are counted as + ``skipped_idle``. + """ + if request.ls_remote is None: + return list(targets), 0 + + to_fetch: list[ScanTarget] = [] + skipped_idle = 0 + for target in targets: + repo_id = repo_id_for_scan_target_url(target.url) + decision = decide_ls_remote_skip( + repo_id=repo_id, + repo_url=target.url, + ls_remote=request.ls_remote, + cursor_shas=_cursor_shas_for(request, repo_id), + patterns=request.ref_patterns, + ) + if decision.changed: + to_fetch.append(target) + else: + skipped_idle += 1 + return to_fetch, skipped_idle + + +def _run_fetches( + request: IncrementalDiscoveryRequest, + targets: Sequence[ScanTarget], +) -> tuple[list[tuple[ScanTarget, Path]], list[FetchFailure]]: + """Bounded concurrent fetch (SC-6b) returning (target, repo_path) pairs. + + Each target is fetched through ``request.fetch_repo`` inside the injected + bounded executor (``SerialFetchExecutor`` by default). Per-repo failures are + isolated into ``FetchFailure`` so one bad repo never aborts the cycle; the + fetched repo path is the poll-mirror path the ref-observation phase reads. + """ + by_url = {target.url: target for target in targets} + tasks = [ + FetchTask(repo_id=repo_id_for_scan_target_url(t.url), repo_url=t.url) + for t in targets + ] + + def _fetch_one(task: FetchTask) -> FetchOutcome: + repo_path = request.fetch_repo(task.repo_url) + return FetchOutcome( + repo_id=task.repo_id, + repo_url=task.repo_url, + ok=True, + repo_path=repo_path, + ) + + outcomes = request.fetch_executor.map_bounded( + _fetch_one, tasks, max(request.fetch_concurrency, 1) + ) + + fetched: list[tuple[ScanTarget, Path]] = [] + fetch_failed: list[FetchFailure] = [] + for outcome in outcomes: + target = by_url[outcome.repo_url] + if outcome.ok and outcome.repo_path is not None: + fetched.append((target, outcome.repo_path)) + else: + fetch_failed.append( + FetchFailure(target=target, error=outcome.error or "fetch failed") + ) + return fetched, fetch_failed + + def run_incremental_discovery( request: IncrementalDiscoveryRequest, ) -> IncrementalDiscoverySummary: - """Fetch enabled targets, observe refs, and optionally enqueue commit jobs.""" + """Fetch changed targets, observe refs, and optionally enqueue commit jobs. + + SC-6 poll path (when ``ls_remote`` is injected): an ls-remote skip pass drops + idle repos, then a bounded concurrent pool fetches only the changed ones into + the poll-mirror cache before refs are observed. The legacy path (no + ``ls_remote``) fetches every target serially, unchanged. + """ if request.mode not in {DISCOVERY_MODE_INITIALIZE, DISCOVERY_MODE_ENQUEUE}: raise ValueError(f"unsupported discovery mode: {request.mode}") - targets = [target for target in request.store.list_scan_targets() if target.enabled] - if request.max_targets is not None: - targets = targets[: request.max_targets] + targets = _resolve_targets(request) summary = IncrementalDiscoverySummary(targets=len(targets)) - fetch_ok = 0 fetch_failed: list[FetchFailure] = [] refs_observed = 0 jobs_enqueued = 0 ledger_skipped = 0 skipped_non_fast_forward = 0 - for target in targets: - try: - repo_path = request.fetch_repo(target.url) - except Exception as exc: # noqa: BLE001 - per-target failures are isolated. - fetch_failed.append(FetchFailure(target=target, error=str(exc))) - continue + to_fetch, skipped_idle = _plan_fetches(request, targets) + fetched, fetch_failed = _run_fetches(request, to_fetch) + fetch_ok = 0 + for target, repo_path in fetched: repo_id = repo_id_for_scan_target_url(target.url) repo_url = normalize_scan_target_url(target.url) try: @@ -289,6 +551,7 @@ def run_incremental_discovery( jobs_enqueued=jobs_enqueued, ledger_skipped=ledger_skipped, skipped_non_fast_forward=skipped_non_fast_forward, + skipped_idle=skipped_idle, ) @@ -343,6 +606,9 @@ def _scan_job_for_commit( scanner_config_hash=scanner.scanner_config_hash, priority=DEFAULT_JOB_PRIORITY, status=SCAN_JOB_STATUS_PENDING, + # Discovery enqueues INCREMENTAL jobs (SC-6): a completion advances + # lastSuccessfulIncrementalAt and these dequeue ahead of baseline. + job_type=JOB_TYPE_INCREMENTAL, attempts=0, max_attempts=DEFAULT_MAX_ATTEMPTS, worker_id=None, diff --git a/src/security_scanner/runtime/local_scan.py b/src/security_scanner/runtime/local_scan.py index ace98b8..9781286 100644 --- a/src/security_scanner/runtime/local_scan.py +++ b/src/security_scanner/runtime/local_scan.py @@ -5,20 +5,22 @@ import datetime as dt import subprocess import uuid +from collections.abc import Callable from dataclasses import dataclass, field from pathlib import Path -from typing import Callable, Protocol +from typing import Protocol from security_scanner.core.finding.model import Finding from security_scanner.runtime.branch_residual import finding_with_context from security_scanner.scanners.gitleaks.scanner import GitleaksScanner +from security_scanner.storage.adapters.nosql_db.items import repo_id_for_local_target from security_scanner.storage.adapters.nosql_db.transport import ( DynamoDbCompatibleConfig, ) from security_scanner.storage.base import ( + JOB_TYPE_BASELINE, + RepoHealthStore, ScanResultWriter, - ScanRunHealth, - ScanRunHealthStore, TargetScanResult, ) from security_scanner.storage.factory import create_finding_store @@ -30,7 +32,6 @@ ) from security_scanner.workspace.clone_manager import CloneError, LocalCloneManager - RULE_PACK_VERSION = "secret-rules-0.1.0" @@ -193,6 +194,7 @@ def run_local_scan( scanned = 0 total_findings = 0 target_results: list[LocalScanTargetResult] = [] + scanned_target_names: list[str] = [] for target in targets: try: @@ -247,6 +249,7 @@ def run_local_scan( ) scanned += 1 total_findings += len(findings) + scanned_target_names.append(target.name) target_results.append( LocalScanTargetResult( target_name=target.name, @@ -256,18 +259,19 @@ def run_local_scan( ) ) - # Record a freshness marker only when the run completed without raising, - # so a downstream publisher can fail closed once scans stop succeeding. - if isinstance(store, ScanRunHealthStore): - store.put_scan_run_health( - ScanRunHealth( - scan_run_id=scan_run_id, - completed_at_iso=scan_at_iso, - targets_total=len(targets), - targets_scanned=scanned, - findings_total=total_findings, + # Record PER-REPO freshness for each successfully-scanned target (SC-5), + # replacing the global SCAN_HEALTH singleton: a local full-batch scan is a + # full-scan completion, so advance each repo's ``lastSuccessfulFullScanAt`` + # via the store's attribute-scoped advancing-only conditional write. A single + # fresh repo can no longer mask a target that silently stopped scanning, + # because each repo carries its own timestamp. + if isinstance(store, RepoHealthStore): + for target_name in scanned_target_names: + store.advance_repo_health( + repo_id_for_local_target(target_name), + job_type=JOB_TYPE_BASELINE, + completed_at=scan_at_iso, ) - ) return LocalScanResult( manifest_path=request.manifest_path, diff --git a/src/security_scanner/runtime/notification_log.py b/src/security_scanner/runtime/notification_log.py index 41593ed..f7d1cdc 100644 --- a/src/security_scanner/runtime/notification_log.py +++ b/src/security_scanner/runtime/notification_log.py @@ -141,3 +141,28 @@ def fatal_error_record( "error": error, "stage": stage, } + + +def cadence_overrun_record( + *, + event_at: str, + cycle_seconds: float, + cadence_seconds: float, + overrun_seconds: float, + targets: int, +) -> dict[str, Any]: + """Build a `cadence_overrun` alert record for incr-poll (SC-6d). + + The poller cannot keep up with its cadence: the cycle ran longer than its + budget. This surfaces that as an ALERT on the existing notification-log seam + (not a new sink — the sink is M9) so the poller never silently falls behind, + which is exactly the silent-staleness failure mode #2 exists to prevent. + """ + return { + "type": "cadence_overrun", + "event_at": event_at, + "cycle_seconds": cycle_seconds, + "cadence_seconds": cadence_seconds, + "overrun_seconds": overrun_seconds, + "targets": targets, + } diff --git a/src/security_scanner/runtime/poll_fetch.py b/src/security_scanner/runtime/poll_fetch.py new file mode 100644 index 0000000..2284da9 --- /dev/null +++ b/src/security_scanner/runtime/poll_fetch.py @@ -0,0 +1,284 @@ +"""Poll-side fetch coordination for incr-poll (M4, SC-6). + +The reviewer's SC-6 finding: the naive incr-poll does 500 SERIAL +``git fetch --all --prune`` every poll cycle (infeasible at 500+ repos) and +shares the worker clone cache, so a poller fetch and a worker gitleaks scan can +touch the same ``.git`` concurrently (pack-file race). This module is the M4 +LOGIC fix for all four SC-6 sub-points: + + (a) **ls-remote skip** — before fetching a repo, an INJECTABLE ``LsRemoteRunner`` + reports the remote ref SHAs; when every observed ref already matches the + REF_STATE cursor the repo is IDLE and the fetch is SKIPPED. Most repos are + idle between polls, so this collapses the per-cycle work from "fetch all + 500" to "fetch only the handful that moved". + (b) **bounded concurrent fetch** — fetch is modelled as a bounded pool with an + INJECTABLE executor (``FetchExecutor``) instead of a serial loop, so the + real thread/process pool can be sized at the box gate while the LOGIC + (bounded concurrency, per-repo isolation, failure isolation) is tested + deterministically with a synchronous fake executor. + (c) **cache isolation** — the poller fetches into a SEPARATE poll-mirror cache + root, never the worker checkout root, so a poller fetch and a worker scan + never touch the same ``.git`` directory concurrently. The seam is a pair of + cache roots; the concrete mirror-vs-checkout sync (or a fetch-under-lease + alternative) is wired by the caller. + (d) cadence-overrun is surfaced as a SIGNAL by the discovery layer (see + ``incremental_discovery.evaluate_poll_cadence``), not silently dropped. + +The runners/executor mirror the existing injection pattern (``GitDiscovery``, +``GhApiRunner``, ``fetch_repo``): tests inject fakes, and the live subprocess +implementations are thin seams. No real network or git is invoked here when a +fake runner/executor is supplied. +""" + +from __future__ import annotations + +import subprocess +from collections.abc import Callable, Mapping, Sequence +from dataclasses import dataclass, field +from pathlib import Path +from typing import Protocol + + +class LsRemoteError(RuntimeError): + """Raised when an ls-remote probe fails for a repo (isolated per repo).""" + + +class LsRemoteRunner(Protocol): + """Injectable ``git ls-remote`` probe (SC-6a). + + Reports the current remote ref SHAs WITHOUT fetching objects, so the poller + can decide whether a repo moved before paying for a fetch. The production + implementation shells out to ``git ls-remote``; tests inject a fake that + returns a canned ref->sha map (no network). + """ + + def ls_remote( + self, repo_url: str, patterns: Sequence[str] + ) -> dict[str, str]: + """Return ``{ref_name: commit_sha}`` for refs matching ``patterns``.""" + + +class SubprocessLsRemoteRunner: + """``LsRemoteRunner`` backed by ``git ls-remote`` (live seam). + + ``git ls-remote ...`` lists remote refs and their object + names with no object transfer, which is exactly the cheap "did anything + move?" probe SC-6a needs. Patterns are passed through to the remote so the + server filters; the result is normalized to the same ``refs/remotes/origin/*`` + namespace REF_STATE uses so the SHAs are directly comparable to the cursor. + """ + + def ls_remote( + self, repo_url: str, patterns: Sequence[str] + ) -> dict[str, str]: + # ls-remote yields server refs (refs/heads/*, refs/tags/*); REF_STATE + # cursors live under refs/remotes/origin/*. We translate refs/heads/ + # to refs/remotes/origin/ so a server head SHA compares directly to + # the cursor a prior fetch recorded. + cmd = ["git", "ls-remote", repo_url, *patterns] + try: + result = subprocess.run( + cmd, check=True, capture_output=True, text=True + ) + except FileNotFoundError as exc: + raise LsRemoteError("git binary not found on PATH") from exc + except subprocess.CalledProcessError as exc: + detail = (exc.stderr or exc.stdout or "no process output").strip() + raise LsRemoteError( + f"git ls-remote failed with exit code {exc.returncode}: {detail}" + ) from exc + + refs: dict[str, str] = {} + for line in result.stdout.splitlines(): + if not line.strip(): + continue + sha, _, ref = line.partition("\t") + ref = ref.strip() + if not ref or ref.endswith("^{}"): + # Peeled tag lines (refs/tags/v1^{}) duplicate the tag; skip. + continue + mapped = _map_server_ref(ref) + if mapped is None: + continue + refs[mapped] = sha.strip() + return refs + + +def _map_server_ref(server_ref: str) -> str | None: + """Map a server ref (refs/heads/) to the origin remote namespace.""" + head_prefix = "refs/heads/" + if server_ref.startswith(head_prefix): + return "refs/remotes/origin/" + server_ref[len(head_prefix) :] + # HEAD and tags are not part of the origin remote-tracking namespace + # REF_STATE records; leave them out so the comparison stays apples-to-apples. + return None + + +@dataclass(frozen=True) +class CacheRoots: + """Cache isolation seam between poller fetch and worker scan (SC-6c). + + ``poll_mirror_root`` is where the poller fetches; ``worker_checkout_root`` is + where workers run gitleaks. They are DISTINCT directories so a poller fetch + and a worker scan never share a ``.git`` concurrently (the pack-file race the + reviewer flagged). The concrete mirror->checkout promotion (or a + fetch-under-repo-lease alternative) is a box-gated operational detail; this + object just makes the two roots an explicit, non-overlapping seam. + """ + + poll_mirror_root: Path + worker_checkout_root: Path + + def __post_init__(self) -> None: + if self.poll_mirror_root == self.worker_checkout_root: + raise ValueError( + "poll mirror and worker checkout roots must differ so a poller " + "fetch and a worker scan never touch the same .git (SC-6c)" + ) + + +@dataclass(frozen=True) +class FetchTask: + """One repo's fetch unit for the bounded pool.""" + + repo_id: str + repo_url: str + + +@dataclass(frozen=True) +class FetchOutcome: + """Result of one repo's fetch (success carries the poll-mirror path).""" + + repo_id: str + repo_url: str + ok: bool + repo_path: Path | None = None + error: str | None = None + + +FetchOne = Callable[[FetchTask], FetchOutcome] + + +class FetchExecutor(Protocol): + """Injectable bounded-concurrency executor (SC-6b). + + ``map_bounded`` applies ``fn`` to each task with AT MOST ``max_concurrency`` + in flight. The production executor is a real thread/process pool whose size + is set at the box gate; tests inject a synchronous fake that records the + observed concurrency so the bounded-pool LOGIC is verified without real + threads. Implementations MUST isolate per-task failures: a raising ``fn`` for + one task must not abort the others. + """ + + def map_bounded( + self, + fn: FetchOne, + tasks: Sequence[FetchTask], + max_concurrency: int, + ) -> list[FetchOutcome]: + """Run ``fn`` over ``tasks`` with bounded concurrency, preserving order.""" + + +class SerialFetchExecutor: + """Default in-process executor (one task at a time). + + A correct ``FetchExecutor`` that ignores ``max_concurrency`` and runs tasks + sequentially. It exists so callers and tests have a zero-dependency default; + the real bounded thread pool is dropped in at the box gate. Per-task failures + are isolated: a raising ``fn`` is converted into a failed ``FetchOutcome``. + """ + + def map_bounded( + self, + fn: FetchOne, + tasks: Sequence[FetchTask], + max_concurrency: int, + ) -> list[FetchOutcome]: + outcomes: list[FetchOutcome] = [] + for task in tasks: + outcomes.append(_run_one_isolated(fn, task)) + return outcomes + + +def _run_one_isolated(fn: FetchOne, task: FetchTask) -> FetchOutcome: + try: + return fn(task) + except Exception as exc: # noqa: BLE001 - isolate per-repo fetch failure. + return FetchOutcome( + repo_id=task.repo_id, + repo_url=task.repo_url, + ok=False, + error=str(exc), + ) + + +@dataclass(frozen=True) +class LsRemoteSkipDecision: + """Per-repo decision: does this repo need a fetch this cycle? (SC-6a).""" + + repo_id: str + repo_url: str + changed: bool + error: str | None = None + + +def decide_ls_remote_skip( + *, + repo_id: str, + repo_url: str, + ls_remote: LsRemoteRunner, + cursor_shas: Mapping[str, str], + patterns: Sequence[str], +) -> LsRemoteSkipDecision: + """Decide whether ``repo`` moved since its REF_STATE cursor (SC-6a). + + Probes the remote ref SHAs and compares them to ``cursor_shas`` (the + last-seen SHA per ref from REF_STATE). The repo is treated as CHANGED (needs + a fetch) when ANY observed ref SHA differs from the cursor, or a ref the + cursor knows is missing, or a brand-new ref appears. It is SKIPPED (idle) only + when every probed ref SHA matches the cursor exactly. A probe error is + fail-safe: it returns ``changed=True`` so a flaky ls-remote never silently + suppresses a fetch (we would rather over-fetch one repo than miss a change). + """ + try: + observed = ls_remote.ls_remote(repo_url, patterns) + except Exception as exc: # noqa: BLE001 - fail-safe: probe error => fetch. + return LsRemoteSkipDecision( + repo_id=repo_id, + repo_url=repo_url, + changed=True, + error=str(exc), + ) + + # A repo we have never observed (no cursor) is always "changed" so its + # initial fetch happens. Otherwise compare ref-by-ref over the UNION of + # cursor refs and observed refs: an added ref, a removed ref, or a moved SHA + # all count as changed. + if not cursor_shas: + return LsRemoteSkipDecision( + repo_id=repo_id, repo_url=repo_url, changed=True + ) + + all_refs = set(cursor_shas) | set(observed) + changed = any( + cursor_shas.get(ref) != observed.get(ref) for ref in all_refs + ) + return LsRemoteSkipDecision( + repo_id=repo_id, repo_url=repo_url, changed=changed + ) + + +@dataclass(frozen=True) +class PollFetchPlan: + """Outcome of the ls-remote skip pass before any fetch runs (SC-6a).""" + + to_fetch: list[FetchTask] = field(default_factory=list) + skipped_idle: list[str] = field(default_factory=list) + + @property + def fetch_count(self) -> int: + return len(self.to_fetch) + + @property + def skipped_count(self) -> int: + return len(self.skipped_idle) diff --git a/src/security_scanner/runtime/read_api.py b/src/security_scanner/runtime/read_api.py new file mode 100644 index 0000000..5e6bf06 --- /dev/null +++ b/src/security_scanner/runtime/read_api.py @@ -0,0 +1,465 @@ +"""Scanner read API query layer (FR-9, M7, design Data Flow step 7). + +A read-only surface the live admin dashboard (M8, separate sub-project) consumes. +It exposes four panels, each backed by a cost-bounded read so the always-on, +frequently-polled dashboard never pays an O(table) price: + + 1. findings (+ disposition filter) — reuses M6 ``read_findings`` / + ``FindingQueryRequest.dispositions``; DTOs are public-safe (no raw secret). + 2. freshness rollup — reads the materialized ``BREACH_COUNTER`` + (M5) O(1); never re-enumerates REPO_HEALTH per request (F5). + 3. coverage (org N / covered M) — from the CATALOG entity (M1), bounded by + org size (≤ N), off the dashboard hot path. + 4. queue backlog — per-status ``Select=COUNT`` over the + status GSI partitions (SC-7), O(status-partitions), NEVER a full-table Scan. + +Trust model (F9) +---------------- +This is an always-on service that exposes secret-finding metadata, so: + + - **Bind localhost/internal only.** Any HTTP exposure binds to + ``READ_API_DEFAULT_HOST`` (127.0.0.1) by default; binding to a routable + interface is an explicit, deploy-gated operator decision, not the default. + - **Payloads are public-safe / redacted.** Finding DTOs are built from + ``Finding.to_dict()``, which already strips raw secrets (only the salted + ``secretHash`` survives) and redacts the scanner-native gitleaks payload + (``to_redacted_dict`` nulls ``match``/``secret``). The read API additionally + projects findings onto a small fixed allowlist of fields + (:func:`finding_to_public_dto`) so a future, less-redacted snapshot field can + never leak through this surface by accident. + - **Authn.** The localhost bind IS the trust boundary for the M7 contract: the + admin UI (M8) runs co-located and is the only consumer. Real authentication + (and any non-localhost serving) is DEPLOY-GATED — see + :class:`ReadApiServerConfig` and :func:`build_read_api_wsgi_app`; M7 ships and + tests the query-layer CONTRACT, not a network-exposed authenticated server. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +from security_scanner.core.finding.model import Finding +from security_scanner.runtime.catalog_reconcile import coverage_gap_from_store +from security_scanner.runtime.finding_query import ( + FindingQueryRequest, + read_findings, +) +from security_scanner.storage.base import BreachCounter, CatalogEntry, QueueBacklog + +# Default bind host for any HTTP exposure of the read API (F9 trust model). The +# read API surfaces secret-finding metadata, so it binds to loopback only by +# default; exposing it on a routable interface is an explicit, deploy-gated +# operator choice (and requires real authn — see ReadApiServerConfig). +READ_API_DEFAULT_HOST = "127.0.0.1" + +# Public-safe finding fields the read API projects onto. An allowlist (not a +# denylist) so a future snapshot field added upstream cannot silently leak: it is +# simply absent from the DTO until deliberately added here. Every field below is +# already redaction-safe in ``Finding.to_dict()`` output. +_PUBLIC_FINDING_FIELDS = ( + "findingId", + "category", + "sourceTool", + "ruleId", + "severity", + "confidence", + "status", + "disposition", + "fingerprint", +) + + +# --------------------------------------------------------------------------- +# DTOs (public-safe; the M8 dashboard consumes these) +# --------------------------------------------------------------------------- + + +@dataclass(frozen=True) +class FindingSummaryDto: + """One public-safe finding row for the read API / dashboard (FR-9/FR-11). + + Carries identity + triage/disposition + location metadata, but NEVER the raw + secret or raw scanner match: those are stripped by ``Finding.to_dict()`` and + the field allowlist here. ``repo`` is the ``owner/repo`` full name (public), + ``file_path``/``line_start`` are location metadata an operator needs to find + and rotate the leak, not the secret itself. + """ + + finding_id: str + repo: str + rule_id: str + severity: str + confidence: str + status: str + disposition: str + file_path: str + line_start: int + secret_hash: str | None + + def to_dict(self) -> dict[str, Any]: + """Return the camelCase wire form for the dashboard.""" + return { + "findingId": self.finding_id, + "repo": self.repo, + "ruleId": self.rule_id, + "severity": self.severity, + "confidence": self.confidence, + "status": self.status, + "disposition": self.disposition, + "filePath": self.file_path, + "lineStart": self.line_start, + "secretHash": self.secret_hash, + } + + +@dataclass(frozen=True) +class FreshnessRollupDto: + """Freshness rollup panel read O(1) from the materialized BREACH_COUNTER (F5). + + Mirrors ``BreachCounter`` but is its own read-API DTO so the wire contract is + decoupled from the storage record. ``available`` is False when no evaluator + pass has run yet (no BREACH_COUNTER row) — the dashboard renders "not yet + evaluated" rather than a misleading all-zero "everything fresh". + """ + + available: bool + incremental_breaches: int = 0 + baseline_breaches: int = 0 + total_breaches: int = 0 + repos_evaluated: int = 0 + evaluated_at: str | None = None + coverage_gap: int | None = None + + def to_dict(self) -> dict[str, Any]: + return { + "available": self.available, + "incrementalBreaches": self.incremental_breaches, + "baselineBreaches": self.baseline_breaches, + "totalBreaches": self.total_breaches, + "reposEvaluated": self.repos_evaluated, + "evaluatedAt": self.evaluated_at, + "coverageGap": self.coverage_gap, + } + + +@dataclass(frozen=True) +class CoverageDto: + """Coverage panel: org N total / covered M / opt-out / gap (FR-9, from M1). + + ``org_total`` is every CATALOG row (the N in "org N 중 M 스캔됨"); ``included`` + is N minus opt-outs; ``covered`` is included repos that have a REPO_HEALTH row + (have scanned at least once); ``coverage_gap`` is included-but-not-covered. + Bounded by org size, computed off the dashboard hot path. + """ + + org_total: int + included: int + excluded: int + covered: int + coverage_gap: int + + def to_dict(self) -> dict[str, Any]: + return { + "orgTotal": self.org_total, + "included": self.included, + "excluded": self.excluded, + "covered": self.covered, + "coverageGap": self.coverage_gap, + } + + +@dataclass(frozen=True) +class QueueBacklogDto: + """Queue backlog panel computed O(status-partitions), never O(table) (SC-7).""" + + backlog: int + job_counts_by_status: dict[str, int] + + def to_dict(self) -> dict[str, Any]: + return { + "backlog": self.backlog, + "jobCountsByStatus": dict(self.job_counts_by_status), + } + + +# --------------------------------------------------------------------------- +# Public-safe finding projection +# --------------------------------------------------------------------------- + + +def finding_to_public_dto(finding: Finding) -> FindingSummaryDto: + """Project a Finding onto the public-safe read-API DTO (F9 redaction). + + Routes through ``Finding.to_dict()`` (raw secret already absent; only the + salted ``secretHash`` survives, gitleaks payload redacted) and then keeps only + the field allowlist. The location ``filePath``/``lineStart`` are operator + metadata (where to rotate), not the secret value. + """ + data = finding.to_dict() + public = {key: data[key] for key in _PUBLIC_FINDING_FIELDS if key in data} + evidence = data.get("evidence") or {} + location = data.get("location") or {} + return FindingSummaryDto( + finding_id=public.get("findingId", finding.finding_id), + repo=finding.repo.full_name, + rule_id=public.get("ruleId", finding.rule_id), + severity=public.get("severity", finding.severity), + confidence=public.get("confidence", finding.confidence), + status=public.get("status", finding.status), + disposition=public.get("disposition", finding.disposition), + file_path=location.get("filePath", finding.location.file_path), + line_start=int(location.get("lineStart", finding.location.line_start)), + secret_hash=evidence.get("secretHash"), + ) + + +# --------------------------------------------------------------------------- +# Read query layer (the four panels) +# --------------------------------------------------------------------------- + + +def read_findings_panel( + request: FindingQueryRequest, + *, + store: Any | None = None, + store_factory: Any | None = None, +) -> list[FindingSummaryDto]: + """Findings panel (FR-9/FR-11): findings sliced by disposition, redacted. + + Reuses M6's ``read_findings`` (which applies ``request.dispositions``) and + then projects each Finding to the public-safe DTO so the read API never + re-implements the filter and never emits a raw secret. + """ + kwargs: dict[str, Any] = {} + if store is not None: + kwargs["store"] = store + if store_factory is not None: + kwargs["store_factory"] = store_factory + findings = read_findings(request, **kwargs) + return [finding_to_public_dto(finding) for finding in findings] + + +def read_freshness_rollup(store: Any) -> FreshnessRollupDto: + """Freshness rollup panel (F5): read the materialized BREACH_COUNTER O(1). + + Consumes the M5 evaluator's materialized rollup via ``read_breach_counter``; + it does NOT enumerate REPO_HEALTH per request. When no evaluator pass has run + yet (no counter), returns ``available=False`` so the dashboard distinguishes + "not yet evaluated" from "everything fresh". + """ + counter: BreachCounter | None = store.read_breach_counter() + if counter is None: + return FreshnessRollupDto(available=False) + return FreshnessRollupDto( + available=True, + incremental_breaches=counter.incremental_breaches, + baseline_breaches=counter.baseline_breaches, + total_breaches=counter.total_breaches, + repos_evaluated=counter.repos_evaluated, + evaluated_at=counter.evaluated_at, + coverage_gap=counter.coverage_gap, + ) + + +def read_coverage(store: Any) -> CoverageDto: + """Coverage panel (FR-9, M1): org N / included / covered / gap. + + Enumerates CATALOG (bounded by org size) for the N axis and reuses M1's + ``coverage_gap_from_store`` for the included-but-not-covered count. Runs off + the dashboard hot path; not the per-poll panel. + """ + catalog: list[CatalogEntry] = list(store.read_all_catalog_entries()) + org_total = len(catalog) + included = sum(1 for entry in catalog if entry.included) + excluded = org_total - included + covered_ids = {health.repo_id for health in store.read_all_repo_health()} + covered = sum( + 1 for entry in catalog if entry.included and entry.repo_id in covered_ids + ) + return CoverageDto( + org_total=org_total, + included=included, + excluded=excluded, + covered=covered, + coverage_gap=coverage_gap_from_store(store), + ) + + +def read_queue_backlog_panel(store: Any) -> QueueBacklogDto: + """Queue backlog panel (SC-7): O(status-partitions), never a full-table Scan. + + Delegates to the store's ``read_queue_backlog`` (per-status ``Select=COUNT`` + over the status GSI partitions). This is the path the live dashboard polls; + it must not be the legacy ``get_queue_status`` full-table Scan. + """ + backlog: QueueBacklog = store.read_queue_backlog() + return QueueBacklogDto( + backlog=backlog.backlog, + job_counts_by_status=dict(backlog.job_counts_by_status), + ) + + +# --------------------------------------------------------------------------- +# Snapshot assembler (one dashboard refresh) +# --------------------------------------------------------------------------- + + +@dataclass(frozen=True) +class ReadApiSnapshot: + """One dashboard-refresh payload: the four panels assembled together (FR-9). + + ``findings`` is omitted from the cheap always-poll snapshot by default (it can + be O(findings)); the dashboard requests it on demand with a disposition + filter. The freshness/coverage/backlog panels are the cost-bounded live panels. + """ + + freshness: FreshnessRollupDto + coverage: CoverageDto + backlog: QueueBacklogDto + findings: list[FindingSummaryDto] | None = None + + def to_dict(self) -> dict[str, Any]: + payload: dict[str, Any] = { + "freshness": self.freshness.to_dict(), + "coverage": self.coverage.to_dict(), + "backlog": self.backlog.to_dict(), + } + if self.findings is not None: + payload["findings"] = [dto.to_dict() for dto in self.findings] + return payload + + +def read_dashboard_snapshot( + store: Any, + *, + findings_request: FindingQueryRequest | None = None, +) -> ReadApiSnapshot: + """Assemble one dashboard snapshot from the read query layer (FR-9). + + The three live panels (freshness/coverage/backlog) are always built from + cost-bounded reads (BREACH_COUNTER O(1), CATALOG ≤N, status COUNT + O(status-partitions)). ``findings`` is included only when a + ``findings_request`` is supplied, so the always-on poll stays cheap and the + findings panel is an explicit, disposition-filtered request. + """ + findings = ( + read_findings_panel(findings_request, store=store) + if findings_request is not None + else None + ) + return ReadApiSnapshot( + freshness=read_freshness_rollup(store), + coverage=read_coverage(store), + backlog=read_queue_backlog_panel(store), + findings=findings, + ) + + +# --------------------------------------------------------------------------- +# Trust model: deploy-gated HTTP wrapper (F9) +# --------------------------------------------------------------------------- + + +@dataclass(frozen=True) +class ReadApiServerConfig: + """Bind/authn config for any HTTP exposure of the read API (F9 trust model). + + Defaults are SAFE: ``host`` is loopback only. Two invariants are enforced by + :func:`validate_read_api_server_config` and must hold before a real socket is + bound (a DEPLOY-GATED step M7 does not perform): + + - binding to a non-loopback host REQUIRES ``require_auth=True`` (a routable + read API exposing secret-finding metadata must authenticate); + - the localhost-only default needs no authn because the loopback bind IS + the trust boundary and the co-located admin UI is the only consumer. + """ + + host: str = READ_API_DEFAULT_HOST + port: int = 8787 + require_auth: bool = False + + @property + def is_loopback(self) -> bool: + """Return whether the bind host is loopback-only.""" + return self.host in ("127.0.0.1", "::1", "localhost") + + +def validate_read_api_server_config(config: ReadApiServerConfig) -> None: + """Enforce the F9 trust invariants before any real bind (deploy-gated). + + Raises ``ValueError`` when a non-loopback bind is requested without authn: + serving secret-finding metadata on a routable interface unauthenticated is + exactly what the trust model forbids. The localhost default passes. + """ + if not config.is_loopback and not config.require_auth: + raise ValueError( + "read API refuses a non-loopback bind without authentication: " + f"host={config.host!r} exposes secret-finding metadata; set " + "require_auth=True (and wire real authn) for any routable bind. " + "Default localhost bind needs no authn (loopback is the boundary)." + ) + + +def build_read_api_wsgi_app(store: Any): + """Build a thin read-only WSGI app over the read query layer (F9). + + A deliberately minimal in-process wrapper so the SAME query-layer contract can + be served if/when a real bind is deploy-gated on. It does NOT bind a socket — + serving (and real authentication) is the deploy-gated step; this returns a + WSGI callable a test or a co-located UI can drive in-process. Routes: + + GET /freshness -> freshness rollup (BREACH_COUNTER O(1)) + GET /coverage -> coverage (CATALOG ≤N) + GET /backlog -> queue backlog (status COUNT, no Scan) + GET /snapshot -> the three live panels + + All responses are JSON built from the public-safe DTOs. + """ + import json + + routes = { + "/freshness": lambda: read_freshness_rollup(store).to_dict(), + "/coverage": lambda: read_coverage(store).to_dict(), + "/backlog": lambda: read_queue_backlog_panel(store).to_dict(), + "/snapshot": lambda: read_dashboard_snapshot(store).to_dict(), + } + + def app(environ: dict[str, Any], start_response): + if environ.get("REQUEST_METHOD", "GET") != "GET": + body = json.dumps({"error": "method not allowed"}).encode("utf-8") + start_response( + "405 Method Not Allowed", + [("Content-Type", "application/json")], + ) + return [body] + path = environ.get("PATH_INFO", "/") + handler = routes.get(path) + if handler is None: + body = json.dumps({"error": "not found", "path": path}).encode("utf-8") + start_response("404 Not Found", [("Content-Type", "application/json")]) + return [body] + body = json.dumps(handler()).encode("utf-8") + start_response("200 OK", [("Content-Type", "application/json")]) + return [body] + + return app + + +__all__ = [ + "READ_API_DEFAULT_HOST", + "CoverageDto", + "FindingSummaryDto", + "FreshnessRollupDto", + "QueueBacklogDto", + "ReadApiServerConfig", + "ReadApiSnapshot", + "build_read_api_wsgi_app", + "finding_to_public_dto", + "read_coverage", + "read_dashboard_snapshot", + "read_findings_panel", + "read_freshness_rollup", + "read_queue_backlog_panel", + "validate_read_api_server_config", +] diff --git a/src/security_scanner/runtime/scan_health.py b/src/security_scanner/runtime/scan_health.py index a54e461..ddf2af2 100644 --- a/src/security_scanner/runtime/scan_health.py +++ b/src/security_scanner/runtime/scan_health.py @@ -1,23 +1,213 @@ -"""Scan-run freshness gate runtime. +"""Per-repo freshness evaluation + scheduled breach rollup (FR-7/FR-8, M5). -Reads the latest scan-run health record (written by ``run_local_scan`` on a -successful run) and decides whether a downstream publisher may proceed. The -gate fails closed: a missing or stale record yields an ``ok=False`` verdict so -a publisher republishing the local DB cannot silently serve stale findings. +Replaces the global single-record freshness gate. Each repository owns a +``REPO_HEALTH`` record with two "last successful" timestamps; freshness is +evaluated PER REPO against two explicit thresholds so a single fresh repo can no +longer mask 499 stale ones (the silent-staleness bug). The scheduled evaluator +is the thing that DETECTS staleness on a timer — staleness is the *absence* of a +worker event, so it cannot be hung off worker writes — and it materializes a +``BREACH_COUNTER`` rollup the future read API reads O(1) (F5/SC-7). + +The legacy ``evaluate_scan_freshness`` global verdict is retained for any caller +still on the single-record path, but it is no longer the freshness source of +truth (see ``evaluate_repo_freshness`` / ``run_freshness_evaluator``). """ from __future__ import annotations import datetime as dt +from collections.abc import Sequence from dataclasses import dataclass -from security_scanner.storage.adapters.nosql_db.items import datetime_from_iso -from security_scanner.storage.base import ScanRunHealth +from security_scanner.storage.adapters.nosql_db.items import datetime_from_iso, now_iso +from security_scanner.storage.base import ( + BreachCounter, + RepoFreshnessBreach, + RepoHealth, + ScanRunHealth, +) + +# Default thresholds reuse the existing scan_health margin idiom: the legacy gate +# allowed 24h cadence + 2h margin (DEFAULT_MAX_AGE_HOURS=26.0 = 24h+2h). We keep +# that 2h margin and expose configurable cadence/margin via the same +# ``--max-age-hours``-style float-hours config surface. The CONCRETE cadence +# numbers (poll interval, baseline window) are load-gate decisions (do not invent +# load-validated values); these defaults are placeholders that stay configurable. +DEFAULT_MARGIN_HOURS = 2.0 +# Incremental poll cadence default: discovery polls ~5min soft (design Data Flow +# step 2). Kept generous (1h) so a normally-healthy repo is not flagged on a +# single missed poll; the load gate sets the real value. +DEFAULT_POLL_INTERVAL_HOURS = 1.0 +# Baseline full-scan cadence default: weekly (24h*7), mirroring the pre-scale +# weekly full scan. The load gate sets the real 500-repo window. +DEFAULT_BASELINE_CADENCE_HOURS = 24.0 * 7.0 + + +@dataclass(frozen=True) +class FreshnessThresholds: + """Two explicit per-repo freshness thresholds (FR-8/F4). + + ``incremental_max_age_hours`` = poll_interval + margin and + ``baseline_max_age_hours`` = baseline_cadence + margin. Both are expressed in + float hours so they share the existing ``--max-age-hours`` config surface. + """ + + incremental_max_age_hours: float + baseline_max_age_hours: float + + @classmethod + def from_cadences( + cls, + *, + poll_interval_hours: float = DEFAULT_POLL_INTERVAL_HOURS, + baseline_cadence_hours: float = DEFAULT_BASELINE_CADENCE_HOURS, + margin_hours: float = DEFAULT_MARGIN_HOURS, + ) -> FreshnessThresholds: + """Build thresholds from cadence + a shared margin (the 26h=24h+2h idiom).""" + return cls( + incremental_max_age_hours=poll_interval_hours + margin_hours, + baseline_max_age_hours=baseline_cadence_hours + margin_hours, + ) + + +def _age_hours(timestamp_iso: str | None, now: dt.datetime) -> float | None: + """Return age in hours of an ISO timestamp, or None when never recorded.""" + if timestamp_iso is None: + return None + return (now - datetime_from_iso(timestamp_iso)).total_seconds() / 3600.0 + + +def evaluate_repo_freshness( + health: RepoHealth, + *, + now: dt.datetime, + thresholds: FreshnessThresholds, +) -> RepoFreshnessBreach | None: + """Evaluate one repo against both thresholds (FR-8/F4). + + Concrete breach expressions (fail closed on a never-recorded class): + + incremental_breach = lastSuccessfulIncrementalAt is None + OR now - lastSuccessfulIncrementalAt + > poll_interval + margin + baseline_breach = lastSuccessfulFullScanAt is None + OR now - lastSuccessfulFullScanAt + > baseline_cadence + margin + + Returns ``None`` when the repo is fresh on BOTH classes, else a + ``RepoFreshnessBreach`` flagging which threshold(s) it crossed. + """ + incremental_age = _age_hours(health.last_successful_incremental_at, now) + baseline_age = _age_hours(health.last_successful_full_scan_at, now) + + incremental_breach = ( + incremental_age is None + or incremental_age > thresholds.incremental_max_age_hours + ) + baseline_breach = ( + baseline_age is None or baseline_age > thresholds.baseline_max_age_hours + ) + if not incremental_breach and not baseline_breach: + return None + return RepoFreshnessBreach( + repo_id=health.repo_id, + incremental=incremental_breach, + baseline=baseline_breach, + last_successful_incremental_at=health.last_successful_incremental_at, + last_successful_full_scan_at=health.last_successful_full_scan_at, + ) + + +@dataclass(frozen=True) +class FreshnessEvaluation: + """Outcome of one scheduled freshness-evaluator pass (FR-8/F3).""" + + counter: BreachCounter + breaches: list[RepoFreshnessBreach] + + +def evaluate_freshness_breaches( + health_records: Sequence[RepoHealth], + *, + now: dt.datetime, + thresholds: FreshnessThresholds, + coverage_gap: int | None = None, +) -> FreshnessEvaluation: + """Compute per-repo breaches and the materialized rollup (pure function). + + Enumerates the supplied REPO_HEALTH records, evaluates each against both + thresholds, and aggregates a ``BreachCounter``. ``coverage_gap`` is the + org-N-vs-covered-M half (M1 CATALOG); pass it through when known, else leave + ``None`` so the seam is explicit. No alert sink is invoked here — the breach + list is returned for a pluggable sink wired in M9. + """ + breaches: list[RepoFreshnessBreach] = [] + incremental_breaches = 0 + baseline_breaches = 0 + for health in health_records: + breach = evaluate_repo_freshness(health, now=now, thresholds=thresholds) + if breach is None: + continue + breaches.append(breach) + if breach.incremental: + incremental_breaches += 1 + if breach.baseline: + baseline_breaches += 1 + counter = BreachCounter( + incremental_breaches=incremental_breaches, + baseline_breaches=baseline_breaches, + total_breaches=len(breaches), + repos_evaluated=len(health_records), + evaluated_at=now_iso(), + coverage_gap=coverage_gap, + ) + return FreshnessEvaluation(counter=counter, breaches=breaches) + + +def run_freshness_evaluator( + store, + *, + now: dt.datetime, + thresholds: FreshnessThresholds, + coverage_gap: int | None = None, + on_breaches=None, +) -> FreshnessEvaluation: + """Scheduled freshness-evaluator operation (FR-8/F3/SC-7). + + Callable from a future timer (the design's ``freshness-eval`` timer). It is + the staleness DETECTOR: because staleness is the absence of a worker event, + it runs on a schedule and enumerates REPO_HEALTH itself rather than being + triggered by worker writes. It: + + 1. enumerates every REPO_HEALTH record, + 2. evaluates per-repo breaches against both thresholds, + 3. writes the materialized ``BREACH_COUNTER`` rollup (read API reads O(1)), + 4. hands the breach list to an optional ``on_breaches`` hook. + + ``on_breaches`` is a clean seam for the M9 alert sink; this function does NOT + implement the sink. ``coverage_gap`` is the M1 CATALOG seam (org N vs covered + M); pass it when CATALOG exists, else it stays ``None``. + """ + evaluation = evaluate_freshness_breaches( + store.read_all_repo_health(), + now=now, + thresholds=thresholds, + coverage_gap=coverage_gap, + ) + store.put_breach_counter(evaluation.counter) + if on_breaches is not None and evaluation.breaches: + on_breaches(evaluation.breaches) + return evaluation + + +# --------------------------------------------------------------------------- +# Legacy single-record gate (retained for back-compat; no longer SoT). +# --------------------------------------------------------------------------- @dataclass(frozen=True) class ScanFreshnessVerdict: - """Outcome of a scan-run freshness check.""" + """Outcome of a (legacy) global scan-run freshness check.""" ok: bool message: str @@ -29,10 +219,12 @@ def evaluate_scan_freshness( now: dt.datetime, max_age_hours: float, ) -> ScanFreshnessVerdict: - """Return whether the latest scan run is fresh enough to trust. + """Legacy global freshness verdict over a single ScanRunHealth record. - Fails closed when no record exists or the newest record is older than - ``max_age_hours``. + Retained so callers on the pre-M5 single-record path keep working, but this + is NO LONGER the freshness source of truth: a single global timestamp can + mask stale repos, which is exactly what per-repo ``evaluate_repo_freshness`` + fixes. Fails closed on a missing or stale record. """ if latest is None: return ScanFreshnessVerdict( @@ -61,3 +253,20 @@ def evaluate_scan_freshness( f"targets) [run {latest.scan_run_id}]" ), ) + + +# Re-export for callers importing thresholds/breach types from the runtime module. +__all__ = [ + "DEFAULT_BASELINE_CADENCE_HOURS", + "DEFAULT_MARGIN_HOURS", + "DEFAULT_POLL_INTERVAL_HOURS", + "FreshnessThresholds", + "FreshnessEvaluation", + "RepoFreshnessBreach", + "RepoHealth", + "ScanFreshnessVerdict", + "evaluate_freshness_breaches", + "evaluate_repo_freshness", + "evaluate_scan_freshness", + "run_freshness_evaluator", +] diff --git a/src/security_scanner/runtime/scan_worker.py b/src/security_scanner/runtime/scan_worker.py index e647aac..979bcd4 100644 --- a/src/security_scanner/runtime/scan_worker.py +++ b/src/security_scanner/runtime/scan_worker.py @@ -5,9 +5,10 @@ import datetime as dt import time import uuid +from collections.abc import Callable from dataclasses import dataclass from pathlib import Path -from typing import Callable, Protocol +from typing import Protocol from security_scanner.core.finding.model import Finding from security_scanner.core.scan.options import ScanOptions @@ -22,7 +23,6 @@ ScanLedgerEntry, ) - DEFAULT_LEASE_SECONDS = 300 DEFAULT_RETRY_DELAY_SECONDS = 60 @@ -99,16 +99,21 @@ def run_scan_worker_once(request: ScanWorkerRequest) -> ScanWorkerSummary: findings=[], ledger=_ledger_for_job(job, scanned_at=now, finding_count=0), ) + _advance_repo_health(request, job, completed_at=now) completed += 1 continue - if not request.store.acquire_repo_lease( + repo_fence = request.store.acquire_repo_lease( job.repo_id, worker_id, request.lease_seconds, - ): + ) + if not repo_fence: + # FR-6 skip-bug fix: one unavailable repo must NOT stop the whole + # invocation. Return just this job to pending and CONTINUE to the + # next job so the N-process worker pool keeps draining work. request.store.return_job_to_pending(job.job_id, "repo lease unavailable") - break + continue try: repo_path = request.fetch_repo(job.repo_url) @@ -125,9 +130,7 @@ def run_scan_worker_once(request: ScanWorkerRequest) -> ScanWorkerSummary: ) branch = branch_from_ref(job.ref_name) findings = [ - finding_with_context( - finding, commit=job.commit_sha, branch=branch - ) + finding_with_context(finding, commit=job.commit_sha, branch=branch) for finding in findings ] scanned_at = _now(request) @@ -140,20 +143,28 @@ def run_scan_worker_once(request: ScanWorkerRequest) -> ScanWorkerSummary: finding_count=len(findings), ), ) + _advance_repo_health(request, job, completed_at=scanned_at) completed += 1 except Exception as exc: # noqa: BLE001 - scanner/runtime failure is retryable until exhausted. if job.attempts + 1 >= job.max_attempts: dead_lettered += 1 else: retryable += 1 + # Fence the failure write with the leased job's worker_id+fence so a + # reaped/slow original worker cannot stamp a failure over a job the + # reaper already returned to another worker (FR-6/SC-2). request.store.record_retryable_failure( job.job_id, error=str(exc), next_attempt_at=_now(request) + dt.timedelta(seconds=request.retry_delay_seconds), + worker_id=worker_id, + fence=job.fence, ) finally: - request.store.release_repo_lease(job.repo_id, worker_id) + # Release only the repo lease this worker still owns: the repo fence + # rejects a stale release after the lease was reaped + re-acquired. + request.store.release_repo_lease(job.repo_id, worker_id, fence=repo_fence) return ScanWorkerSummary( leased=leased_count, @@ -223,6 +234,23 @@ def make_default_scanner() -> GitleaksScanner: return GitleaksScanner() +def _advance_repo_health( + request: ScanWorkerRequest, job: ScanJob, *, completed_at: dt.datetime +) -> None: + """Advance per-repo freshness after a successful completion (FR-7/SC-5). + + Keyed by ``job.job_type`` so an incremental completion advances the + incremental field and a baseline completion the full-scan field, via the + store's attribute-scoped advancing-only conditional write. Guarded with + ``hasattr`` so a minimal store fake that predates REPO_HEALTH still runs the + worker; a real store always implements it. + """ + advance = getattr(request.store, "advance_repo_health", None) + if advance is None: + return + advance(job.repo_id, job_type=job.job_type, completed_at=completed_at) + + def _scan_run_id_for_job(job: ScanJob) -> str: return f"scan_run_{job.job_id}" @@ -247,8 +275,6 @@ def _ledger_for_job( ) - - def _now(request: ScanWorkerRequest) -> dt.datetime: value = request.now_factory() if value.tzinfo is None: diff --git a/src/security_scanner/storage/adapters/nosql_db/access.py b/src/security_scanner/storage/adapters/nosql_db/access.py index cda7def..59fb105 100644 --- a/src/security_scanner/storage/adapters/nosql_db/access.py +++ b/src/security_scanner/storage/adapters/nosql_db/access.py @@ -194,3 +194,28 @@ def query_all_pages( next_key = response.get("LastEvaluatedKey") if next_key is None: return items[:limit] if limit is not None else items + + +def query_count_all_pages(table: Any, **query_args: Any) -> int: + """Return the server-side ``Count`` of a ``Select=COUNT`` query (SC-7). + + Issues a ``Select="COUNT"`` query that returns only the per-page item count, + never the item payloads, and sums it across pages. This is the read-API queue + backlog primitive: counting one status GSI partition is O(matching keys on + that partition) on the server and transfers no item bodies — it is NOT a + full-table ``Scan`` and never materializes the SCAN_JOB rows. The caller fans + this out across the (bounded) set of status partitions so the backlog read is + O(status-partitions), independent of total table size. + """ + total = 0 + next_key: dict[str, Any] | None = None + while True: + page_args = dict(query_args) + page_args["Select"] = "COUNT" + if next_key is not None: + page_args["ExclusiveStartKey"] = next_key + response = table.query(**page_args) + total += int(response.get("Count", 0)) + next_key = response.get("LastEvaluatedKey") + if next_key is None: + return total diff --git a/src/security_scanner/storage/adapters/nosql_db/items.py b/src/security_scanner/storage/adapters/nosql_db/items.py index dc8ddde..c515f8e 100644 --- a/src/security_scanner/storage/adapters/nosql_db/items.py +++ b/src/security_scanner/storage/adapters/nosql_db/items.py @@ -25,9 +25,13 @@ repo_axis_projection_for_item, ) from security_scanner.storage.base import ( + JOB_TYPE_INCREMENTAL, + BreachCounter, + CatalogEntry, FindingStateEvent, GhasAlertRecord, RefState, + RepoHealth, RepoLease, ScanJob, ScanLedgerEntry, @@ -79,6 +83,61 @@ class ScanRunSummary: # _secret_evidence_link_pk (not the removed GSI2 key). SECRET_EVIDENCE_LIST_PK = "SECRET_EVIDENCE#ALL" SCAN_HEALTH_PK = "SCAN_HEALTH" +# Per-repo freshness (FR-7): each repo owns one REPO_HEALTH#/META row, +# replacing the SCAN_HEALTH singleton's single-partition hot-spot. +REPO_HEALTH_SK = "META" +# Materialized freshness rollup the evaluator writes and the read API reads O(1) +# (F5/SC-7). Singleton row; the evaluator overwrites it each timer tick. +BREACH_COUNTER_PK = "BREACH_COUNTER" +BREACH_COUNTER_SK = "META" +# Org catalog membership (FR-1, M1): one CATALOG#/META row per org repo. +# The shared org repo list axis for gitleaks coverage and any future GHAS +# reconcile; reconciled on a timer from an injectable org-list provider. +CATALOG_SK = "META" +# Alert de-dup / re-notify state (FR-8 alerts, M9): one ALERT_STATE# row +# per repo, with one SK per alert kind, holding the last-alerted ISO timestamp. +# The freshness-evaluator's de-dup policy reads/writes this so a persistently +# stale repo neither spams every cycle nor goes silent. Idempotent overwrite. +ALERT_STATE_SK_PREFIX = "ALERT_KIND#" +# DynamoDB attribute names for the two per-repo freshness timestamps. Shared by +# the item mapping and the attribute-scoped conditional UpdateItem so the write +# condition and the stored field can never drift apart. +REPO_HEALTH_INCREMENTAL_ATTR = "lastSuccessfulIncrementalAt" +REPO_HEALTH_FULL_SCAN_ATTR = "lastSuccessfulFullScanAt" + + +def repo_health_pk(repo_id: str) -> str: + """Return the single-table partition key for one repo's freshness row.""" + return f"REPO_HEALTH#{repo_id}" + + +def catalog_pk(repo_id: str) -> str: + """Return the single-table partition key for one repo's catalog row.""" + return f"CATALOG#{repo_id}" + + +def alert_state_pk(repo_id: str) -> str: + """Return the partition key for one repo's alert de-dup state (M9).""" + return f"ALERT_STATE#{repo_id}" + + +def alert_state_sk(kind: str) -> str: + """Return the per-kind sort key for one repo's alert de-dup state (M9).""" + return f"{ALERT_STATE_SK_PREFIX}{kind}" + + +def repo_health_freshness_attr(job_type: str) -> str: + """Return the freshness attribute a completion of ``job_type`` advances. + + Anything other than the explicit baseline class advances the incremental + field; this keeps the default (and every legacy job decoded as incremental) + on the incremental timestamp without a second branch. + """ + from security_scanner.storage.base import JOB_TYPE_BASELINE + + if job_type == JOB_TYPE_BASELINE: + return REPO_HEALTH_FULL_SCAN_ATTR + return REPO_HEALTH_INCREMENTAL_ATTR def now_iso() -> str: @@ -129,6 +188,17 @@ def repo_id_for_scan_target_url(url: str) -> str: return f"repo_{digest}" +def repo_id_for_local_target(target_name: str) -> str: + """Return the per-repo freshness ID for a local (manifest) scan target. + + Local full-batch scans key on the manifest target name (they have no URL). + Each scanned target gets its OWN REPO_HEALTH row so the local-scan writer + advances per-repo freshness instead of one global record (SC-5). + """ + digest = hashlib.sha256(target_name.encode("utf-8")).hexdigest()[:24] + return f"repo_local_{digest}" + + def scan_job_id_for( *, repo_id: str, @@ -281,6 +351,100 @@ def scan_run_health_from_item(item: dict[str, Any]) -> ScanRunHealth: ) +def repo_health_to_item(record: RepoHealth) -> dict[str, Any]: + """Map a per-repo freshness record into its REPO_HEALTH#/META item. + + Only non-None timestamps are written (``without_none``), so a brand-new repo + that has only ever had an incremental success carries no full-scan attribute + — which is exactly what the ``attribute_not_exists`` arm of the conditional + UpdateItem keys on. + """ + return without_none( + { + "PK": repo_health_pk(record.repo_id), + "SK": REPO_HEALTH_SK, + "entityType": "REPO_HEALTH", + "repoId": record.repo_id, + REPO_HEALTH_INCREMENTAL_ATTR: record.last_successful_incremental_at, + REPO_HEALTH_FULL_SCAN_ATTR: record.last_successful_full_scan_at, + } + ) + + +def repo_health_from_item(item: dict[str, Any]) -> RepoHealth: + """Reconstruct a per-repo freshness record from a table item.""" + return RepoHealth( + repo_id=item["repoId"], + last_successful_incremental_at=item.get(REPO_HEALTH_INCREMENTAL_ATTR), + last_successful_full_scan_at=item.get(REPO_HEALTH_FULL_SCAN_ATTR), + ) + + +def breach_counter_to_item(counter: BreachCounter) -> dict[str, Any]: + """Map the materialized freshness rollup into its singleton item.""" + return without_none( + { + "PK": BREACH_COUNTER_PK, + "SK": BREACH_COUNTER_SK, + "entityType": "BREACH_COUNTER", + "incrementalBreaches": counter.incremental_breaches, + "baselineBreaches": counter.baseline_breaches, + "totalBreaches": counter.total_breaches, + "reposEvaluated": counter.repos_evaluated, + "evaluatedAt": counter.evaluated_at, + "coverageGap": counter.coverage_gap, + } + ) + + +def breach_counter_from_item(item: dict[str, Any]) -> BreachCounter: + """Reconstruct the materialized freshness rollup from a table item.""" + coverage_gap = item.get("coverageGap") + return BreachCounter( + incremental_breaches=int(item.get("incrementalBreaches", 0)), + baseline_breaches=int(item.get("baselineBreaches", 0)), + total_breaches=int(item.get("totalBreaches", 0)), + repos_evaluated=int(item.get("reposEvaluated", 0)), + evaluated_at=item["evaluatedAt"], + coverage_gap=int(coverage_gap) if coverage_gap is not None else None, + ) + + +def catalog_entry_to_item(entry: CatalogEntry) -> dict[str, Any]: + """Map one org catalog membership row into the NoSQL item shape (FR-1, M1). + + ``excludedReason`` is only present for opt-out repos (``without_none`` drops + it when None), so an included repo carries no reason attribute. ``firstSeen`` + is written verbatim from the (reconcile-merged) entry — the reconcile, not the + item mapping, preserves the original first_seen across passes. + """ + return without_none( + { + "PK": catalog_pk(entry.repo_id), + "SK": CATALOG_SK, + "entityType": "CATALOG", + "repoId": entry.repo_id, + "repoUrl": entry.repo_url, + "included": entry.included, + "excludedReason": entry.excluded_reason, + "firstSeen": entry.first_seen, + "lastReconciled": entry.last_reconciled, + } + ) + + +def catalog_entry_from_item(item: dict[str, Any]) -> CatalogEntry: + """Reconstruct an org catalog membership row from a table item.""" + return CatalogEntry( + repo_id=item["repoId"], + repo_url=item["repoUrl"], + included=bool(item.get("included", True)), + first_seen=item["firstSeen"], + last_reconciled=item["lastReconciled"], + excluded_reason=item.get("excludedReason"), + ) + + def ghas_alert_to_item(alert: GhasAlertRecord) -> dict[str, Any]: """Map redacted GHAS alert metadata into the NoSQL item shape.""" fetched_at = datetime_to_iso(alert.fetched_at) @@ -492,6 +656,7 @@ def scan_job_to_item(job: ScanJob) -> dict[str, Any]: "scannerConfigHash": job.scanner_config_hash, "priority": job.priority, "status": job.status, + "jobType": job.job_type, "attempts": job.attempts, "maxAttempts": job.max_attempts, "workerId": job.worker_id, @@ -500,6 +665,9 @@ def scan_job_to_item(job: ScanJob) -> dict[str, Any]: "createdAt": created_at, "updatedAt": updated_at, "lastError": job.last_error, + "fence": job.fence, + "leaseExpiryCount": job.lease_expiry_count, + "maxLeaseExpiries": job.max_lease_expiries, } ) # Hot queue partitions (pending/leased) are sharded (D1); cold terminal @@ -532,6 +700,9 @@ def scan_job_from_item(item: dict[str, Any]) -> ScanJob: scanner_config_hash=item["scannerConfigHash"], priority=int(item["priority"]), status=item["status"], + # Pre-M5 job items carry no jobType; decode them as incremental so they + # advance the incremental freshness field, matching how they were enqueued. + job_type=item.get("jobType", JOB_TYPE_INCREMENTAL), attempts=int(item.get("attempts", 0)), max_attempts=int(item.get("maxAttempts", 3)), worker_id=item.get("workerId"), @@ -540,6 +711,10 @@ def scan_job_from_item(item: dict[str, Any]) -> ScanJob: created_at=datetime_from_iso(item["createdAt"]), updated_at=datetime_from_iso(item["updatedAt"]), last_error=item.get("lastError"), + # fence/lease-expiry counters default for items written pre-fencing. + fence=int(item.get("fence", 0)), + lease_expiry_count=int(item.get("leaseExpiryCount", 0)), + max_lease_expiries=int(item.get("maxLeaseExpiries", 5)), ) @@ -605,6 +780,7 @@ def repo_lease_to_item(lease: RepoLease) -> dict[str, Any]: "workerId": lease.worker_id, "leaseUntil": lease_until, "updatedAt": updated_at, + "fence": lease.fence, } @@ -615,6 +791,8 @@ def repo_lease_from_item(item: dict[str, Any]) -> RepoLease: worker_id=item["workerId"], lease_until=datetime_from_iso(item["leaseUntil"]), updated_at=datetime_from_iso(item["updatedAt"]), + # default for leases written before fencing existed. + fence=int(item.get("fence", 0)), ) diff --git a/src/security_scanner/storage/adapters/nosql_db/store.py b/src/security_scanner/storage/adapters/nosql_db/store.py index 3e4de46..7b34a10 100644 --- a/src/security_scanner/storage/adapters/nosql_db/store.py +++ b/src/security_scanner/storage/adapters/nosql_db/store.py @@ -3,8 +3,10 @@ from __future__ import annotations import datetime as dt +import hashlib +import random from collections import Counter -from collections.abc import Iterable, Sequence +from collections.abc import Callable, Iterable, Sequence from dataclasses import replace from typing import Any @@ -24,9 +26,18 @@ items_to_state_events, merge_finding_states, query_all_pages, + query_count_all_pages, scan_all_pages, ) +from security_scanner.storage.adapters.nosql_db.axis_core import ( + axis_shard, + bucket_width, +) from security_scanner.storage.adapters.nosql_db.items import ( + BREACH_COUNTER_PK, + BREACH_COUNTER_SK, + CATALOG_SK, + REPO_HEALTH_SK, SCAN_HEALTH_PK, SCAN_JOB_STATUS_COMPLETED, SCAN_JOB_STATUS_DEAD_LETTER, @@ -35,6 +46,13 @@ STATE_SCOPE_GLOBAL, RepoMetadata, ScanRunSummary, + alert_state_pk, + alert_state_sk, + breach_counter_from_item, + breach_counter_to_item, + catalog_entry_from_item, + catalog_entry_to_item, + catalog_pk, counts_by_category, datetime_to_iso, finding_state_event_to_item, @@ -43,6 +61,10 @@ new_state_event_seq, ref_state_from_item, ref_state_to_item, + repo_health_freshness_attr, + repo_health_from_item, + repo_health_pk, + repo_lease_from_item, repo_lease_to_item, repo_metadata_to_item, scan_job_from_item, @@ -61,6 +83,7 @@ SCAN_DATE_AXIS, SCAN_JOB_AXIS, TARGET_LIST_AXIS, + sharded_list_axis_pk, ) from security_scanner.storage.adapters.nosql_db.list_axis_reader import ( read_list_axis, @@ -74,10 +97,15 @@ make_boto3_resource_and_client, ) from security_scanner.storage.base import ( + BreachCounter, + CatalogEntry, FindingStateEvent, GhasAlertRecord, + QueueBacklog, QueueStatus, + ReapSummary, RefState, + RepoHealth, RepoLease, ScanJob, ScanLedgerEntry, @@ -93,6 +121,23 @@ {Status.RESOLVED.value, Status.FALSE_POSITIVE.value, Status.IGNORED.value} ) +# Bounded ordered dequeue head-window size (FR-5/SC-1). Small constant so each +# poll reads at most K per status shard instead of the whole partition. Sized a +# few multiples of N workers so concurrent workers have distinct head candidates +# to spread across (SC-4) without over-reading. Tunable; load gate confirms. +DEFAULT_DEQUEUE_WINDOW = 32 + +# Jittered CAS-loss backoff bounds (SC-4). Kept tiny: the goal is to desynchronize +# N workers that lost the same head, not to throttle throughput. +_CAS_BACKOFF_MIN_SECONDS = 0.0 +_CAS_BACKOFF_MAX_SECONDS = 0.05 + +# Exponential starvation backoff for a job whose lease keeps expiring (SC-8). The +# nextAttemptAt push grows with lease_expiry_count so a poison job stops +# head-of-line-blocking; capped so a transiently-slow repo recovers in bounded time. +_STARVATION_BACKOFF_BASE_SECONDS = 30 +_STARVATION_BACKOFF_CAP_SECONDS = 3600 + class DynamoDbCompatibleFindingStore: """FindingStore implementation backed by a DynamoDB-compatible endpoint.""" @@ -168,6 +213,168 @@ def read_latest_scan_run_health(self) -> ScanRunHealth | None: return None return max(records, key=lambda record: record.completed_at_iso) + def advance_repo_health( + self, + repo_id: str, + *, + job_type: str, + completed_at: str | dt.datetime, + ) -> None: + """Advance one repo's freshness field with a conditional UpdateItem. + + Attribute-scoped (SC-5): the update touches ONLY the one timestamp + attribute for this ``job_type`` (incremental vs baseline), so a + concurrent incremental + baseline completion can never clobber each + other's field. The condition ``attribute_not_exists(#at) OR :t > #at`` + makes the write advancing-only, so an out-of-order (older) completion is + a conditional no-op and cannot regress a newer timestamp — and the + per-commit write amplification from one-job-per-commit discovery is a + cheap rejected CAS when not newer. + """ + completed_iso = _decided_at_iso(completed_at) + attr = repo_health_freshness_attr(job_type) + try: + self._table.update_item( + Key={"PK": repo_health_pk(repo_id), "SK": REPO_HEALTH_SK}, + UpdateExpression="SET #at = :t, entityType = :etype, repoId = :rid", + ConditionExpression="attribute_not_exists(#at) OR :t > #at", + ExpressionAttributeNames={"#at": attr}, + ExpressionAttributeValues={ + ":t": completed_iso, + ":etype": "REPO_HEALTH", + ":rid": repo_id, + }, + ) + except Exception as exc: + # An older or equal timestamp loses the advancing-only CAS: that is + # the intended monotonic no-op, not an error. + if _is_conditional_check_failure(exc): + return + raise + + def read_repo_health(self, repo_id: str) -> RepoHealth | None: + response = self._table.get_item( + Key={"PK": repo_health_pk(repo_id), "SK": REPO_HEALTH_SK} + ) + item = response.get("Item") + if not item or item.get("entityType") != "REPO_HEALTH": + return None + return repo_health_from_item(item) + + def read_repo_health_batch( + self, repo_ids: Iterable[str] + ) -> dict[str, RepoHealth]: + """Batch-read repo freshness rows, chunked at the 100-key BatchGet cap.""" + unique_ids = list(dict.fromkeys(repo_ids)) + out: dict[str, RepoHealth] = {} + for start in range(0, len(unique_ids), 100): + chunk = unique_ids[start : start + 100] + keys = [ + {"PK": repo_health_pk(repo_id), "SK": REPO_HEALTH_SK} + for repo_id in chunk + ] + request_items = {self.config.table_name: {"Keys": keys}} + while request_items: + response = self._resource.batch_get_item(RequestItems=request_items) + for item in response.get("Responses", {}).get( + self.config.table_name, [] + ): + if item.get("entityType") == "REPO_HEALTH": + record = repo_health_from_item(item) + out[record.repo_id] = record + request_items = response.get("UnprocessedKeys", {}) + return out + + def read_all_repo_health(self) -> list[RepoHealth]: + """Enumerate every REPO_HEALTH row (scheduled-evaluator full scan).""" + items = scan_all_pages( + self._table, + FilterExpression="entityType = :entity_type", + ExpressionAttributeValues={":entity_type": "REPO_HEALTH"}, + ) + return [repo_health_from_item(item) for item in items] + + def put_breach_counter(self, counter: BreachCounter) -> None: + """Overwrite the materialized freshness rollup (evaluator-owned).""" + self._table.put_item(Item=breach_counter_to_item(counter)) + + def read_breach_counter(self) -> BreachCounter | None: + """Read the materialized freshness rollup O(1) (read-API path, SC-7).""" + response = self._table.get_item( + Key={"PK": BREACH_COUNTER_PK, "SK": BREACH_COUNTER_SK} + ) + item = response.get("Item") + if not item or item.get("entityType") != "BREACH_COUNTER": + return None + return breach_counter_from_item(item) + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + """Return the last-alerted ISO timestamp for ``(repo, kind)`` (M9). + + The de-dup/re-notify substrate: the freshness-evaluator reads this to + decide whether a repeat alert for the same ``(repo, kind)`` is within the + re-notify window (suppress) or past it (re-fire). Absent row => never + alerted => the alert fires. + """ + response = self._table.get_item( + Key={"PK": alert_state_pk(repo_id), "SK": alert_state_sk(kind)} + ) + item = response.get("Item") + if not item or item.get("entityType") != "ALERT_STATE": + return None + return item.get("alertedAt") + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + """Record that ``(repo, kind)`` alerted at ``alerted_at`` (ISO) (M9). + + Idempotent overwrite (last-writer-wins on the one-row identity), like the + BREACH_COUNTER materialize: the dispatcher owns the cadence decision, the + store just persists the most recent alert time per ``(repo, kind)``. + """ + self._table.put_item( + Item={ + "PK": alert_state_pk(repo_id), + "SK": alert_state_sk(kind), + "entityType": "ALERT_STATE", + "repoId": repo_id, + "alertKind": kind, + "alertedAt": alerted_at, + } + ) + + def put_catalog_entry(self, entry: CatalogEntry) -> None: + """Upsert one CATALOG row (FR-1, M1). + + Plain put: the reconcile operation has already merged first_seen and + last_reconciled, so the whole row is replaced last-writer-wins. The + reconcile, not the store, owns the additive first_seen-preserving merge. + """ + self._table.put_item(Item=catalog_entry_to_item(entry)) + + def read_catalog_entry(self, repo_id: str) -> CatalogEntry | None: + """Read one repository's catalog membership row (FR-1, M1).""" + response = self._table.get_item( + Key={"PK": catalog_pk(repo_id), "SK": CATALOG_SK} + ) + item = response.get("Item") + if not item or item.get("entityType") != "CATALOG": + return None + return catalog_entry_from_item(item) + + def read_all_catalog_entries(self) -> list[CatalogEntry]: + """Enumerate every CATALOG row (bounded by org size, reconcile-only). + + Enumeration is bounded by org size (≤ N) and runs on the reconcile / + coverage timer, not the read API hot path, so a filtered scan is fine + (the read API reads the materialized BREACH_COUNTER instead). + """ + items = scan_all_pages( + self._table, + FilterExpression="entityType = :entity_type", + ExpressionAttributeValues={":entity_type": "CATALOG"}, + ) + return [catalog_entry_from_item(item) for item in items] + def put_scan_target(self, target: ScanTarget) -> None: self._table.put_item(Item=scan_target_to_item(target)) @@ -244,20 +451,41 @@ def lease_next_scan_job( now: dt.datetime, *, include_legacy: bool = False, + dequeue_window: int = DEFAULT_DEQUEUE_WINDOW, + rng: random.Random | None = None, + jitter_sleep: Callable[[float], None] | None = None, ) -> ScanJob | None: + """Bounded ordered dequeue + CAS lease (FR-5/SC-1/SC-4). + + Replaces the prior full-partition read+sort. Each sharded status + partition (pending, leased) is read with a per-shard ``Limit`` and + k-way merged into the global ordered head window (``read_list_axis_ordered`` + ascending on ``nextAttemptAt#priority#createdAt#jobId``); only that + bounded window of candidates is materialized, never the whole partition. + + Contention spread (SC-4): rather than every worker CAS-racing the single + FIFO head, each worker first drains its preferred shard + (``worker_id``-derived) within the window and only steals other shards + when its own is empty; a CAS loss randomizes the remaining order and + applies a small jittered backoff. + """ now = _ensure_utc(now) - candidates = [ - *self._read_scan_jobs_by_status( - SCAN_JOB_STATUS_PENDING, include_legacy=include_legacy - ), - *self._read_scan_jobs_by_status( - SCAN_JOB_STATUS_LEASED, include_legacy=include_legacy - ), - ] - candidates.sort(key=_scan_job_lease_sort_key) - for job in candidates: - if not _scan_job_is_lease_eligible(job, now): - continue + rng = rng or random.Random() + sleep = jitter_sleep or (lambda _seconds: None) + window = max(dequeue_window, 1) + + candidates = self._read_lease_candidate_window( + now=now, window=window, include_legacy=include_legacy + ) + eligible = [job for job in candidates if _scan_job_is_lease_eligible(job, now)] + if not eligible: + return None + + ordered = self._spread_candidates(eligible, worker_id) + contended = False + for job in ordered: + if contended: + sleep(_jitter_backoff_seconds(rng)) leased = self._try_lease_scan_job( job=job, worker_id=worker_id, @@ -266,8 +494,58 @@ def lease_next_scan_job( ) if leased is not None: return leased + # CAS loss: another worker took this candidate. Randomize the rest + # so N workers don't all retry the same next head (thundering herd). + contended = True + rng.shuffle(ordered) return None + def _read_lease_candidate_window( + self, + *, + now: dt.datetime, + window: int, + include_legacy: bool, + ) -> list[ScanJob]: + """Return the bounded global head window across pending+leased shards. + + Each status partition is read ordered+limited (``read_list_axis_ordered`` + ascending), then the two status streams are merged and truncated to + ``window`` by the same lease sort key the GSI sort key encodes. The read + is bounded by ``window`` per status, never a full-partition scan. + """ + pending = self._read_scan_jobs_by_status_bounded( + SCAN_JOB_STATUS_PENDING, limit=window, include_legacy=include_legacy + ) + leased = self._read_scan_jobs_by_status_bounded( + SCAN_JOB_STATUS_LEASED, limit=window, include_legacy=include_legacy + ) + merged = [*pending, *leased] + merged.sort(key=_scan_job_lease_sort_key) + return merged[:window] + + @staticmethod + def _spread_candidates(eligible: list[ScanJob], worker_id: str) -> list[ScanJob]: + """Order the window so a worker drains its preferred shard first (SC-4). + + Jobs are partitioned by their SCAN_JOB_AXIS shard; the worker's preferred + shard (``hash(worker_id) % shard_count``) is drained ahead of the rest so + concurrent workers spread their CAS attempts across distinct head + candidates instead of all racing the single FIFO head. Within each group + the FIFO (lease-sort) order is preserved, so fairness/priority hold. + """ + shard_count = SCAN_JOB_AXIS.shard_count + preferred = ( + int(hashlib.sha256(worker_id.encode("utf-8")).hexdigest(), 16) % shard_count + ) + preferred_bucket = f"{preferred:0{bucket_width(shard_count)}d}" + mine: list[ScanJob] = [] + others: list[ScanJob] = [] + for job in eligible: + bucket = axis_shard(job.job_id, shard_count=shard_count) + (mine if bucket == preferred_bucket else others).append(job) + return [*mine, *others] + def complete_processed_job( self, job: ScanJob, @@ -287,17 +565,25 @@ def complete_processed_job( updated_at=ledger.scanned_at, last_error=None, ) + # Fence the completion on the LEASED job's worker_id + fence (FR-6, + # SC-2): a reaped/slow original worker carries a stale fence and is + # rejected, so it cannot stamp a stale completion over a job another + # worker already re-ran. ``:completed`` keeps re-completion idempotent. try: self._table.put_item( Item=scan_job_to_item(completed_job), ConditionExpression=( "attribute_exists(PK) AND attribute_exists(SK) AND " - "(#status = :leased OR #status = :completed)" + "(#status = :completed OR " + "(#status = :leased AND workerId = :worker_id " + "AND fence = :fence))" ), ExpressionAttributeNames={"#status": "status"}, ExpressionAttributeValues={ ":leased": SCAN_JOB_STATUS_LEASED, ":completed": SCAN_JOB_STATUS_COMPLETED, + ":worker_id": job.worker_id, + ":fence": job.fence, }, ) except Exception as exc: @@ -310,10 +596,27 @@ def record_retryable_failure( job_id: str, error: str, next_attempt_at: dt.datetime, + *, + worker_id: str | None = None, + fence: int | None = None, ) -> None: job = self._get_scan_job(job_id) if job is None: return + # Fence the failure write on the leased worker_id+fence (FR-6): a stale + # worker whose lease was reaped and re-leased to another must not mutate + # the job the new holder now owns. None worker_id/fence keeps the legacy + # unfenced path for callers that do not carry a lease identity. + if ( + worker_id is not None + and fence is not None + and not ( + job.status == SCAN_JOB_STATUS_LEASED + and job.worker_id == worker_id + and job.fence == fence + ) + ): + return attempts = job.attempts + 1 now = _now() if attempts >= job.max_attempts: @@ -374,13 +677,23 @@ def acquire_repo_lease( repo_id: str, worker_id: str, lease_seconds: int, - ) -> bool: + ) -> int | None: + """Acquire (or steal an expired) repo lease and return the fence token. + + The fence is monotonic: it is the prior holder's fence + 1, so every + (re)acquire mints a strictly larger token. ``release_repo_lease`` CASes + on this fence so a reaped worker cannot delete the new holder's lease. + Returns ``None`` when a live worker still holds the lease. + """ now = _now() + existing = self._get_repo_lease(repo_id) + fence = (existing.fence + 1) if existing is not None else 1 lease = RepoLease( repo_id=repo_id, worker_id=worker_id, lease_until=now + dt.timedelta(seconds=lease_seconds), updated_at=now, + fence=fence, ) try: self._table.put_item( @@ -388,24 +701,41 @@ def acquire_repo_lease( ConditionExpression="attribute_not_exists(PK) OR leaseUntil <= :now", ExpressionAttributeValues={":now": datetime_to_iso(now)}, ) - return True + return fence except Exception as exc: if _is_conditional_check_failure(exc): - return False + return None raise - def release_repo_lease(self, repo_id: str, worker_id: str) -> None: + def release_repo_lease( + self, repo_id: str, worker_id: str, *, fence: int | None = None + ) -> None: + if fence is None: + condition = "workerId = :worker_id" + values: dict[str, Any] = {":worker_id": worker_id} + else: + condition = "workerId = :worker_id AND fence = :fence" + values = {":worker_id": worker_id, ":fence": fence} try: self._table.delete_item( Key={"PK": f"REPO_LEASE#{repo_id}", "SK": "META"}, - ConditionExpression="workerId = :worker_id", - ExpressionAttributeValues={":worker_id": worker_id}, + ConditionExpression=condition, + ExpressionAttributeValues=values, ) except Exception as exc: if _is_conditional_check_failure(exc): return raise + def _get_repo_lease(self, repo_id: str) -> RepoLease | None: + response = self._table.get_item( + Key={"PK": f"REPO_LEASE#{repo_id}", "SK": "META"} + ) + item = response.get("Item") + if not item or item.get("entityType") != "REPO_LEASE": + return None + return repo_lease_from_item(item) + def get_queue_status(self, now: dt.datetime) -> QueueStatus: now = _ensure_utc(now) job_items = scan_all_pages( @@ -437,6 +767,70 @@ def get_queue_status(self, now: dt.datetime) -> QueueStatus: expired_repo_leases=expired_repo_leases, ) + def count_scan_jobs_by_status(self, status: str) -> int: + """Count jobs in one status WITHOUT reading their item bodies (SC-7). + + The read-API backlog path. Uses ``Select=COUNT`` over the GSI1 status + axis so the server returns only a count, never the SCAN_JOB rows: + + - hot statuses (``pending``/``leased``) are sharded across + ``SCAN_JOB_AXIS``; fan a per-shard ``Select=COUNT`` query over the + (small, fixed) shard set and sum — O(shard_count) queries, each + O(matching keys on that shard partition), no item transfer; + - cold terminal statuses (``completed``/``dead_letter``) live on the + single unsharded ``SCAN_JOB_STATUS#`` GSI1 partition, counted + with one ``Select=COUNT`` query. + + This never calls ``scan()``; it is the O(status-partitions) replacement + for the legacy ``get_queue_status`` full-table Scan the live dashboard + used to poll. + """ + if status in (SCAN_JOB_STATUS_PENDING, SCAN_JOB_STATUS_LEASED): + width = bucket_width(SCAN_JOB_AXIS.shard_count) + partition_root = f"SCAN_JOB_STATUS#{status}" + total = 0 + for bucket in range(SCAN_JOB_AXIS.shard_count): + partition = sharded_list_axis_pk( + SCAN_JOB_AXIS, partition_root, f"{bucket:0{width}d}" + ) + total += query_count_all_pages( + self._table, + IndexName=GSI1_NAME, + KeyConditionExpression=f"{SCAN_JOB_AXIS.gsi_pk_field} = :pk", + ExpressionAttributeValues={":pk": partition}, + ) + return total + return query_count_all_pages( + self._table, + IndexName=GSI1_NAME, + KeyConditionExpression="gsi1pk = :pk", + ExpressionAttributeValues={":pk": f"SCAN_JOB_STATUS#{status}"}, + ) + + def read_queue_backlog(self) -> QueueBacklog: + """Return queue backlog counts O(status-partitions), never O(table) (SC-7). + + The always-on read-API/dashboard backlog path. Counts each known status + via ``count_scan_jobs_by_status`` (per-status ``Select=COUNT``), so the + whole read is bounded by the status set times the fixed shard count and + transfers no SCAN_JOB item bodies — unlike ``get_queue_status``, it issues + zero full-table ``Scan`` calls. ``backlog`` is the actionable depth + (pending + leased) the dashboard headlines. + """ + counts = { + status: self.count_scan_jobs_by_status(status) + for status in ( + SCAN_JOB_STATUS_PENDING, + SCAN_JOB_STATUS_LEASED, + SCAN_JOB_STATUS_COMPLETED, + SCAN_JOB_STATUS_DEAD_LETTER, + ) + } + backlog = ( + counts[SCAN_JOB_STATUS_PENDING] + counts[SCAN_JOB_STATUS_LEASED] + ) + return QueueBacklog(job_counts_by_status=counts, backlog=backlog) + def write_scan_result(self, result: TargetScanResult) -> None: findings = list(result.findings) self.extend(findings) @@ -832,6 +1226,141 @@ def _read_scan_jobs_by_status( ) return items_to_scan_jobs(items) + def _read_scan_jobs_by_status_bounded( + self, status: str, *, limit: int, include_legacy: bool = False + ) -> list[ScanJob]: + """Read at most ``limit`` head jobs of one sharded status (FR-5/SC-1). + + Uses the EXISTING ``read_list_axis_ordered`` k-way merge over the GSI1 + status axis with ``ScanIndexForward`` ascending (oldest-available first) + and a per-shard ``Limit``, so the read is bounded to the head window + instead of the whole partition. No new GSI: the gsi1sk sort key + ``nextAttemptAt#priority#createdAt#jobId`` already encodes FIFO order. + """ + items = read_list_axis_ordered( + self._table, + spec=SCAN_JOB_AXIS, + partition_root=f"SCAN_JOB_STATUS#{status}", + gsi_sk_prefix=None, + limit=limit, + descending=False, + include_legacy=include_legacy, + ) + return items_to_scan_jobs(items) + + def reap_expired_leases(self, now: dt.datetime) -> ReapSummary: + """Reclaim expired job + repo leases on a timer (FR-6 lease-reaper). + + - Expired LEASED jobs (``leaseUntil <= now``) are returned to pending + with their fence bumped (fencing out the slow/crashed prior holder so + its late completion is rejected) and ``lease_expiry_count`` raised. + A job past ``max_lease_expiries`` is dead-lettered instead of looping + forever (SC-8); otherwise ``nextAttemptAt`` is pushed by exponential + starvation backoff so it stops head-of-line-blocking the queue. + - Expired repo leases (``leaseUntil <= now``) are deleted so a crashed + worker cannot strand a repo. Each write is conditional, so a worker + that renews/completes between the read and the write simply wins the + CAS and the reaper no-ops for that item ("expired" reclaim is distinct + from normal completion). + """ + now = _ensure_utc(now) + returned = dead_lettered = repo_released = 0 + + for job in self._read_scan_jobs_by_status(SCAN_JOB_STATUS_LEASED): + if job.lease_until is None or job.lease_until > now: + continue + if self._reap_expired_job(job, now=now): + if job.lease_expiry_count + 1 >= job.max_lease_expiries: + dead_lettered += 1 + else: + returned += 1 + + lease_items = scan_all_pages( + self._table, + FilterExpression="entityType = :entity_type", + ExpressionAttributeValues={":entity_type": "REPO_LEASE"}, + ) + for lease in items_to_repo_leases(lease_items): + if lease.lease_until > now: + continue + if self._reap_expired_repo_lease(lease, now=now): + repo_released += 1 + + return ReapSummary( + jobs_returned_to_pending=returned, + jobs_dead_lettered=dead_lettered, + repo_leases_released=repo_released, + ) + + def _reap_expired_job(self, job: ScanJob, *, now: dt.datetime) -> bool: + """Fence-bump an expired leased job back to pending (or dead-letter it). + + CASes on the still-leased, still-expired, still-same-fence row so a job + the original worker renews or completes first is left untouched. + """ + expiry_count = job.lease_expiry_count + 1 + if expiry_count >= job.max_lease_expiries: + updated = replace( + job, + status=SCAN_JOB_STATUS_DEAD_LETTER, + worker_id=None, + lease_until=None, + fence=job.fence + 1, + lease_expiry_count=expiry_count, + updated_at=now, + last_error="lease expired beyond max_lease_expiries", + ) + else: + updated = replace( + job, + status=SCAN_JOB_STATUS_PENDING, + worker_id=None, + lease_until=None, + fence=job.fence + 1, + lease_expiry_count=expiry_count, + next_attempt_at=now + _starvation_backoff(expiry_count), + updated_at=now, + last_error="lease expired; reclaimed by reaper", + ) + try: + self._table.put_item( + Item=scan_job_to_item(updated), + ConditionExpression=( + "#status = :leased AND leaseUntil <= :now AND fence = :fence" + ), + ExpressionAttributeNames={"#status": "status"}, + ExpressionAttributeValues={ + ":leased": SCAN_JOB_STATUS_LEASED, + ":now": datetime_to_iso(now), + ":fence": job.fence, + }, + ) + return True + except Exception as exc: + if _is_conditional_check_failure(exc): + return False + raise + + def _reap_expired_repo_lease(self, lease: RepoLease, *, now: dt.datetime) -> bool: + """Delete an expired repo lease iff it is still the same expired holder.""" + try: + self._table.delete_item( + Key={"PK": f"REPO_LEASE#{lease.repo_id}", "SK": "META"}, + ConditionExpression=( + "leaseUntil <= :now AND workerId = :worker_id AND fence = :fence" + ), + ExpressionAttributeValues={ + ":now": datetime_to_iso(now), + ":worker_id": lease.worker_id, + ":fence": lease.fence, + }, + ) + return True + except Exception as exc: + if _is_conditional_check_failure(exc): + return False + raise + def _try_lease_scan_job( self, *, @@ -840,12 +1369,27 @@ def _try_lease_scan_job( lease_seconds: int, now: dt.datetime, ) -> ScanJob | None: + # Mint a strictly larger fence on every (re)lease so a previous holder's + # completion/failure write (carrying the old fence) is rejected — the + # double-scan + stale-completion guard (FR-6/SC-2). Reclaiming an EXPIRED + # lease here bumps lease_expiry_count (SC-8) so a job whose lease keeps + # expiring is backed off and finally dead-lettered, never head-of-line + # blocking; a fresh pending lease leaves the counter untouched. + is_expiry_reclaim = ( + job.status == SCAN_JOB_STATUS_LEASED and job.lease_until is not None + ) leased = replace( job, status=SCAN_JOB_STATUS_LEASED, worker_id=worker_id, lease_until=now + dt.timedelta(seconds=lease_seconds), updated_at=now, + fence=job.fence + 1, + lease_expiry_count=( + job.lease_expiry_count + 1 + if is_expiry_reclaim + else job.lease_expiry_count + ), ) try: self._table.put_item( @@ -963,6 +1507,30 @@ def _now() -> dt.datetime: return dt.datetime.now(dt.timezone.utc).replace(microsecond=0) +def _jitter_backoff_seconds(rng: random.Random) -> float: + """Return a small uniform jitter to desynchronize CAS-loss retries (SC-4).""" + return rng.uniform(_CAS_BACKOFF_MIN_SECONDS, _CAS_BACKOFF_MAX_SECONDS) + + +def _starvation_backoff(lease_expiry_count: int) -> dt.timedelta: + """Return the exponential nextAttemptAt push for a repeatedly-expiring job. + + The FIRST expiry (count == 1) gets NO push: a single expiry is usually a + normal crash/restart and a healthy worker should reclaim it immediately. The + push only kicks in on REPEATED expiry (count >= 2) and doubles each time + (capped), so a poison job stops head-of-line blocking the FIFO head while a + transiently slow repo still recovers quickly (SC-8). + """ + if lease_expiry_count <= 1: + return dt.timedelta(0) + exponent = lease_expiry_count - 2 + seconds = min( + _STARVATION_BACKOFF_BASE_SECONDS * (2**exponent), + _STARVATION_BACKOFF_CAP_SECONDS, + ) + return dt.timedelta(seconds=seconds) + + def _scan_job_is_lease_eligible(job: ScanJob, now: dt.datetime) -> bool: """Return whether a job can be leased at this instant.""" if job.status == SCAN_JOB_STATUS_PENDING: diff --git a/src/security_scanner/storage/base.py b/src/security_scanner/storage/base.py index ce28568..00f55c9 100644 --- a/src/security_scanner/storage/base.py +++ b/src/security_scanner/storage/base.py @@ -11,6 +11,12 @@ from security_scanner.catalog.scan_target import ScanTarget from security_scanner.core.finding.model import Finding +# Job classes that decide which per-repo freshness field a completion advances +# (FR-7/SC-5). Kept as plain string constants so a ScanJob.job_type written by +# discovery/baseline enqueue and decoded from a table item share one vocabulary. +JOB_TYPE_INCREMENTAL = "incremental" +JOB_TYPE_BASELINE = "baseline" + @dataclass(frozen=True) class TargetScanResult: @@ -102,6 +108,24 @@ class ScanJob: created_at: dt.datetime updated_at: dt.datetime last_error: str | None = None + # Job class that decides WHICH per-repo freshness field a successful + # completion advances (FR-7/SC-5): an ``incremental`` completion advances + # ``lastSuccessfulIncrementalAt``, a ``baseline`` completion advances + # ``lastSuccessfulFullScanAt``. Discovery enqueues incremental jobs today; + # the baseline enqueue path (M4) sets ``baseline``. Defaults to + # ``incremental`` so every pre-existing job/item decodes unchanged. + job_type: str = JOB_TYPE_INCREMENTAL + # Monotonic fencing token (FR-6/SC-2). Bumped on each (re)lease and on + # reaper reclaim so a reaped/slow original worker's late completion or + # retryable-failure write is rejected by a workerId+fence CAS. Defaults to + # 0 for items written before fencing existed (backward-compatible decode). + fence: int = 0 + # Lease-expiry count is SEPARATE from attempts (SC-8). A healthy long scan + # whose lease expired is reclaimed without consuming a failure attempt; a + # job whose lease keeps expiring is backed off and finally dead-lettered on + # max_lease_expiries so it cannot head-of-line-block the queue forever. + lease_expiry_count: int = 0 + max_lease_expiries: int = 5 @property def ledger_key(self) -> ScanLedgerKey: @@ -124,6 +148,11 @@ class RepoLease: worker_id: str lease_until: dt.datetime updated_at: dt.datetime + # Monotonic fencing token (FR-6/SC-2). Incremented on every (re)acquire and + # on reaper reclaim. release_repo_lease CASes on workerId+fence so a worker + # that already lost the lease (reaped/expired and re-acquired by another) + # cannot delete the new holder's lease. Defaults to 0 for legacy items. + fence: int = 0 @dataclass(frozen=True) @@ -135,6 +164,37 @@ class QueueStatus: expired_repo_leases: int +@dataclass(frozen=True) +class QueueBacklog: + """Read-API queue backlog counts computed WITHOUT a full-table Scan (SC-7). + + The live dashboard polls the backlog panel, so it must not pay the O(table) + cost the legacy ``get_queue_status`` does (two full-table Scans). This DTO is + produced from per-status ``Select=COUNT`` queries over the status GSI + partitions, so the read is O(status-partitions) — bounded by the (small, + fixed) shard count times the status set, never by total table size. + + ``job_counts_by_status`` carries one entry per known status (pending, leased, + completed, dead_letter). ``backlog`` is the actionable-depth shorthand the + dashboard surfaces: pending + leased (work not yet terminal). It deliberately + does NOT carry expired-lease counts: those require reading each leased row's + ``leaseUntil`` against now (O(leased) item bodies), which is the reaper/timer + path, not the always-on dashboard poll. + """ + + job_counts_by_status: dict[str, int] + backlog: int + + +@dataclass(frozen=True) +class ReapSummary: + """Outcome of one lease-reaper sweep (FR-6).""" + + jobs_returned_to_pending: int = 0 + jobs_dead_lettered: int = 0 + repo_leases_released: int = 0 + + @dataclass(frozen=True) class ScanRunHealth: """One-per-invocation marker that a scan run completed successfully. @@ -151,6 +211,78 @@ class ScanRunHealth: findings_total: int +@dataclass(frozen=True) +class RepoHealth: + """Per-repository freshness marker (FR-7), one ``REPO_HEALTH#`` row. + + Replaces the global ``SCAN_HEALTH`` singleton: each repo carries its own two + "last successful" timestamps so a single fresh repo can no longer mask 499 + stale ones (the silent-staleness bug). ``None`` means that scan class has + never succeeded for this repo. Per FR-7 scope this holds ONLY freshness + timestamps; finding counts are derived at read time, never cached here. + """ + + repo_id: str + last_successful_incremental_at: str | None = None + last_successful_full_scan_at: str | None = None + + +@dataclass(frozen=True) +class RepoFreshnessBreach: + """One per-repo freshness breach emitted by the scheduled evaluator (FR-8). + + ``incremental``/``baseline`` say which threshold(s) the repo breached so a + downstream alert sink (M9) can route without re-evaluating. This is an + advisory signal object; it carries no sink wiring. + """ + + repo_id: str + incremental: bool + baseline: bool + last_successful_incremental_at: str | None + last_successful_full_scan_at: str | None + + +@dataclass(frozen=True) +class BreachCounter: + """Materialized freshness rollup (F5/SC-7) the read API reads O(1). + + The scheduled freshness evaluator (FR-8) recomputes this on its timer so the + future read API/dashboard never full-enumerates REPO_HEALTH per request. + ``coverage_gap`` is the org-N-vs-covered-M half that depends on the CATALOG + entity (M1, not built here); it stays ``None`` until that seam is filled. + """ + + incremental_breaches: int + baseline_breaches: int + total_breaches: int + repos_evaluated: int + evaluated_at: str + coverage_gap: int | None = None + + +@dataclass(frozen=True) +class CatalogEntry: + """One org repository's catalog membership row (FR-1, M1). + + ``CATALOG#``/``META``. The catalog is the shared org repo list axis + for gitleaks coverage today and any future GHAS reconcile (FR-12). One row + per org repo, reconciled on a timer from an org-list provider. + + ``included`` is False for opt-out repos: an opted-out repo is RECORDED with an + ``excluded_reason`` rather than dropped, so it never silently disappears from + coverage accounting. ``first_seen`` is set once on the row's first reconcile; + ``last_reconciled`` advances every reconcile pass the repo is still present. + """ + + repo_id: str + repo_url: str + included: bool + first_seen: str + last_reconciled: str + excluded_reason: str | None = None + + @dataclass(frozen=True) class FindingStateEvent: """Append-only lifecycle transition for one finding disposition.""" @@ -312,6 +444,14 @@ def list_scan_targets(self) -> list[ScanTarget]: def get_ref_state(self, repo_id: str, ref_name: str) -> RefState | None: """Return the last observed state for a repository ref.""" + def list_ref_states(self, repo_id: str) -> list[RefState]: + """Return every stored ref state for one repository. + + Powers the ls-remote skip cursor (SC-6a): ``ref_patterns`` are typically + globs, so the poller reads all ref states for a repo and matches concrete + ref names client-side rather than resolving each glob via a point lookup. + """ + def put_ref_state(self, state: RefState) -> None: """Persist the last observed state for a repository ref.""" @@ -335,15 +475,44 @@ def complete_processed_job( findings: Sequence[Finding], ledger: ScanLedgerEntry, ) -> None: - """Persist findings, ledger, then mark the job completed.""" + """Persist findings, ledger, then mark the job completed. + + Completion CASes on the leased job's ``worker_id`` AND ``fence`` so a + reaped/slow original worker cannot complete a job another worker has + already re-run (double-scan + stale-completion race). + """ + + def advance_repo_health( + self, + repo_id: str, + *, + job_type: str, + completed_at: str | dt.datetime, + ) -> None: + """Advance one repo's freshness field after a successful completion. + + Attribute-scoped, advancing-only conditional write (FR-7/SC-5): the + worker calls this after ``complete_processed_job`` so a successful scan + records per-repo freshness. ``job_type`` selects the incremental vs + baseline field; the write never clobbers the other field and never + regresses a newer timestamp. + """ def record_retryable_failure( self, job_id: str, error: str, next_attempt_at: dt.datetime, + *, + worker_id: str | None = None, + fence: int | None = None, ) -> None: - """Return a failed job to pending or dead-letter it when attempts exhaust.""" + """Return a failed job to pending or dead-letter it when attempts exhaust. + + When ``worker_id``/``fence`` are supplied the write is fenced: a stale + worker whose lease was already reclaimed is rejected and does not mutate + the job another worker now owns. + """ def move_job_to_dead_letter(self, job_id: str, error: str) -> None: """Move a job to the terminal failure state.""" @@ -356,14 +525,53 @@ def acquire_repo_lease( repo_id: str, worker_id: str, lease_seconds: int, - ) -> bool: - """Acquire a bounded repository lease when absent or expired.""" + ) -> int | None: + """Acquire a bounded repository lease when absent or expired. + + Returns the monotonically increasing fence token granted to the caller + on success, or ``None`` when the lease is held by a live worker. The + caller carries the fence and passes it to ``release_repo_lease``. + """ - def release_repo_lease(self, repo_id: str, worker_id: str) -> None: - """Release a repository lease only when owned by the worker.""" + def release_repo_lease( + self, repo_id: str, worker_id: str, *, fence: int | None = None + ) -> None: + """Release a repository lease only when owned by the worker. + + When ``fence`` is supplied the delete CASes on ``workerId`` AND + ``fence`` so a worker that no longer owns the lease (it was reaped and + re-acquired) cannot delete the new holder's lease. + """ + + def reap_expired_leases(self, now: dt.datetime) -> ReapSummary: + """Reclaim expired job and repo leases (FR-6 lease-reaper). + + Returns expired leased jobs to pending while incrementing their fence + (fencing out the prior holder) and releases expired repo leases. This is + the timer-driven reclaim path; it is distinct from normal completion. + """ def get_queue_status(self, now: dt.datetime) -> QueueStatus: - """Return queue status counts and expired lease counts.""" + """Return queue status counts and expired lease counts. + + Legacy full-table-Scan visibility path retained for its existing callers + (the ``queue-status`` CLI / reaper observability). The always-on read-API + backlog path is ``read_queue_backlog`` below, which avoids the Scan. + """ + + def count_scan_jobs_by_status(self, status: str) -> int: + """Count jobs in one status without reading item bodies (SC-7). + + Uses ``Select=COUNT`` over the status GSI partition(s) so the server + returns only a count. O(status-partitions); never a full-table Scan. + """ + + def read_queue_backlog(self) -> QueueBacklog: + """Return read-API queue backlog counts O(status-partitions) (SC-7). + + The always-on dashboard backlog path: per-status ``Select=COUNT`` over the + status GSI partitions, never the O(table) ``get_queue_status`` Scan. + """ @runtime_checkable @@ -381,6 +589,86 @@ def read_latest_scan_run_health(self) -> ScanRunHealth | None: """Return the most recently completed scan-run health record.""" +@runtime_checkable +class RepoHealthStore(Protocol): + """Per-repo freshness capability (FR-7/SC-5). + + A store implements this when it can advance one repo's freshness field with + an attribute-scoped, advancing-only conditional write and report repo health + for the scheduled evaluator's rollup. + """ + + def advance_repo_health( + self, + repo_id: str, + *, + job_type: str, + completed_at: str | dt.datetime, + ) -> None: + """Advance one repo's freshness timestamp for ``job_type``. + + Uses an attribute-scoped conditional UpdateItem (``SET ...At = :t`` IF + ``attribute_not_exists`` OR ``:t > existing``) so an ``incremental`` and + a ``baseline`` completion never clobber each other's field, and an + out-of-order (older) completion cannot regress a newer timestamp. The + advancing-only condition also makes per-commit write amplification cheap: + a no-op when the candidate is not newer. + """ + + def read_repo_health(self, repo_id: str) -> RepoHealth | None: + """Return one repository's freshness record, or None when absent.""" + + def read_repo_health_batch( + self, repo_ids: Iterable[str] + ) -> dict[str, RepoHealth]: + """Return freshness records for the given repos, keyed by repo_id.""" + + def read_all_repo_health(self) -> list[RepoHealth]: + """Enumerate every REPO_HEALTH record (evaluator-only full scan).""" + + +@runtime_checkable +class AlertStateStore(Protocol): + """Alert de-dup / re-notify state capability (FR-8 alerts, M9). + + A store implements this when it can persist the last-alerted timestamp per + ``(repo_id, kind)`` so the scheduled freshness-evaluator's de-dup/re-notify + policy can suppress a repeat alert within a re-notify window but re-fire after + it. The state is a single ISO timestamp per identity; idempotent overwrite. + """ + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + """Return the last-alerted ISO timestamp for ``(repo, kind)`` or None.""" + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + """Record that ``(repo, kind)`` was alerted at ``alerted_at`` (ISO).""" + + +@runtime_checkable +class CatalogStore(Protocol): + """Org catalog membership capability (FR-1, M1). + + A store implements this when it can upsert one catalog entry, read one back, + and enumerate every catalog row. Enumeration is bounded by org size (≤ N), so + a full scan is acceptable here — unlike per-request read paths, the reconcile + and coverage computation run on a timer, not on the read API hot path. + """ + + def put_catalog_entry(self, entry: CatalogEntry) -> None: + """Upsert one CATALOG row (last-writer-wins on the whole row). + + The reconcile operation has already merged first_seen/last_reconciled, so + a plain put is correct: the reconcile, not the store, owns the additive + first_seen-preserving merge. + """ + + def read_catalog_entry(self, repo_id: str) -> CatalogEntry | None: + """Return one repository's catalog row, or None when absent.""" + + def read_all_catalog_entries(self) -> list[CatalogEntry]: + """Enumerate every CATALOG row (bounded by org size, reconcile-only).""" + + @runtime_checkable class GhasAlertStore(Protocol): """Durable redacted GHAS alert storage capability.""" diff --git a/tests/test_alert_sink.py b/tests/test_alert_sink.py new file mode 100644 index 0000000..907a51d --- /dev/null +++ b/tests/test_alert_sink.py @@ -0,0 +1,476 @@ +"""M9 alert-sink tests (FR-8 alerts, the F3 silent-staleness fix). + +The evidence M9 ties "done" to: a SCHEDULED detection of a stale repo RESULTS IN +an alert reaching the sink — not merely sink plumbing existing. These tests drive +the real evaluator (``run_freshness_evaluator_with_alerts``) against a fake store +and a recording sink (no DynamoDB, no live channel) and assert WHICH alerts fired +with WHAT context, that the de-dup/re-notify policy holds, that a fresh repo fires +NOTHING, and — mirroring the incident — that one fresh repo does NOT suppress the +alert for a different stale repo. +""" + +from __future__ import annotations + +import datetime as dt + +import pytest + +from security_scanner.runtime.alert_sink import ( + _now_iso, + ALERT_KIND_BASELINE_BREACH, + ALERT_KIND_CADENCE_OVERRUN, + ALERT_KIND_COVERAGE_GAP, + ALERT_KIND_DEAD_LETTER, + ALERT_KIND_INCREMENTAL_BREACH, + ALERT_KIND_QUEUE_BACKLOG, + ALERT_SCOPE_ORG, + Alert, + AlertDispatcher, + InMemoryAlertStateStore, + NotificationLogAlertSink, + RecordingAlertSink, + alert_from_cadence_overrun, + run_freshness_evaluator_with_alerts, +) +from security_scanner.runtime.incremental_discovery import evaluate_poll_cadence +from security_scanner.runtime.scan_health import FreshnessThresholds +from security_scanner.storage.base import BreachCounter, QueueBacklog, RepoHealth + + +def _now() -> dt.datetime: + return dt.datetime(2026, 6, 20, 12, 0, 0, tzinfo=dt.UTC) + + +def _thresholds() -> FreshnessThresholds: + # Generous cadences so a recently-scanned repo is fresh and a never-scanned + # one fails closed. + return FreshnessThresholds.from_cadences( + poll_interval_hours=1.0, + baseline_cadence_hours=24.0, + margin_hours=2.0, + ) + + +def _fresh(repo_id: str, now: dt.datetime) -> RepoHealth: + recent = (now - dt.timedelta(minutes=5)).isoformat() + return RepoHealth( + repo_id=repo_id, + last_successful_incremental_at=recent, + last_successful_full_scan_at=recent, + ) + + +def _stale(repo_id: str) -> RepoHealth: + # Never recorded => fail-closed breach on BOTH classes. + return RepoHealth(repo_id=repo_id) + + +def _incremental_only_stale(repo_id: str, now: dt.datetime) -> RepoHealth: + # Fresh baseline, stale incremental (incremental success 10h ago > 1h+2h). + return RepoHealth( + repo_id=repo_id, + last_successful_incremental_at=(now - dt.timedelta(hours=10)).isoformat(), + last_successful_full_scan_at=(now - dt.timedelta(minutes=5)).isoformat(), + ) + + +class FakeAlertStore: + """Fake store implementing the freshness + alert-state + backlog seams. + + No DynamoDB: enumerates injected REPO_HEALTH, records the BREACH_COUNTER, and + holds per-(repo, kind) alert de-dup state in a dict (the durable ALERT_STATE + substrate's in-memory stand-in). Optionally returns a QueueBacklog so the + dead-letter / backlog-growth triggers can be exercised. + """ + + def __init__( + self, + health: list[RepoHealth], + *, + backlog: QueueBacklog | None = None, + ) -> None: + self._health = health + self.put_counters: list[BreachCounter] = [] + self._alert_state: dict[tuple[str, str], str] = {} + self._backlog = backlog + + def read_all_repo_health(self) -> list[RepoHealth]: + return list(self._health) + + def put_breach_counter(self, counter: BreachCounter) -> None: + self.put_counters.append(counter) + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + return self._alert_state.get((repo_id, kind)) + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + self._alert_state[(repo_id, kind)] = alerted_at + + # Only present when a backlog is injected, so the orchestrator's getattr + # probe skips the queue triggers otherwise. + def read_queue_backlog(self) -> QueueBacklog: + assert self._backlog is not None + return self._backlog + + +def _run(store: FakeAlertStore, *, now=None, **kwargs): + now = now or _now() + sink = RecordingAlertSink() + dispatcher = AlertDispatcher(sink=sink, state=store, window_hours=24.0) + evaluation = run_freshness_evaluator_with_alerts( + store, + now=now, + thresholds=_thresholds(), + dispatcher=dispatcher, + **kwargs, + ) + return sink, evaluation + + +# --- the F3 evidence: a stale repo detected by the evaluator FIRES an alert --- + + +def test_stale_repo_fires_incremental_and_baseline_alerts_with_context(): + store = FakeAlertStore([_stale("repo_stale")]) + sink, evaluation = _run(store) + + assert ALERT_KIND_INCREMENTAL_BREACH in sink.kinds + assert ALERT_KIND_BASELINE_BREACH in sink.kinds + # Context is actionable: repo_id + which threshold + the (None=NEVER) age. + inc = sink.of_kind(ALERT_KIND_INCREMENTAL_BREACH)[0] + assert inc.repo_id == "repo_stale" + assert inc.detail["lastSuccessfulIncrementalAt"] is None + assert "repo_stale" in inc.message + # The materialized rollup was still written (M5 behavior preserved). + assert len(store.put_counters) == 1 + assert evaluation.counter.total_breaches == 1 + + +def test_incremental_only_breach_fires_only_incremental_alert(): + now = _now() + store = FakeAlertStore([_incremental_only_stale("repo_x", now)]) + sink, _ = _run(store, now=now) + + assert sink.kinds == [ALERT_KIND_INCREMENTAL_BREACH] + assert sink.of_kind(ALERT_KIND_BASELINE_BREACH) == [] + + +# --- a fresh repo fires NOTHING (silence only when truly fresh) --------------- + + +def test_fresh_repo_fires_no_alert(): + now = _now() + store = FakeAlertStore([_fresh("repo_fresh", now)]) + sink, evaluation = _run(store, now=now) + + assert sink.alerts == [] + assert evaluation.counter.total_breaches == 0 + # BREACH_COUNTER still materialized (zero breaches) — the read API distinguishes + # "evaluated, all fresh" from "never evaluated". + assert len(store.put_counters) == 1 + + +# --- the incident shape: one fresh repo must NOT suppress a different stale one + + +def test_one_fresh_repo_does_not_suppress_a_different_stale_repo(): + now = _now() + store = FakeAlertStore([_fresh("repo_fresh", now), _stale("repo_stale")]) + sink, _ = _run(store, now=now) + + fired_repo_ids = {a.repo_id for a in sink.alerts} + # The fresh repo fires nothing; the stale repo still fires. This is exactly + # the silent-staleness incident: a fresh repo can no longer mask a stale one. + assert "repo_fresh" not in fired_repo_ids + assert "repo_stale" in fired_repo_ids + assert sink.for_repo("repo_stale") # at least one alert reached the sink + + +# --- coverage gap fires ------------------------------------------------------- + + +def test_coverage_gap_fires_org_scoped_alert(): + now = _now() + store = FakeAlertStore([_fresh("repo_fresh", now)]) + sink, _ = _run(store, now=now, coverage_gap=3) + + gap = sink.of_kind(ALERT_KIND_COVERAGE_GAP) + assert len(gap) == 1 + assert gap[0].repo_id == ALERT_SCOPE_ORG + assert gap[0].detail["coverageGap"] == 3 + + +def test_zero_coverage_gap_fires_nothing(): + now = _now() + store = FakeAlertStore([_fresh("repo_fresh", now)]) + sink, _ = _run(store, now=now, coverage_gap=0) + + assert sink.of_kind(ALERT_KIND_COVERAGE_GAP) == [] + + +# --- dead-letter increase fires ---------------------------------------------- + + +def test_dead_letter_increase_fires(): + now = _now() + backlog = QueueBacklog( + job_counts_by_status={ + "pending": 0, + "leased": 0, + "completed": 5, + "dead_letter": 2, + }, + backlog=0, + ) + store = FakeAlertStore([_fresh("repo_fresh", now)], backlog=backlog) + sink, _ = _run(store, now=now) + + dead = sink.of_kind(ALERT_KIND_DEAD_LETTER) + assert len(dead) == 1 + assert dead[0].detail["deadLetter"] == 2 + + +def test_dead_letter_steady_count_does_not_refire_within_window(): + now = _now() + backlog = QueueBacklog( + job_counts_by_status={"completed": 0, "dead_letter": 2}, + backlog=0, + ) + store = FakeAlertStore([_fresh("repo_fresh", now)], backlog=backlog) + sink = RecordingAlertSink() + dispatcher = AlertDispatcher(sink=sink, state=store, window_hours=24.0) + + # First pass: dead-letter rose 0 -> 2, fires. + run_freshness_evaluator_with_alerts( + store, now=now, thresholds=_thresholds(), dispatcher=dispatcher + ) + # Second pass shortly after: SAME count, within window -> no new alert. + run_freshness_evaluator_with_alerts( + store, + now=now + dt.timedelta(minutes=5), + thresholds=_thresholds(), + dispatcher=dispatcher, + ) + assert len(sink.of_kind(ALERT_KIND_DEAD_LETTER)) == 1 + + +# --- queue-backlog growth fires past threshold ------------------------------- + + +def test_queue_backlog_growth_fires_past_threshold(): + now = _now() + backlog = QueueBacklog( + job_counts_by_status={"pending": 80, "leased": 30}, + backlog=110, + ) + store = FakeAlertStore([_fresh("repo_fresh", now)], backlog=backlog) + sink, _ = _run(store, now=now, backlog_threshold=100) + + grew = sink.of_kind(ALERT_KIND_QUEUE_BACKLOG) + assert len(grew) == 1 + assert grew[0].detail["backlog"] == 110 + assert grew[0].detail["threshold"] == 100 + + +def test_queue_backlog_below_threshold_fires_nothing(): + now = _now() + backlog = QueueBacklog( + job_counts_by_status={"pending": 10, "leased": 5}, + backlog=15, + ) + store = FakeAlertStore([_fresh("repo_fresh", now)], backlog=backlog) + sink, _ = _run(store, now=now, backlog_threshold=100) + + assert sink.of_kind(ALERT_KIND_QUEUE_BACKLOG) == [] + + +# --- cadence-overrun routes to the sink -------------------------------------- + + +def test_cadence_overrun_routes_to_sink(): + signal = evaluate_poll_cadence( + cycle_seconds=400.0, cadence_seconds=300.0, targets=500 + ) + alert = alert_from_cadence_overrun(signal, now=_now()) + assert alert is not None + assert alert.kind == ALERT_KIND_CADENCE_OVERRUN + assert alert.detail["overrunSeconds"] == 100.0 + + sink = RecordingAlertSink() + state = InMemoryAlertStateStore() + dispatcher = AlertDispatcher(sink=sink, state=state, window_hours=24.0) + assert dispatcher.dispatch(alert, now=_now()) is True + assert sink.of_kind(ALERT_KIND_CADENCE_OVERRUN) + + +def test_cadence_within_budget_routes_nothing(): + signal = evaluate_poll_cadence( + cycle_seconds=200.0, cadence_seconds=300.0, targets=10 + ) + assert alert_from_cadence_overrun(signal, now=_now()) is None + + +# --- de-dup: suppress a repeat within the window ----------------------------- + + +def test_dedup_suppresses_repeat_within_renotify_window(): + now = _now() + store = FakeAlertStore([_stale("repo_stale")]) + sink = RecordingAlertSink() + dispatcher = AlertDispatcher(sink=sink, state=store, window_hours=24.0) + + run_freshness_evaluator_with_alerts( + store, now=now, thresholds=_thresholds(), dispatcher=dispatcher + ) + first = len(sink.of_kind(ALERT_KIND_INCREMENTAL_BREACH)) + assert first == 1 + + # Re-evaluate 1h later: still stale, but inside the 24h re-notify window. + run_freshness_evaluator_with_alerts( + store, + now=now + dt.timedelta(hours=1), + thresholds=_thresholds(), + dispatcher=dispatcher, + ) + # No second incremental alert for the same (repo, kind): not spamming. + assert len(sink.of_kind(ALERT_KIND_INCREMENTAL_BREACH)) == 1 + + +# --- re-notify: fire again after the window ---------------------------------- + + +def test_renotify_fires_again_after_window_elapses(): + now = _now() + store = FakeAlertStore([_stale("repo_stale")]) + sink = RecordingAlertSink() + dispatcher = AlertDispatcher(sink=sink, state=store, window_hours=24.0) + + run_freshness_evaluator_with_alerts( + store, now=now, thresholds=_thresholds(), dispatcher=dispatcher + ) + assert len(sink.of_kind(ALERT_KIND_INCREMENTAL_BREACH)) == 1 + + # Re-evaluate 25h later: still stale and PAST the 24h window -> re-fires, so a + # persistently-stale repo can never be silently forgotten. + run_freshness_evaluator_with_alerts( + store, + now=now + dt.timedelta(hours=25), + thresholds=_thresholds(), + dispatcher=dispatcher, + ) + assert len(sink.of_kind(ALERT_KIND_INCREMENTAL_BREACH)) == 2 + + +def test_dedup_is_per_repo_kind_not_global(): + now = _now() + # Two stale repos: an alert for repo_a must not suppress repo_b's alert, and + # an incremental kind must not suppress the baseline kind for the same repo. + store = FakeAlertStore([_stale("repo_a"), _stale("repo_b")]) + sink, _ = _run(store, now=now) + + assert sink.for_repo("repo_a") + assert sink.for_repo("repo_b") + repo_a_kinds = {a.kind for a in sink.for_repo("repo_a")} + assert ALERT_KIND_INCREMENTAL_BREACH in repo_a_kinds + assert ALERT_KIND_BASELINE_BREACH in repo_a_kinds + + +# --- default sink reuses the notification-log JSONL seam ---------------------- + + +def test_notification_log_sink_writes_alert_record(): + written: list[tuple] = [] + + def fake_writer(path, record): + written.append((path, record)) + + sink = NotificationLogAlertSink("/tmp/does-not-matter.jsonl", writer=fake_writer) + alert = Alert( + repo_id="repo_z", + kind=ALERT_KIND_INCREMENTAL_BREACH, + message="stale", + event_at=_now().isoformat(), + detail={"k": "v"}, + ) + sink.emit(alert) + + assert len(written) == 1 + _, record = written[0] + assert record["type"] == "alert" + assert record["kind"] == ALERT_KIND_INCREMENTAL_BREACH + assert record["repo_id"] == "repo_z" + assert record["detail"] == {"k": "v"} + + +def test_notification_log_sink_default_path_is_the_existing_seam(): + from security_scanner.runtime.notification_log import ( + DEFAULT_NOTIFICATION_LOG_PATH, + ) + + sink = NotificationLogAlertSink(None, writer=lambda p, r: None) + assert sink._path == DEFAULT_NOTIFICATION_LOG_PATH + + +# --- durable ALERT_STATE store round-trip (de-dup state persistence) ---------- + + +def _durable_store(): + from security_scanner.storage.dynamodb_compatible.store import ( + DynamoDbCompatibleConfig, + DynamoDbCompatibleFindingStore, + ) + from tests.test_dynamodb_compatible_store import ( + FakeDynamoClient, + FakeDynamoResource, + FakeDynamoTable, + ) + + table = FakeDynamoTable() + return DynamoDbCompatibleFindingStore( + DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + resource=FakeDynamoResource(table), + client=FakeDynamoClient(table), + ) + + +def test_store_alert_state_round_trips_per_repo_kind(): + store = _durable_store() + # Absent before any write. + assert store.read_alert_state("repo_a", ALERT_KIND_INCREMENTAL_BREACH) is None + + store.put_alert_state("repo_a", ALERT_KIND_INCREMENTAL_BREACH, _now().isoformat()) + assert ( + store.read_alert_state("repo_a", ALERT_KIND_INCREMENTAL_BREACH) + == _now().isoformat() + ) + # Different kind for the same repo is a distinct identity (no cross-clobber). + assert store.read_alert_state("repo_a", ALERT_KIND_BASELINE_BREACH) is None + # Different repo is a distinct identity too. + assert store.read_alert_state("repo_b", ALERT_KIND_INCREMENTAL_BREACH) is None + + +def test_store_alert_state_overwrite_is_idempotent(): + store = _durable_store() + kind = ALERT_KIND_INCREMENTAL_BREACH + store.put_alert_state("repo_a", kind, "2026-06-20T12:00:00+00:00") + store.put_alert_state("repo_a", kind, "2026-06-21T12:00:00+00:00") + assert store.read_alert_state("repo_a", kind) == "2026-06-21T12:00:00+00:00" + + +def test_store_satisfies_alert_state_protocol(): + from security_scanner.storage.base import AlertStateStore + + assert isinstance(_durable_store(), AlertStateStore) + + +def test_now_iso_normalizes_aware_datetime_to_utc(): + aware = dt.datetime(2026, 6, 20, 7, 0, 0, tzinfo=dt.timezone(dt.timedelta(hours=5))) + assert _now_iso(aware) == "2026-06-20T02:00:00+00:00" + + +def test_now_iso_rejects_naive_datetime(): + # A naive datetime would be silently reinterpreted as local time by + # astimezone(); _now_iso fails closed instead of emitting a wrong timestamp. + naive = dt.datetime(2026, 6, 20, 12, 0, 0) + with pytest.raises(ValueError, match="timezone-aware"): + _now_iso(naive) diff --git a/tests/test_catalog_reconcile.py b/tests/test_catalog_reconcile.py new file mode 100644 index 0000000..b3d0adc --- /dev/null +++ b/tests/test_catalog_reconcile.py @@ -0,0 +1,322 @@ +"""M1 catalog reconcile (FR-1): CATALOG entity, additive reconcile, coverage gap. + +No docker: reuses the existing fake boto3 resource/client/table from +``test_dynamodb_compatible_store`` so the reconcile, the CATALOG item mapping, +and the coverage-gap → BREACH_COUNTER seam are all exercised on the real store. +The org list is a FIXTURE — live org fetch stays governance-gated (GATE 2). +""" + +from __future__ import annotations + +import datetime as dt + +import pytest + +from security_scanner.runtime.catalog_reconcile import ( + FixtureOrgRepoListProvider, + GovernanceGatedOrgRepoListProvider, + OrgRepoListProvider, + compute_coverage_gap, + coverage_gap_from_store, + run_catalog_reconcile, +) +from security_scanner.runtime.scan_health import ( + FreshnessThresholds, + run_freshness_evaluator, +) +from security_scanner.storage.adapters.nosql_db.items import ( + catalog_entry_from_item, + catalog_entry_to_item, + repo_id_for_scan_target_url, +) +from security_scanner.storage.base import ( + JOB_TYPE_INCREMENTAL, + CatalogEntry, + CatalogStore, +) +from security_scanner.storage.dynamodb_compatible.store import ( + DynamoDbCompatibleConfig, + DynamoDbCompatibleFindingStore, +) +from tests.test_dynamodb_compatible_store import ( + FakeDynamoClient, + FakeDynamoResource, + FakeDynamoTable, +) + + +def _store() -> DynamoDbCompatibleFindingStore: + table = FakeDynamoTable() + return DynamoDbCompatibleFindingStore( + DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + resource=FakeDynamoResource(table), + client=FakeDynamoClient(table), + ) + + +_NOW = dt.datetime(2026, 6, 20, 12, 0, tzinfo=dt.UTC) +_LATER = dt.datetime(2026, 6, 20, 13, 0, tzinfo=dt.UTC) +_NOW_ISO = "2026-06-20T12:00:00+00:00" +_LATER_ISO = "2026-06-20T13:00:00+00:00" + +_URL_A = "https://github.com/org/repo-a" +_URL_B = "https://github.com/org/repo-b" +_URL_C = "https://github.com/org/repo-c" + + +class _RaisingProvider: + """Provider that simulates a transient/total org-list fetch failure.""" + + def list_org_repos(self): + raise RuntimeError("org list fetch failed (rate-limited)") + + +# --------------------------------------------------------------------------- +# CATALOG item mapping +# --------------------------------------------------------------------------- + + +def test_catalog_entry_item_round_trips(): + entry = CatalogEntry( + repo_id="repo_x", + repo_url=_URL_A, + included=True, + first_seen=_NOW_ISO, + last_reconciled=_LATER_ISO, + excluded_reason=None, + ) + item = catalog_entry_to_item(entry) + assert item["PK"] == "CATALOG#repo_x" + assert item["SK"] == "META" + assert item["entityType"] == "CATALOG" + assert "excludedReason" not in item # dropped when included + assert catalog_entry_from_item(item) == entry + + +def test_catalog_entry_item_keeps_excluded_reason(): + entry = CatalogEntry( + repo_id="repo_x", + repo_url=_URL_A, + included=False, + first_seen=_NOW_ISO, + last_reconciled=_NOW_ISO, + excluded_reason="opt-out", + ) + item = catalog_entry_to_item(entry) + assert item["included"] is False + assert item["excludedReason"] == "opt-out" + assert catalog_entry_from_item(item) == entry + + +def test_store_implements_catalog_store_protocol(): + assert isinstance(_store(), CatalogStore) + + +def test_store_catalog_round_trip_and_enumerate(): + store = _store() + store.put_catalog_entry( + CatalogEntry("repo_a", _URL_A, True, _NOW_ISO, _NOW_ISO, None) + ) + store.put_catalog_entry( + CatalogEntry("repo_b", _URL_B, False, _NOW_ISO, _NOW_ISO, "opt-out") + ) + assert store.read_catalog_entry("repo_a").repo_url == _URL_A + assert store.read_catalog_entry("repo_missing") is None + all_ids = {e.repo_id for e in store.read_all_catalog_entries()} + assert all_ids == {"repo_a", "repo_b"} + + +# --------------------------------------------------------------------------- +# Reconcile: new repo auto-included +# --------------------------------------------------------------------------- + + +def test_new_repo_is_auto_included_with_first_seen_set(): + store = _store() + provider = FixtureOrgRepoListProvider([_URL_A, _URL_B]) + summary = run_catalog_reconcile(store, provider, now=_NOW) + + assert summary.added == 2 + assert summary.updated == 0 + assert summary.excluded == 0 + assert summary.total == 2 + + entry = store.read_catalog_entry(repo_id_for_scan_target_url(_URL_A)) + assert entry.included is True + assert entry.excluded_reason is None + assert entry.first_seen == _NOW_ISO + assert entry.last_reconciled == _NOW_ISO + + +# --------------------------------------------------------------------------- +# Reconcile: opt-out is recorded with a reason, NOT dropped +# --------------------------------------------------------------------------- + + +def test_opt_out_repo_is_excluded_with_reason_not_dropped(): + store = _store() + provider = FixtureOrgRepoListProvider([_URL_A, _URL_B]) + summary = run_catalog_reconcile( + store, provider, opt_out=[_URL_B], now=_NOW + ) + + assert summary.total == 2 # both rows still present + assert summary.excluded == 1 + + excluded = store.read_catalog_entry(repo_id_for_scan_target_url(_URL_B)) + assert excluded is not None # NOT dropped + assert excluded.included is False + assert excluded.excluded_reason == "opt-out" + + +def test_opt_out_flips_membership_on_re_reconcile_keeping_first_seen(): + store = _store() + provider = FixtureOrgRepoListProvider([_URL_A]) + run_catalog_reconcile(store, provider, now=_NOW) + # repo-a later opted out: membership flips, first_seen preserved. + run_catalog_reconcile(store, provider, opt_out=[_URL_A], now=_LATER) + + entry = store.read_catalog_entry(repo_id_for_scan_target_url(_URL_A)) + assert entry.included is False + assert entry.excluded_reason == "opt-out" + assert entry.first_seen == _NOW_ISO # preserved across passes + assert entry.last_reconciled == _LATER_ISO # advanced + + +# --------------------------------------------------------------------------- +# Reconcile: existing entry updated (first_seen preserved, last_reconciled bumped) +# --------------------------------------------------------------------------- + + +def test_existing_repo_is_updated_preserving_first_seen(): + store = _store() + provider = FixtureOrgRepoListProvider([_URL_A]) + run_catalog_reconcile(store, provider, now=_NOW) + summary = run_catalog_reconcile(store, provider, now=_LATER) + + assert summary.added == 0 + assert summary.updated == 1 + entry = store.read_catalog_entry(repo_id_for_scan_target_url(_URL_A)) + assert entry.first_seen == _NOW_ISO # not reset + assert entry.last_reconciled == _LATER_ISO # advanced + + +def test_duplicate_repo_in_provider_list_is_collapsed(): + store = _store() + provider = FixtureOrgRepoListProvider([_URL_A, _URL_A]) + summary = run_catalog_reconcile(store, provider, now=_NOW) + assert summary.added == 1 + assert summary.total == 1 + + +# --------------------------------------------------------------------------- +# Reconcile: ADDITIVE on transient failure — existing entries NOT dropped +# --------------------------------------------------------------------------- + + +def test_transient_failure_does_not_drop_existing_entries(): + store = _store() + # First a successful reconcile populates two repos. + run_catalog_reconcile( + store, FixtureOrgRepoListProvider([_URL_A, _URL_B]), now=_NOW + ) + assert {e.repo_id for e in store.read_all_catalog_entries()} == { + repo_id_for_scan_target_url(_URL_A), + repo_id_for_scan_target_url(_URL_B), + } + + # A later reconcile whose provider RAISES must not delete anything. + with pytest.raises(RuntimeError, match="org list fetch failed"): + run_catalog_reconcile(store, _RaisingProvider(), now=_LATER) + + surviving = {e.repo_id for e in store.read_all_catalog_entries()} + assert surviving == { + repo_id_for_scan_target_url(_URL_A), + repo_id_for_scan_target_url(_URL_B), + } # additive: nothing dropped on failure + + +def test_partial_list_does_not_drop_absent_existing_repos(): + store = _store() + run_catalog_reconcile( + store, FixtureOrgRepoListProvider([_URL_A, _URL_B]), now=_NOW + ) + # A later (partial) list omits repo-b; it must NOT be dropped from coverage. + summary = run_catalog_reconcile( + store, FixtureOrgRepoListProvider([_URL_A]), now=_LATER + ) + # total still reflects both rows (additive); only repo-a was in this list. + assert summary.total == 2 + assert store.read_catalog_entry(repo_id_for_scan_target_url(_URL_B)) is not None + + +# --------------------------------------------------------------------------- +# Coverage gap definition + threading into BREACH_COUNTER via the M5 evaluator +# --------------------------------------------------------------------------- + + +def test_compute_coverage_gap_counts_included_repos_without_health(): + entries = [ + CatalogEntry("repo_a", _URL_A, True, _NOW_ISO, _NOW_ISO, None), + CatalogEntry("repo_b", _URL_B, True, _NOW_ISO, _NOW_ISO, None), + CatalogEntry("repo_c", _URL_C, False, _NOW_ISO, _NOW_ISO, "opt-out"), + ] + # repo_a covered; repo_b not covered; repo_c excluded (not counted). + assert compute_coverage_gap(entries, covered_repo_ids={"repo_a"}) == 1 + + +def test_excluded_repos_are_not_a_coverage_gap(): + entries = [ + CatalogEntry("repo_c", _URL_C, False, _NOW_ISO, _NOW_ISO, "opt-out"), + ] + assert compute_coverage_gap(entries, covered_repo_ids=set()) == 0 + + +def test_coverage_gap_threads_into_breach_counter_via_evaluator(): + store = _store() + # Two included org repos in the catalog. + run_catalog_reconcile( + store, FixtureOrgRepoListProvider([_URL_A, _URL_B]), now=_NOW + ) + # Only repo-a has ever successfully scanned (has a REPO_HEALTH row). + store.advance_repo_health( + repo_id_for_scan_target_url(_URL_A), + job_type=JOB_TYPE_INCREMENTAL, + completed_at=_NOW_ISO, + ) + + gap = coverage_gap_from_store(store) + assert gap == 1 # repo-b is included but uncovered + + run_freshness_evaluator( + store, + now=_LATER, + thresholds=FreshnessThresholds( + incremental_max_age_hours=2.0, baseline_max_age_hours=10.0 + ), + coverage_gap=gap, + ) + persisted = store.read_breach_counter() + assert persisted is not None + assert persisted.coverage_gap == 1 + + +# --------------------------------------------------------------------------- +# Governance seam: the default provider refuses to fetch live GitHub +# --------------------------------------------------------------------------- + + +def test_governance_gated_provider_refuses_live_fetch(): + provider = GovernanceGatedOrgRepoListProvider() + assert isinstance(provider, OrgRepoListProvider) + with pytest.raises(RuntimeError, match="governance-gated"): + provider.list_org_repos() + + +def test_reconcile_with_gated_default_provider_raises_and_writes_nothing(): + store = _store() + with pytest.raises(RuntimeError, match="governance-gated"): + run_catalog_reconcile( + store, GovernanceGatedOrgRepoListProvider(), now=_NOW + ) + assert store.read_all_catalog_entries() == [] # nothing written diff --git a/tests/test_cli.py b/tests/test_cli.py index 4db5694..868cdc5 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -708,7 +708,9 @@ def test_subcommand_registration_order_is_stable(): assert list(subparsers_action.choices) == [ "scan", "discover-updates", + "baseline", "scan-worker", + "reap-expired-leases", "queue-status", "residual", "residual-diff", @@ -716,6 +718,7 @@ def test_subcommand_registration_order_is_stable(): "import-sarif", "scan-vuln", "scan-health", + "freshness-eval", "report", "gate", "evaluate", @@ -734,4 +737,6 @@ def test_subcommand_registration_order_is_stable(): "backfill-repo-axis", "backfill-list-axis", "disposition", + "reconcile", + "read-api", ] diff --git a/tests/test_cli_reconcile.py b/tests/test_cli_reconcile.py new file mode 100644 index 0000000..5c96fdd --- /dev/null +++ b/tests/test_cli_reconcile.py @@ -0,0 +1,113 @@ +"""CLI tests for the M1 ``reconcile`` subcommand (FR-1). + +Covers the governance default: the CLI's default org-list provider REFUSES to +fetch live GitHub, so a plain ``reconcile`` invocation fails loudly with an +"inject a provider" message and writes nothing. Also covers the injected-fixture +happy path (the seam tests use to drive reconcile without touching GitHub). +""" + +from __future__ import annotations + +import pytest + +from security_scanner.cli import main +from security_scanner.cli.commands import reconcile as reconcile_cmd +from security_scanner.runtime.catalog_reconcile import FixtureOrgRepoListProvider +from security_scanner.storage.dynamodb_compatible.store import ( + DynamoDbCompatibleConfig, + DynamoDbCompatibleFindingStore, +) +from tests.test_dynamodb_compatible_store import ( + FakeDynamoClient, + FakeDynamoResource, + FakeDynamoTable, +) + +_URL_A = "https://github.com/org/repo-a" +_URL_B = "https://github.com/org/repo-b" + + +def _real_store() -> DynamoDbCompatibleFindingStore: + table = FakeDynamoTable() + return DynamoDbCompatibleFindingStore( + DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + resource=FakeDynamoResource(table), + client=FakeDynamoClient(table), + ) + + +@pytest.fixture +def fake_store(monkeypatch): + store = _real_store() + monkeypatch.setattr( + "security_scanner.cli._store.create_finding_store", + lambda backend, **kwargs: store, + ) + return store + + +def test_reconcile_jsonl_backend_rejected(fake_store, capsys): + code = main(["reconcile", "--storage-backend", "jsonl"]) + assert code == 2 + assert "dynamodb only" in capsys.readouterr().err + + +def test_reconcile_default_provider_refuses_live_fetch(fake_store, capsys): + # No provider injected -> the governance-gated default stub refuses to fetch. + code = main(["reconcile", "--storage-backend", "dynamodb"]) + assert code == 1 + assert "governance-gated" in capsys.readouterr().err + # Nothing was written: the gated provider raised before any catalog upsert. + assert fake_store.read_all_catalog_entries() == [] + + +def test_reconcile_with_injected_provider_writes_catalog(fake_store, capsys): + # The seam: inject a fixture provider via the parser default so the dispatch + # (args.func(args)) runs reconcile without reaching live GitHub. + provider = FixtureOrgRepoListProvider([_URL_A, _URL_B]) + code = reconcile_cmd.cmd_reconcile( + _ns(storage_backend="dynamodb", opt_out=[], evaluate_freshness=False), + org_list_provider=provider, + ) + assert code == 0 + out = capsys.readouterr().out + assert "reconcile: OK 2 repos" in out + assert "coverage gap 2" in out # both included, none scanned yet + assert len(fake_store.read_all_catalog_entries()) == 2 + + +def test_reconcile_records_opt_out_via_cli(fake_store, capsys): + provider = FixtureOrgRepoListProvider([_URL_A, _URL_B]) + code = reconcile_cmd.cmd_reconcile( + _ns(storage_backend="dynamodb", opt_out=[_URL_B], evaluate_freshness=False), + org_list_provider=provider, + ) + assert code == 0 + assert "1 excluded" in capsys.readouterr().out + entries = {e.repo_id: e for e in fake_store.read_all_catalog_entries()} + excluded = [e for e in entries.values() if not e.included] + assert len(excluded) == 1 + assert excluded[0].excluded_reason == "opt-out" + + +class _Ns: + def __init__(self, **kwargs): + self.__dict__.update(kwargs) + + +def _ns(**kwargs): + # Defaults the reconcile command reads; overridden by kwargs as needed. + base = { + "storage_backend": "dynamodb", + "opt_out": [], + "evaluate_freshness": False, + "poll_interval_hours": 1.0, + "baseline_cadence_hours": 168.0, + "margin_hours": 2.0, + "dynamodb_table": None, + "dynamodb_endpoint_url": None, + "dynamodb_region": None, + "org_list_provider": None, + } + base.update(kwargs) + return _Ns(**base) diff --git a/tests/test_cli_scan_worker.py b/tests/test_cli_scan_worker.py index 09baf67..e902b95 100644 --- a/tests/test_cli_scan_worker.py +++ b/tests/test_cli_scan_worker.py @@ -41,16 +41,19 @@ def has_scan_ledger(self, key): return key in self.ledger_keys def acquire_repo_lease(self, repo_id, worker_id, lease_seconds): - return self.repo_lease_available + # New contract: fence token on success, None on failure (FR-6/SC-2). + return 1 if self.repo_lease_available else None - def release_repo_lease(self, repo_id, worker_id): + def release_repo_lease(self, repo_id, worker_id, *, fence=None): return None def complete_processed_job(self, job, findings, ledger): self.completed.append((job, list(findings), ledger)) self.ledger_keys.add(ledger.key) - def record_retryable_failure(self, job_id, error, next_attempt_at): + def record_retryable_failure( + self, job_id, error, next_attempt_at, *, worker_id=None, fence=None + ): self.retry_failures.append((job_id, error, next_attempt_at)) def return_job_to_pending(self, job_id, reason): diff --git a/tests/test_cli_timer_entrypoints.py b/tests/test_cli_timer_entrypoints.py new file mode 100644 index 0000000..3e3be99 --- /dev/null +++ b/tests/test_cli_timer_entrypoints.py @@ -0,0 +1,411 @@ +"""CLI tests for the M3 timer entrypoints: reap-expired-leases + freshness-eval. + +M2 added the ``reap_expired_leases`` store op and M5 added +``run_freshness_evaluator``; both deliberately left the timer wiring to M3. These +tests exercise the new CLI surfaces the ``lease-reaper`` and ``freshness-eval`` +systemd timers invoke, using injected fake stores (no DynamoDB), mirroring the +existing scan-worker CLI test pattern. +""" + +from __future__ import annotations + +import datetime as dt + +from security_scanner.cli import main +from security_scanner.storage.base import ( + BreachCounter, + ReapSummary, + RepoHealth, +) + +# --- reap-expired-leases ---------------------------------------------------- + + +class FakeReaperStore: + def __init__(self, summary: ReapSummary) -> None: + self._summary = summary + self.reap_calls: list[dt.datetime] = [] + + def reap_expired_leases(self, now: dt.datetime) -> ReapSummary: + self.reap_calls.append(now) + return self._summary + + +def _patch_store(monkeypatch, store) -> None: + monkeypatch.setattr( + "security_scanner.cli._store.create_finding_store", + lambda backend, **kwargs: store, + ) + + +def test_reap_expired_leases_runs_one_sweep_and_reports(monkeypatch, capsys): + store = FakeReaperStore( + ReapSummary( + jobs_returned_to_pending=3, + jobs_dead_lettered=1, + repo_leases_released=2, + ) + ) + _patch_store(monkeypatch, store) + + exit_code = main(["reap-expired-leases", "--storage-backend", "dynamodb"]) + + out = capsys.readouterr().out + assert exit_code == 0 + assert len(store.reap_calls) == 1 + assert "jobs returned to pending: 3" in out + assert "jobs dead-lettered: 1" in out + assert "repo leases released: 2" in out + + +def test_reap_expired_leases_rejects_jsonl_backend(capsys): + exit_code = main(["reap-expired-leases", "--storage-backend", "jsonl"]) + + assert exit_code == 2 + assert "dynamodb only" in capsys.readouterr().err + + +def test_reap_expired_leases_reports_fatal_error(monkeypatch, capsys): + class BoomStore: + def reap_expired_leases(self, now): + raise RuntimeError("synthetic storage failure") + + _patch_store(monkeypatch, BoomStore()) + + exit_code = main(["reap-expired-leases", "--storage-backend", "dynamodb"]) + + assert exit_code == 1 + assert "reap-expired-leases failed" in capsys.readouterr().err + + +# --- freshness-eval --------------------------------------------------------- + + +class FakeFreshnessStore: + def __init__(self, health: list[RepoHealth]) -> None: + self._health = health + self.put_counters: list[BreachCounter] = [] + # M9: the freshness-eval CLI now routes detections through the alert + # dispatcher, which reads/writes per-(repo, kind) de-dup state on the + # store. A fresh store has no prior alert state, so every alert fires. + self._alert_state: dict[tuple[str, str], str] = {} + + def read_all_repo_health(self) -> list[RepoHealth]: + return list(self._health) + + def put_breach_counter(self, counter: BreachCounter) -> None: + self.put_counters.append(counter) + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + return self._alert_state.get((repo_id, kind)) + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + self._alert_state[(repo_id, kind)] = alerted_at + + +def _stale_health(repo_id: str) -> RepoHealth: + # Never-recorded timestamps => fail-closed breach on both classes. + return RepoHealth( + repo_id=repo_id, + last_successful_incremental_at=None, + last_successful_full_scan_at=None, + ) + + +def _fresh_health(repo_id: str, now: dt.datetime) -> RepoHealth: + recent = (now - dt.timedelta(minutes=1)).isoformat() + return RepoHealth( + repo_id=repo_id, + last_successful_incremental_at=recent, + last_successful_full_scan_at=recent, + ) + + +def test_freshness_eval_materializes_breach_counter(monkeypatch, capsys): + store = FakeFreshnessStore([_stale_health("repo_a"), _stale_health("repo_b")]) + _patch_store(monkeypatch, store) + + exit_code = main(["freshness-eval", "--storage-backend", "dynamodb"]) + + out = capsys.readouterr().out + # Detection (not a gate): a successful pass exits 0 even with breaches. + assert exit_code == 0 + # The materialized rollup the read API consumes was written exactly once. + assert len(store.put_counters) == 1 + counter = store.put_counters[0] + assert counter.repos_evaluated == 2 + assert counter.total_breaches == 2 + assert "2 repos evaluated, 2 breached" in out + assert "BREACH_COUNTER materialized" in out + + +def test_freshness_eval_all_fresh_writes_zero_breach_counter(monkeypatch, capsys): + now = dt.datetime.now(dt.UTC) + store = FakeFreshnessStore([_fresh_health("repo_a", now)]) + _patch_store(monkeypatch, store) + + exit_code = main(["freshness-eval", "--storage-backend", "dynamodb"]) + + out = capsys.readouterr().out + assert exit_code == 0 + assert len(store.put_counters) == 1 + assert store.put_counters[0].total_breaches == 0 + assert "1 repos evaluated, 0 breached" in out + + +def test_freshness_eval_fires_alert_to_notification_log_for_stale_repo( + monkeypatch, tmp_path, capsys +): + """The F3 tie at the CLI surface: a scheduled detection of a stale repo + RESULTS IN an alert reaching the notification-log sink — not merely the + BREACH_COUNTER being written.""" + import json + + store = FakeFreshnessStore([_stale_health("repo_stale")]) + _patch_store(monkeypatch, store) + log_path = tmp_path / "alerts.jsonl" + + exit_code = main( + [ + "freshness-eval", + "--storage-backend", + "dynamodb", + "--notification-log", + str(log_path), + ] + ) + + assert exit_code == 0 + assert "alerts dispatched" in capsys.readouterr().out + # An alert record actually reached the sink (the existing JSONL seam). + records = [ + json.loads(line) for line in log_path.read_text().splitlines() if line.strip() + ] + assert records, "expected at least one alert record on the notification log" + kinds = {r["kind"] for r in records if r["type"] == "alert"} + assert "incremental_sla_breach" in kinds + assert any(r["repo_id"] == "repo_stale" for r in records) + + +def test_freshness_eval_fresh_repo_fires_no_alert(monkeypatch, tmp_path): + """Silence only when truly fresh: a fresh repo writes the counter but no + alert record reaches the sink.""" + now = dt.datetime.now(dt.UTC) + store = FakeFreshnessStore([_fresh_health("repo_fresh", now)]) + _patch_store(monkeypatch, store) + log_path = tmp_path / "alerts.jsonl" + + exit_code = main( + [ + "freshness-eval", + "--storage-backend", + "dynamodb", + "--notification-log", + str(log_path), + ] + ) + + assert exit_code == 0 + # No alert file written at all (no alert ever emitted). + assert not log_path.exists() or log_path.read_text().strip() == "" + + +def test_freshness_eval_rejects_jsonl_backend(capsys): + exit_code = main(["freshness-eval", "--storage-backend", "jsonl"]) + + assert exit_code == 2 + assert "dynamodb only" in capsys.readouterr().err + + +def test_freshness_eval_reports_fatal_error(monkeypatch, capsys): + class BoomStore: + def read_all_repo_health(self): + raise RuntimeError("synthetic storage failure") + + _patch_store(monkeypatch, BoomStore()) + + exit_code = main(["freshness-eval", "--storage-backend", "dynamodb"]) + + assert exit_code == 1 + assert "freshness-eval failed" in capsys.readouterr().err + + +# --- incr-poll cadence-overrun (SC-6d, the M4->M9 deployed-path tie) --------- +# +# These prove the DEPLOYED incr-poll path (``discover-updates`` — what the +# incr-poll.service timer invokes) FIRES a cadence-overrun alert to the M9 sink +# when a poll cycle exceeds its cadence budget, and fires nothing when it does +# not. They drive the SAME entrypoint function ``args.func`` resolves to +# (``cmd_discover_updates``) built from the timer's exact argv, injecting the +# monotonic ``clock`` seam to make the cycle wall-time deterministic without a +# real sleep. This is the gap the unit-level M4 mechanism tests did NOT cover: +# that the deployed command actually measures and routes the overrun. + +from pathlib import Path # noqa: E402 + +from security_scanner.catalog.scan_target import ScanTarget # noqa: E402 +from security_scanner.cli.app import build_parser # noqa: E402 +from security_scanner.cli.commands.scan import cmd_discover_updates # noqa: E402 +from security_scanner.runtime.incremental_discovery import GitRef # noqa: E402 +from security_scanner.storage.base import RefState, ScanJob, ScanLedgerKey # noqa: E402 + +_POLL_TARGET = ScanTarget( + url="https://github.com/example-org/poll-repo", + name="example-org/poll-repo", + enabled=True, +) +_POLL_REF = "refs/remotes/origin/main" +_POLL_SHA = "a" * 40 + + +class _FakePollStore: + """Minimal IncrementalScanStore + ALERT_STATE fake for the poll path.""" + + def __init__(self) -> None: + self.ref_states: dict[tuple[str, str], RefState] = {} + self.jobs: dict[str, ScanJob] = {} + self.ledger: set[ScanLedgerKey] = set() + self._alert_state: dict[tuple[str, str], str] = {} + + def list_scan_targets(self) -> list[ScanTarget]: + return [_POLL_TARGET] + + def get_ref_state(self, repo_id: str, ref_name: str) -> RefState | None: + return self.ref_states.get((repo_id, ref_name)) + + def put_ref_state(self, state: RefState) -> None: + self.ref_states[(state.repo_id, state.ref_name)] = state + + def has_scan_ledger(self, key: ScanLedgerKey) -> bool: + return key in self.ledger + + def enqueue_commit_scan_job(self, job: ScanJob) -> bool: + if job.job_id in self.jobs: + return False + self.jobs[job.job_id] = job + return True + + def read_alert_state(self, repo_id: str, kind: str) -> str | None: + return self._alert_state.get((repo_id, kind)) + + def put_alert_state(self, repo_id: str, kind: str, alerted_at: str) -> None: + self._alert_state[(repo_id, kind)] = alerted_at + + +class _FakePollGit: + def __init__(self, repo_path: Path) -> None: + self.repo_path = repo_path + + def fetch(self, repo_path: Path) -> None: + return None + + def list_remote_refs(self, repo_path: Path, patterns) -> list[GitRef]: + return [GitRef(ref_name=_POLL_REF, commit_sha=_POLL_SHA)] + + def is_ancestor(self, repo_path, old_sha, new_sha) -> bool: + return True + + def list_new_commits(self, repo_path, old_sha, new_sha) -> list[str]: + return [] + + +def _poll_args(tmp_path, *, cadence_seconds: float): + """Parse the EXACT incr-poll.service argv into resolved args. + + Uses the real parser so the test exercises the deployed entrypoint + (``args.func is cmd_discover_updates``), with a temp notification-log path. + """ + log_path = tmp_path / "alerts.jsonl" + parser = build_parser() + args = parser.parse_args( + [ + "discover-updates", + "--initialize", + "--from-catalog", + "--storage-backend", + "dynamodb", + "--cadence-seconds", + str(cadence_seconds), + "--notification-log", + str(log_path), + ] + ) + assert args.func is cmd_discover_updates # it IS the deployed timer entrypoint + return args, log_path + + +def _patch_poll_discovery(monkeypatch, tmp_path): + store = _FakePollStore() + repo_path = tmp_path / "poll-repo" + git = _FakePollGit(repo_path) + monkeypatch.setattr( + "security_scanner.cli._store.create_finding_store", + lambda backend, **kwargs: store, + ) + monkeypatch.setattr( + "security_scanner.runtime.incremental_discovery.catalog_repo_targets", + lambda s: [_POLL_TARGET], + ) + monkeypatch.setattr( + "security_scanner.cli.commands.scan.catalog_repo_targets", + lambda s: [_POLL_TARGET], + ) + monkeypatch.setattr( + "security_scanner.cli.commands.scan.fetch_or_clone", lambda url: repo_path + ) + monkeypatch.setattr( + "security_scanner.cli.commands.scan.SubprocessGitDiscovery", lambda: git + ) + return store + + +def _read_alert_records(log_path: Path) -> list[dict]: + import json + + if not log_path.exists(): + return [] + return [ + json.loads(line) + for line in log_path.read_text().splitlines() + if line.strip() + ] + + +def test_discover_updates_deployed_path_fires_cadence_overrun_alert( + monkeypatch, tmp_path, capsys +): + """The deployed incr-poll command (what the timer runs) FIRES a + cadence-overrun alert to the M9 sink when the poll cycle exceeds its + cadence budget.""" + _patch_poll_discovery(monkeypatch, tmp_path) + args, log_path = _poll_args(tmp_path, cadence_seconds=300.0) + + # Monotonic clock: started=0.0, finished=600.0 => 600s cycle > 300s budget. + ticks = iter([0.0, 600.0]) + exit_code = cmd_discover_updates(args, clock=lambda: next(ticks)) + + assert exit_code == 0 + records = _read_alert_records(log_path) + assert records, "expected a cadence-overrun alert on the notification log" + overruns = [r for r in records if r.get("kind") == "cadence_overrun"] + assert overruns, "deployed poll path did not fire a cadence_overrun alert" + detail = overruns[0]["detail"] + assert detail["cycleSeconds"] == 600.0 + assert detail["cadenceSeconds"] == 300.0 + assert detail["overrunSeconds"] == 300.0 + + +def test_discover_updates_deployed_path_within_budget_fires_nothing( + monkeypatch, tmp_path +): + """A within-budget poll cycle on the deployed path stays SILENT: no alert + record reaches the sink.""" + _patch_poll_discovery(monkeypatch, tmp_path) + args, log_path = _poll_args(tmp_path, cadence_seconds=300.0) + + # started=0.0, finished=10.0 => 10s cycle, well within the 300s budget. + ticks = iter([0.0, 10.0]) + exit_code = cmd_discover_updates(args, clock=lambda: next(ticks)) + + assert exit_code == 0 + assert _read_alert_records(log_path) == [] diff --git a/tests/test_dynamodb_compatible_store.py b/tests/test_dynamodb_compatible_store.py index 564592c..c09775d 100644 --- a/tests/test_dynamodb_compatible_store.py +++ b/tests/test_dynamodb_compatible_store.py @@ -12,6 +12,7 @@ from boto3.dynamodb.types import TypeDeserializer from security_scanner.core.finding.model import ( + Disposition, Finding, GitleaksFindingPayload, Status, @@ -104,7 +105,16 @@ def put_item(self, *, Item: dict, **kwargs) -> dict: ): raise self.ConditionalCheckFailedException("conditional check failed") self.put_calls.append(Item) - self.items.append(Item) + # Real PutItem REPLACES the item at (PK, SK); model that so a re-put of the + # same key (e.g. catalog reconcile re-upsert) overwrites instead of leaving + # a stale duplicate that get_item/scan would then read. ``put_calls`` still + # records every call for write-sequence assertions. + for index, existing in enumerate(self.items): + if existing.get("PK") == Item["PK"] and existing.get("SK") == Item["SK"]: + self.items[index] = Item + break + else: + self.items.append(Item) return {"ResponseMetadata": {"HTTPStatusCode": 200}} def batch_writer(self): @@ -124,6 +134,60 @@ def get_item(self, *, Key: dict, **kwargs) -> dict: return {"Item": item} return {} + def update_item( + self, + *, + Key: dict, + UpdateExpression: str, + ConditionExpression: str | None = None, + ExpressionAttributeNames: dict | None = None, + ExpressionAttributeValues: dict | None = None, + **kwargs, + ) -> dict: + """Minimal SET-only UpdateItem with attribute-scoped advancing CAS. + + Supports exactly the shapes the REPO_HEALTH conditional advance uses: + ``SET a = :x, b = :y`` assignments and a condition of the form + ``attribute_not_exists(#at) OR :t > #at`` (upsert-on-absent, advance + only when strictly newer). Anything else is unsupported on purpose so a + test that diverges fails loudly rather than silently passing. + """ + names = ExpressionAttributeNames or {} + values = ExpressionAttributeValues or {} + existing = self.get_item(Key=Key).get("Item") + + if ConditionExpression is not None: + if not self._evaluate_advancing_condition( + ConditionExpression, names, values, existing + ): + raise self.ConditionalCheckFailedException("conditional check failed") + + item = existing if existing is not None else {**Key} + for assignment in UpdateExpression.removeprefix("SET ").split(", "): + target, value_key = assignment.split(" = ", 1) + attr = names.get(target, target) + item[attr] = values[value_key] + if existing is None: + self.items.append(item) + return {"ResponseMetadata": {"HTTPStatusCode": 200}} + + @staticmethod + def _evaluate_advancing_condition( + condition: str, names: dict, values: dict, existing: dict | None + ) -> bool: + # Only the REPO_HEALTH advancing-CAS shape is modeled: + # "attribute_not_exists(#at) OR :t > #at". + absent_clause, _, advance_clause = condition.partition(" OR ") + attr_token = absent_clause[ + absent_clause.index("(") + 1 : absent_clause.index(")") + ].strip() + attr = names.get(attr_token, attr_token) + if existing is None or attr not in existing: + return True + value_token, _, attr_again = advance_clause.partition(" > ") + candidate = values[value_token.strip()] + return candidate > existing[attr] + def query(self, **kwargs) -> dict: self.query_calls.append(kwargs) index_name = kwargs.get("IndexName") @@ -151,6 +215,12 @@ def query(self, **kwargs) -> dict: ) if "Limit" in kwargs: items = items[: kwargs["Limit"]] + # Model Select=COUNT: real DynamoDB returns only a Count and transfers no + # item bodies. The read-API backlog path (SC-7) relies on this so it never + # materializes SCAN_JOB rows. Returning no "Items" also lets a test prove + # the count path is NOT the item-reading/Scan path. + if kwargs.get("Select") == "COUNT": + return {"Count": len(items), "ScannedCount": len(items)} return {"Items": items} def scan(self, **kwargs) -> dict: @@ -1279,6 +1349,42 @@ def test_set_finding_disposition_updates_state_and_appends_event(): ] == {"N": "1"} +def test_disposition_surfaces_through_read_after_verifier_write(): + """A verifier verdict write updates the finding's disposition on read (FR-11/F6). + + M6 seam check: the verifier track writes only a verdict via + set_finding_disposition (no separate disposition field). The merge-on-read + path overlays FINDING_STATE.triage onto the snapshot, so read_for_scan_run + returns the finding with its disposition derived from the NEW verdict — the + field the dashboard filters on tracks the verifier with zero translation. + """ + finding = _make(triage_verdict=Verdict.NEEDS_REVIEW.value) + table = FakeDynamoTable(finding_to_items(finding)) + store = DynamoDbCompatibleFindingStore( + DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + resource=FakeDynamoResource(table), + client=FakeDynamoClient(table), + ) + + # Before any verdict, the finding reads back as unreviewed. + [before] = store.read_for_scan_run(SCAN_RUN_ID) + assert before.disposition == Disposition.UNREVIEWED.value + + store.set_finding_disposition( + finding.finding_id, + status=Status.OPEN.value, + verdict=Verdict.TRUE_POSITIVE.value, + actor="lfm2.5-thinking", + source="verifier", + reason="Synthetic real secret.", + repo=finding.repo.full_name, + rule_id=finding.rule_id, + ) + + [after] = store.read_for_scan_run(SCAN_RUN_ID) + assert after.disposition == Disposition.VERIFIED.value + + def test_set_finding_disposition_retries_optimistic_conflict(): finding = _make(triage_verdict=Verdict.NEEDS_REVIEW.value) table = FakeDynamoTable(finding_to_items(finding)) diff --git a/tests/test_dynamodb_local_integration.py b/tests/test_dynamodb_local_integration.py new file mode 100644 index 0000000..7dd0747 --- /dev/null +++ b/tests/test_dynamodb_local_integration.py @@ -0,0 +1,124 @@ +"""Real DynamoDB-compatible engine integration smoke (Segment A blockers). + +Skipped by default. Runs against a REAL DynamoDB Local (or ScyllaDB Alternator) +endpoint when ``RUN_DDB_LOCAL_SMOKE=1``, exercising the four review-blocker +mechanisms against the actual engine rather than the in-memory fakes: + +- SC-1 bounded ordered dequeue (real GSI1 query + Limit + conditional lease) +- SC-2 lease fencing (real ConditionExpression rejects a stale holder) +- SC-5 REPO_HEALTH attribute-scoped conditional UpdateItem (no clobber/regress) +- SC-7 queue backlog via real ``Select=COUNT`` (no full-table Scan) + +Run: + RUN_DDB_LOCAL_SMOKE=1 SECURITY_SCANNER_DYNAMO_ENDPOINT=http://localhost:4567 \ + uv run pytest tests/test_dynamodb_local_integration.py -q -v +""" + +from __future__ import annotations + +import datetime as dt +import os +import time +import uuid + +import pytest + +from security_scanner.storage.adapters.nosql_db.transport import DynamoDbCompatibleConfig +from security_scanner.storage.base import ScanJob +from security_scanner.storage.factory import create_finding_store + +pytestmark = pytest.mark.skipif( + os.environ.get("RUN_DDB_LOCAL_SMOKE") != "1", + reason="real DynamoDB-Local integration; set RUN_DDB_LOCAL_SMOKE=1", +) + +NOW = dt.datetime(2026, 6, 20, tzinfo=dt.timezone.utc) + + +def _store(): + endpoint = os.environ.get( + "SECURITY_SCANNER_DYNAMO_ENDPOINT", "http://localhost:4567" + ) + cfg = DynamoDbCompatibleConfig( + table_name=f"SegA_{uuid.uuid4().hex[:10]}", + endpoint_url=endpoint, + region_name="us-east-1", + aws_access_key_id="dummy", + aws_secret_access_key="dummy", + ) + store = create_finding_store("dynamodb", dynamodb_config=cfg) + store.ensure_table() + return store + + +def _job(commit_sha: str, *, priority: int = 100) -> ScanJob: + jid = f"job_{commit_sha}" + return ScanJob( + job_id=jid, + repo_id="repo-1", + repo_url="https://example.com/org/repo-1.git", + ref_name="refs/remotes/origin/main", + old_sha="0" * 40, + new_sha=commit_sha, + commit_sha=commit_sha, + commit_range=f"{'0' * 40}..{commit_sha}", + scanner_name="gitleaks", + scanner_version="8.0.0", + rule_pack_version="rp1", + scanner_config_hash="cfg1", + priority=priority, + status="pending", + attempts=0, + max_attempts=3, + worker_id=None, + lease_until=None, + next_attempt_at=NOW, + created_at=NOW, + updated_at=NOW, + ) + + +def test_real_engine_segment_a_blockers(): + store = _store() + + # --- SC-1 + SC-7: enqueue, real Select=COUNT backlog, bounded dequeue --- + for i in range(3): + assert store.enqueue_commit_scan_job(_job(f"{i:040d}")) is True + + assert store.count_scan_jobs_by_status("pending") == 3 + backlog = store.read_queue_backlog() + assert backlog.job_counts_by_status.get("pending") == 3 + assert backlog.job_counts_by_status.get("leased", 0) == 0 + + leased = store.lease_next_scan_job("worker-a", lease_seconds=300, now=NOW) + assert leased is not None and leased.status == "leased" + assert store.count_scan_jobs_by_status("pending") == 2 + assert store.count_scan_jobs_by_status("leased") == 1 + assert store.read_queue_backlog().backlog == 3 # pending(2)+leased(1) + + # --- SC-2: lease fencing rejects a stale holder (real ConditionExpression) --- + fence_a = store.acquire_repo_lease("repo-fence", "worker-a", lease_seconds=1) + assert isinstance(fence_a, int) + time.sleep(1.3) # let worker-a's lease expire + fence_b = store.acquire_repo_lease("repo-fence", "worker-b", lease_seconds=300) + assert isinstance(fence_b, int) and fence_b > fence_a + # stale worker-a tries to release with its old fence -> must NOT delete worker-b's lease + store.release_repo_lease("repo-fence", "worker-a", fence=fence_a) + # worker-b still holds it -> a fresh acquire by worker-c is refused + assert store.acquire_repo_lease("repo-fence", "worker-c", lease_seconds=300) is None + + # --- SC-5: REPO_HEALTH attribute-scoped conditional UpdateItem (no clobber/regress) --- + t_incr = "2026-06-20T01:00:00+00:00" + t_base = "2026-06-20T02:00:00+00:00" + store.advance_repo_health("repo-h", job_type="incremental", completed_at=t_incr) + store.advance_repo_health("repo-h", job_type="baseline", completed_at=t_base) + health = store.read_repo_health("repo-h") + assert health is not None + # both fields preserved (baseline write did NOT clobber the incremental field) + assert health.last_successful_incremental_at == t_incr + assert health.last_successful_full_scan_at == t_base + # an OLDER incremental completion must not regress the newer timestamp + store.advance_repo_health( + "repo-h", job_type="incremental", completed_at="2026-06-19T00:00:00+00:00" + ) + assert store.read_repo_health("repo-h").last_successful_incremental_at == t_incr diff --git a/tests/test_finding.py b/tests/test_finding.py index 6496bb9..51647c0 100644 --- a/tests/test_finding.py +++ b/tests/test_finding.py @@ -16,8 +16,12 @@ import pytest from security_scanner.core.finding.model import ( + DEFAULT_DISPOSITION, + DISPOSITION_TO_VERDICT, + VERDICT_TO_DISPOSITION, Category, ConfidenceLevel, + Disposition, Evidence, Finding, GitleaksFindingPayload, @@ -31,7 +35,11 @@ combine_repo_fingerprint, compute_finding_id, compute_fingerprint, + disposition_for_verdict, + filter_findings_by_disposition, hash_secret, + order_findings_for_dashboard, + verdict_for_disposition, ) @@ -533,3 +541,132 @@ def test_roundtrip_with_none_optional_fields(self): assert restored.repo.branch is None assert restored.repo.commit is None assert restored.evidence.context_artifact_ref is None + + +# --------------------------------------------------------------------------- +# Disposition field (M6 / FR-11 / F6) +# --------------------------------------------------------------------------- + +class TestDispositionMapping: + """Verdict <-> disposition reconciliation table (F6) is correct + total.""" + + def test_mapping_matches_design_vocabulary(self): + # Exact pairs the design pins: verified<->true_positive, + # false_positive<->false_positive, unreviewed<->needs_review. + assert DISPOSITION_TO_VERDICT == { + "verified": Verdict.TRUE_POSITIVE.value, + "false_positive": Verdict.FALSE_POSITIVE.value, + "unreviewed": Verdict.NEEDS_REVIEW.value, + } + + def test_mapping_is_total_over_every_verdict(self): + # Every canonical Verdict the verifier track can produce maps to a + # disposition — no verdict is left untranslatable. + for verdict in list(Verdict): + assert verdict.value in VERDICT_TO_DISPOSITION + + def test_mapping_is_total_over_every_disposition(self): + for disposition in list(Disposition): + assert disposition.value in DISPOSITION_TO_VERDICT + + def test_mapping_is_bijective_roundtrip(self): + for disposition in list(Disposition): + verdict = verdict_for_disposition(disposition.value) + assert disposition_for_verdict(verdict) == disposition.value + for verdict in list(Verdict): + disposition = disposition_for_verdict(verdict.value) + assert verdict_for_disposition(disposition) == verdict.value + + def test_default_disposition_is_unreviewed(self): + assert DEFAULT_DISPOSITION == Disposition.UNREVIEWED.value + assert disposition_for_verdict(Verdict.NEEDS_REVIEW.value) == "unreviewed" + + def test_unknown_verdict_falls_back_to_default(self): + assert disposition_for_verdict("SOME_FUTURE_VERDICT") == DEFAULT_DISPOSITION + + def test_verdict_for_unknown_disposition_raises(self): + with pytest.raises(ValueError, match="Invalid disposition"): + verdict_for_disposition("not_a_disposition") + + +class TestFindingDisposition: + def test_finding_defaults_to_unreviewed(self): + # An untriaged finding (verdict defaults to NEEDS_REVIEW) is unreviewed. + assert _make().disposition == Disposition.UNREVIEWED.value + + def test_true_positive_finding_is_verified(self): + f = _make(triage_verdict=Verdict.TRUE_POSITIVE.value) + assert f.disposition == Disposition.VERIFIED.value + + def test_false_positive_finding_is_false_positive(self): + f = _make(triage_verdict=Verdict.FALSE_POSITIVE.value) + assert f.disposition == Disposition.FALSE_POSITIVE.value + + def test_disposition_emitted_in_to_dict(self): + d = _make(triage_verdict=Verdict.TRUE_POSITIVE.value).to_dict() + assert d["disposition"] == Disposition.VERIFIED.value + + def test_disposition_round_trips_through_dict(self): + f = _make(triage_verdict=Verdict.TRUE_POSITIVE.value) + restored = Finding.from_dict(f.to_dict()) + assert restored.disposition == Disposition.VERIFIED.value + + def test_legacy_finding_without_disposition_decodes_to_default(self): + # A finding written before this field existed has no "disposition" key. + d = _make().to_dict() + del d["disposition"] + restored = Finding.from_dict(d) + assert restored.disposition == Disposition.UNREVIEWED.value + + def test_stale_disposition_in_payload_is_ignored_not_trusted(self): + # disposition is derived from triage.verdict, never trusted from the + # wire: a payload whose disposition disagrees with its verdict decodes + # to the verdict-derived value, so the two can never drift. + d = _make(triage_verdict=Verdict.FALSE_POSITIVE.value).to_dict() + d["disposition"] = "verified" # deliberately inconsistent + restored = Finding.from_dict(d) + assert restored.disposition == Disposition.FALSE_POSITIVE.value + + def test_disposition_does_not_break_equality_roundtrip(self): + f = _make(triage_verdict=Verdict.TRUE_POSITIVE.value) + assert Finding.from_dict(f.to_dict()) == f + + +class TestDispositionFilter: + def _findings_by_disposition(self): + return { + "unreviewed": _make(line_start=1), + "verified": _make(line_start=2, triage_verdict=Verdict.TRUE_POSITIVE.value), + "false_positive": _make( + line_start=3, triage_verdict=Verdict.FALSE_POSITIVE.value + ), + } + + def test_filter_returns_only_requested_disposition(self): + by_disp = self._findings_by_disposition() + result = filter_findings_by_disposition( + by_disp.values(), [Disposition.FALSE_POSITIVE.value] + ) + assert result == [by_disp["false_positive"]] + + def test_filter_dashboard_default_surfaces_unreviewed_and_verified(self): + by_disp = self._findings_by_disposition() + result = filter_findings_by_disposition( + by_disp.values(), + [Disposition.UNREVIEWED.value, Disposition.VERIFIED.value], + ) + assert result == [by_disp["unreviewed"], by_disp["verified"]] + assert by_disp["false_positive"] not in result + + def test_filter_rejects_unknown_disposition(self): + with pytest.raises(ValueError, match="Invalid disposition filter"): + filter_findings_by_disposition([], ["bogus"]) + + def test_dashboard_order_deemphasizes_false_positive(self): + by_disp = self._findings_by_disposition() + # Feed in worst-emphasis-first to prove the helper reorders. + ordered = order_findings_for_dashboard( + [by_disp["false_positive"], by_disp["verified"], by_disp["unreviewed"]] + ) + dispositions = [f.disposition for f in ordered] + assert dispositions == ["unreviewed", "verified", "false_positive"] diff --git a/tests/test_finding_query_runtime.py b/tests/test_finding_query_runtime.py index b4ab8c4..820d2c2 100644 --- a/tests/test_finding_query_runtime.py +++ b/tests/test_finding_query_runtime.py @@ -2,6 +2,7 @@ from __future__ import annotations +from security_scanner.core.finding.model import Disposition, Finding, Verdict from security_scanner.runtime.finding_query import ( FindingQueryRequest, read_findings, @@ -9,6 +10,20 @@ from security_scanner.storage.adapters.nosql_db.transport import DynamoDbCompatibleConfig +def _finding(line_start: int, verdict: str) -> Finding: + return Finding.create( + repo_full_name="fake-org/fake-repo", + rule_id="aws-access-key-id", + file_path="config/settings.py", + line_start=line_start, + raw_secret="AKIAFAKEEXAMPLE000000", + source_tool="gitleaks", + scan_run_id="scan_abc12345", + rule_pack_version="secret-rules-0.1.0", + triage_verdict=verdict, + ) + + def test_read_findings_reads_all_when_scan_run_id_absent(): """Query runtime reads the full store when no scan-run filter is requested.""" findings = [object()] @@ -163,3 +178,53 @@ def fake_store_factory(backend, **kwargs): assert result == findings assert calls == [("dynamodb-compatible", {"dynamodb_config": None})] + + +def test_read_findings_filters_by_disposition_when_requested(): + """The read-API seam slices findings by disposition for the dashboard (FR-11).""" + unreviewed = _finding(1, Verdict.NEEDS_REVIEW.value) + verified = _finding(2, Verdict.TRUE_POSITIVE.value) + false_positive = _finding(3, Verdict.FALSE_POSITIVE.value) + + class FakeReader: + def read_all(self): + return [unreviewed, verified, false_positive] + + def read_for_scan_run(self, scan_run_id): + raise AssertionError(scan_run_id) + + result = read_findings( + FindingQueryRequest( + storage_backend="jsonl", + jsonl_path="findings.jsonl", + dispositions=[Disposition.UNREVIEWED.value, Disposition.VERIFIED.value], + ), + store=FakeReader(), + ) + + assert result == [unreviewed, verified] + + +def test_read_findings_disposition_filter_applies_to_scan_run_reads(): + """Disposition filtering composes with the scan-run filter.""" + verified = _finding(2, Verdict.TRUE_POSITIVE.value) + false_positive = _finding(3, Verdict.FALSE_POSITIVE.value) + + class FakeReader: + def read_all(self): + raise AssertionError("scan-run query should not call read_all") + + def read_for_scan_run(self, scan_run_id): + return [verified, false_positive] + + result = read_findings( + FindingQueryRequest( + storage_backend="dynamodb", + scan_run_id="scan_abc12345", + dynamodb_config=DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + dispositions=[Disposition.FALSE_POSITIVE.value], + ), + store=FakeReader(), + ) + + assert result == [false_positive] diff --git a/tests/test_incremental_discovery.py b/tests/test_incremental_discovery.py index 1c4dc13..3924bad 100644 --- a/tests/test_incremental_discovery.py +++ b/tests/test_incremental_discovery.py @@ -43,6 +43,13 @@ def list_scan_targets(self) -> list[ScanTarget]: def get_ref_state(self, repo_id: str, ref_name: str) -> RefState | None: return self.ref_states.get((repo_id, ref_name)) + def list_ref_states(self, repo_id: str) -> list[RefState]: + return [ + state + for (rid, _ref), state in self.ref_states.items() + if rid == repo_id + ] + def put_ref_state(self, state: RefState) -> None: self.ref_states[(state.repo_id, state.ref_name)] = state diff --git a/tests/test_incremental_scan_smoke.py b/tests/test_incremental_scan_smoke.py index f6f74c6..8c5304c 100644 --- a/tests/test_incremental_scan_smoke.py +++ b/tests/test_incremental_scan_smoke.py @@ -120,6 +120,9 @@ def record_retryable_failure( job_id: str, error: str, next_attempt_at: dt.datetime, + *, + worker_id: str | None = None, + fence: int | None = None, ) -> None: raise AssertionError(f"unexpected retryable failure: {job_id} {error}") @@ -134,18 +137,27 @@ def acquire_repo_lease( repo_id: str, worker_id: str, lease_seconds: int, - ) -> bool: + ) -> int | None: + existing = self.repo_leases.get(repo_id) + fence = (existing.fence + 1) if existing is not None else 1 self.repo_leases[repo_id] = RepoLease( repo_id=repo_id, worker_id=worker_id, lease_until=NOW + dt.timedelta(seconds=lease_seconds), updated_at=NOW, + fence=fence, ) - return True + return fence - def release_repo_lease(self, repo_id: str, worker_id: str) -> None: + def release_repo_lease( + self, repo_id: str, worker_id: str, *, fence: int | None = None + ) -> None: lease = self.repo_leases.get(repo_id) - if lease and lease.worker_id == worker_id: + if ( + lease + and lease.worker_id == worker_id + and (fence is None or lease.fence == fence) + ): del self.repo_leases[repo_id] def get_queue_status(self, now: dt.datetime) -> QueueStatus: diff --git a/tests/test_incremental_scan_storage.py b/tests/test_incremental_scan_storage.py index 907ff19..104cfa9 100644 --- a/tests/test_incremental_scan_storage.py +++ b/tests/test_incremental_scan_storage.py @@ -4,10 +4,10 @@ import datetime as dt import hashlib +import random from security_scanner.core.finding.model import Finding from security_scanner.storage.adapters.nosql_db.items import ( - datetime_to_iso, ref_state_from_item, ref_state_to_item, repo_id_for_scan_target_url, @@ -32,7 +32,6 @@ ScanLedgerEntry, ) - NOW = dt.datetime(2026, 6, 12, 9, 0, tzinfo=dt.UTC) REPO_URL = "https://github.com/example-org/example-repo" REPO_ID = repo_id_for_scan_target_url(REPO_URL) @@ -110,6 +109,14 @@ def query(self, **kwargs) -> dict: key=lambda item: item.get(sk_attr, ""), reverse=not kwargs.get("ScanIndexForward", True), ) + if "Limit" in kwargs: + # honor the per-shard bound so bounded-dequeue boundedness is real. + items = items[: kwargs["Limit"]] + # Select=COUNT returns only a count and no item bodies, modeling real + # DynamoDB so the SC-7 read-API backlog path can prove it never reads + # SCAN_JOB rows (and never Scans the table). + if kwargs.get("Select") == "COUNT": + return {"Count": len(items), "ScannedCount": len(items)} return {"Items": [dict(item) for item in items]} def scan(self, **kwargs) -> dict: @@ -142,7 +149,38 @@ def _condition_allows(self, kwargs: dict, existing: dict | None) -> bool: if expression == "attribute_not_exists(PK) OR leaseUntil <= :now": return existing is None or existing["leaseUntil"] <= values[":now"] if expression == "workerId = :worker_id": - return existing is not None and existing.get("workerId") == values[":worker_id"] + return ( + existing is not None + and existing.get("workerId") == values[":worker_id"] + ) + if expression == "workerId = :worker_id AND fence = :fence": + # fence-aware repo-lease release: only the current holder (same + # worker AND same fence) may delete the lease (FR-6/SC-2). + return ( + existing is not None + and existing.get("workerId") == values[":worker_id"] + and existing.get("fence") == values[":fence"] + ) + if expression == ( + "leaseUntil <= :now AND workerId = :worker_id AND fence = :fence" + ): + # reaper repo-lease delete: still expired AND still the same holder. + return ( + existing is not None + and existing.get("leaseUntil", "") <= values[":now"] + and existing.get("workerId") == values[":worker_id"] + and existing.get("fence") == values[":fence"] + ) + if expression == ( + "#status = :leased AND leaseUntil <= :now AND fence = :fence" + ): + # reaper job reclaim: still leased, still expired, same fence. + return ( + existing is not None + and existing.get("status") == values[":leased"] + and existing.get("leaseUntil", "") <= values[":now"] + and existing.get("fence") == values[":fence"] + ) if "nextAttemptAt <= :now" in expression: if existing is None: return False @@ -153,11 +191,18 @@ def _condition_allows(self, kwargs: dict, existing: dict | None) -> bool: existing.get("status") == values[":leased"] and existing.get("leaseUntil", "") <= values[":now"] ) - if "#status = :leased OR #status = :completed" in expression: - return existing is not None and existing.get("status") in { - values[":leased"], - values[":completed"], - } + if "#status = :completed OR" in expression and "fence = :fence" in expression: + # fence-aware completion: idempotent re-complete OR the leased job + # still owned by this worker+fence (FR-6/SC-2). + if existing is None: + return False + if existing.get("status") == values[":completed"]: + return True + return ( + existing.get("status") == values[":leased"] + and existing.get("workerId") == values[":worker_id"] + and existing.get("fence") == values[":fence"] + ) raise AssertionError(f"unsupported condition expression: {expression}") @@ -288,9 +333,9 @@ def test_incremental_item_mappers_round_trip_all_entity_types(): def test_repo_id_and_job_id_are_deterministic_from_contract_fields(): normalized = "https://github.com/example-org/example-repo" - expected_repo_id = "repo_" + hashlib.sha256( - normalized.encode("utf-8") - ).hexdigest()[:24] + expected_repo_id = ( + "repo_" + hashlib.sha256(normalized.encode("utf-8")).hexdigest()[:24] + ) assert repo_id_for_scan_target_url(f"{normalized}/") == expected_repo_id assert repo_id_for_scan_target_url(f"{normalized}?tab=code") == expected_repo_id @@ -382,8 +427,10 @@ def test_expired_job_lease_can_be_reclaimed(): def test_repo_lease_acquire_release_and_expired_count_are_conditional(): store, table = _make_store() - assert store.acquire_repo_lease(REPO_ID, "worker-a", lease_seconds=60) is True - assert store.acquire_repo_lease(REPO_ID, "worker-b", lease_seconds=60) is False + # acquire now returns the granted fence token (truthy) or None (FR-6/SC-2). + first_fence = store.acquire_repo_lease(REPO_ID, "worker-a", lease_seconds=60) + assert first_fence == 1 + assert store.acquire_repo_lease(REPO_ID, "worker-b", lease_seconds=60) is None store.release_repo_lease(REPO_ID, "worker-b") assert table.get_item(Key={"PK": f"REPO_LEASE#{REPO_ID}", "SK": "META"}).get("Item") @@ -396,9 +443,11 @@ def test_repo_lease_acquire_release_and_expired_count_are_conditional(): worker_id="worker-old", lease_until=dt.datetime(2000, 1, 1, tzinfo=dt.UTC), updated_at=dt.datetime(2000, 1, 1, tzinfo=dt.UTC), + fence=7, ) table.put_item(Item=repo_lease_to_item(expired)) - assert store.acquire_repo_lease(REPO_ID, "worker-c", lease_seconds=60) is True + # stealing an expired lease mints a strictly larger fence (prior + 1). + assert store.acquire_repo_lease(REPO_ID, "worker-c", lease_seconds=60) == 8 status = store.get_queue_status(now=dt.datetime(2100, 1, 1, tzinfo=dt.UTC)) assert status.expired_repo_leases == 1 @@ -555,7 +604,9 @@ def test_ledger_present_leased_job_completes_without_rewriting_findings(): table.put_item(Item=scan_ledger_entry_to_item(ledger)) before = len(table.put_calls) - store.complete_processed_job(leased, findings=[_make_finding(leased)], ledger=ledger) + store.complete_processed_job( + leased, findings=[_make_finding(leased)], ledger=ledger + ) written_types = [item["entityType"] for item in table.put_calls[before:]] assert written_types == ["SCAN_JOB"] @@ -605,3 +656,401 @@ def test_rule_pack_version_change_invalidates_ledger_and_triggers_rescan(): scan_jobs = [item for item in table.items if item["entityType"] == "SCAN_JOB"] assert len(scan_jobs) == 1 assert scan_jobs[0]["rulePackVersion"] == "secret-rules-0.2.0" + + +# --- M2: fencing token (FR-6/SC-2) ---------------------------------------- + + +def test_scan_job_item_defaults_fence_for_legacy_items(): + # backward-compatible decode: a SCAN_JOB item written before fencing has no + # fence/leaseExpiryCount attributes and must decode with safe defaults. + item = scan_job_to_item(_make_job()) + for attr in ("fence", "leaseExpiryCount", "maxLeaseExpiries"): + item.pop(attr, None) + + decoded = scan_job_from_item(item) + assert decoded.fence == 0 + assert decoded.lease_expiry_count == 0 + assert decoded.max_lease_expiries == 5 + + +def test_lease_bumps_fence_so_completion_is_fenced(): + store, table = _make_store() + job = _make_job() + store.enqueue_commit_scan_job(job) # fence 0 at enqueue + + leased = store.lease_next_scan_job("worker-a", lease_seconds=60, now=NOW) + assert leased is not None + # (re)lease minted a strictly larger fence carried by the worker. + assert leased.fence == 1 + + +def test_fence_rejects_stale_workers_completion_after_reclaim(): + # The double-scan + stale-completion race (FR-6/SC-2): worker-a leases, its + # lease is reaped + re-leased to worker-b, then the slow worker-a tries to + # complete with its now-stale fence and MUST be rejected. + store, table = _make_store() + job = _make_job() + store.enqueue_commit_scan_job(job) + + a_leased = store.lease_next_scan_job("worker-a", lease_seconds=1, now=NOW) + assert a_leased is not None and a_leased.fence == 1 + + # lease expires; reaper reclaims and fences worker-a out. + later = NOW + dt.timedelta(seconds=5) + summary = store.reap_expired_leases(now=later) + assert summary.jobs_returned_to_pending == 1 + + # worker-b re-leases (fence advances again). + b_leased = store.lease_next_scan_job("worker-b", lease_seconds=60, now=later) + assert b_leased is not None and b_leased.fence == 3 # +1 reap, +1 release + + # stale worker-a completion with fence=1 is rejected; the job is NOT marked + # completed and stays leased by worker-b. + store.complete_processed_job( + a_leased, findings=[_make_finding(a_leased)], ledger=_make_ledger(a_leased) + ) + current = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{job.job_id}", "SK": "META"})["Item"] + ) + assert current.status == "leased" + assert current.worker_id == "worker-b" + assert current.fence == 3 + + # the legitimate current holder (worker-b) can still complete. + store.complete_processed_job( + b_leased, findings=[_make_finding(b_leased)], ledger=_make_ledger(b_leased) + ) + status = store.get_queue_status(now=later + dt.timedelta(minutes=10)) + assert status.job_counts_by_status == {"completed": 1} + + +def test_fence_rejects_stale_retryable_failure_after_reclaim(): + store, table = _make_store() + job = _make_job() + store.enqueue_commit_scan_job(job) + a_leased = store.lease_next_scan_job("worker-a", lease_seconds=1, now=NOW) + assert a_leased is not None + + later = NOW + dt.timedelta(seconds=5) + store.reap_expired_leases(now=later) + b_leased = store.lease_next_scan_job("worker-b", lease_seconds=60, now=later) + assert b_leased is not None + + # stale worker-a failure write with its old fence is a no-op. + store.record_retryable_failure( + job.job_id, + error="stale failure from reaped worker", + next_attempt_at=later + dt.timedelta(minutes=5), + worker_id="worker-a", + fence=a_leased.fence, + ) + current = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{job.job_id}", "SK": "META"})["Item"] + ) + assert current.status == "leased" + assert current.worker_id == "worker-b" + assert current.attempts == 0 # the stale failure did not consume an attempt + assert current.last_error != "stale failure from reaped worker" + + +def test_fence_rejects_stale_repo_lease_release_after_reclaim(): + # release_repo_lease must not delete a lease the worker no longer owns: after + # worker-a's lease is stolen (expired) by worker-c with a larger fence, a + # late release from worker-a (fence=1) leaves worker-c's lease intact. + store, table = _make_store() + a_fence = store.acquire_repo_lease(REPO_ID, "worker-a", lease_seconds=0) + assert a_fence == 1 + + # lease is immediately expired (lease_seconds=0); worker-c steals it. + c_fence = store.acquire_repo_lease(REPO_ID, "worker-c", lease_seconds=60) + assert c_fence == 2 + + store.release_repo_lease(REPO_ID, "worker-a", fence=a_fence) + held = table.get_item(Key={"PK": f"REPO_LEASE#{REPO_ID}", "SK": "META"}).get("Item") + assert held is not None + assert held["workerId"] == "worker-c" + assert held["fence"] == 2 + + # the real owner can release. + store.release_repo_lease(REPO_ID, "worker-c", fence=c_fence) + assert not table.get_item(Key={"PK": f"REPO_LEASE#{REPO_ID}", "SK": "META"}) + + +# --- M2: lease-reaper (FR-6) ---------------------------------------------- + + +def test_reaper_reclaims_expired_job_lease_and_fences_old_holder(): + store, table = _make_store() + expired = _make_job( + status="leased", + worker_id="worker-old", + lease_until=NOW - dt.timedelta(seconds=1), + ) + expired = ScanJob(**{**expired.__dict__, "fence": 4}) + table.put_item(Item=scan_job_to_item(expired)) + + summary = store.reap_expired_leases(now=NOW) + + assert summary.jobs_returned_to_pending == 1 + assert summary.jobs_dead_lettered == 0 + reclaimed = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{expired.job_id}", "SK": "META"})["Item"] + ) + assert reclaimed.status == "pending" + assert reclaimed.worker_id is None + assert reclaimed.fence == 5 # fenced out the prior holder + assert reclaimed.lease_expiry_count == 1 + # attempts is NOT consumed by a lease expiry (distinct from failure). + assert reclaimed.attempts == 0 + + +def test_reaper_releases_expired_repo_lease_and_leaves_live_one(): + store, table = _make_store() + table.put_item( + Item=repo_lease_to_item( + RepoLease( + repo_id=REPO_ID, + worker_id="worker-dead", + lease_until=NOW - dt.timedelta(seconds=1), + updated_at=NOW, + fence=3, + ) + ) + ) + live_repo = "repo_live00000000000000000001" + table.put_item( + Item=repo_lease_to_item( + RepoLease( + repo_id=live_repo, + worker_id="worker-live", + lease_until=NOW + dt.timedelta(minutes=5), + updated_at=NOW, + fence=1, + ) + ) + ) + + summary = store.reap_expired_leases(now=NOW) + + assert summary.repo_leases_released == 1 + assert not table.get_item(Key={"PK": f"REPO_LEASE#{REPO_ID}", "SK": "META"}) + assert table.get_item(Key={"PK": f"REPO_LEASE#{live_repo}", "SK": "META"}).get( + "Item" + ) + + +def test_reaper_does_not_touch_live_job_lease(): + store, table = _make_store() + live = _make_job( + status="leased", + worker_id="worker-busy", + lease_until=NOW + dt.timedelta(minutes=10), + ) + table.put_item(Item=scan_job_to_item(live)) + + summary = store.reap_expired_leases(now=NOW) + + assert summary == summary.__class__() # all zero + current = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{live.job_id}", "SK": "META"})["Item"] + ) + assert current.status == "leased" + assert current.worker_id == "worker-busy" + + +# --- M2: bounded ordered dequeue (FR-5/SC-1) ------------------------------ + + +def _pending_job(commit_digit: str, next_attempt_at: dt.datetime) -> ScanJob: + return _make_job(commit_sha=commit_digit * 40, next_attempt_at=next_attempt_at) + + +def test_bounded_candidate_window_is_globally_ordered_and_bounded(): + # FR-5/SC-1: the candidate window is the GLOBAL FIFO head (earliest + # nextAttemptAt first), truncated to `window`, read with a per-shard Limit + # and never via a full-partition SCAN. + store, table = _make_store() + base = NOW - dt.timedelta(minutes=20) + jobs = [_pending_job(str(i), base + dt.timedelta(minutes=i)) for i in range(1, 7)] + for job in jobs: + store.enqueue_commit_scan_job(job) + + table.query_calls.clear() + table.scan_calls.clear() + + window = store._read_lease_candidate_window(now=NOW, window=3, include_legacy=False) + + # exactly the 3 earliest-available jobs, in FIFO order. + assert [j.job_id for j in window] == [j.job_id for j in jobs[:3]] + # bounded: a per-shard Limit was sent, and NO full-table SCAN_JOB scan ran. + assert any(call.get("Limit") == 3 for call in table.query_calls) + assert all( + call["ExpressionAttributeValues"].get(":entity_type") != "SCAN_JOB" + for call in table.scan_calls + ) + + +def test_single_worker_leases_head_of_its_preferred_shard_order(): + # With one worker, the leased job is the head of that worker's spread order + # (preferred shard first, FIFO within), drawn from the bounded window — and + # it is necessarily a member of the global head window (no full read). + store, _ = _make_store() + base = NOW - dt.timedelta(minutes=20) + jobs = [_pending_job(str(i), base + dt.timedelta(minutes=i)) for i in range(1, 7)] + for job in jobs: + store.enqueue_commit_scan_job(job) + + window = store._read_lease_candidate_window(now=NOW, window=6, include_legacy=False) + expected_head = store._spread_candidates(window, "worker-a")[0] + + leased = store.lease_next_scan_job("worker-a", lease_seconds=60, now=NOW) + + assert leased is not None + assert leased.job_id == expected_head.job_id + + +def test_bounded_dequeue_window_does_not_read_whole_partition(): + store, table = _make_store() + base = NOW - dt.timedelta(minutes=30) + for i in range(1, 13): + store.enqueue_commit_scan_job( + _pending_job(f"{i:x}"[-1], base + dt.timedelta(minutes=i)) + ) + + table.query_calls.clear() + store.lease_next_scan_job("worker-a", lease_seconds=60, now=NOW, dequeue_window=3) + + # every queue query carried a Limit (== window) — never an unbounded read. + queue_queries = [ + call + for call in table.query_calls + if str(call["ExpressionAttributeValues"].get(":pk", "")).startswith( + "SCAN_JOB_STATUS#" + ) + ] + assert queue_queries + assert all(call.get("Limit") == 3 for call in queue_queries) + + +def test_cas_loss_applies_jittered_backoff_and_recovers(monkeypatch): + # SC-4: when a CAS attempt loses the head to another worker, the dequeue + # applies a (deterministic, injected) jittered backoff and retries the next + # candidate rather than giving up or hot-looping. + store, _ = _make_store() + base = NOW - dt.timedelta(minutes=10) + for i in range(1, 4): + store.enqueue_commit_scan_job( + _pending_job(str(i), base + dt.timedelta(minutes=i)) + ) + + real_try = store._try_lease_scan_job + state = {"first": True} + + def flaky_try(**kwargs): + if state["first"]: + state["first"] = False + return None # simulate losing the head to a concurrent worker + return real_try(**kwargs) + + monkeypatch.setattr(store, "_try_lease_scan_job", flaky_try) + sleeps: list[float] = [] + + leased = store.lease_next_scan_job( + "worker-a", + lease_seconds=60, + now=NOW, + rng=random.Random(0), + jitter_sleep=sleeps.append, + ) + + assert leased is not None # recovered on a later candidate + assert len(sleeps) == 1 # exactly one jittered backoff after the CAS loss + assert 0.0 <= sleeps[0] <= 0.05 + + +def test_concurrent_workers_do_not_double_lease_the_same_job(): + store, _ = _make_store() + job = _make_job() + store.enqueue_commit_scan_job(job) + + results = [ + store.lease_next_scan_job(f"worker-{i}", lease_seconds=60, now=NOW) + for i in range(8) + ] + leased = [r for r in results if r is not None] + + # exactly one worker wins the single job; the rest get None (CAS losers). + assert len(leased) == 1 + assert leased[0].status == "leased" + + +def test_dequeue_spreads_across_workers_via_preferred_shard(): + # with many distinct jobs, two workers leasing concurrently should be able to + # each take a job (preferred-shard spread reduces head-of-line contention). + store, _ = _make_store() + base = NOW - dt.timedelta(minutes=20) + for i in range(1, 9): + store.enqueue_commit_scan_job( + _pending_job(f"{i:x}"[-1], base + dt.timedelta(minutes=i)) + ) + + a = store.lease_next_scan_job("worker-a", lease_seconds=60, now=NOW) + b = store.lease_next_scan_job("worker-b", lease_seconds=60, now=NOW) + + assert a is not None and b is not None + assert a.job_id != b.job_id + + +# --- M2: starvation backoff (SC-8) ---------------------------------------- + + +def test_repeated_lease_expiry_backs_off_next_attempt_off_the_head(): + store, table = _make_store() + job = _make_job() + store.enqueue_commit_scan_job(job) + + # lease + let it expire + reap, twice; nextAttemptAt should be pushed forward + # so the job stops sitting on the FIFO head. + t = NOW + for _ in range(2): + leased = store.lease_next_scan_job("worker-x", lease_seconds=1, now=t) + assert leased is not None + t = t + dt.timedelta(seconds=5) + store.reap_expired_leases(now=t) + + reclaimed = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{job.job_id}", "SK": "META"})["Item"] + ) + assert reclaimed.status == "pending" + assert reclaimed.lease_expiry_count == 2 + # pushed strictly past the reap instant (exponential starvation backoff). + assert reclaimed.next_attempt_at > t + + +def test_lease_expiry_beyond_max_routes_to_dead_letter_separate_from_attempts(): + store, table = _make_store() + # max_lease_expiries=2 so the 2nd expiry reap dead-letters; attempts stays 0 + # (lease-expiry budget is SEPARATE from the failure-attempts budget). + job = _make_job(max_attempts=3) + job = ScanJob(**{**job.__dict__, "max_lease_expiries": 2}) + store.enqueue_commit_scan_job(job) + + t = NOW + dead = False + for _ in range(2): + leased = store.lease_next_scan_job("worker-x", lease_seconds=1, now=t) + if leased is None: + break + t = t + dt.timedelta(seconds=5) + summary = store.reap_expired_leases(now=t) + if summary.jobs_dead_lettered: + dead = True + + assert dead is True + final = scan_job_from_item( + table.get_item(Key={"PK": f"SCAN_JOB#{job.job_id}", "SK": "META"})["Item"] + ) + assert final.status == "dead_letter" + assert final.attempts == 0 # not a failure attempt + assert final.lease_expiry_count == 2 diff --git a/tests/test_m4_poll_baseline.py b/tests/test_m4_poll_baseline.py new file mode 100644 index 0000000..c9101ee --- /dev/null +++ b/tests/test_m4_poll_baseline.py @@ -0,0 +1,781 @@ +"""M4 (FR-2/FR-3, SC-3/SC-6): catalog-driven poll + baseline queue routing. + +Autonomously-verifiable LOGIC with injectable fakes — no docker, no real git, +no network. Covers the reviewer's SC-3 (baseline per-repo enqueue + priority + +backpressure + rolling) and SC-6 (ls-remote skip, bounded concurrent fetch, +cache isolation, cadence-overrun alert) findings. +""" + +from __future__ import annotations + +import datetime as dt +from pathlib import Path + +import pytest + +from security_scanner.catalog.scan_target import ScanTarget +from security_scanner.runtime.baseline_enqueue import ( + BASELINE_JOB_PRIORITY, + BaselineEnqueueRequest, + BaselineScannerConfig, + run_baseline_enqueue, + select_rolling_slice, +) +from security_scanner.runtime.incremental_discovery import ( + DEFAULT_JOB_PRIORITY, + DEFAULT_REF_PATTERNS, + DISCOVERY_MODE_ENQUEUE, + DISCOVERY_MODE_INITIALIZE, + DiscoveryScannerConfig, + GitRef, + IncrementalDiscoveryRequest, + PollCadenceSignal, + _cursor_shas_for, + alert_poll_cadence_overrun, + catalog_repo_targets, + evaluate_poll_cadence, + run_incremental_discovery, +) +from security_scanner.runtime.poll_fetch import ( + CacheRoots, + FetchOutcome, + FetchTask, + SerialFetchExecutor, + SubprocessLsRemoteRunner, + decide_ls_remote_skip, +) +from security_scanner.storage.adapters.nosql_db.items import ( + repo_id_for_scan_target_url, +) +from security_scanner.storage.base import ( + JOB_TYPE_BASELINE, + JOB_TYPE_INCREMENTAL, + CatalogEntry, + QueueStatus, + RefState, + ScanJob, + ScanLedgerKey, +) + +NOW = dt.datetime(2026, 6, 20, 10, 0, tzinfo=dt.UTC) +REF_MAIN = "refs/remotes/origin/main" +OLD_SHA = "a" * 40 +NEW_SHA = "c" * 40 + +INCLUDED_A = "https://github.com/example-org/repo-a" +INCLUDED_B = "https://github.com/example-org/repo-b" +EXCLUDED_C = "https://github.com/example-org/opted-out" + + +# --------------------------------------------------------------------------- +# Fakes +# --------------------------------------------------------------------------- + + +class FakeStore: + """Fake catalog + incremental queue store for M4 logic tests.""" + + def __init__( + self, + *, + catalog: list[CatalogEntry] | None = None, + targets: list[ScanTarget] | None = None, + ) -> None: + self.catalog = catalog or [] + self.targets = targets or [] + self.ref_states: dict[tuple[str, str], RefState] = {} + self.jobs: dict[str, ScanJob] = {} + self.ledger: set[ScanLedgerKey] = set() + self.pending_backlog = 0 + + # CatalogStore + def read_all_catalog_entries(self) -> list[CatalogEntry]: + return list(self.catalog) + + # IncrementalScanStore (subset used by discovery/baseline) + def list_scan_targets(self) -> list[ScanTarget]: + return list(self.targets) + + def get_ref_state(self, repo_id: str, ref_name: str) -> RefState | None: + return self.ref_states.get((repo_id, ref_name)) + + def list_ref_states(self, repo_id: str) -> list[RefState]: + return [ + state + for (rid, _ref), state in self.ref_states.items() + if rid == repo_id + ] + + def put_ref_state(self, state: RefState) -> None: + self.ref_states[(state.repo_id, state.ref_name)] = state + + def has_scan_ledger(self, key: ScanLedgerKey) -> bool: + return key in self.ledger + + def enqueue_commit_scan_job(self, job: ScanJob) -> bool: + if job.job_id in self.jobs: + return False + self.jobs[job.job_id] = job + return True + + def get_queue_status(self, now: dt.datetime) -> QueueStatus: + return QueueStatus( + job_counts_by_status={"pending": self.pending_backlog}, + expired_job_leases=0, + expired_repo_leases=0, + ) + + +class FakeGit: + def __init__(self) -> None: + self.refs_by_path: dict[Path, list[GitRef]] = {} + self.ancestor: dict[tuple[Path, str, str], bool] = {} + self.commits: dict[tuple[Path, str, str], list[str]] = {} + + def fetch(self, repo_path: Path) -> None: + return None + + def list_remote_refs(self, repo_path: Path, patterns) -> list[GitRef]: + return self.refs_by_path.get(repo_path, []) + + def is_ancestor(self, repo_path: Path, old_sha: str, new_sha: str) -> bool: + return self.ancestor[(repo_path, old_sha, new_sha)] + + def list_new_commits(self, repo_path, old_sha, new_sha) -> list[str]: + return self.commits[(repo_path, old_sha, new_sha)] + + +class FakeLsRemote: + """Fake ls-remote runner returning a canned {url: {ref: sha}} map.""" + + def __init__(self, by_url: dict[str, dict[str, str]]) -> None: + self.by_url = by_url + self.calls: list[str] = [] + + def ls_remote(self, repo_url: str, patterns): + self.calls.append(repo_url) + return dict(self.by_url.get(repo_url, {})) + + +class RecordingFetchExecutor: + """Executor that records peak in-flight to prove the bound is honored.""" + + def __init__(self, max_concurrency_seen: list[int]) -> None: + self.max_concurrency_seen = max_concurrency_seen + self.fetched_urls: list[str] = [] + + def map_bounded(self, fn, tasks, max_concurrency): + self.max_concurrency_seen.append(max_concurrency) + outcomes = [] + for task in tasks: + self.fetched_urls.append(task.repo_url) + outcomes.append(fn(task)) + return outcomes + + +def _scanner() -> DiscoveryScannerConfig: + return DiscoveryScannerConfig( + scanner_name="gitleaks", + scanner_version="unknown", + rule_pack_version="secret-rules-0.1.0", + scanner_config_hash="default", + ) + + +def _baseline_scanner() -> BaselineScannerConfig: + return BaselineScannerConfig( + scanner_name="gitleaks", + scanner_version="unknown", + rule_pack_version="secret-rules-0.1.0", + scanner_config_hash="default", + ) + + +def _catalog() -> list[CatalogEntry]: + return [ + CatalogEntry( + repo_id=repo_id_for_scan_target_url(INCLUDED_A), + repo_url=INCLUDED_A, + included=True, + first_seen="2026-06-01T00:00:00+00:00", + last_reconciled="2026-06-20T00:00:00+00:00", + ), + CatalogEntry( + repo_id=repo_id_for_scan_target_url(INCLUDED_B), + repo_url=INCLUDED_B, + included=True, + first_seen="2026-06-01T00:00:00+00:00", + last_reconciled="2026-06-20T00:00:00+00:00", + ), + CatalogEntry( + repo_id=repo_id_for_scan_target_url(EXCLUDED_C), + repo_url=EXCLUDED_C, + included=False, + first_seen="2026-06-01T00:00:00+00:00", + last_reconciled="2026-06-20T00:00:00+00:00", + excluded_reason="opt-out", + ), + ] + + +# --------------------------------------------------------------------------- +# 1. Catalog-driven discovery (FR-2) +# --------------------------------------------------------------------------- + + +def test_catalog_repo_targets_skips_excluded_repos(): + store = FakeStore(catalog=_catalog()) + targets = catalog_repo_targets(store) + urls = {t.url for t in targets} + assert urls == {INCLUDED_A, INCLUDED_B} + assert EXCLUDED_C not in urls + + +def test_discovery_consumes_catalog_targets_not_manifest(): + # Manifest list is intentionally different from the catalog to prove the + # catalog seam (passed via targets=) wins. + store = FakeStore( + catalog=_catalog(), + targets=[ScanTarget(url="https://github.com/old/manifest", name="m")], + ) + git = FakeGit() + repo_path = Path("/mirror/a") + git.refs_by_path[repo_path] = [GitRef(ref_name=REF_MAIN, commit_sha=NEW_SHA)] + + summary = run_incremental_discovery( + IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_INITIALIZE, + store=store, + fetch_repo=lambda url: repo_path, + git=git, + scanner=_scanner(), + targets=catalog_repo_targets(store), + now_factory=lambda: NOW, + ) + ) + + # Two INCLUDED catalog repos, excluded one skipped; manifest repo ignored. + assert summary.targets == 2 + + +# --------------------------------------------------------------------------- +# 2. ls-remote skip (SC-6a) +# --------------------------------------------------------------------------- + + +def test_decide_ls_remote_skip_unchanged_is_not_changed(): + decision = decide_ls_remote_skip( + repo_id="repo_x", + repo_url=INCLUDED_A, + ls_remote=FakeLsRemote({INCLUDED_A: {REF_MAIN: OLD_SHA}}), + cursor_shas={REF_MAIN: OLD_SHA}, + patterns=(REF_MAIN,), + ) + assert decision.changed is False + + +def test_decide_ls_remote_skip_moved_ref_is_changed(): + decision = decide_ls_remote_skip( + repo_id="repo_x", + repo_url=INCLUDED_A, + ls_remote=FakeLsRemote({INCLUDED_A: {REF_MAIN: NEW_SHA}}), + cursor_shas={REF_MAIN: OLD_SHA}, + patterns=(REF_MAIN,), + ) + assert decision.changed is True + + +def test_decide_ls_remote_skip_probe_error_is_failsafe_changed(): + class Boom: + def ls_remote(self, repo_url, patterns): + raise RuntimeError("ls-remote boom") + + decision = decide_ls_remote_skip( + repo_id="repo_x", + repo_url=INCLUDED_A, + ls_remote=Boom(), + cursor_shas={REF_MAIN: OLD_SHA}, + patterns=(REF_MAIN,), + ) + # Fail-safe: a probe error never silently suppresses a fetch. + assert decision.changed is True + assert decision.error is not None + + +def test_subprocess_ls_remote_passes_patterns_to_git(monkeypatch): + # Regression: patterns were accepted but never appended to the git command, + # so the remote returned ALL refs instead of the requested subset. + captured: dict[str, list[str]] = {} + + class _Result: + stdout = "" + + def fake_run(cmd, **kwargs): + captured["cmd"] = list(cmd) + return _Result() + + monkeypatch.setattr( + "security_scanner.runtime.poll_fetch.subprocess.run", fake_run + ) + SubprocessLsRemoteRunner().ls_remote( + INCLUDED_A, ["refs/heads/main", "refs/heads/release/*"] + ) + + assert captured["cmd"] == [ + "git", + "ls-remote", + INCLUDED_A, + "refs/heads/main", + "refs/heads/release/*", + ] + + +def test_discovery_ls_remote_unchanged_skips_fetch_no_job(): + store = FakeStore(catalog=_catalog()) + repo_id_a = repo_id_for_scan_target_url(INCLUDED_A) + repo_id_b = repo_id_for_scan_target_url(INCLUDED_B) + # Both repos have a cursor at OLD_SHA; ls-remote reports OLD_SHA -> idle. + for rid, url in ((repo_id_a, INCLUDED_A), (repo_id_b, INCLUDED_B)): + store.put_ref_state( + RefState( + repo_id=rid, + repo_url=url, + ref_name=REF_MAIN, + last_seen_sha=OLD_SHA, + updated_at=NOW, + ) + ) + ls_remote = FakeLsRemote( + {INCLUDED_A: {REF_MAIN: OLD_SHA}, INCLUDED_B: {REF_MAIN: OLD_SHA}} + ) + fetched_urls: list[str] = [] + + def fetch_repo(url: str) -> Path: + fetched_urls.append(url) + return Path("/mirror") / url.rsplit("/", 1)[-1] + + summary = run_incremental_discovery( + IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_ENQUEUE, + store=store, + fetch_repo=fetch_repo, + git=FakeGit(), + scanner=_scanner(), + targets=catalog_repo_targets(store), + ref_patterns=(REF_MAIN,), + ls_remote=ls_remote, + now_factory=lambda: NOW, + ) + ) + + assert summary.skipped_idle == 2 + assert summary.fetch_ok == 0 + assert fetched_urls == [] # SKIP: no fetch ran for idle repos + assert summary.jobs_enqueued == 0 + + +def test_cursor_shas_for_glob_pattern_yields_non_empty_cursor(): + # Regression: with a GLOB ref pattern (the default), _cursor_shas_for used to + # skip every pattern and return an empty cursor, defeating the ls-remote skip. + store = FakeStore(catalog=_catalog()) + repo_id = repo_id_for_scan_target_url(INCLUDED_A) + store.put_ref_state( + RefState( + repo_id=repo_id, + repo_url=INCLUDED_A, + ref_name=REF_MAIN, # refs/remotes/origin/main matches the glob + last_seen_sha=OLD_SHA, + updated_at=NOW, + ) + ) + request = IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_ENQUEUE, + store=store, + fetch_repo=lambda url: Path("/mirror"), + git=FakeGit(), + scanner=_scanner(), + targets=catalog_repo_targets(store), + ref_patterns=DEFAULT_REF_PATTERNS, # ("refs/remotes/origin/*",) + ) + + cursor = _cursor_shas_for(request, repo_id) + + assert cursor == {REF_MAIN: OLD_SHA} + + +def test_discovery_ls_remote_glob_pattern_unchanged_skips_fetch(): + # End-to-end: an unchanged repo matched by a GLOB ref pattern must be skipped + # (no fetch, no job). Pre-fix this always fetched because the cursor was empty. + store = FakeStore(catalog=_catalog()) + for url in (INCLUDED_A, INCLUDED_B): + store.put_ref_state( + RefState( + repo_id=repo_id_for_scan_target_url(url), + repo_url=url, + ref_name=REF_MAIN, + last_seen_sha=OLD_SHA, + updated_at=NOW, + ) + ) + ls_remote = FakeLsRemote( + {INCLUDED_A: {REF_MAIN: OLD_SHA}, INCLUDED_B: {REF_MAIN: OLD_SHA}} + ) + fetched_urls: list[str] = [] + + def fetch_repo(url: str) -> Path: + fetched_urls.append(url) + return Path("/mirror") / url.rsplit("/", 1)[-1] + + summary = run_incremental_discovery( + IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_ENQUEUE, + store=store, + fetch_repo=fetch_repo, + git=FakeGit(), + scanner=_scanner(), + targets=catalog_repo_targets(store), + ref_patterns=DEFAULT_REF_PATTERNS, # glob, the common case + ls_remote=ls_remote, + now_factory=lambda: NOW, + ) + ) + + assert summary.skipped_idle == 2 + assert summary.fetch_ok == 0 + assert fetched_urls == [] + assert summary.jobs_enqueued == 0 + + +def test_discovery_ls_remote_changed_fetches_and_enqueues_incremental_job(): + store = FakeStore( + catalog=[ + CatalogEntry( + repo_id=repo_id_for_scan_target_url(INCLUDED_A), + repo_url=INCLUDED_A, + included=True, + first_seen="2026-06-01T00:00:00+00:00", + last_reconciled="2026-06-20T00:00:00+00:00", + ) + ] + ) + repo_id = repo_id_for_scan_target_url(INCLUDED_A) + store.put_ref_state( + RefState( + repo_id=repo_id, + repo_url=INCLUDED_A, + ref_name=REF_MAIN, + last_seen_sha=OLD_SHA, + updated_at=NOW, + ) + ) + repo_path = Path("/mirror/repo-a") + git = FakeGit() + git.refs_by_path[repo_path] = [GitRef(ref_name=REF_MAIN, commit_sha=NEW_SHA)] + git.ancestor[(repo_path, OLD_SHA, NEW_SHA)] = True + git.commits[(repo_path, OLD_SHA, NEW_SHA)] = [NEW_SHA] + ls_remote = FakeLsRemote({INCLUDED_A: {REF_MAIN: NEW_SHA}}) + + summary = run_incremental_discovery( + IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_ENQUEUE, + store=store, + fetch_repo=lambda url: repo_path, + git=git, + scanner=_scanner(), + targets=catalog_repo_targets(store), + ref_patterns=(REF_MAIN,), + ls_remote=ls_remote, + now_factory=lambda: NOW, + ) + ) + + assert summary.skipped_idle == 0 + assert summary.fetch_ok == 1 + assert summary.jobs_enqueued == 1 + job = next(iter(store.jobs.values())) + assert job.job_type == JOB_TYPE_INCREMENTAL + assert job.priority == DEFAULT_JOB_PRIORITY + + +# --------------------------------------------------------------------------- +# 3. Bounded concurrent fetch + cache isolation (SC-6b/c) +# --------------------------------------------------------------------------- + + +def test_bounded_executor_receives_concurrency_and_fetches_changed_only(): + store = FakeStore(catalog=_catalog()) + repo_id_a = repo_id_for_scan_target_url(INCLUDED_A) + # repo-a idle (cursor matches), repo-b new (no cursor) -> only b fetched. + store.put_ref_state( + RefState( + repo_id=repo_id_a, + repo_url=INCLUDED_A, + ref_name=REF_MAIN, + last_seen_sha=OLD_SHA, + updated_at=NOW, + ) + ) + ls_remote = FakeLsRemote( + {INCLUDED_A: {REF_MAIN: OLD_SHA}, INCLUDED_B: {REF_MAIN: NEW_SHA}} + ) + seen: list[int] = [] + executor = RecordingFetchExecutor(seen) + + summary = run_incremental_discovery( + IncrementalDiscoveryRequest( + mode=DISCOVERY_MODE_INITIALIZE, + store=store, + fetch_repo=lambda url: Path("/mirror") / url.rsplit("/", 1)[-1], + git=FakeGit(), + scanner=_scanner(), + targets=catalog_repo_targets(store), + ref_patterns=(REF_MAIN,), + ls_remote=ls_remote, + fetch_executor=executor, + fetch_concurrency=4, + now_factory=lambda: NOW, + ) + ) + + assert summary.skipped_idle == 1 + assert seen == [4] # the bound was passed to the executor + assert executor.fetched_urls == [INCLUDED_B] # only the changed repo + + +def test_serial_executor_isolates_per_repo_fetch_failure(): + executor = SerialFetchExecutor() + + def fn(task: FetchTask) -> FetchOutcome: + if task.repo_url.endswith("/repo-b"): + raise RuntimeError("boom") + return FetchOutcome( + repo_id=task.repo_id, + repo_url=task.repo_url, + ok=True, + repo_path=Path("/mirror") / task.repo_url.rsplit("/", 1)[-1], + ) + + tasks = [ + FetchTask(repo_id="a", repo_url=INCLUDED_A), + FetchTask(repo_id="b", repo_url=INCLUDED_B), + ] + outcomes = executor.map_bounded(fn, tasks, max_concurrency=2) + assert outcomes[0].ok is True + assert outcomes[1].ok is False + assert outcomes[1].error is not None + + +def test_cache_roots_must_differ_for_isolation(): + # SC-6c: poll mirror and worker checkout must not be the same .git tree. + CacheRoots( + poll_mirror_root=Path("/cache/poll"), + worker_checkout_root=Path("/cache/worker"), + ) + with pytest.raises(ValueError): + CacheRoots( + poll_mirror_root=Path("/cache/same"), + worker_checkout_root=Path("/cache/same"), + ) + + +# --------------------------------------------------------------------------- +# 4. Baseline per-repo enqueue (SC-3) +# --------------------------------------------------------------------------- + + +def test_baseline_enqueues_one_low_priority_job_per_included_repo(): + store = FakeStore(catalog=_catalog()) + summary = run_baseline_enqueue( + BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=_baseline_scanner(), + rolling_divisor=1, # full pass + now_factory=lambda: NOW, + ) + ) + + assert summary.included_repos == 2 # excluded repo not counted + assert summary.jobs_enqueued == 2 + assert summary.throttled is False + repo_ids = {repo_id_for_scan_target_url(u) for u in (INCLUDED_A, INCLUDED_B)} + assert {job.repo_id for job in store.jobs.values()} == repo_ids + for job in store.jobs.values(): + assert job.job_type == JOB_TYPE_BASELINE + assert job.priority == BASELINE_JOB_PRIORITY + + +def test_baseline_priority_is_lower_precedence_than_incremental(): + # Ascending gsi1sk sort => higher number served later. Baseline must be + # numerically GREATER than incremental so it never starves detection. + assert BASELINE_JOB_PRIORITY > DEFAULT_JOB_PRIORITY + + +def test_baseline_enqueue_is_idempotent_duplicate_skipped(): + store = FakeStore(catalog=_catalog()) + req = BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=_baseline_scanner(), + rolling_divisor=1, + now_factory=lambda: NOW, + ) + run_baseline_enqueue(req) + second = run_baseline_enqueue(req) + assert second.jobs_enqueued == 0 + assert second.duplicates_skipped == 2 + + +# --------------------------------------------------------------------------- +# 4b. Backpressure (SC-3) +# --------------------------------------------------------------------------- + + +def test_baseline_backpressure_throttles_when_backlog_high(): + store = FakeStore(catalog=_catalog()) + store.pending_backlog = 5000 # over threshold + summary = run_baseline_enqueue( + BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=_baseline_scanner(), + backpressure_threshold=1000, + rolling_divisor=1, + now_factory=lambda: NOW, + ) + ) + assert summary.throttled is True + assert summary.jobs_enqueued == 0 + assert store.jobs == {} # nothing enqueued under backpressure + + +def test_baseline_enqueues_when_backlog_below_threshold(): + store = FakeStore(catalog=_catalog()) + store.pending_backlog = 10 # under threshold + summary = run_baseline_enqueue( + BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=_baseline_scanner(), + backpressure_threshold=1000, + rolling_divisor=1, + now_factory=lambda: NOW, + ) + ) + assert summary.throttled is False + assert summary.jobs_enqueued == 2 + + +# --------------------------------------------------------------------------- +# 4c. Rolling baseline (SC-3) +# --------------------------------------------------------------------------- + + +def test_rolling_slice_is_deterministic_and_partitions_repos(): + repo_ids = [f"repo_{i:02d}" for i in range(50)] + divisor = 7 + # Union over all offsets == every repo exactly once (a clean partition). + seen: list[str] = [] + for offset in range(divisor): + slice_ids = select_rolling_slice(repo_ids, divisor=divisor, offset=offset) + seen.extend(slice_ids) + assert sorted(seen) == sorted(repo_ids) + assert len(seen) == len(set(seen)) # no repo in two slices + # Determinism: same offset yields the same slice. + assert select_rolling_slice(repo_ids, divisor=divisor, offset=3) == ( + select_rolling_slice(repo_ids, divisor=divisor, offset=3) + ) + + +def test_rolling_slice_offset_wraps_modulo_divisor(): + repo_ids = [f"repo_{i:02d}" for i in range(50)] + assert select_rolling_slice(repo_ids, divisor=7, offset=0) == ( + select_rolling_slice(repo_ids, divisor=7, offset=7) + ) + + +def test_baseline_rolling_selects_subset_only(): + # 2 included repos with divisor 7: at most one bucket matches per offset, so + # the selected subset is strictly smaller than the full included set for at + # least one offset (the rolling fallback never enqueues all repos at once). + store = FakeStore(catalog=_catalog()) + enqueued_total = 0 + for offset in range(7): + store.jobs.clear() + summary = run_baseline_enqueue( + BaselineEnqueueRequest( + catalog_store=store, + queue_store=store, + scanner=_baseline_scanner(), + rolling_divisor=7, + rolling_offset=offset, + now_factory=lambda: NOW, + ) + ) + assert summary.selected_repos <= 2 + enqueued_total += summary.jobs_enqueued + # Over a full cycle every included repo is enqueued exactly once. + assert enqueued_total == 2 + + +# --------------------------------------------------------------------------- +# 5. Cadence-overrun alert seam (SC-6d) +# --------------------------------------------------------------------------- + + +def test_poll_cadence_overrun_raises_signal(): + signal = evaluate_poll_cadence( + cycle_seconds=420.0, cadence_seconds=300.0, targets=500 + ) + assert isinstance(signal, PollCadenceSignal) + assert signal.overrun is True + assert signal.overrun_seconds == pytest.approx(120.0) + + +def test_poll_cadence_within_budget_no_overrun(): + signal = evaluate_poll_cadence( + cycle_seconds=120.0, cadence_seconds=300.0, targets=500 + ) + assert signal.overrun is False + assert signal.overrun_seconds == 0.0 + + +def test_poll_cadence_disabled_when_no_budget(): + signal = evaluate_poll_cadence( + cycle_seconds=999.0, cadence_seconds=0.0, targets=10 + ) + assert signal.overrun is False + + +def test_overrun_alert_writes_record_to_notification_seam(): + written: list[tuple[Path, dict]] = [] + signal = evaluate_poll_cadence( + cycle_seconds=420.0, cadence_seconds=300.0, targets=500 + ) + pushed = alert_poll_cadence_overrun( + signal, + notification_log_path=Path("/tmp/notify.jsonl"), + event_at="2026-06-20T10:00:00+00:00", + notification_writer=lambda path, record: written.append((path, record)), + ) + assert pushed is True + assert len(written) == 1 + record = written[0][1] + assert record["type"] == "cadence_overrun" + assert record["overrun_seconds"] == pytest.approx(120.0) + assert record["targets"] == 500 + + +def test_no_overrun_alert_is_silent(): + written: list[tuple[Path, dict]] = [] + signal = evaluate_poll_cadence( + cycle_seconds=120.0, cadence_seconds=300.0, targets=500 + ) + pushed = alert_poll_cadence_overrun( + signal, + notification_log_path=Path("/tmp/notify.jsonl"), + event_at="2026-06-20T10:00:00+00:00", + notification_writer=lambda path, record: written.append((path, record)), + ) + assert pushed is False + assert written == [] # healthy cycle never alerts diff --git a/tests/test_quickstart.py b/tests/test_quickstart.py index f64ef26..a4ede1c 100644 --- a/tests/test_quickstart.py +++ b/tests/test_quickstart.py @@ -103,22 +103,32 @@ def record_retryable_failure( job_id: str, error: str, next_attempt_at: dt.datetime, + *, + worker_id: str | None = None, + fence: int | None = None, ) -> None: raise AssertionError(error) def return_job_to_pending(self, job_id: str, reason: str) -> None: raise AssertionError(reason) - def acquire_repo_lease(self, repo_id: str, worker_id: str, lease_seconds: int) -> bool: + def acquire_repo_lease( + self, repo_id: str, worker_id: str, lease_seconds: int + ) -> int | None: + existing = self.repo_leases.get(repo_id) + fence = (existing.fence + 1) if existing is not None else 1 self.repo_leases[repo_id] = RepoLease( repo_id=repo_id, worker_id=worker_id, lease_until=NOW + dt.timedelta(seconds=lease_seconds), updated_at=NOW, + fence=fence, ) - return True + return fence - def release_repo_lease(self, repo_id: str, worker_id: str) -> None: + def release_repo_lease( + self, repo_id: str, worker_id: str, *, fence: int | None = None + ) -> None: self.repo_leases.pop(repo_id, None) def get_queue_status(self, now: dt.datetime) -> QueueStatus: diff --git a/tests/test_read_api.py b/tests/test_read_api.py new file mode 100644 index 0000000..567db62 --- /dev/null +++ b/tests/test_read_api.py @@ -0,0 +1,561 @@ +"""M7 read API (FR-9, F5/F9/SC-7) contract tests — no docker, no real socket. + +Covers the four read-API panels and the trust model: + + - findings panel honors the M6 disposition filter AND is public-safe (the DTO + carries no raw secret / no raw scanner match); + - freshness rollup reads the materialized BREACH_COUNTER (M5) O(1) and does NOT + enumerate REPO_HEALTH per request (F5); + - coverage is computed from the CATALOG entity (M1); + - queue backlog tracks enqueue -> lease -> complete -> dead-letter transitions + via per-status Select=COUNT and the read path issues ZERO full-table Scans + (SC-7); + - trust model: ServerConfig defaults to localhost, a non-loopback bind without + auth is refused, and the deploy-gated WSGI wrapper serves the public-safe DTOs. + +The store harness is the in-memory fake from the incremental-storage tests (it +faithfully shards the pending/leased status partitions and models Select=COUNT), +so the backlog read exercises the real sharded GSI COUNT path. +""" + +from __future__ import annotations + +import json + +import pytest + +from security_scanner.cli import main +from security_scanner.core.finding.model import ( + Disposition, + Finding, + GitleaksFindingPayload, + Verdict, +) +from security_scanner.runtime.finding_query import FindingQueryRequest +from security_scanner.runtime.read_api import ( + READ_API_DEFAULT_HOST, + FindingSummaryDto, + QueueBacklogDto, + ReadApiServerConfig, + build_read_api_wsgi_app, + finding_to_public_dto, + read_coverage, + read_dashboard_snapshot, + read_findings_panel, + read_freshness_rollup, + read_queue_backlog_panel, + validate_read_api_server_config, +) +from security_scanner.storage.base import ( + BreachCounter, + CatalogEntry, + QueueBacklog, + RepoHealth, +) +from tests.test_incremental_scan_storage import ( + NOW, + _make_finding, + _make_job, + _make_ledger, + _make_store, +) + +RAW_SECRET = "AKIAFAKEEXAMPLE000000" +RAW_MATCH = "api_key = AKIAFAKEEXAMPLE000000" + + +# --------------------------------------------------------------------------- +# Findings panel: disposition filter + redaction +# --------------------------------------------------------------------------- + + +def _finding(line_start: int, verdict: str) -> Finding: + """A finding carrying a raw gitleaks secret + match, to prove redaction.""" + return Finding.create( + repo_full_name="fake-org/fake-repo", + rule_id="aws-access-key-id", + file_path="config/settings.py", + line_start=line_start, + raw_secret=RAW_SECRET, + source_tool="gitleaks", + scan_run_id="scan_abc12345", + rule_pack_version="secret-rules-0.1.0", + triage_verdict=verdict, + gitleaks=GitleaksFindingPayload( + rule_id="aws-access-key-id", + file="config/settings.py", + start_line=line_start, + match=RAW_MATCH, + secret=RAW_SECRET, + ), + ) + + +class _ListReader: + def __init__(self, findings: list[Finding]) -> None: + self._findings = findings + + def read_all(self) -> list[Finding]: + return list(self._findings) + + def read_for_scan_run(self, scan_run_id: str) -> list[Finding]: + raise AssertionError("scan-run read not expected in this test") + + +def test_findings_panel_honors_disposition_filter(): + """The read-API findings panel reuses the M6 disposition filter (FR-11).""" + unreviewed = _finding(1, Verdict.NEEDS_REVIEW.value) + verified = _finding(2, Verdict.TRUE_POSITIVE.value) + false_positive = _finding(3, Verdict.FALSE_POSITIVE.value) + reader = _ListReader([unreviewed, verified, false_positive]) + + dtos = read_findings_panel( + FindingQueryRequest( + storage_backend="jsonl", + jsonl_path="findings.jsonl", + dispositions=[Disposition.UNREVIEWED.value, Disposition.VERIFIED.value], + ), + store=reader, + ) + + assert {dto.disposition for dto in dtos} == { + Disposition.UNREVIEWED.value, + Disposition.VERIFIED.value, + } + assert all(isinstance(dto, FindingSummaryDto) for dto in dtos) + + +def test_findings_panel_dto_is_public_safe_no_raw_secret_leaks(): + """No raw secret / raw scanner match appears anywhere in a findings DTO (F9).""" + finding = _finding(7, Verdict.TRUE_POSITIVE.value) + reader = _ListReader([finding]) + + dtos = read_findings_panel( + FindingQueryRequest(storage_backend="jsonl", jsonl_path="findings.jsonl"), + store=reader, + ) + + assert len(dtos) == 1 + serialized = json.dumps(dtos[0].to_dict()) + assert RAW_SECRET not in serialized + assert RAW_MATCH not in serialized + # The salted hash IS present (operator can correlate without the secret). + assert dtos[0].secret_hash is not None + assert dtos[0].secret_hash.startswith("salted-sha256:") + + +def test_finding_to_public_dto_drops_gitleaks_payload_entirely(): + """The public DTO surface has no field that could carry the raw payload (F9).""" + finding = _finding(9, Verdict.NEEDS_REVIEW.value) + dto = finding_to_public_dto(finding) + blob = json.dumps(dto.to_dict()) + assert RAW_SECRET not in blob and RAW_MATCH not in blob + assert dto.repo == "fake-org/fake-repo" + assert dto.file_path == "config/settings.py" + assert dto.line_start == 9 + + +# --------------------------------------------------------------------------- +# Freshness rollup: read BREACH_COUNTER O(1), no REPO_HEALTH enumeration (F5) +# --------------------------------------------------------------------------- + + +class _CountingFreshnessStore: + """Store exposing BOTH the O(1) counter read and the full enumeration, so a + test can assert the read API takes the counter and never enumerates.""" + + def __init__(self, counter: BreachCounter | None) -> None: + self._counter = counter + self.breach_counter_reads = 0 + self.repo_health_enumerations = 0 + + def read_breach_counter(self) -> BreachCounter | None: + self.breach_counter_reads += 1 + return self._counter + + def read_all_repo_health(self) -> list[RepoHealth]: + self.repo_health_enumerations += 1 + return [] + + +def test_freshness_rollup_reads_breach_counter_without_enumerating(): + counter = BreachCounter( + incremental_breaches=2, + baseline_breaches=1, + total_breaches=3, + repos_evaluated=10, + evaluated_at="2026-06-19T12:00:00+00:00", + coverage_gap=4, + ) + store = _CountingFreshnessStore(counter) + + rollup = read_freshness_rollup(store) + + assert rollup.available is True + assert rollup.total_breaches == 3 + assert rollup.coverage_gap == 4 + assert store.breach_counter_reads == 1 + # F5: the read API must NOT re-enumerate REPO_HEALTH per request. + assert store.repo_health_enumerations == 0 + + +def test_freshness_rollup_unavailable_when_no_counter_yet(): + store = _CountingFreshnessStore(None) + rollup = read_freshness_rollup(store) + assert rollup.available is False + assert rollup.total_breaches == 0 + assert store.repo_health_enumerations == 0 + + +# --------------------------------------------------------------------------- +# Coverage: org N / covered M from CATALOG (M1) +# --------------------------------------------------------------------------- + + +class _CoverageStore: + def __init__( + self, catalog: list[CatalogEntry], covered_ids: set[str] + ) -> None: + self._catalog = catalog + self._covered = covered_ids + + def read_all_catalog_entries(self) -> list[CatalogEntry]: + return list(self._catalog) + + def read_all_repo_health(self) -> list[RepoHealth]: + return [RepoHealth(repo_id=rid) for rid in self._covered] + + +def _catalog_entry(repo_id: str, included: bool = True) -> CatalogEntry: + return CatalogEntry( + repo_id=repo_id, + repo_url=f"https://example.com/{repo_id}", + included=included, + first_seen="2026-06-01T00:00:00+00:00", + last_reconciled="2026-06-19T00:00:00+00:00", + excluded_reason=None if included else "opt-out", + ) + + +def test_coverage_counts_org_total_included_excluded_covered_and_gap(): + catalog = [ + _catalog_entry("repo_a"), + _catalog_entry("repo_b"), + _catalog_entry("repo_c"), + _catalog_entry("repo_optout", included=False), + ] + # repo_a covered; repo_b/repo_c not yet; opt-out never counts toward gap. + store = _CoverageStore(catalog, covered_ids={"repo_a"}) + + coverage = read_coverage(store) + + assert coverage.org_total == 4 + assert coverage.included == 3 + assert coverage.excluded == 1 + assert coverage.covered == 1 + # gap = included-but-not-covered = repo_b + repo_c. + assert coverage.coverage_gap == 2 + + +# --------------------------------------------------------------------------- +# Queue backlog (SC-7): COUNT over status partitions, transitions, NO Scan +# --------------------------------------------------------------------------- + + +def test_backlog_counter_tracks_enqueue_lease_complete_dead_letter_transitions(): + """Backlog COUNT tracks a job across every status transition (SC-7).""" + store, table = _make_store() + + # enqueue: one pending job, backlog reflects it. + job = _make_job() + assert store.enqueue_commit_scan_job(job) is True + backlog = store.read_queue_backlog() + assert backlog.job_counts_by_status["pending"] == 1 + assert backlog.job_counts_by_status["leased"] == 0 + assert backlog.backlog == 1 # pending + leased + + # lease: pending -> leased; still backlog (work not terminal). + leased = store.lease_next_scan_job("worker-a", lease_seconds=60, now=NOW) + assert leased is not None + backlog = store.read_queue_backlog() + assert backlog.job_counts_by_status["pending"] == 0 + assert backlog.job_counts_by_status["leased"] == 1 + assert backlog.backlog == 1 + + # complete: leased -> completed; backlog drains. + store.complete_processed_job(leased, [_make_finding(leased)], _make_ledger(leased)) + backlog = store.read_queue_backlog() + assert backlog.job_counts_by_status["leased"] == 0 + assert backlog.job_counts_by_status["completed"] == 1 + assert backlog.backlog == 0 + + # dead-letter: a separate job exhausts retries and lands terminal. + poison = _make_job(commit_sha="d" * 40, max_attempts=1) + assert store.enqueue_commit_scan_job(poison) is True + poison_leased = store.lease_next_scan_job("worker-b", lease_seconds=60, now=NOW) + assert poison_leased is not None + store.record_retryable_failure( + poison_leased.job_id, + "synthetic failure", + NOW, + worker_id=poison_leased.worker_id, + fence=poison_leased.fence, + ) + backlog = store.read_queue_backlog() + assert backlog.job_counts_by_status["dead_letter"] == 1 + # dead_letter is terminal, not part of actionable backlog. + assert backlog.backlog == backlog.job_counts_by_status["pending"] + ( + backlog.job_counts_by_status["leased"] + ) + + +def test_backlog_read_does_not_full_table_scan(monkeypatch): + """The read-API backlog path issues ZERO table Scans (SC-7 core claim).""" + store, table = _make_store() + for index in range(3): + job = _make_job(commit_sha=f"{index:040d}") + assert store.enqueue_commit_scan_job(job) is True + + table.scan_calls.clear() + backlog = store.read_queue_backlog() + + assert backlog.job_counts_by_status["pending"] == 3 + # The whole point of SC-7: no scan() at all on the backlog read path. + assert table.scan_calls == [] + # And it really did issue Select=COUNT GSI queries (not item reads). + count_queries = [c for c in table.query_calls if c.get("Select") == "COUNT"] + assert count_queries, "backlog read must use Select=COUNT queries" + assert all(c.get("IndexName") == "GSI1" for c in count_queries) + + +def test_count_scan_jobs_by_status_matches_actual_pending_count(): + store, table = _make_store() + for index in range(5): + store.enqueue_commit_scan_job(_make_job(commit_sha=f"{index:040d}")) + assert store.count_scan_jobs_by_status("pending") == 5 + assert store.count_scan_jobs_by_status("leased") == 0 + + +def test_read_queue_backlog_panel_wraps_store_backlog(): + class _BacklogStore: + def read_queue_backlog(self) -> QueueBacklog: + return QueueBacklog( + job_counts_by_status={ + "pending": 4, + "leased": 1, + "completed": 9, + "dead_letter": 2, + }, + backlog=5, + ) + + dto = read_queue_backlog_panel(_BacklogStore()) + assert isinstance(dto, QueueBacklogDto) + assert dto.backlog == 5 + assert dto.job_counts_by_status["completed"] == 9 + + +# --------------------------------------------------------------------------- +# Snapshot assembler +# --------------------------------------------------------------------------- + + +class _SnapshotStore: + """Composite store exposing exactly the read-API surface for the snapshot.""" + + def __init__(self) -> None: + self._counter = BreachCounter( + incremental_breaches=1, + baseline_breaches=0, + total_breaches=1, + repos_evaluated=3, + evaluated_at="2026-06-19T12:00:00+00:00", + coverage_gap=1, + ) + self._catalog = [_catalog_entry("repo_a"), _catalog_entry("repo_b")] + + def read_breach_counter(self) -> BreachCounter | None: + return self._counter + + def read_all_catalog_entries(self) -> list[CatalogEntry]: + return list(self._catalog) + + def read_all_repo_health(self) -> list[RepoHealth]: + return [RepoHealth(repo_id="repo_a")] + + def read_queue_backlog(self) -> QueueBacklog: + return QueueBacklog( + job_counts_by_status={ + "pending": 2, + "leased": 1, + "completed": 0, + "dead_letter": 0, + }, + backlog=3, + ) + + +def test_snapshot_assembles_three_live_panels_without_findings_by_default(): + snapshot = read_dashboard_snapshot(_SnapshotStore()) + payload = snapshot.to_dict() + assert set(payload) == {"freshness", "coverage", "backlog"} + assert payload["freshness"]["available"] is True + assert payload["coverage"]["coverageGap"] == 1 + assert payload["backlog"]["backlog"] == 3 + assert snapshot.findings is None + + +def test_snapshot_includes_findings_only_when_requested(): + class _SnapshotStoreWithFindings(_SnapshotStore): + def read_all(self) -> list[Finding]: + return [_finding(1, Verdict.TRUE_POSITIVE.value)] + + def read_for_scan_run(self, scan_run_id: str) -> list[Finding]: + raise AssertionError("not expected") + + store = _SnapshotStoreWithFindings() + request = FindingQueryRequest(storage_backend="dynamodb") + snapshot = read_dashboard_snapshot(store, findings_request=request) + payload = snapshot.to_dict() + assert "findings" in payload + assert len(payload["findings"]) == 1 + # still public-safe inside the snapshot. + assert RAW_SECRET not in json.dumps(payload) + + +# --------------------------------------------------------------------------- +# Trust model (F9): localhost default, refuse routable-without-auth, WSGI wrapper +# --------------------------------------------------------------------------- + + +def test_server_config_defaults_to_localhost(): + config = ReadApiServerConfig() + assert config.host == READ_API_DEFAULT_HOST == "127.0.0.1" + assert config.is_loopback is True + # localhost default needs no authn (loopback IS the trust boundary). + validate_read_api_server_config(config) + + +def test_server_config_refuses_non_loopback_bind_without_auth(): + config = ReadApiServerConfig(host="0.0.0.0", require_auth=False) + assert config.is_loopback is False + with pytest.raises(ValueError, match="non-loopback bind without authentication"): + validate_read_api_server_config(config) + + +def test_server_config_allows_non_loopback_bind_with_auth(): + config = ReadApiServerConfig(host="10.0.0.5", require_auth=True) + # allowed (real authn is the deploy-gated step; the invariant is satisfied). + validate_read_api_server_config(config) + + +def test_wsgi_wrapper_serves_public_safe_json_without_binding_a_socket(): + store = _SnapshotStore() + app = build_read_api_wsgi_app(store) + + captured: dict[str, object] = {} + + def start_response(status, headers): + captured["status"] = status + captured["headers"] = dict(headers) + + body = b"".join( + app({"REQUEST_METHOD": "GET", "PATH_INFO": "/backlog"}, start_response) + ) + + assert captured["status"] == "200 OK" + assert captured["headers"]["Content-Type"] == "application/json" + assert json.loads(body)["backlog"] == 3 + + +def test_wsgi_wrapper_snapshot_route_is_public_safe(): + store = _SnapshotStore() + app = build_read_api_wsgi_app(store) + body = b"".join( + app({"REQUEST_METHOD": "GET", "PATH_INFO": "/snapshot"}, lambda *a: None) + ) + payload = json.loads(body) + assert set(payload) == {"freshness", "coverage", "backlog"} + + +def test_wsgi_wrapper_rejects_unknown_route_and_non_get(): + app = build_read_api_wsgi_app(_SnapshotStore()) + statuses: list[str] = [] + + def start_response(status, headers): + statuses.append(status) + + b"".join(app({"REQUEST_METHOD": "GET", "PATH_INFO": "/nope"}, start_response)) + b"".join( + app({"REQUEST_METHOD": "POST", "PATH_INFO": "/backlog"}, start_response) + ) + assert statuses == ["404 Not Found", "405 Method Not Allowed"] + + +# --------------------------------------------------------------------------- +# CLI surface: read-api command +# --------------------------------------------------------------------------- + + +def _patch_cli_store(monkeypatch, store) -> None: + monkeypatch.setattr( + "security_scanner.cli._store.create_finding_store", + lambda backend, **kwargs: store, + ) + + +def test_read_api_cli_emits_snapshot_json(monkeypatch, capsys): + _patch_cli_store(monkeypatch, _SnapshotStore()) + + exit_code = main(["read-api", "--storage-backend", "dynamodb"]) + + assert exit_code == 0 + payload = json.loads(capsys.readouterr().out) + assert set(payload) == {"freshness", "coverage", "backlog"} + assert payload["backlog"]["backlog"] == 3 + assert payload["freshness"]["available"] is True + + +def test_read_api_cli_includes_findings_when_disposition_requested( + monkeypatch, capsys +): + class _StoreWithFindings(_SnapshotStore): + def read_all(self) -> list[Finding]: + return [ + _finding(1, Verdict.TRUE_POSITIVE.value), + _finding(2, Verdict.FALSE_POSITIVE.value), + ] + + def read_for_scan_run(self, scan_run_id: str) -> list[Finding]: + raise AssertionError("not expected") + + _patch_cli_store(monkeypatch, _StoreWithFindings()) + + exit_code = main( + ["read-api", "--storage-backend", "dynamodb", "--disposition", "verified"] + ) + + assert exit_code == 0 + out = capsys.readouterr().out + payload = json.loads(out) + assert [f["disposition"] for f in payload["findings"]] == ["verified"] + # public-safe even through the CLI surface. + assert RAW_SECRET not in out + + +def test_read_api_cli_rejects_jsonl_backend(capsys): + exit_code = main(["read-api", "--storage-backend", "jsonl"]) + assert exit_code == 2 + assert "dynamodb only" in capsys.readouterr().err + + +def test_read_api_cli_reports_fatal_error(monkeypatch, capsys): + class _BoomStore: + def read_breach_counter(self): + raise RuntimeError("synthetic read-api failure") + + _patch_cli_store(monkeypatch, _BoomStore()) + + exit_code = main(["read-api", "--storage-backend", "dynamodb"]) + assert exit_code == 1 + assert "read-api failed" in capsys.readouterr().err diff --git a/tests/test_repo_health_freshness.py b/tests/test_repo_health_freshness.py new file mode 100644 index 0000000..54a716c --- /dev/null +++ b/tests/test_repo_health_freshness.py @@ -0,0 +1,342 @@ +"""M5 per-repo freshness: REPO_HEALTH conditional advance, two-threshold eval, +scheduled evaluator + BREACH_COUNTER, and the per-repo gate that no longer lets +one fresh repo mask a stale one (the silent-staleness bug). + +No docker: uses the existing fake boto3 resource/client/table from +``test_dynamodb_compatible_store``. +""" + +from __future__ import annotations + +import datetime as dt + +from security_scanner.runtime.scan_health import ( + FreshnessThresholds, + evaluate_freshness_breaches, + evaluate_repo_freshness, + run_freshness_evaluator, +) +from security_scanner.storage.adapters.nosql_db.items import ( + REPO_HEALTH_FULL_SCAN_ATTR, + REPO_HEALTH_INCREMENTAL_ATTR, + breach_counter_from_item, + breach_counter_to_item, + repo_health_from_item, + repo_health_to_item, +) +from security_scanner.storage.base import ( + JOB_TYPE_BASELINE, + JOB_TYPE_INCREMENTAL, + BreachCounter, + RepoFreshnessBreach, + RepoHealth, + RepoHealthStore, +) +from security_scanner.storage.dynamodb_compatible.store import ( + DynamoDbCompatibleConfig, + DynamoDbCompatibleFindingStore, +) +from tests.test_dynamodb_compatible_store import ( + FakeDynamoClient, + FakeDynamoResource, + FakeDynamoTable, +) + + +def _store() -> DynamoDbCompatibleFindingStore: + table = FakeDynamoTable() + return DynamoDbCompatibleFindingStore( + DynamoDbCompatibleConfig(table_name="SecurityScannerLocal"), + resource=FakeDynamoResource(table), + client=FakeDynamoClient(table), + ) + + +_T0 = "2026-06-19T08:00:00+00:00" +_T1 = "2026-06-19T09:00:00+00:00" # newer than _T0 +_NOW = dt.datetime(2026, 6, 19, 12, 0, tzinfo=dt.UTC) +# Ages relative to _NOW (12:00): _AGE_1H fresh, _AGE_5H mid, _AGE_11H old. +_AGE_1H = "2026-06-19T11:00:00+00:00" +_AGE_5H = "2026-06-19T07:00:00+00:00" +_AGE_11H = "2026-06-19T01:00:00+00:00" +_STALE = "2020-01-01T00:00:00+00:00" + + +# --------------------------------------------------------------------------- +# Item mapping +# --------------------------------------------------------------------------- + + +def test_repo_health_item_round_trips(): + record = RepoHealth("repo_x", _T0, _T1) + item = repo_health_to_item(record) + assert item["PK"] == "REPO_HEALTH#repo_x" + assert item["SK"] == "META" + assert item["entityType"] == "REPO_HEALTH" + assert item[REPO_HEALTH_INCREMENTAL_ATTR] == _T0 + assert item[REPO_HEALTH_FULL_SCAN_ATTR] == _T1 + assert repo_health_from_item(item) == record + + +def test_repo_health_item_omits_absent_timestamps(): + item = repo_health_to_item(RepoHealth("repo_x", _T0, None)) + assert REPO_HEALTH_FULL_SCAN_ATTR not in item + assert repo_health_from_item(item) == RepoHealth("repo_x", _T0, None) + + +def test_breach_counter_item_round_trips(): + counter = BreachCounter( + incremental_breaches=2, + baseline_breaches=1, + total_breaches=2, + repos_evaluated=5, + evaluated_at=_NOW.isoformat(), + coverage_gap=None, + ) + assert breach_counter_from_item(breach_counter_to_item(counter)) == counter + + +# --------------------------------------------------------------------------- +# SC-5: attribute-scoped conditional advance (no clobber, no regression) +# --------------------------------------------------------------------------- + + +def test_store_implements_repo_health_store_protocol(): + assert isinstance(_store(), RepoHealthStore) + + +def test_advance_creates_repo_health_on_first_write(): + store = _store() + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T0) + health = store.read_repo_health("repo_a") + assert health == RepoHealth("repo_a", _T0, None) + + +def test_concurrent_incremental_and_baseline_preserve_both_fields(): + """An incremental and a baseline completion must not clobber each other.""" + store = _store() + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T0) + store.advance_repo_health("repo_a", job_type=JOB_TYPE_BASELINE, completed_at=_T1) + + health = store.read_repo_health("repo_a") + assert health.last_successful_incremental_at == _T0 + assert health.last_successful_full_scan_at == _T1 + + +def test_older_completion_does_not_regress_a_newer_timestamp(): + """An out-of-order (older) completion is a monotonic no-op, not a regress.""" + store = _store() + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T1) + # An older completion arrives late (e.g. a reaped worker / out-of-order job). + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T0) + + health = store.read_repo_health("repo_a") + assert health.last_successful_incremental_at == _T1 # not regressed to _T0 + + +def test_newer_completion_advances_the_timestamp(): + store = _store() + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T0) + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T1) + assert store.read_repo_health("repo_a").last_successful_incremental_at == _T1 + + +def test_advance_accepts_datetime_and_normalizes(): + store = _store() + store.advance_repo_health( + "repo_a", + job_type=JOB_TYPE_BASELINE, + completed_at=dt.datetime(2026, 6, 19, 8, 0, tzinfo=dt.UTC), + ) + assert store.read_repo_health("repo_a").last_successful_full_scan_at == _T0 + + +def test_read_repo_health_batch_returns_only_present_repos(): + store = _store() + store.advance_repo_health("repo_a", job_type=JOB_TYPE_INCREMENTAL, completed_at=_T0) + store.advance_repo_health("repo_b", job_type=JOB_TYPE_BASELINE, completed_at=_T1) + out = store.read_repo_health_batch(["repo_a", "repo_b", "repo_missing"]) + assert set(out) == {"repo_a", "repo_b"} + assert out["repo_a"].last_successful_incremental_at == _T0 + assert out["repo_b"].last_successful_full_scan_at == _T1 + + +# --------------------------------------------------------------------------- +# FR-8/F4: per-repo two-threshold evaluation +# --------------------------------------------------------------------------- + + +_THRESHOLDS = FreshnessThresholds( + incremental_max_age_hours=2.0, # poll 1h + margin? no — explicit small for test + baseline_max_age_hours=10.0, +) + + +def test_fresh_repo_returns_no_breach(): + # _NOW is 12:00; both at 11:00 (1h old) -> within 2h/10h thresholds. + health = RepoHealth("repo_a", _AGE_1H, _AGE_1H) + assert evaluate_repo_freshness(health, now=_NOW, thresholds=_THRESHOLDS) is None + + +def test_incremental_breach_only(): + # incremental 5h old (>2h breach) but baseline 5h old (<10h fresh). + health = RepoHealth("repo_a", _AGE_5H, _AGE_5H) + breach = evaluate_repo_freshness(health, now=_NOW, thresholds=_THRESHOLDS) + assert breach == RepoFreshnessBreach( + repo_id="repo_a", + incremental=True, + baseline=False, + last_successful_incremental_at=_AGE_5H, + last_successful_full_scan_at=_AGE_5H, + ) + + +def test_baseline_breach_only(): + # incremental 1h old (fresh), baseline 11h old (>10h breach). + health = RepoHealth("repo_a", _AGE_1H, _AGE_11H) + breach = evaluate_repo_freshness(health, now=_NOW, thresholds=_THRESHOLDS) + assert breach is not None + assert breach.incremental is False + assert breach.baseline is True + + +def test_never_recorded_class_is_a_breach_fail_closed(): + health = RepoHealth("repo_a", None, None) + breach = evaluate_repo_freshness(health, now=_NOW, thresholds=_THRESHOLDS) + assert breach is not None + assert breach.incremental is True + assert breach.baseline is True + + +def test_thresholds_from_cadences_uses_margin_idiom(): + thresholds = FreshnessThresholds.from_cadences( + poll_interval_hours=24.0, baseline_cadence_hours=48.0, margin_hours=2.0 + ) + # 24h+2h = the legacy 26h scan_health margin idiom. + assert thresholds.incremental_max_age_hours == 26.0 + assert thresholds.baseline_max_age_hours == 50.0 + + +# --------------------------------------------------------------------------- +# F3/F5/SC-7: scheduled evaluator + materialized BREACH_COUNTER +# --------------------------------------------------------------------------- + + +def test_evaluate_freshness_breaches_counts_per_class(): + records = [ + RepoHealth("repo_fresh", _AGE_1H, _AGE_1H), + RepoHealth("repo_inc", _AGE_5H, _AGE_1H), + RepoHealth("repo_base", _AGE_1H, _AGE_11H), + RepoHealth("repo_both", None, None), + ] + evaluation = evaluate_freshness_breaches(records, now=_NOW, thresholds=_THRESHOLDS) + counter = evaluation.counter + assert counter.repos_evaluated == 4 + assert counter.total_breaches == 3 # repo_fresh excluded + assert counter.incremental_breaches == 2 # repo_inc + repo_both + assert counter.baseline_breaches == 2 # repo_base + repo_both + assert counter.coverage_gap is None # M1 seam unfilled + + +def test_run_freshness_evaluator_materializes_breach_counter_and_calls_hook(): + store = _store() + # 1 fresh repo + 2 stale -> evaluator must detect the 2 stale. + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_INCREMENTAL, + completed_at="2026-06-19T11:00:00+00:00", + ) + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_BASELINE, + completed_at="2026-06-19T11:00:00+00:00", + ) + store.advance_repo_health( + "repo_stale_a", + job_type=JOB_TYPE_INCREMENTAL, + completed_at="2020-01-01T00:00:00+00:00", + ) + store.advance_repo_health( + "repo_stale_b", + job_type=JOB_TYPE_INCREMENTAL, + completed_at="2020-01-01T00:00:00+00:00", + ) + + alerted: list[list[RepoFreshnessBreach]] = [] + evaluation = run_freshness_evaluator( + store, + now=_NOW, + thresholds=_THRESHOLDS, + on_breaches=alerted.append, + ) + + # Materialized rollup is persisted and readable O(1). + persisted = store.read_breach_counter() + assert persisted is not None + assert persisted.total_breaches == 2 + assert persisted.repos_evaluated == 3 + assert evaluation.counter == persisted + # The alert hook (M9 seam) received the breaches; the evaluator itself does + # not implement a sink. + assert len(alerted) == 1 + assert {b.repo_id for b in alerted[0]} == {"repo_stale_a", "repo_stale_b"} + + +def test_run_freshness_evaluator_no_breaches_skips_alert_hook(): + store = _store() + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_INCREMENTAL, + completed_at="2026-06-19T11:00:00+00:00", + ) + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_BASELINE, + completed_at="2026-06-19T11:00:00+00:00", + ) + alerted: list[object] = [] + evaluation = run_freshness_evaluator( + store, now=_NOW, thresholds=_THRESHOLDS, on_breaches=alerted.append + ) + assert evaluation.counter.total_breaches == 0 + assert alerted == [] # no spam when nothing breaches + + +def test_evaluator_passes_coverage_gap_seam_through(): + store = _store() + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_INCREMENTAL, + completed_at="2026-06-19T11:00:00+00:00", + ) + store.advance_repo_health( + "repo_fresh", + job_type=JOB_TYPE_BASELINE, + completed_at="2026-06-19T11:00:00+00:00", + ) + run_freshness_evaluator(store, now=_NOW, thresholds=_THRESHOLDS, coverage_gap=7) + assert store.read_breach_counter().coverage_gap == 7 + + +# --------------------------------------------------------------------------- +# The silent-staleness bug, end-to-end on the real store: one fresh repo must +# NOT mask a stale one (per-repo signal is the source of truth). +# --------------------------------------------------------------------------- + + +def test_one_fresh_repo_does_not_mask_a_stale_repo_via_real_store(): + store = _store() + inc = JOB_TYPE_INCREMENTAL + base = JOB_TYPE_BASELINE + store.advance_repo_health("repo_fresh", job_type=inc, completed_at=_AGE_1H) + store.advance_repo_health("repo_fresh", job_type=base, completed_at=_AGE_1H) + store.advance_repo_health("repo_stale", job_type=inc, completed_at=_STALE) + store.advance_repo_health("repo_stale", job_type=base, completed_at=_STALE) + + evaluation = evaluate_freshness_breaches( + store.read_all_repo_health(), now=_NOW, thresholds=_THRESHOLDS + ) + # A global max-timestamp gate would say OK (repo_fresh is fresh). The per-repo + # gate flags repo_stale. + assert evaluation.counter.total_breaches == 1 + assert evaluation.breaches[0].repo_id == "repo_stale" diff --git a/tests/test_scan_run_health.py b/tests/test_scan_run_health.py index f6ed135..73e19fe 100644 --- a/tests/test_scan_run_health.py +++ b/tests/test_scan_run_health.py @@ -21,7 +21,11 @@ scan_run_health_from_item, scan_run_health_to_item, ) -from security_scanner.storage.base import ScanRunHealth, ScanRunHealthStore +from security_scanner.storage.base import ( + RepoHealth, + ScanRunHealth, + ScanRunHealthStore, +) from security_scanner.storage.dynamodb_compatible.store import ( DynamoDbCompatibleConfig, DynamoDbCompatibleFindingStore, @@ -117,10 +121,11 @@ def test_read_latest_scan_run_health_returns_most_recent_record(): class HealthRecordingStore: - """Minimal ScanResultWriter that also captures scan-run health records.""" + """ScanResultWriter that captures per-repo REPO_HEALTH advances (M5).""" def __init__(self) -> None: - self.health_records: list[ScanRunHealth] = [] + # (repo_id, job_type, completed_at) advances, in call order. + self.advances: list[tuple[str, str, str]] = [] def prepare_for_scan(self) -> None: return None @@ -128,11 +133,17 @@ def prepare_for_scan(self) -> None: def write_scan_result(self, result) -> None: return None - def put_scan_run_health(self, record: ScanRunHealth) -> None: - self.health_records.append(record) + def advance_repo_health(self, repo_id, *, job_type, completed_at) -> None: + self.advances.append((repo_id, job_type, completed_at)) + + def read_repo_health(self, repo_id): + return None + + def read_repo_health_batch(self, repo_ids): + return {} - def read_latest_scan_run_health(self) -> ScanRunHealth | None: - return self.health_records[-1] if self.health_records else None + def read_all_repo_health(self): + return [] def _request(manifest_path) -> LocalScanRequest: @@ -144,7 +155,12 @@ def _request(manifest_path) -> LocalScanRequest: ) -def test_successful_run_writes_one_health_record(tmp_path): +def test_successful_run_advances_per_repo_full_scan_health(tmp_path): + from security_scanner.storage.adapters.nosql_db.items import ( + repo_id_for_local_target, + ) + from security_scanner.storage.base import JOB_TYPE_BASELINE + repo = tmp_path / "repo" repo.mkdir() manifest = tmp_path / "targets.yaml" @@ -160,16 +176,15 @@ def test_successful_run_writes_one_health_record(tmp_path): now_factory=lambda: "2026-06-19T08:00:00+00:00", ) - assert len(store.health_records) == 1 - record = store.health_records[0] - assert record.scan_run_id == "scan_health01" - assert record.completed_at_iso == "2026-06-19T08:00:00+00:00" - assert record.targets_total == 1 - assert record.targets_scanned == 1 - assert record.findings_total == 0 + # One scanned target -> one per-repo full-scan advance (no global singleton). + assert len(store.advances) == 1 + repo_id, job_type, completed_at = store.advances[0] + assert repo_id == repo_id_for_local_target("demo-org/demo-repo") + assert job_type == JOB_TYPE_BASELINE + assert completed_at == "2026-06-19T08:00:00+00:00" -def test_manifest_error_writes_no_health_record(tmp_path): +def test_manifest_error_writes_no_repo_health(tmp_path): store = HealthRecordingStore() missing = tmp_path / "nope.yaml" @@ -181,7 +196,7 @@ def test_manifest_error_writes_no_health_record(tmp_path): scanner_factory=lambda _manifest: FakeScanner([]), ) - assert store.health_records == [] + assert store.advances == [] # --------------------------------------------------------------------------- @@ -213,40 +228,64 @@ def test_evaluate_scan_freshness_old_record_is_stale(): assert "STALE" in verdict.message -class _GateStore: - def __init__(self, latest: ScanRunHealth | None) -> None: - self._latest = latest +class _PerRepoGateStore: + """Gate store returning a fixed set of REPO_HEALTH records (per-repo M5).""" + + def __init__(self, records: list[RepoHealth]) -> None: + self._records = records - def read_latest_scan_run_health(self) -> ScanRunHealth | None: - return self._latest + def read_all_repo_health(self) -> list[RepoHealth]: + return list(self._records) -def _fresh_record() -> ScanRunHealth: - completed = dt.datetime.now(dt.UTC).replace(microsecond=0).isoformat() - return ScanRunHealth("scan_now", completed, 14, 14, 0) +def _fresh_iso() -> str: + return dt.datetime.now(dt.UTC).replace(microsecond=0).isoformat() -def test_scan_health_cli_exits_zero_when_fresh(monkeypatch): +def test_scan_health_cli_exits_zero_when_all_repos_fresh(monkeypatch): + fresh = _fresh_iso() monkeypatch.setattr( "security_scanner.cli.commands.scan_health.store_from_args", - lambda args: _GateStore(_fresh_record()), + lambda args: _PerRepoGateStore( + [RepoHealth("repo_a", fresh, fresh), RepoHealth("repo_b", fresh, fresh)] + ), ) assert main(["scan-health", "--storage-backend", "dynamodb"]) == 0 -def test_scan_health_cli_exits_one_when_stale(monkeypatch): - stale = ScanRunHealth("scan_old", "2020-01-01T00:00:00+00:00", 14, 14, 0) +def test_scan_health_cli_exits_one_when_a_repo_is_stale(monkeypatch): + fresh = _fresh_iso() + stale = "2020-01-01T00:00:00+00:00" + records = [ + RepoHealth("repo_fresh", fresh, fresh), + RepoHealth("repo_stale", stale, stale), + ] + monkeypatch.setattr( + "security_scanner.cli.commands.scan_health.store_from_args", + lambda args: _PerRepoGateStore(records), + ) + assert main(["scan-health", "--storage-backend", "dynamodb"]) == 1 + + +def test_scan_health_cli_one_fresh_repo_does_not_mask_a_stale_one(monkeypatch): + """The exact silent-staleness bug: per-repo gate, not a global boolean.""" + fresh = _fresh_iso() + stale = "2020-01-01T00:00:00+00:00" + # 1 fresh repo + 499 stale repos: a global max-timestamp gate would report + # OK; the per-repo gate must fail. + records = [RepoHealth("repo_fresh", fresh, fresh)] + records += [RepoHealth(f"repo_stale_{i}", stale, stale) for i in range(499)] monkeypatch.setattr( "security_scanner.cli.commands.scan_health.store_from_args", - lambda args: _GateStore(stale), + lambda args: _PerRepoGateStore(records), ) assert main(["scan-health", "--storage-backend", "dynamodb"]) == 1 -def test_scan_health_cli_exits_one_when_no_record(monkeypatch): +def test_scan_health_cli_exits_one_when_no_records(monkeypatch): monkeypatch.setattr( "security_scanner.cli.commands.scan_health.store_from_args", - lambda args: _GateStore(None), + lambda args: _PerRepoGateStore([]), ) assert main(["scan-health", "--storage-backend", "dynamodb"]) == 1 diff --git a/tests/test_scan_worker.py b/tests/test_scan_worker.py index 91fa497..0677c84 100644 --- a/tests/test_scan_worker.py +++ b/tests/test_scan_worker.py @@ -14,7 +14,6 @@ ) from security_scanner.storage.base import ScanJob, ScanLedgerEntry, ScanLedgerKey - NOW = dt.datetime(2026, 6, 12, 12, 0, tzinfo=dt.UTC) REPO_ID = "repo_synthetic000000000001" REPO_URL = "https://github.com/example-org/example-repo" @@ -27,11 +26,14 @@ def __init__(self, jobs: list[ScanJob] | None = None) -> None: self.ledger_keys: set[ScanLedgerKey] = set() self.completed: list[tuple[ScanJob, list[Finding], ScanLedgerEntry]] = [] self.retry_failures: list[tuple[str, str, dt.datetime]] = [] + self.retry_fences: list[tuple[str | None, int | None]] = [] self.pending_returns: list[tuple[str, str]] = [] self.repo_lease_available = True self.repo_lease_calls: list[tuple[str, str, int]] = [] - self.repo_release_calls: list[tuple[str, str]] = [] + self.repo_release_calls: list[tuple[str, str, int | None]] = [] self.lease_calls = 0 + # (repo_id, job_type, completed_at) per-repo freshness advances (M5/SC-5). + self.health_advances: list[tuple[str, str, object]] = [] def lease_next_scan_job( self, @@ -49,6 +51,9 @@ def lease_next_scan_job( "status": "leased", "worker_id": worker_id, "lease_until": now + dt.timedelta(seconds=lease_seconds), + # store bumps the fence on (re)lease; surface a non-trivial value + # so worker fence-threading can be asserted (FR-6/SC-2). + "fence": job.fence + 11, } ) @@ -60,12 +65,16 @@ def acquire_repo_lease( repo_id: str, worker_id: str, lease_seconds: int, - ) -> bool: + ) -> int | None: self.repo_lease_calls.append((repo_id, worker_id, lease_seconds)) - return self.repo_lease_available + # New contract (FR-6/SC-2): return a fence token on success, None on + # failure. Use a non-1 fence so tests prove the worker carries it. + return 42 if self.repo_lease_available else None - def release_repo_lease(self, repo_id: str, worker_id: str) -> None: - self.repo_release_calls.append((repo_id, worker_id)) + def release_repo_lease( + self, repo_id: str, worker_id: str, *, fence: int | None = None + ) -> None: + self.repo_release_calls.append((repo_id, worker_id, fence)) def complete_processed_job( self, @@ -81,12 +90,19 @@ def record_retryable_failure( job_id: str, error: str, next_attempt_at: dt.datetime, + *, + worker_id: str | None = None, + fence: int | None = None, ) -> None: self.retry_failures.append((job_id, error, next_attempt_at)) + self.retry_fences.append((worker_id, fence)) def return_job_to_pending(self, job_id: str, reason: str) -> None: self.pending_returns.append((job_id, reason)) + def advance_repo_health(self, repo_id: str, *, job_type: str, completed_at) -> None: + self.health_advances.append((repo_id, job_type, completed_at)) + class FakeScanner: def __init__( @@ -206,7 +222,7 @@ def test_one_pending_job_is_scanned_and_completed_with_commit_log_opts(): assert findings[0].repo.commit == COMMIT_SHA assert ledger.commit_sha == COMMIT_SHA assert ledger.finding_count == 1 - assert store.repo_release_calls == [(REPO_ID, "worker-a")] + assert store.repo_release_calls == [(REPO_ID, "worker-a", 42)] def test_completed_finding_is_tagged_with_branch_and_commit_from_job(): @@ -259,7 +275,9 @@ def test_repo_lease_failure_returns_job_to_pending_without_scanner_or_attempt(): assert store.repo_release_calls == [] -def test_repo_lease_failure_stops_current_once_loop(): +def test_repo_lease_failure_continues_to_next_job_in_once_loop(): + # FR-6 skip-bug fix: a repo-lease failure on one job must NOT break the + # whole once-loop. Both jobs are attempted; both are returned to pending. first = _job_with_id("scan_job_first", "1" * 40) second = _job_with_id("scan_job_second", "2" * 40) store = FakeWorkerStore([first, second]) @@ -277,10 +295,49 @@ def test_repo_lease_failure_stops_current_once_loop(): summary = run_scan_worker_once(request) - assert summary.leased == 1 - assert store.lease_calls == 1 - assert store.pending_returns == [("scan_job_first", "repo lease unavailable")] + # the loop CONTINUED: both jobs leased + both returned to pending, no break. + assert summary.leased == 2 + assert store.lease_calls == 2 + assert store.pending_returns == [ + ("scan_job_first", "repo lease unavailable"), + ("scan_job_second", "repo lease unavailable"), + ] assert scanner.calls == [] + assert store.repo_release_calls == [] + + +def test_repo_lease_failure_on_one_job_does_not_block_a_later_scannable_job(): + # Mixed: first job's repo is locked, second job's repo is free -> the worker + # skips the first and still scans the second (loop continuation, not break). + first = _job_with_id("scan_job_locked", "1" * 40) + second = _job_with_id("scan_job_free", "2" * 40) + + class GatedLeaseStore(FakeWorkerStore): + def acquire_repo_lease(self, repo_id, worker_id, lease_seconds): + self.repo_lease_calls.append((repo_id, worker_id, lease_seconds)) + # first lease attempt fails, every later attempt succeeds. + return None if len(self.repo_lease_calls) == 1 else 7 + + store = GatedLeaseStore([first, second]) + scanner = FakeScanner(findings=[_finding(commit=None)]) + request = ScanWorkerRequest( + store=store, + fetch_repo=lambda url: Path("/synthetic-cache/example-repo"), + scanner=scanner, + max_jobs=2, + lease_seconds=60, + worker_id="worker-a", + now_factory=lambda: NOW, + ) + + summary = run_scan_worker_once(request) + + assert summary.leased == 2 + assert summary.completed == 1 + assert store.pending_returns == [("scan_job_locked", "repo lease unavailable")] + assert len(scanner.calls) == 1 + # the free job's lease was released with its fence. + assert store.repo_release_calls == [(REPO_ID, "worker-a", 7)] def test_scanner_failure_records_retryable_failure_and_releases_repo_lease(): @@ -295,7 +352,7 @@ def test_scanner_failure_records_retryable_failure_and_releases_repo_lease(): assert job_id == "scan_job_synthetic" assert "synthetic scanner failure" in error assert next_attempt_at == NOW + dt.timedelta(seconds=60) - assert store.repo_release_calls == [(REPO_ID, "worker-a")] + assert store.repo_release_calls == [(REPO_ID, "worker-a", 42)] def test_attempts_exhausted_is_reported_as_dead_letter(): @@ -386,3 +443,183 @@ def test_daemon_does_not_sleep_after_final_bounded_poll(): # idle each poll -> sleeps between polls but NOT after the last bounded poll. assert summary.polls == 2 assert sleeps == [3.0] + + +def test_worker_threads_repo_fence_into_release(): + # FR-6/SC-2: the worker carries the fence acquire_repo_lease returned and + # passes it to release_repo_lease so a reaped worker cannot delete a lease + # it no longer owns. The fake returns fence 42 on acquire. + store = FakeWorkerStore([_job()]) + scanner = FakeScanner(findings=[_finding(commit=None)]) + + run_scan_worker_once(_request(store, scanner)) + + assert store.repo_release_calls == [(REPO_ID, "worker-a", 42)] + + +def test_worker_threads_leased_job_fence_into_retryable_failure(): + # On scanner failure the worker fences the failure write with the leased + # job's worker_id+fence (the fake lease bumps fence to job.fence + 11 = 11). + store = FakeWorkerStore([_job(attempts=0, max_attempts=3)]) + scanner = FakeScanner(error=RuntimeError("synthetic scanner failure")) + + run_scan_worker_once(_request(store, scanner)) + + assert store.retry_fences == [("worker-a", 11)] + # and the repo lease is still released with the acquired repo fence. + assert store.repo_release_calls == [(REPO_ID, "worker-a", 42)] + + +def _baseline_job() -> ScanJob: + job = _job() + return ScanJob(**{**job.__dict__, "job_type": "baseline"}) + + +def test_completed_incremental_job_advances_incremental_repo_health(): + # SC-5: a successful incremental completion advances the incremental field, + # keyed by job_type, with the scan completion timestamp. + store = FakeWorkerStore([_job()]) # default job_type == "incremental" + scanner = FakeScanner(findings=[_finding(commit=None)]) + + run_scan_worker_once(_request(store, scanner)) + + assert store.health_advances == [(REPO_ID, "incremental", NOW)] + + +def test_completed_baseline_job_advances_baseline_repo_health(): + store = FakeWorkerStore([_baseline_job()]) + scanner = FakeScanner(findings=[_finding(commit=None)]) + + run_scan_worker_once(_request(store, scanner)) + + assert store.health_advances == [(REPO_ID, "baseline", NOW)] + + +def test_ledger_already_present_completion_still_advances_repo_health(): + # The fast path (ledger already exists) is still a successful completion for + # this repo, so freshness must advance. + store = FakeWorkerStore([_job()]) + store.ledger_keys.add(_job().ledger_key) + scanner = FakeScanner() + + run_scan_worker_once(_request(store, scanner)) + + assert scanner.calls == [] # no re-scan + assert store.health_advances == [(REPO_ID, "incremental", NOW)] + + +def test_failed_scan_does_not_advance_repo_health(): + store = FakeWorkerStore([_job(attempts=0, max_attempts=3)]) + scanner = FakeScanner(error=RuntimeError("synthetic scanner failure")) + + run_scan_worker_once(_request(store, scanner)) + + assert store.health_advances == [] # only successful completions advance + + +# --- M3 / FR-4: N-worker no-duplicate (RepoLease + fence) ------------------ + + +class SharedRepoLeaseStore(FakeWorkerStore): + """FakeWorkerStore variant with a REAL per-repo RepoLease CAS. + + The base fake gates the lease with a single global ``repo_lease_available`` + boolean, which cannot express "two distinct workers contend for the SAME + repo". This subclass keeps a per-repo holder map so ``acquire_repo_lease`` + behaves like the store's CAS (M2): the first worker to hold a repo wins a + fence, a second concurrent worker for that repo gets ``None`` until the first + releases. That is the FR-4 duplicate-prevention invariant at the worker + level — two scan-worker@ processes cannot both scan one repo concurrently. + """ + + def __init__(self, jobs: list[ScanJob] | None = None) -> None: + super().__init__(jobs) + self._held: dict[str, str] = {} # repo_id -> worker_id currently holding + self._fence_seq = 0 + + def acquire_repo_lease(self, repo_id, worker_id, lease_seconds): + self.repo_lease_calls.append((repo_id, worker_id, lease_seconds)) + if repo_id in self._held: + return None # another worker holds this repo: no duplicate scan. + self._held[repo_id] = worker_id + self._fence_seq += 1 + return self._fence_seq + + def release_repo_lease(self, repo_id, worker_id, *, fence=None): + self.repo_release_calls.append((repo_id, worker_id, fence)) + if self._held.get(repo_id) == worker_id: + del self._held[repo_id] + + +def _request_for(store, scanner, *, worker_id, fetch_repo=None): + return ScanWorkerRequest( + store=store, + fetch_repo=fetch_repo or (lambda url: Path("/synthetic-cache/example-repo")), + scanner=scanner, + max_jobs=1, + lease_seconds=60, + worker_id=worker_id, + now_factory=lambda: NOW, + ) + + +def test_two_workers_cannot_both_scan_the_same_repo_concurrently(): + # FR-4: two jobs for the SAME repo are leased by two distinct workers while a + # lease is held. The RepoLease CAS lets only ONE worker scan; the other + # returns its job to pending without scanning. No duplicate scan of one repo. + first = _job_with_id("scan_job_repoX_c1", "1" * 40) + second = _job_with_id("scan_job_repoX_c2", "2" * 40) + store = SharedRepoLeaseStore([first, second]) + + # worker-a holds the repo lease for REPO_ID before worker-b runs (simulating + # an in-flight scan): worker-b must be denied and not scan. + held_fence = store.acquire_repo_lease(REPO_ID, "worker-a", 60) + assert held_fence is not None + + scanner_b = FakeScanner(findings=[_finding(commit=None)]) + summary_b = run_scan_worker_once( + _request_for(store, scanner_b, worker_id="worker-b") + ) + + # worker-b leased a job but could NOT acquire the repo lease -> no scan, job + # returned to pending (the no-duplicate guarantee). + assert summary_b.leased == 1 + assert summary_b.completed == 0 + assert scanner_b.calls == [] + assert store.pending_returns == [("scan_job_repoX_c1", "repo lease unavailable")] + + # worker-a finishes and releases; now a worker can scan the repo (no + # permanent starvation — the lease is a mutex, not a drop). + store.release_repo_lease(REPO_ID, "worker-a", fence=held_fence) + scanner_c = FakeScanner(findings=[_finding(commit=None)]) + summary_c = run_scan_worker_once( + _request_for(store, scanner_c, worker_id="worker-c") + ) + assert summary_c.completed == 1 + assert len(scanner_c.calls) == 1 + + +def test_n_workers_each_take_distinct_repos_without_collision(): + # Three workers, three jobs on three DISTINCT repos: with no repo contention + # every worker scans its own repo. Proves the lease only blocks SAME-repo + # collisions, not independent parallelism (the point of the N-process pool). + jobs = [] + for i in range(1, 4): + job = _job_with_id(f"scan_job_r{i}", f"{i}" * 40) + jobs.append( + ScanJob(**{**job.__dict__, "repo_id": f"repo_distinct_{i:024d}"}) + ) + store = SharedRepoLeaseStore(list(jobs)) + + completed = 0 + for i, worker_id in enumerate(("worker-1", "worker-2", "worker-3")): + scanner = FakeScanner(findings=[_finding(commit=None)]) + summary = run_scan_worker_once( + _request_for(store, scanner, worker_id=worker_id) + ) + completed += summary.completed + + assert completed == 3 + assert store.pending_returns == [] # no contention, nothing bounced + # every held repo lease was released after each scan (no leak). + assert store._held == {} diff --git a/tests/test_systemd_units.py b/tests/test_systemd_units.py new file mode 100644 index 0000000..f7227cf --- /dev/null +++ b/tests/test_systemd_units.py @@ -0,0 +1,234 @@ +"""Structure checks for the M3 scale-redesign systemd unit artifacts. + +The deployment box is OFFLINE, so DEPLOYED runtime behavior (N live processes, +Restart=on-failure recovery, real cadence values) is box-gated and cannot be +proven here. These tests instead validate the unit-file ARTIFACTS in-repo: that +the instanced worker template uses %i as the worker id, that the worker unit has +Restart=on-failure, that every periodic .service has a matching .timer, that the +timers carry the GATE-1 placeholder marker, and that the governance-gated +catalog-reconcile unit carries the GATE-2 "do not enable" note. No live systemd +is required. +""" + +from __future__ import annotations + +import configparser +from pathlib import Path + +import pytest + +SYSTEMD_DIR = Path(__file__).resolve().parents[1] / "deploy" / "systemd" + +WORKER_TEMPLATE = SYSTEMD_DIR / "security-scanner-scan-worker@.service" +WORKER_TARGET = SYSTEMD_DIR / "scan-worker.target" + +# (service, timer) pairs for the periodic jobs M3 wired. +PERIODIC_UNITS = { + "lease-reaper": ( + "security-scanner-lease-reaper.service", + "security-scanner-lease-reaper.timer", + ), + "incr-poll": ( + "security-scanner-incr-poll.service", + "security-scanner-incr-poll.timer", + ), + "baseline": ( + "security-scanner-baseline.service", + "security-scanner-baseline.timer", + ), + "catalog-reconcile": ( + "security-scanner-catalog-reconcile.service", + "security-scanner-catalog-reconcile.timer", + ), + "freshness-eval": ( + "security-scanner-freshness-eval.service", + "security-scanner-freshness-eval.timer", + ), +} + + +def _parse_unit(path: Path) -> configparser.ConfigParser: + """Parse a systemd unit file with the same lenient rules systemd uses. + + Directives like ExecStart can repeat and values are not interpolated, so we + disable strict-duplicate checking and ``%``-interpolation (``%i`` must be + read literally, not treated as a configparser format token). + """ + parser = configparser.ConfigParser(strict=False, interpolation=None) + # Preserve case of keys (systemd directives are CamelCase). + parser.optionxform = str # type: ignore[assignment] + parser.read_string(path.read_text(encoding="utf-8")) + return parser + + +def test_worker_template_and_target_exist() -> None: + assert WORKER_TEMPLATE.is_file(), WORKER_TEMPLATE + assert WORKER_TARGET.is_file(), WORKER_TARGET + # An instanced unit MUST have the @ template suffix so scan-worker@1..N + # instantiate from one file. + assert WORKER_TEMPLATE.name.endswith("@.service") + + +def test_worker_template_uses_instance_id_as_worker_id() -> None: + parser = _parse_unit(WORKER_TEMPLATE) + exec_start = parser.get("Service", "ExecStart") + # %i (the systemd instance name) is threaded into --worker-id so each + # scan-worker@i is a distinct fence-token holder (FR-4 duplicate prevention). + assert "%i" in exec_start + assert "--worker-id" in exec_start + assert "scan-worker@%i" in exec_start + # It must run the daemon, not a oneshot scan. + assert "scan-worker" in exec_start + assert "--daemon" in exec_start + + +def test_worker_template_has_crash_restart_and_install() -> None: + parser = _parse_unit(WORKER_TEMPLATE) + # Restart=on-failure is the deployed worker-crash recovery half (box-gated to + # actually exercise, but the directive must be present in the artifact). + assert parser.get("Service", "Restart") == "on-failure" + assert parser.get("Service", "Type") == "simple" + # [Install] must wire instances into the pool target. + assert parser.get("Install", "WantedBy") == "scan-worker.target" + + +def test_worker_template_mirrors_existing_unit_conventions() -> None: + parser = _parse_unit(WORKER_TEMPLATE) + service = dict(parser.items("Service")) + # Same operator conventions as the existing scan-all unit. + assert service["User"] == "scanner" + assert service["Group"] == "scanner" + assert service["WorkingDirectory"] == "/opt/security-scanner" + assert "uv run" in service["ExecStart"] + assert "EnvironmentFile" in service + # Notification-log path is passed (matches scan-all's external-tooling tail). + assert "--notification-log" in service["ExecStart"] + + +def test_worker_target_brings_up_the_pool() -> None: + parser = _parse_unit(WORKER_TARGET) + assert parser.get("Install", "WantedBy") == "multi-user.target" + + +@pytest.mark.parametrize("service_name,timer_name", PERIODIC_UNITS.values(), + ids=list(PERIODIC_UNITS.keys())) +def test_periodic_service_and_timer_exist(service_name: str, timer_name: str) -> None: + assert (SYSTEMD_DIR / service_name).is_file(), service_name + assert (SYSTEMD_DIR / timer_name).is_file(), timer_name + + +@pytest.mark.parametrize("service_name,timer_name", PERIODIC_UNITS.values(), + ids=list(PERIODIC_UNITS.keys())) +def test_periodic_units_have_required_fields( + service_name: str, timer_name: str +) -> None: + service = _parse_unit(SYSTEMD_DIR / service_name) + # Each periodic job is a oneshot invoked by its timer. + assert service.get("Service", "Type") == "oneshot" + exec_start = service.get("Service", "ExecStart") + assert "uv run security-scanner" in exec_start + assert service.get("Install", "WantedBy") == "multi-user.target" + + timer = _parse_unit(SYSTEMD_DIR / timer_name) + assert timer.has_section("Timer") + assert timer.get("Timer", "OnCalendar") + # The timer binds back to the matching .service by basename. + assert timer.get("Timer", "Unit") == service_name + assert timer.get("Install", "WantedBy") == "timers.target" + + +@pytest.mark.parametrize("timer_name", [t for _, t in PERIODIC_UNITS.values()], + ids=list(PERIODIC_UNITS.keys())) +def test_timer_cadences_are_marked_placeholders(timer_name: str) -> None: + # Cadence values are box-gated; each timer must clearly mark its OnCalendar as + # a GATE-1 placeholder so no one mistakes it for a load-validated value. + text = (SYSTEMD_DIR / timer_name).read_text(encoding="utf-8") + assert "PLACEHOLDER" in text + assert "GATE 1" in text + + +def test_exec_start_targets_match_wired_cli_commands() -> None: + # Each periodic .service must invoke a CLI subcommand that actually exists. + expected_command = { + "security-scanner-lease-reaper.service": "reap-expired-leases", + "security-scanner-incr-poll.service": "discover-updates", + "security-scanner-baseline.service": "baseline", + "security-scanner-catalog-reconcile.service": "reconcile", + "security-scanner-freshness-eval.service": "freshness-eval", + } + from security_scanner.cli.app import build_parser + + parser = build_parser() + # Pull the registered subcommand names from the argparse subparsers action. + import argparse + + sub_action = next( + a + for a in parser._subparsers._group_actions # type: ignore[union-attr] + if isinstance(a, argparse._SubParsersAction) + ) + registered = set(sub_action.choices) + + for service_name, command in expected_command.items(): + exec_start = _parse_unit(SYSTEMD_DIR / service_name).get("Service", "ExecStart") + assert command in exec_start, (service_name, command) + assert command in registered, command + + +ALL_M3_SERVICES = [ + "security-scanner-baseline.service", + "security-scanner-catalog-reconcile.service", + "security-scanner-freshness-eval.service", + "security-scanner-incr-poll.service", + "security-scanner-lease-reaper.service", + "security-scanner-scan-worker@.service", +] + + +# Machine-local absolute-path prefixes ci/public-safety (identifier.private-path) +# rejects on changed lines. Assembled from parts so this test file itself carries +# no contiguous literal that would trip the same rule. +FORBIDDEN_ABS_PREFIXES = tuple(f"/{seg}/" for seg in ("var", "home", "srv", "Users")) + + +@pytest.mark.parametrize("service_name", ALL_M3_SERVICES) +def test_units_use_systemd_managed_state_dirs(service_name: str) -> None: + # The M3 units must NOT hardcode machine-local absolute paths. Logs and cache + # are delegated to systemd's StateDirectory machinery: LogsDirectory= and + # CacheDirectory= make systemd create and own the per-service log/cache dirs + # and auto-grant the unit RW to them (so no explicit ReadWritePaths for them is + # needed, even under ProtectSystem=strict). + parser = _parse_unit(SYSTEMD_DIR / service_name) + service = dict(parser.items("Service")) + assert service.get("LogsDirectory") == "security-scanner", service_name + assert service.get("CacheDirectory") == "security-scanner", service_name + # No literal machine-local absolute path may remain anywhere in the unit file. + text = (SYSTEMD_DIR / service_name).read_text(encoding="utf-8") + for prefix in FORBIDDEN_ABS_PREFIXES: + assert prefix not in text, (service_name, prefix) + + +def test_worker_notification_log_uses_logs_directory() -> None: + # The worker writes its notification log under the systemd-managed logs dir, so + # ExecStart must reference the systemd-exported ${LOGS_DIRECTORY} rather than a + # hardcoded absolute path. + exec_start = _parse_unit(WORKER_TEMPLATE).get("Service", "ExecStart") + assert "--notification-log" in exec_start + assert "${LOGS_DIRECTORY}/scan-worker.log.jsonl" in exec_start + # The log filename is present, but only via the exported var, never as a + # hardcoded absolute path under a machine-local root. + for prefix in FORBIDDEN_ABS_PREFIXES: + assert prefix not in exec_start, prefix + + +def test_catalog_reconcile_is_documented_disabled_until_gate_2() -> None: + # The catalog-reconcile unit + timer are wired but stay governance-gated: the + # default provider refuses live fetch until GATE 2. Both files must carry the + # explicit "do not enable until GATE 2" note so an operator does not arm it. + for name in ( + "security-scanner-catalog-reconcile.service", + "security-scanner-catalog-reconcile.timer", + ): + text = (SYSTEMD_DIR / name).read_text(encoding="utf-8") + assert "GATE 2" in text + assert "DO NOT" in text.upper()