From 07c4e82a739d77735136ac605a931cf197b15b57 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 08:59:29 +0900
Subject: [PATCH 1/7] =?UTF-8?q?chore(autopilot):=20GHAS=EA=B8=89=20?=
 =?UTF-8?q?=EC=8B=9C=ED=81=AC=EB=A6=BF=20=ED=92=88=EC=A7=88=20goal=20?=
 =?UTF-8?q?=EC=85=8B=EC=97=85=20=E2=80=94=20spec=20=EC=8A=B9=EA=B2=A9=20+?=
 =?UTF-8?q?=20goal=20=ED=8C=A8=ED=82=B7=20+=20goal.yml=20=EB=A6=AC?=
 =?UTF-8?q?=ED=8F=AC=EC=9D=B8=ED=8A=B8?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- docs/workbench/specs/ghas-quality-secrets/{requirements,design,review}.md (리뷰 반영 v2 승격)
- docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md (실행 패킷)
- governance/autopilot_goal.yml → goal_id ghas-quality-secrets-parity
  (governance/** 광역 금지·parity_slo.py만, acceptance_checks 정렬, stop_conditions 정본 16)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../2026-06-21-ghas-quality-secrets-goal.md   | 186 +++++++++++++++
 .../specs/ghas-quality-secrets/design.md      | 211 ++++++++++++++++++
 .../ghas-quality-secrets/requirements.md      | 154 +++++++++++++
 .../specs/ghas-quality-secrets/review.md      |  60 +++++
 governance/autopilot_goal.yml                 |  12 +-
 5 files changed, 617 insertions(+), 6 deletions(-)
 create mode 100644 docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
 create mode 100644 docs/workbench/specs/ghas-quality-secrets/design.md
 create mode 100644 docs/workbench/specs/ghas-quality-secrets/requirements.md
 create mode 100644 docs/workbench/specs/ghas-quality-secrets/review.md

diff --git a/docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md b/docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
new file mode 100644
index 0000000..84e184f
--- /dev/null
+++ b/docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
@@ -0,0 +1,186 @@
+# Agentic Workflow: GHAS급 시크릿 탐지 품질 (parity SLO)
+
+**Status:** Ready for long single-goal execution
+**Date:** 2026-06-21
+**Goal ID:** `ghas-quality-secrets-parity`
+**Spec:** `docs/workbench/specs/ghas-quality-secrets/{requirements,design,review}.md`
+**Merge flow:** pull request
+
+장시간 단일 goal 실행 패킷. 시크릿 탐지를 **GHAS parity SLO**에 맞추는 품질 머신과 측정 harness를
+구축한다. 실 GHAS live-fetch는 stop-condition이므로 자율층은 **synthetic redacted snapshot fixture**로만
+증명하고, 실 GHAS 취득·baseline·enforce는 human-gated(H1~H3)로 격리한다.
+
+## Goal
+
+시크릿 탐지의 per-repo 1:1 GHAS parity 측정 harness + 티어드 FP-억제 품질 머신 + report-only CI SLO
+게이트를 synthetic fixture로 TDD 완성하고 PR/CI/merge까지 닫는다.
+
+**완료 기준(자율 goal done = M5):**
+
+- **M1**: `core/evaluation/metrics.py` 위 per-repo precision/recall parity harness. `baseline/ghas_api`는
+  GHAS alert→`EvaluationKey` 어댑터로만. `secret_type↔rule_id` 정규화 맵 + type-coverage 메타. state-aware
+  truth(open+resolved-TP만 positive; dismissed 분리). 라인 tolerance(구간겹침/±k). full-history universe.
+  **신규 precision/recall·gate 계산 코드 0줄(metrics 재사용).** 적대적 fixture(type-mismatch/line-drift/
+  dismissed)에서 누락이 red.
+- **M2**: 인라인 싼 티어 = `scanners/gitleaks/filter.py` noise_reason 확장(path-role/context-class +
+  partner-pattern). 결정적·no-network 부분은 default-on, 동작 변경분만 gated. scan-time filter(미생성) vs
+  post-scan disposition(FALSE_POSITIVE) 경계 명시. 11-FP 억제 + canary TP 보존 + 기존 default 불변.
+- **M3**: LLM 티어 disposition 자동배선. `runtime/scan_all.py`(기존) + **`runtime/scan_worker.py`(신규)**:
+  worker는 per-job 핫패스라 인라인 싼 티어만 동기, 애매 건 LLM verdict는 별도 비동기 큐/후속 잡. NEEDS_REVIEW
+  재verify backoff/skip-key. secretHash salt provenance.
+- **M4**: non-GHAS drift monitor. GHAS-calibrated 분포(M1) 기준선 + verifier-직교 분포-shift 교차. 입력
+  `eval/synthetic-corpus` 재사용. parity metric과 분리 필드. default-off·오프라인 비활성. 폴링 스케줄 신설 없음.
+- **M5**: report-only CI SLO 게이트 = 신규 `governance.parity_slo --check`. threshold yml 부재→report-only.
+  snapshot 나이>임계→`stale-degraded`(silent pass 금지).
+- 기존 Gitleaks-first secret default path 불변. GHAS trigger/upload/mutation/live-fetch 없음.
+- Architecture review gate(pre/post-M2/post-M3/final) blocking finding 없음. PR CI + local governance gate 통과.
+
+**human-gated 운영층(자율 루프 밖, stop-condition PR):** H1 실 GHAS snapshot 취득(local 비커밋) → H2
+baseline 측정 + fixture-vs-real divergence 보고 + measure-first 목표 확정 → H3 threshold 커밋 + enforce 전환.
+
+## Execution Contract
+
+- 단일 장기 goal로 M0~M5를 끝까지. 중간 milestone 사용자 승인 없음. 사람 개입은 stop condition 시에만.
+- Subagent 적극 사용. 구현 worker는 `gpt-5.5` reasoning_effort high; 보조 coding/review는 repo policy.
+- PR 만들고 CI 통과 후 merge 가능 상태까지. runtime mutation 허용하되 committed artifact는 synthetic/redacted만.
+- 실 endpoint/host/credential/private path/real GHAS export/real finding 커밋 금지.
+
+## Fixed Decisions
+
+- Scope: 시크릿 서브트랙 M1~M5. vuln/SAST는 별도 goal(`.claude/specs/20260621-ghas-quality-vuln-subtrack/`).
+- 측정: GHAS parity SLO, per-repo 1:1, snapshot=ground-truth(frozen synthetic). 계산은 `metrics.py` 재사용,
+  ghas_api는 어댑터. precision은 Q3상 GHAS를 못 넘음(parity = "GHAS만큼").
+- validity check: no-network(verifier+휴리스틱+partner-pattern). live validity는 evidence-gated 연기.
+- 인라인 싼 티어: 기존 `filter.py` 확장(default-on 결정적 부분), 신규 동작분만 gated.
+- LLM 티어: detector 아님, verifier/explainer. redacted 입력(raw snippet 금지), strict JSON, fail-closed
+  NEEDS_REVIEW(무기록 + 재verify backoff). default-off gated.
+- snapshot: synthetic redacted fixture만 커밋, `source: synthetic` provenance marker 필수, marker 없으면
+  fail-closed. 실 snapshot은 `.gitignore` + allowed_writes 비포함으로 이중 차단.
+- CI 게이트: threshold 부재→report-only, 존재→enforce(measure-first 자동 분기). 자율층은 항상 report-only.
+- **governance 핵심 자율수정 금지**: allowed_writes는 `governance/parity_slo.py`만. `autopilot_goal.yml`·
+  `autopilot_gate.py`·`public_safety.py` 수정 필요 시 stop(scope-expansion) → 사람 PR.
+- GHAS: out of scope(자율). live fetch/upload/mutation 금지 → H1~H3 human-gated.
+
+## Required Architecture Review Gate
+
+Mandatory, blocking. Checkpoints: (1) pre-implementation (2) post-M2 (3) post-M3 (4) final.
+Blocking finding이 SoT change/scope expansion/unsafe data/secret default change를 요구할 때만 정지;
+그 외엔 같은 goal 안에서 수정.
+
+## Multi-agent Execution Model
+
+Subagent를 disjoint 책임으로. Main agent가 통합·최종 판단.
+
+| Role | Responsibility | Write scope |
+| --- | --- | --- |
+| `system_architecture_manager` | Architecture gate, SoT drift, 측정 방법론 건전성 | read-only |
+| `codebase_architecture_manager` | seam/locality(metrics 재사용, scan_worker 2경로, filter seam) | read-only |
+| Worker A | parity harness + 정규화 맵 + 적대적 fixture | `src/security_scanner/baseline/**`, `eval/**`, tests |
+| Worker B | 인라인 싼 티어(filter.py 확장) | `src/security_scanner/scanners/gitleaks/**`, tests |
+| Worker C | LLM 티어 disposition 배선(scan_all + scan_worker) | `src/security_scanner/runtime/**`, `llm/**`, tests |
+| Worker D | drift monitor + CI parity_slo 게이트 | `governance/parity_slo.py`, `src/security_scanner/runtime/scan_health*`, tests |
+| Reviewer | public-safety/security review | read-only |
+| `code_simplifier` | 최종 clarity pass(행동 보존) | touched files only |
+
+## Allowed Write Surface
+
+`governance/autopilot_goal.yml`의 `allowed_writes`가 authoritative. 요약: 승격된 spec, 이 workflow 문서,
+src/tests/eval/examples, `governance/parity_slo.py`(신규 게이트만), ledger, CURRENT.md. **`governance/**`
+광역 아님** — 그 밖 governance 파일 변경은 scope expansion으로 정지.
+
+## Suggested Work Plan
+
+### Readiness (M0)
+1. 계약 읽기: `AGENTS.md`, `governance/autopilot_goal.yml`, 이 문서, spec 3종(requirements/design/review).
+2. pre-implementation architecture review. 3. 워크트리 격리 + write surface 확인.
+
+### M1 parity harness + 정규화 + 적대적 fixture
+1. red-first: 정규화 맵 누락이 `type-unmatched` 버킷으로 분리됨; state-aware truth가 dismissed를 분모에서
+   제외; line-tolerance가 ±k 매칭; precision/recall이 `metrics.py`에서 산출; 적대적 fixture가 누락을 red로.
+2. 구현: ghas_api alert→EvaluationKey 어댑터, 정규화 맵, harness가 metrics 재사용. **신규 계산 코드 0줄.**
+
+### M2 인라인 싼 티어
+1. red-first: path-role/context-class 억제, partner-pattern 고신뢰, canary TP 보존, 기존 default 불변.
+2. 구현: filter.py noise_reason 확장. scan-time vs post-scan 경계 명시. post-M2 architecture review.
+
+### M3 LLM 티어 disposition (scan_all + scan_worker)
+1. red-first: scan_worker 동기 인라인 + 비동기 LLM 큐; NEEDS_REVIEW 무기록+backoff; disposition durable write.
+2. 구현: scan_worker에 2경로 배선. post-M3 architecture review.
+
+### M4 drift monitor
+1. red-first: 기준선이 GHAS-derived임 증명, SLO 미오염, 분리 필드, 폴링 신설 없음.
+2. 구현: eval/synthetic-corpus 재사용 + verifier-직교 분포-shift.
+
+### M5 CI SLO gate(report-only)
+1. red-first: threshold 부재→report-only, 나이>임계→stale-degraded.
+2. 구현: `governance/parity_slo.py`. final architecture review → PR. CURRENT.md에 "SLO enforce 미달성, H-track 대기".
+
+## Required Local Checks
+
+```bash
+uv run pytest
+uv run python -m governance.render --validate
+uv run python -m governance.render --check
+uv run python -m governance.rebuild_ledger_index --check
+uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
+uv run python -m governance.public_safety --diff origin/main...HEAD
+uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-secrets
+uv run python -m governance.parity_slo --check
+uv run python -m governance.autopilot_gate --base origin/main
+```
+
+## Stop Conditions
+
+`governance/autopilot_goal.yml`의 `stop_conditions`(정본 16). 핵심: ghas-live-fetch-or-mutation-required
+(H1 실 snapshot), existing-secret-default-behavior-change, architecture-review-blocking-finding,
+storage-projection-or-schema-migration-required(disposition durable write), public-safety-hit,
+scope-expansion(governance 핵심 파일 수정 포함), same-blocker-three-times, break-glass.
+
+## Resume Prompt
+
+```text
+Goal: complete `ghas-quality-secrets-parity` in the security-scanner repo through a PR.
+
+Read first:
+- AGENTS.md
+- governance/autopilot_goal.yml
+- docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
+- docs/workbench/specs/ghas-quality-secrets/requirements.md
+- docs/workbench/specs/ghas-quality-secrets/design.md
+- docs/workbench/specs/ghas-quality-secrets/review.md
+- src/security_scanner/baseline/ghas_api/__init__.py
+- src/security_scanner/core/evaluation/metrics.py
+- src/security_scanner/scanners/gitleaks/{filter,parser}.py
+- src/security_scanner/runtime/{scan_all,scan_worker,verify_artifact}.py
+- src/security_scanner/core/finding/model.py
+- src/security_scanner/llm/common/verifier.py
+
+Implement M1~M5 (autonomous, synthetic fixtures only, no real GHAS):
+M1 parity harness on core/evaluation/metrics.py + secret_type<->rule_id normalization map +
+   state-aware truth + line tolerance + adversarial fixtures. Zero new precision/recall code.
+M2 inline cheap tier by extending scanners/gitleaks/filter.py noise_reason (default-on deterministic
+   part, gated for new behavior). scan-time vs post-scan boundary.
+M3 LLM tier disposition wiring into scan_all AND scan_worker (worker: sync inline + async LLM queue).
+   NEEDS_REVIEW no-write + re-verify backoff.
+M4 non-GHAS drift monitor (GHAS-calibrated baseline, separated field, no new polling).
+M5 report-only CI SLO gate governance.parity_slo --check (report-only until threshold yml exists).
+
+Use multi-agent execution. Mandatory architecture gates: pre-implementation, post-M2, post-M3, final.
+Do not change existing Gitleaks-first secret defaults. Do not call GHAS, upload, mutate, fetch live,
+commit real snapshots/findings, or modify governance/autopilot_goal.yml | autopilot_gate.py |
+public_safety.py (allowed_writes = governance/parity_slo.py only). Real GHAS snapshot fetch, baseline
+measurement, and enforce flip are human-gated H1~H3, OUT of this run. Finish by opening a PR, waiting
+for CI, and merging when green. Autonomous done = M5 (report-only gate); record "SLO enforce pending
+H-track" in CURRENT.md.
+
+Required checks:
+- uv run pytest
+- uv run python -m governance.render --validate
+- uv run python -m governance.render --check
+- uv run python -m governance.rebuild_ledger_index --check
+- uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
+- uv run python -m governance.public_safety --diff origin/main...HEAD
+- uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-secrets
+- uv run python -m governance.parity_slo --check
+- uv run python -m governance.autopilot_gate --base origin/main
+```
diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
new file mode 100644
index 0000000..903e350
--- /dev/null
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -0,0 +1,211 @@
+# GHAS급 시크릿 탐지 품질 — Design Spec (v2, 리뷰 반영)
+
+> Phase 2 (grill-to-spec). SoT: `requirements.md`(승인됨) + 이 `design.md`.
+> v2: 멀티에이전트 리뷰(29건: blocker 1·major 7·minor/nit) 반영. `review.md` 참조.
+> 대상: **시크릿 서브트랙**(vuln은 별도 — `.claude/specs/20260621-ghas-quality-vuln-subtrack/`).
+> 실행 형태: **autopilot 단일 goal long-single-goal**(`governance/autopilot_goal.yml` 패턴).
+
+## Overview
+
+시크릿 탐지를 GHAS parity SLO에 도달시키는 품질 머신과 측정 harness. 핵심 제약: 실 GHAS
+live-fetch는 stop-condition(`ghas-live-fetch-or-mutation-required`), 커밋은 synthetic-or-redacted-only.
+→ 두 층 분리:
+
+- **자율층(autopilot 단일 goal, M0~M5)**: parity harness + 티어드 품질 머신 + disposition 자동배선 +
+  CI SLO 게이트를 **synthetic redacted snapshot fixture** + 기존 eval 코퍼스로 TDD 구축·증명. 실 GHAS 무접촉.
+- **human-gated 운영층(H1~H3, stop-condition PR)**: 실 GHAS snapshot 취득 → baseline 측정 →
+  measure-first 목표 확정 → CI 게이트 enforce 전환. 자율 루프 밖.
+
+**done 정의 명확화(리뷰 report-only-enforce-unreachable)**: 자율 goal done = **M5**(머신+harness+
+report-only 게이트, synthetic 증명, PR merge). requirements Q10의 v1 done(baseline 측정+목표 도달)은
+**H1~H3 완료 후에만** 성립. PR merge 시 CURRENT.md에 "SLO enforce 미달성, H-track 대기" 명시.
+
+## Requirements Reference
+
+`requirements.md` Q1~Q10 locked. 핵심: GHAS parity SLO · per-repo 1:1 · snapshot=ground-truth ·
+non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자동 · main 위 쌓기 · measure-first done.
+
+## 측정 의미론 (Measurement Semantics) — 리뷰 blocker/major 반영, 신규 락인
+
+리뷰가 측정 차원에서 blocker 1 + major 2를 냈다. 핵심 의미론을 Open Question에서 **설계 결정으로 격상**.
+
+- **match key 정규화(blocker `match-key-type-mismatch`)**: GHAS `secret_type`(`github_personal_access_token`)
+  와 gitleaks `rule_id`(`github-pat`)는 표기가 다르다. 정규화 없이 완전일치 비교하면 동일 시크릿이
+  `local_only`(FP↑)·`ghas_only`(recall miss↑) 양쪽에 들어가 baseline gap이 표기 아티팩트로 오염된다.
+  → **`secret_type ↔ rule_id` 정규화 맵을 M1 1급 산출물로** 승격. 맵 부재 쌍은 matched로 세지 말고
+  별도 `type-unmatched-but-colocated` 버킷으로 노출(silent 오집계 금지) + `type-coverage` 메타지표 보고.
+- **라인 매칭 tolerance(minor `line-exact-match`)**: 현 키는 `line_start` 단일 완전일치(line_end·tolerance
+  없음). full-history 좌표는 멀티라인·diff·재포맷으로 ±몇 줄 어긋난다. → match는 `line_start..line_end`
+  **구간 겹침 또는 ±k줄 tolerance** 허용. universe는 **full-history 정렬 고정**(증거: HEAD-only=0, full=11).
+- **state-aware truth(major `alert-state-not-filtered`)**: GHAS alert raw stream을 truth로 쓰면 owner가
+  이미 dismiss한 FP까지 정답지가 된다(우리가 안 띄우면 recall 처벌, 띄우면 shared-mode error). →
+  **positive truth = `open` + `resolved as true_positive`**; `dismissed`/`resolved-as-false_positive`/
+  `revoked`는 recall 분모에서 제외하고 **"GHAS-confirmed-FP" 신호로 분리 집계**(precision 진단에 활용,
+  parity 점수 비오염). fetch에 state 보존, `GhasComparisonResult`에 state 분해 추가.
+- **precision/recall 공식(minor `precision-recall-mislabeled`)**: 재사용 대상 `GhasComparisonResult`는
+  `ghas_coverage`/`local_extra_rate`만 노출. 명시: `recall = matched/(matched + ghas_only_positive_truth)`,
+  `precision = matched/(matched + local_only_after_truth_filter)`. Q3 규약(GHAS 미탐 우리 finding = FP)상
+  `local_only`는 FP로 들어가므로 **precision은 정의상 GHAS를 못 넘음**(parity = "GHAS만큼").
+- **집계**: per-repo **micro** 산출 후 **macro** 보고. SLO 게이트 판정은 macro.
+- **엔진 재사용(major `parity-harness-third-engine`)**: repo에 precision/recall 엔진이 둘 —
+  `baseline/ghas_api`(compare_ghas_alerts_with_findings, 카운트만)와 `core/evaluation/metrics.py`
+  (`EvaluationResult.precision/recall` + `EvaluationThresholds` gate, 완비). → **계산·게이트 계층은
+  `core/evaluation/metrics.py` 재사용**, `baseline/ghas_api`는 GHAS alert→`EvaluationKey` **어댑터로만**.
+  `GhasAlertComparisonKey`↔`EvaluationKey` 단일 adapter로 수렴. **M1 done 인변: "신규 precision/recall·
+  gate 계산 코드 0줄, 기존 metrics 재사용".**
+
+## Architecture
+
+```
+                 ┌──────────────── 자율층 (autopilot single goal, M0~M5) ─────────────────┐
+  scan / scan-all │  [티어드 품질 머신]                                                      │
+  scan_worker ───►│   ├ 인라인 싼 티어 = scanners/gitleaks/filter.py(noise_reason) 확장      │
+   (둘 다)         │   │     (default-on, 결정적·no-network: path-role/context-class+partner) │
+                  │   │     → scan chokepoint라 scan_all·scan_worker 자동 공유                │
+                  │   └ 비동기 LLM 티어(gated, default-off): ollama verifier(애매 건만)        │
+                  │         → set_finding_disposition (B-domain writer 재사용)               │
+  GHAS synthetic  │  [parity harness] per-repo 1:1 → core/evaluation/metrics.py(precision/    │
+   snapshot       │   recall/gate) 위에, ghas_api는 alert→EvaluationKey 어댑터                │
+   fixture   ────►│  [drift monitor] non-GHAS 샘플 → GHAS-calibrated 분포 대비 이탈(health,    │
+                  │   SLO 아님, passive piggyback)                                           │
+                  │  [CI SLO gate] governance.parity_slo --check: threshold 부재→report-only,  │
+                  │   존재→enforce(measure-first 자동 분기). snapshot 나이>임계→stale-degraded  │
+                  └─────────────────────────────────────────────────────────────────────────┘
+                 ┌──────────────── human-gated 운영층 (H1~H3, stop-condition PR) ────────────┐
+  실 GHAS API ───►│  baseline/ghas_api(GET-only) → 실 redacted snapshot(local, 비커밋) →       │
+  (human-PR)      │  baseline 측정 + fixture-vs-real divergence 보고 → 목표 확정 → enforce 전환 │
+                  └─────────────────────────────────────────────────────────────────────────┘
+```
+
+## Data Flow
+
+1. **측정(자율)**: synthetic snapshot fixture + 우리 스캔 → ghas_api 어댑터로 `EvaluationKey` 정규화 →
+   metrics.py로 per-repo precision/recall → macro 집계 → report-only SLO.
+2. **억제(자율)**: scan 시점 `filter.py`(인라인 싼 티어, default-on)가 path-role/context-class+partner로
+   즉시 억제(finding 미생성 또는 disposition). scan_all·scan_worker 공유. 애매 건만 비동기 LLM 티어
+   (gated)가 verdict → disposition.
+3. **calibration(human-gated)**: 실 snapshot fetch(stop-condition→사람 PR) → 실 baseline + divergence
+   보고 → 목표 확정 → threshold 커밋 → enforce.
+4. **drift(자율)**: non-GHAS 샘플을 GHAS-calibrated 분포(M1 집계) 대비 이탈로 측정 + verifier와 직교한
+   분포-shift 무라벨 신호 교차(common-cause bias 완화) → scan-health 분리 필드 노출(SLO 비오염).
+
+## Component Details
+
+| 컴포넌트 | 입력 | 출력 | 의존(코드 seam) |
+| --- | --- | --- | --- |
+| parity harness | findings_R, snapshot_R | per-repo·macro precision/recall | `core/evaluation/metrics.py`(계산·gate), `baseline/ghas_api`(alert→EvaluationKey 어댑터) |
+| 정규화 맵 | secret_type, rule_id | 정규화 type | M1 신규 산출물 + type-coverage 메타 |
+| snapshot store | (synthetic 커밋 / 실 local 비커밋) | frozen+state | `baseline/ghas_api` GET-only, provenance marker `source` 필수 |
+| 인라인 싼 티어 | finding + path/context | 억제/disposition | **`scanners/gitleaks/{filter,parser}.py`(noise_reason, enable_noise_filter)**, `llm/common/prompt.py` DEFAULT_PATH_ROLE_ANCHORS 어휘 통일 |
+| 비동기 LLM 티어 | 애매 finding | verdict→disposition | `llm/common/verifier.py`, `llm/ollama/client.py`, `runtime/verify_artifact.py` |
+| disposition 배선 | terminal verdict | B-domain write | `runtime/scan_all.py`(기존) + **`runtime/scan_worker.py`(신규 2경로)** |
+| drift monitor | non-GHAS 샘플 | health 신호(분리 필드) | LLM 티어, `runtime/scan_health.py` 또는 notification_log(M4서 택1 명시) |
+| CI SLO gate | frozen snapshot, threshold | report-only/enforce/stale-degraded | **신규 `governance.parity_slo --check`** + metrics gate |
+
+**Fixed decisions(리뷰 반영):**
+- 인라인 싼 티어는 **기존 `filter.py` noise_reason 확장**(이미 배선됨: `parser.py`에서 import·호출,
+  `enable_noise_filter` default True). 결정적·no-network·secret-egress 없음이라 **default-on 유지**가
+  `existing-secret-default-behavior-change` stop-condition에 안 걸린다(억제율 회귀 테스트로 보장).
+  신규 partner-pattern 고신뢰 매칭 등 동작 바꾸는 부분만 gated. **scan-time filter(finding 미생성)
+  vs post-scan disposition(생성 후 FALSE_POSITIVE) 경계**를 한 문장으로 못박아 이중 처리 차단:
+  placeholder/dummy/path-role은 scan-time, LLM verdict는 post-scan.
+- **주기 경로는 둘(major `periodic-path-is-scan-worker`)**: `scan_all.py`(주간 배치, verifier 이미 배선)
+  + **`scan_worker.py`(incr-poll→큐 드레인, #2 500+ 실경로, 현재 verifier/disposition 참조 0건)**.
+  worker는 per-job 핫패스라 **인라인 싼 티어만 동기 적용**, 애매 건 LLM verdict는 **별도 비동기 큐/후속
+  잡으로 분리**(인라인 LLM 금지). "500+ 비용 제어·주기 자동 혜택" 문구는 이 2경로 배선으로만 성립.
+- 커밋 snapshot은 **synthetic redacted fixture만**, **`source: synthetic` provenance marker 필수,
+  marker 없으면 harness/gate fail-closed**(security `real-snapshot-no-commit`). 실 snapshot은
+  `.gitignore` + allowed_writes 비포함 경로로 **이중 차단**(honor-system 아님).
+- CI 게이트는 threshold yml 부재/빈값이면 **report-only**, 존재하면 **enforce**(measure-first 자동 분기).
+  snapshot 나이>임계면 `pass` 아닌 **`stale-degraded`**로 떨궈 enforce 모드에서 차단/재취득 트리거
+  (staleness 가시성, `pass`로 silent 통과 금지).
+- drift monitor도 **LLM 티어와 동일 gated·default-off, 오프라인 박스 비활성**. parity metric과 **물리적
+  분리 필드**. 기준선은 GHAS-calibrated 분포, 입력 fixture는 `eval/synthetic-corpus` 재사용, **별도 폴링
+  스케줄 신설 금지**(능동 drift 비채택 준수).
+
+## Error Handling
+
+- verifier/store 실패: public-safe error 노출, scan summary 반영. `NEEDS_REVIEW`는 disposition write
+  안 함이되 **재verify 폭주 방지**(minor `needs-review-no-write`): 동일 finding_id 최근 verify
+  타임스탬프 기록→backoff, 또는 `disposition_lookup` line-stable gate가 unreviewed도 skip-key로
+  쓰는지 M3에서 택1 명시(비용 NFR 정합).
+- snapshot 부재/stale: 나이·타임스탬프 노출 + `stale-degraded` 상태. 목표 미설정이면 report-only.
+- 실 GHAS fetch 필요: autopilot 정지 → `ghas-live-fetch-or-mutation-required` stop-condition → 사람 PR.
+- `secretHash` egress(minor `secrethash-entropy-leak`): LLM 티어로 나가는 유일한 secret-파생 값.
+  per-deployment salt(`SECURITY_SCANNER_HASH_SALT`) 전제 명시 + M3 done에 salt provenance/강도 테스트
+  (현 `_DEFAULT_SALT` 하드코딩 약점 인지, 원격 ollama 시 위험).
+
+## Testing Strategy
+
+- TDD red-first. synthetic fixture + fake store/verifier로 CLI·runtime·storage·parity 경계 증명.
+- **적대적 fixture(major `synthetic-fixture-self-fulfilling`)**: 핸드오프 실관측 11건(discord×4
+  manifest-hash, github-pat×3 test-fixture, doc-example×4)의 redacted 구조 아날로그를 1:1 반영하되
+  **실제 GHAS `secret_type` 토큰을 그대로** 써서 (a) type 표기 불일치 쌍, (b) 정규화 후만 매칭, (c) 라인
+  ±1~2 오프셋, (d) dismissed-state 케이스를 포함 → **정규화/필터/tolerance 누락이 red가 되게**. 우리가
+  키를 맞춰 만든 fixture가 항상 green이 되는 self-fulfilling 차단.
+- 인라인 티어: 11-FP 억제 + canary TP 보존(`FALSE_NEGATIVE_PATTERN`). LLM 티어: redacted 입력, strict
+  JSON, fail-closed NEEDS_REVIEW, 애매 건만 호출. scan_worker 2경로(동기 인라인 + 비동기 LLM 큐) 증명.
+- 회귀: 기존 secret scan/report/gate/evaluate default 불변. governance: `pytest` + `public_safety` + `autopilot_gate`.
+
+## Autopilot Execution Shape — goal-setup 시 `governance/autopilot_goal.yml` 반영(리뷰 major 3건 반영)
+
+> **지시: 아래를 그대로 복사하지 말고, 현행 `phase-2a` goal.yml을 base 템플릿으로 두고 diff만 얹어라**
+> (major `acceptance-checks-drift`). 누락 게이트 방지.
+
+- `goal_id`: `ghas-quality-secrets-parity`
+- `execution_mode`: `long-single-goal` / human_gate: `stop-conditions-only` / merge_flow: `pull-request`
+- **SoT 위치 결정(major `allowed-writes-sot-path-mismatch`)**: 리뷰된 spec을 **`docs/workbench/specs/
+  ghas-quality-secrets/`로 승격(migrate)** 하고 git 추적(현 `.claude/specs/`는 gitignore라 게이트가
+  `outside allowed_writes`로 차단·public_safety 누락). grill 원본은 `.claude/specs/`에 두고 커밋본만 승격.
+- `allowed_writes`: `docs/workbench/specs/ghas-quality-secrets/**`,
+  `docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md`, `src/security_scanner/**`,
+  `tests/**`, `eval/**`, `ledger/**`, `CURRENT.md`, **`governance/parity_slo.py`(신규 게이트만)**.
+  **`governance/**` 광역 금지(major `allowed-writes-governance-self-modify`)** — `autopilot_goal.yml`·
+  `autopilot_gate.py`·`public_safety.py` 자율 수정 금지(Fixed decision), 필요 시 사람 PR.
+- `acceptance_checks`(phase-2a와 1:1 정렬): architecture-review **pre/post-M2/post-M3/final(4지점,
+  minor `milestone-arch-review-count`)** + `pytest` + `render --validate/--check` +
+  **`render_github_ruleset --check`** + `rebuild_ledger_index --check` + `public_safety --diff` +
+  **`public_safety --path docs/workbench/specs/ghas-quality-secrets`** + `autopilot_gate --base origin/main`
+  + **신규 `governance.parity_slo --check`**(report-only→enforce).
+- `stop_conditions`: **현행 정본 16개 집합을 base로** + 본 트랙 유효분 명시(`ghas-live-fetch-or-mutation-
+  required`, `existing-secret-default-behavior-change`, `architecture-review-blocking-finding`,
+  `storage-projection-or-schema-migration-required`(disposition durable write 경로), `public-safety-hit`,
+  `scope-expansion`, `same-blocker-three-times`, `break-glass` 등). 임의 부분집합 금지.
+
+## Milestones
+
+자율층 M0~M5(synthetic만, 실 GHAS 무접촉) / human-gated H1~H3.
+
+- **M0 Readiness** — 계약 읽기 + pre-implementation architecture review. _done: 게이트 통과, write surface·SoT 승격 확인._
+- **M1 parity harness + 정규화 맵 + 적대적 fixture** — `metrics.py` 위 per-repo precision/recall,
+  `ghas_api`는 어댑터, `secret_type↔rule_id` 정규화 맵 + type-coverage, state-aware truth, line-tolerance.
+  _done: 적대적 fixture(type-mismatch/line-drift/dismissed)에서 정규화·필터·tolerance 누락이 red,
+  정상 케이스 green. **신규 precision/recall·gate 계산 코드 0줄(metrics 재사용)**._
+- **M2 인라인 싼 티어** — `filter.py` noise_reason 확장(path-role/context-class+partner), default-on
+  결정적 부분 + gated 신규 부분 분리, scan-time vs post-scan 경계 명시. _done: 11-FP 억제 + canary TP
+  보존 + 기존 default 불변, 억제율 회귀 테스트. post-M2 아키텍처 리뷰._
+- **M3 LLM 티어 disposition 배선(scan_all + scan_worker)** — scan_worker에 동기 인라인 + 비동기 LLM 큐
+  2경로, NEEDS_REVIEW backoff/skip-key, salt provenance. _done: scan_worker disposition 반영 증명,
+  재verify 폭주 없음, NEEDS_REVIEW 무기록. post-M3 아키텍처 리뷰._
+- **M4 non-GHAS drift monitor** — GHAS-calibrated 분포 기준선 + verifier-직교 분포-shift 교차, 입력
+  `eval/synthetic-corpus` 재사용, 분리 필드, passive(폴링 신설 없음). _done: 기준선이 GHAS-derived임을
+  테스트로 증명, SLO 미오염, 전이 한계 design 문서화._
+- **M5 CI SLO gate(report-only) + stale-degraded** — `governance.parity_slo --check` 배선, threshold
+  부재→report-only, snapshot 나이>임계→stale-degraded. _done: CI 측정·리포트, silent staleness 없음.
+  final 아키텍처 리뷰 → PR merge. (자율 goal done; v1 done은 H3 후.)_
+- **H1 실 GHAS snapshot 취득(human-gated)** — `ghas-live-fetch` stop → 사람 PR, 실 redacted snapshot(local 비커밋).
+- **H2 baseline + 목표 + divergence 보고(human-gated)** — 실 snapshot 대비 gap 측정, **fixture-vs-real
+  분포 divergence 1회 보고**, measure-first 목표 확정.
+- **H3 enforce 전환(human-gated)** — threshold 커밋, report-only→enforce, snapshot 재취득 SLA(N일/룰셋
+  변경 시) governance 명시.
+
+## Open Questions (잔여, 구현 중)
+
+- 정규화 맵 초기 커버리지(어느 발급처부터) + partner-pattern 확보 범위.
+- drift 샘플링 레이트/판정 임계(별도 스케줄 신설=비채택 위반 상한).
+- drift 노출 표면 최종(scan_health 레코드 vs notification_log) — M4서 택1.
+- line-tolerance k값·구간겹침 vs ±k 택1.
+
+## YAGNI
+
+- live validity check, push protection, 능동 drift 폴링, vuln 서브트랙 — 본 goal 범위 밖(연기/별도).
diff --git a/docs/workbench/specs/ghas-quality-secrets/requirements.md b/docs/workbench/specs/ghas-quality-secrets/requirements.md
new file mode 100644
index 0000000..172f6cc
--- /dev/null
+++ b/docs/workbench/specs/ghas-quality-secrets/requirements.md
@@ -0,0 +1,154 @@
+# GHAS급 탐지 품질 트랙 Requirements
+
+> Phase 1 (grill-to-spec) **완료 — 승인 대기**. SoT: 이 파일(`requirements.md`).
+> 핸드오프 근거: `HANDOFF.md`. 작성 2026-06-21.
+
+## 승인 대상
+
+- Source of truth: `requirements.md`
+- Preview companion: `requirements.html` (generated, 검토용 — source 대체 아님)
+
+## 한 줄 목표
+
+security-scanner의 **탐지 품질**(precision/recall)을 측정 가능한 GHAS급 기준에 맞춘다.
+#2(스케일)와 직교 — 이 트랙은 스캔 한 건의 정확도. **시크릿 먼저, vuln은 후속 사이클.**
+
+## 결정 요약 (locked)
+
+| # | 결정 | 내용 |
+| --- | --- | --- |
+| Q1 | 범위 | 시크릿 + vuln 둘 다 품질 대상 |
+| Q2 | 실행 구조 | **순차, 시크릿 먼저**. 공유 substrate, 각 서브-트랙 자체 측정·SLO |
+| Q3 | "GHAS급" 정의 | **GHAS parity SLO** — GHAS alert을 oracle 삼아 precision/recall 일치 |
+| Q4 | 측정 메커니즘 | **snapshot = ground truth** — GHAS fetch 1회(게이트)→redacted frozen→CI 반복 측정 |
+| Q5 | parity 단위 | **per-repo 1:1** (풀링 아님). repo별 산출 후 집계 |
+| Q6 | non-GHAS repo | **B-floor + C-monitor** — SLO는 GHAS repo만, 품질 머신은 전 repo 적용, 샘플 drift 감시 |
+| Q7 | validity check | **no-network, measure-first** — verifier+휴리스틱+partner-pattern. live validity는 evidence-gated 연기 |
+| Q8 | 기존 자산 | **main 위 쌓기** — PR #45 substrate 머지됨, verifier·vuln verifier 존재 |
+| Q9 | 품질 머신 타이밍 | **티어드 자동** — 싼 규칙 인라인, LLM 배치→disposition, 주기 scan 혜택 |
+| Q10 | SLO done | **measure-first** — baseline 측정 → 목표 확정 → gap 닫힘 |
+
+## 질문-답변 흐름 (provenance)
+
+### Q1. 1차 범위: 시크릿만 vs +vuln/SAST?
+
+**답변: 시크릿 + vuln 동시(품질 대상).** 측정 harness·SLO·라벨 데이터셋이 시크릿/SAST 각각 필요.
+범위가 크므로 실행 구조(decomposition)를 Q2에서 합의.
+
+### Q2. 두 서브시스템의 실행 구조
+
+**답변: 순차 — 시크릿 먼저.** 공유 substrate(metric harness·disposition 후크·SLO 프레임)를 깔고,
+증거 있는 시크릿을 풀 사이클(측정→갭클로저→SLO)로 먼저 완료한 뒤 vuln. 시크릿 학습을 vuln에 이식.
+
+### Q3. "GHAS급"의 운영적 정의 — 성공 기준의 형태 (시크릿 기준)
+
+**답변: GHAS parity SLO.** 실 GHAS-enabled repo에서 GHAS alert을 oracle 삼아 precision/recall
+일치율 목표. 함의: (a) 실 GHAS alert 존재 repo 필요, (b) 실 fetch는
+`ghas-live-fetch-or-mutation-required` → human-PR 게이트, (c) GHAS 미탐 우리 finding은 정의상
+FP("GHAS만큼"이 목표, "GHAS보다 recall↑"는 비목표).
+
+### Q4. parity를 게이트 마찰 없이 어떻게 측정하나
+
+**답변: 스냅샷 = ground truth.** GHAS alert을 human-PR 게이트로 fetch → redacted snapshot으로 고정
+→ 그 snapshot(=GHAS 정답지) 대비 우리 스캐너를 CI에서 반복 측정. parity가 정의이면서 inner loop는
+재현 가능. snapshot 갱신만 게이트. (재확인: 이 비용 보고도 parity 유지 선택.)
+
+### Q5. parity는 per-repo 1:1 (코퍼스 풀링 아님)
+
+**답변(사용자 정정): per-repo 1:1.** GHAS-enabled repo는 그 repo의 GHAS alert snapshot이 정답지 →
+같은 repo 우리 스캔과 1:1 비교. SLO는 repo별 산출 후 집계. **두 역할:** GHAS-enabled repo =
+calibration/validation(GHAS에 얼마나 가까운지 측정), GHAS-없는 repo(GitLab·GHAS-off, #2 대다수) =
+production target(보정된 품질 적용하되 per-repo truth 부재 → hard SLO 불가).
+
+### Q6. GHAS-없는 repo에서 "aggregated GHAS"를 어떻게 쓰나
+
+**답변: B-floor + C-monitor.** SLO는 GHAS-enabled repo 1:1 parity로만 정의·CI 게이트. 증류한 GHAS
+품질 머신(verifier disposition + partner-pattern boost + context-class 억제)은 전 repo 적용 —
+non-GHAS도 혜택은 받되 측정 대상 아님. non-GHAS는 LLM verifier 샘플로 drift 모니터(SLO 아님,
+조기경보). 기각: 순수 A(proxy를 SLO에 포함) — truth 없는 곳 측정은 자기 모델 순응도 =
+silent-staleness 재현.
+
+### Q7. validity check: live 크리덴셜 검증 vs no-network FP 억제
+
+**답변: no-network, measure-first.** v1은 live 검증 없이 LLM verifier + 휴리스틱(path/placeholder/
+context-class) + partner-pattern으로 FP 억제. 관측 11-FP는 전부 context FP라 이걸로 닫힘. 오프라인
+박스 호환·secret egress 없음. live validity는 baseline gap이 "폐기된 real-looking 토큰" 클래스로
+입증될 때만 후속 추가(deferred, evidence-gated).
+
+### Q8. 기존 `claude/verifier-quality` 브랜치 관계 — 증거로 해소
+
+**해소: main 위에서 쌓는다.** 그 브랜치는 PR #45로 main에 이미 머지됨("infra-free verdict-quality
+measurement substrate"). dangling 브랜치 없음. main 재사용 자산: 측정 substrate
+(`eval/verifier-corpus`·`eval/synthetic-corpus`·harness), verifier(`llm/common/verifier.py`·
+`llm/ollama/client.py`), `llm/vulnerability/verifier.py`(vuln verifier). (메모리
+[[verifier-quality-substrate]] stale → 갱신 필요.)
+
+### Q9. FP-억제 품질 머신이 언제 도나
+
+**답변: 티어드 자동.** 싼 규칙(path/placeholder/context-class 휴리스틱 + partner-pattern)은 모든
+스캔에 인라인 즉시 적용(공짜). 비싼 LLM verifier는 자동이되 배치·애매한 건에만 돌고 결과를
+`Finding.disposition`으로 반영. 주기 scan/systemd 경로도 자동 혜택([[ollama-verify-periodic-todo]]
+해소), 500+ repo 비용 제어.
+
+### Q10. SLO done-definition
+
+**답변: measure-first.** 먼저 baseline 측정(현재 GHAS 대비 precision/recall gap) → 그 수치 보고
+현실적 목표 확정(예: precision ≥ X, recall ≥ GHAS의 Y%) → gap 닫음. v1 done = baseline 측정 + 목표
+설정 + 목표 도달.
+
+## 기능 요구사항 (시크릿 서브트랙)
+
+- **FR1 parity 측정 harness.** GHAS-enabled repo별로 GHAS alert snapshot과 우리 스캔 결과를 1:1
+  비교해 per-repo precision/recall 산출 후 집계.
+- **FR2 snapshot 취득.** `baseline/ghas_api`(GET-only)·`cmd_compare_ghas`로 GHAS alert fetch →
+  redacted snapshot으로 고정. 실 fetch는 `ghas-live-fetch-or-mutation-required` human-PR 게이트 준수.
+- **FR3 baseline 측정(measure-first).** 현재 스캐너의 GHAS 대비 precision/recall gap을 frozen
+  snapshot 대비 측정 → SLO 목표치 확정.
+- **FR4 티어드 품질 머신.**
+  - 인라인 싼 티어: path/placeholder/context-class 휴리스틱 + partner-pattern → 즉시 FP 억제, 모든
+    스캔(주기 포함).
+  - 비동기 LLM 티어: ollama verifier가 애매한 finding에 verdict → `Finding.disposition` 자동 반영
+    (verified↔TRUE_POSITIVE / false_positive↔FALSE_POSITIVE / unreviewed↔NEEDS_REVIEW).
+- **FR5 disposition 자동 배선.** verifier verdict가 disposition으로 흐르고 주기 scan/systemd 경로에도
+  적용([[ollama-verify-periodic-todo]] 해소).
+- **FR6 non-GHAS 전이 + drift 모니터.** 증류한 품질 머신을 전 repo 적용. non-GHAS repo는 LLM
+  verifier 샘플 drift 모니터(SLO 아님, 전이 건전성 조기경보).
+- **FR7 SLO CI 게이트.** frozen snapshot 대비 재현 측정을 CI 게이트화(측정 시 human-PR fetch 불요).
+  baseline 후 확정된 목표 후퇴 시 차단.
+
+## 비기능 요구사항
+
+| 항목 | 요구값 |
+| --- | --- |
+| 오프라인 박스 호환 | 측정·억제 경로에 네트워크/secret egress 없음(snapshot fetch는 게이트된 1회 예외) |
+| 재현성 | frozen snapshot + 라벨 코퍼스로 CI 결정적 측정 |
+| 비용 | LLM 티어는 배치·애매 건 한정, 인라인 티어는 공짜 → 500+ repo 수용 |
+| staleness 가시성 | snapshot 나이/타임스탬프를 출력에 노출(scan-health 선례), silent staleness 금지. 능동 drift 감지는 비채택 |
+| 공개안전 | snapshot·findings redacted([[vuln-redaction-design]] 정합) |
+| governance | 실 GHAS fetch/mutation은 human-PR 게이트 유지 |
+
+## 사용자 시나리오
+
+- **S1 baseline.** 운영자가 GHAS-enabled repo에서 baseline 측정 → "현재 precision/recall이 GHAS
+  대비 얼마"를 확인 → measure-first로 목표 설정.
+- **S2 회귀 게이트.** 룰/코드 변경 후 CI가 frozen snapshot 대비 parity 재측정 → SLO 후퇴 시 PR 차단.
+- **S3 production 전이.** 주기 scan이 non-GHAS repo(GitLab) 돌 때 티어드 품질 머신이 자동 FP 억제,
+  샘플 drift 모니터가 전이 건전성 보고.
+
+## 범위 밖 / 연기
+
+- **vuln/SAST 서브트랙**: 순차라 자체 requirements 사이클로 후속(차례 올 때). 자산 재사용:
+  `llm/vulnerability/verifier.py`, `import-sarif`/`scan-vuln`/`codeql.yml`.
+- **live validity check**: evidence-gated 연기(Q7). baseline gap이 폐기-토큰 클래스로 입증되면 재개.
+- **push protection**: Q1(스케일 트랙)에서 비차단 권고형 선택 — 정책상 비대상.
+- **능동 drift 감지(라이브 parity 폴링)**: 비채택(Q4) — 수동 staleness 노출로 갈음.
+
+## 미결정 항목 (Phase 2 design open questions)
+
+- 비교 universe: HEAD-only vs full-history 정렬(GHAS는 history scan; 증거는 full-history 11건).
+- match 정의: 우리 finding ↔ GHAS alert 동일성 기준(secret value / file+line / rule id).
+- GHAS alert state 처리: open / resolved / dismissed(FP-marked) 중 무엇을 truth로.
+- 집계 방식: per-repo micro vs macro 평균.
+- snapshot 갱신 트리거/주기(passive staleness 노출은 확정, 갱신 정책은 설계 단계).
+- partner-pattern 확보 범위(어느 발급처부터) + context-class 억제 규칙 목록.
+- drift 모니터 샘플링 레이트/판정 임계.
diff --git a/docs/workbench/specs/ghas-quality-secrets/review.md b/docs/workbench/specs/ghas-quality-secrets/review.md
new file mode 100644
index 0000000..c7c5c36
--- /dev/null
+++ b/docs/workbench/specs/ghas-quality-secrets/review.md
@@ -0,0 +1,60 @@
+# GHAS급 시크릿 품질 design.md — 멀티에이전트 리뷰 + 반영 기록
+
+> 대상: `design.md`(v1) → 반영 후 `design.md`(v2). 리뷰: 5차원 병렬(opus) → 적대적 검증(sonnet) → 종합.
+> Workflow `wb9e29j7s`, agent 46, subagent ~1.95M tok. **synthesize 단계는 세션 한도로 실패
+> (`You've hit your session limit · resets 5:10am`) → 메인 루프에서 수동 종합.**
+> 확정 지적 **29건**(차원별 리뷰 → 적대적 검증 통과분만). overall: **ready-with-fixes → v2에 반영 완료.**
+
+## 심각도 집계
+
+| 심각도 | 건수 | 비고 |
+| --- | --- | --- |
+| blocker | 1 | 측정 정규화(match-key) |
+| major | 7 | autopilot 2 · codebase 2 · security 1 · measurement 2 |
+| minor | ~13 | 명세 보강 |
+| nit | ~8 | 표기 일관성 |
+
+적대적 검증이 잡아낸 오탐도 기록: `inline-tier` 지적의 "orphan filter.py" 주장은 **틀림**(filter.py는
+`parser.py:11`에서 import·`:60` 호출, `enable_noise_filter` default True). `match-key-vs-comparison-key`,
+`stop-conditions`의 일부 근거도 부분 오독으로 severity 하향. 코드 근거가 탄탄한 리뷰.
+
+## blocker (1) — v2 반영
+
+| id | 문제 | v2 해소 |
+| --- | --- | --- |
+| `match-key-type-mismatch` | GHAS `secret_type`(github_personal_access_token) vs gitleaks `rule_id`(github-pat) 정규화 없는 완전일치 → precision·recall 양방향 오차 위조, baseline 오염, synthetic fixture가 영구 은폐 | 측정 의미론 섹션: 정규화 맵 M1 1급 산출물 승격, type-unmatched 버킷 분리, type-coverage 메타, 적대적 fixture가 누락을 red로 |
+
+## major (7) — v2 반영
+
+| id | 차원 | 문제 | v2 해소 |
+| --- | --- | --- | --- |
+| `allowed-writes-sot-path-mismatch` | autopilot | allowed_writes가 실 SoT(.claude/specs)를 미포함 → gate가 SoT 갱신 차단 | SoT를 `docs/workbench/specs/ghas-quality-secrets/`로 승격·git추적, allowed_writes 정렬 |
+| `acceptance-checks-drift` | autopilot | `render_github_ruleset --check`·`public_safety --path` 누락 | phase-2a를 base 템플릿으로 diff만, 두 체크 추가 |
+| `periodic-path-is-scan-worker` | codebase | FR5/M3가 실 500+ 경로 `scan_worker`(verifier 0건) 빗나가고 scan_all만 확장 | 주기 2경로 명시, scan_worker에 동기 인라인+비동기 LLM 큐 2경로 배선(M3) |
+| `parity-harness-third-engine` | codebase | precision/recall 엔진 2개 중 하나만 참조 → 제3 엔진 신설 위험 | `core/evaluation/metrics.py` 재사용, ghas_api는 어댑터, M1 done "계산 코드 0줄" 인변 |
+| `allowed-writes-governance-self-modify` | security | governance/** 광역 → autopilot이 자기 stop-conditions/gate/public_safety 자율 수정 | governance/** 광역 금지, `parity_slo.py`만 화이트리스트, 핵심 3파일 자율수정 금지 Fixed decision |
+| `alert-state-not-filtered` | measurement | dismissed/resolved alert을 truth로 셈해 oracle 오염 | state-aware truth: open+resolved-TP만 positive, dismissed는 분리 집계 |
+| `synthetic-fixture-self-fulfilling` | measurement | 자율 SLO 게이트가 우리가 만든 합성 fixture로 항상 green → 실분포(H2) 괴리 | 적대적 fixture(11-FP 아날로그+실 secret_type+type-mismatch+line-drift+dismissed), M5 done 적대 케이스, H2 divergence 보고 |
+
+## minor/nit — v2 반영 요지
+
+- `inline-tier-default-off` / `inline-tier-ignores-filter-seam`: 인라인 티어 = 기존 `filter.py` noise_reason
+  확장(default-on 결정적 부분 + gated 신규), scan-time vs post-scan 경계 명시, filter/parser seam 의존 추가.
+- `match-def-aggregation-open` / `line-exact-match` / `precision-recall-mislabeled`: 측정 의미론에 match
+  정의·집계(micro→macro)·universe(full-history)·공식·라인 tolerance 락인.
+- `needs-review-no-write`: NEEDS_REVIEW 재verify backoff/skip-key M3 명시.
+- `m4-drift-fixture` / `drift-scan-health-seam` / `non-ghas-floor-bias` / `drift-active-boundary` /
+  `drift-egress`: M4 drift 기준선=GHAS-calibrated 분포, eval/synthetic-corpus 재사용, verifier-직교
+  분포-shift 교차, 분리 필드, default-off·오프라인 비활성, passive(폴링 신설 없음), 전이 한계 문서화.
+- `real-snapshot-no-commit` / `secrethash-entropy-leak`: provenance marker fail-closed + 경로 이중 차단,
+  salt 전제 명시 + M3 salt 강도 테스트.
+- `staleness-passive-only`: snapshot 나이>임계 → stale-degraded(silent pass 금지), H3 재취득 SLA.
+- `stop-conditions-drift` / `ci-gate-vehicle` / `milestone-arch-review-count` / `report-only-enforce-
+  unreachable` / `spec-path-mismatch`(dup): Autopilot 섹션에 정본 stop_conditions base, parity_slo 게이트
+  진입점·토글, 아키텍처 리뷰 4지점, 자율 done=M5/v1 done=H3 명확화.
+
+## 판정
+
+design.md v2는 blocker·major 전부 반영. 잔여는 구현 중 해소할 Open Questions(정규화 맵 커버리지,
+drift 임계, 노출 표면, tolerance k). **goal-setup 진행 가능**, 단 goal-setup이 SoT 승격 + allowed_writes/
+acceptance_checks/stop_conditions를 phase-2a 템플릿 기준으로 작성해야 함(위 major 반영).
diff --git a/governance/autopilot_goal.yml b/governance/autopilot_goal.yml
index a99ff7d..02467f6 100644
--- a/governance/autopilot_goal.yml
+++ b/governance/autopilot_goal.yml
@@ -1,5 +1,5 @@
 schema_version: 1
-goal_id: phase-2a-sarif-product-complete
+goal_id: ghas-quality-secrets-parity
 execution_mode:
   style: long-single-goal
   human_gate: stop-conditions-only
@@ -15,15 +15,14 @@ policy_decisions:
   fork_prs: blocked-or-skipped-before-secrets
   public_artifacts: synthetic-or-redacted-only
 allowed_writes:
-  - docs/workbench/specs/phase-2a-sarif-native-sast/**
-  - docs/workbench/agentic-workflows/2026-06-20-phase-2a-sarif-import-first-goal.md
+  - docs/workbench/specs/ghas-quality-secrets/**
+  - docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
   - docs/views/research-and-technical-decisions.md
   - src/security_scanner/**
   - tests/**
   - examples/**
   - eval/**
-  - docs/workbench/**
-  - governance/**
+  - governance/parity_slo.py
   - ledger/**
   - CURRENT.md
 acceptance_checks:
@@ -37,7 +36,8 @@ acceptance_checks:
   - uv run python -m governance.rebuild_ledger_index --check
   - uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
   - uv run python -m governance.public_safety --diff origin/main...HEAD
-  - uv run python -m governance.public_safety --path docs/workbench/specs/phase-2a-sarif-native-sast --path docs/views/research-and-technical-decisions.md
+  - uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-secrets
+  - uv run python -m governance.parity_slo --check
   - uv run python -m governance.autopilot_gate --base origin/main
 stop_conditions:
   - public-safety-hit

From 81d59d0a7ddd7d49204f5107ec46a77f9bbafa01 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 09:27:31 +0900
Subject: [PATCH 2/7] =?UTF-8?q?fix(autopilot):=20current.yml=20active=5Fgo?=
 =?UTF-8?q?al=20=EB=8F=99=EA=B8=B0=ED=99=94=20=E2=86=92=20ghas-quality-sec?=
 =?UTF-8?q?rets-parity?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

goal-setup 커밋 07c4e82가 autopilot_goal.yml의 goal_id만 새 goal로
리포인트하고 governance/current.yml의 active_goal 동기화를 빠뜨려
autopilot_gate(render.py active_goal-must-match)가 영구 fail이었음.
직전 goal 전환(5fdc16a)처럼 goal_id·active_goal·CURRENT.md를 함께
맞추는 goal-activation 완성. orchestrator-authorized 정정(범위 한정:
current.yml + render 파생 CURRENT.md만; allowed_writes/gate/public_safety
불변).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 CURRENT.md             | 2 +-
 governance/current.yml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/CURRENT.md b/CURRENT.md
index 66ded2f..4f781d9 100644
--- a/CURRENT.md
+++ b/CURRENT.md
@@ -4,7 +4,7 @@
 
 - Project: `security-scanner`
 - Merge mode: `guarded-auto-merge`
-- Active goal: `phase-2a-sarif-product-complete`
+- Active goal: `ghas-quality-secrets-parity`
 - Last auto merge: `ledger:20260617T003405Z-autopilot-3236f4`
 - Ledger entries: `4`
 - Ledger index hash: `sha256:e1893a649a1101b74a087b5eaaa275813a85708c5bb46c4ae70c24e10a111050`
diff --git a/governance/current.yml b/governance/current.yml
index b06ca03..ff48fa9 100644
--- a/governance/current.yml
+++ b/governance/current.yml
@@ -37,7 +37,7 @@ gates:
   proof_ref: ''
   proof_hash: ''
 autopilot:
-  active_goal: phase-2a-sarif-product-complete
+  active_goal: ghas-quality-secrets-parity
   merge_mode: guarded-auto-merge
   last_auto_merge: ledger:20260617T003405Z-autopilot-3236f4
 open_decisions: []

From 0bdf93996ed2b4c332451bc3341460fbdef8d2e6 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 09:28:42 +0900
Subject: [PATCH 3/7] =?UTF-8?q?feat(parity):=20M1=20GHAS=20parity=20?=
 =?UTF-8?q?=EC=B8=A1=EC=A0=95=20harness=20+=20secret=5Ftype=E2=86=94rule?=
 =?UTF-8?q?=5Fid=20=EC=A0=95=EA=B7=9C=ED=99=94=20=EB=A7=B5=20+=20=EC=A0=81?=
 =?UTF-8?q?=EB=8C=80=EC=A0=81=20fixture?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

자율층 M1. 시크릿 탐지의 per-repo 1:1 GHAS parity 측정 harness를
synthetic fixture로 TDD 완성. 실 GHAS 무접촉.

- baseline/ghas_api/normalize.py: secret_type↔rule_id 정규화 맵(1급 산출물,
  양방향 lookup + 미등록 식별 + type-coverage 메타). 초기 커버리지
  github-pat/discord/aws.
- baseline/ghas_api/parity.py: GHAS alert→EvaluationKey 어댑터.
  - state-aware truth: open + resolved-as-true_positive만 positive,
    dismissed/resolved-FP/revoked는 recall 분모 제외 + GHAS-confirmed-FP 분리집계.
  - line tolerance: ±k(기본 2) 매칭, full-history universe.
  - 정규화 미등록 쌍은 type-unmatched-but-colocated 버킷으로 분리(precision/recall
    미오염).
  - per-repo micro → macro 집계.
  - load_parity_snapshot: source=synthetic provenance fail-closed.
- 신규 precision/recall 공식·gate-threshold 판정 코드 0줄 —
  core/evaluation/metrics.py(EvaluationResult/evaluate_evaluation_gate) 재사용,
  어댑터는 정규화·truth필터·tolerance 매칭만. GhasComparisonResult/
  compare_ghas_alerts_with_findings do-not-modify(제3 엔진 신설 방지).
- eval/ghas-parity-corpus/synthetic-snapshot.json: 적대적 fixture.
  실 GHAS secret_type 토큰 + type-mismatch/line-drift/dismissed + tolerance
  경계 음성대조(±k 안 must-match / 밖 must-NOT-match). 정규화/tolerance/state
  필터를 끄면 특정 지표가 red로 떨어짐을 테스트로 증명(self-fulfilling 차단).
  fake 토큰·repo·path만(public-safe), source=synthetic marker.
- design.md: pre-impl arch gate 반영 — "0 lines" 인변을 공식·gate 코드로 한정,
  어댑터 EvaluationKey 수렴, tolerance 경계 음성대조 요구.

검증: uv run pytest 1058 passed, public_safety green, autopilot_gate green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../specs/ghas-quality-secrets/design.md      |  15 +-
 eval/ghas-parity-corpus/README.md             |  31 +
 .../synthetic-snapshot.json                   |  97 +++
 .../baseline/ghas_api/__init__.py             |  29 +
 .../baseline/ghas_api/normalize.py            | 153 +++++
 .../baseline/ghas_api/parity.py               | 522 ++++++++++++++++
 tests/test_ghas_normalize.py                  |  88 +++
 tests/test_ghas_parity.py                     | 566 ++++++++++++++++++
 8 files changed, 1499 insertions(+), 2 deletions(-)
 create mode 100644 eval/ghas-parity-corpus/README.md
 create mode 100644 eval/ghas-parity-corpus/synthetic-snapshot.json
 create mode 100644 src/security_scanner/baseline/ghas_api/normalize.py
 create mode 100644 src/security_scanner/baseline/ghas_api/parity.py
 create mode 100644 tests/test_ghas_normalize.py
 create mode 100644 tests/test_ghas_parity.py

diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
index 903e350..6e7a1e5 100644
--- a/docs/workbench/specs/ghas-quality-secrets/design.md
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -51,8 +51,15 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
   `baseline/ghas_api`(compare_ghas_alerts_with_findings, 카운트만)와 `core/evaluation/metrics.py`
   (`EvaluationResult.precision/recall` + `EvaluationThresholds` gate, 완비). → **계산·게이트 계층은
   `core/evaluation/metrics.py` 재사용**, `baseline/ghas_api`는 GHAS alert→`EvaluationKey` **어댑터로만**.
-  `GhasAlertComparisonKey`↔`EvaluationKey` 단일 adapter로 수렴. **M1 done 인변: "신규 precision/recall·
-  gate 계산 코드 0줄, 기존 metrics 재사용".**
+  어댑터는 **`EvaluationKey`(metrics.py)로 수렴**(`GhasAlertComparisonKey`는 내부 alert-shape일 뿐;
+  `secret_type↔rule_id` 정규화는 `EvaluationKey` 생성 *전* 어댑터 책임). 기존
+  `compare_ghas_alerts_with_findings`/`GhasComparisonResult`는 **precision/recall로 확장 금지(do-not-modify)**,
+  parity 경로에서 미사용(어댑터 뒤로 후퇴, 실 snapshot 카운트 리포트는 H-track 격리).
+- **M1 "신규 계산 코드 0줄" 인변 정밀화(pre-impl arch gate 권고)**: line tolerance는 순수 키 완전일치로
+  표현 불가능한 fuzzy join이라 매칭 계층은 신규 코드가 맞다. 인변을 **"신규 precision/recall *공식*·
+  *gate-threshold 판정* 코드 0줄(metrics.py `EvaluationResult`/`evaluate_evaluation_gate` 그대로 재사용)"**
+  로 한정한다. alert→`EvaluationKey` 어댑터(정규화 맵·state 필터·라인 tolerance 매칭)는 신규 어댑터 코드로
+  명시 — 인변과 모순 아님(어댑터=키 정규화·truth 필터·매칭, metrics=산식·게이트).
 
 ## Architecture
 
@@ -143,6 +150,10 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
   **실제 GHAS `secret_type` 토큰을 그대로** 써서 (a) type 표기 불일치 쌍, (b) 정규화 후만 매칭, (c) 라인
   ±1~2 오프셋, (d) dismissed-state 케이스를 포함 → **정규화/필터/tolerance 누락이 red가 되게**. 우리가
   키를 맞춰 만든 fixture가 항상 green이 되는 self-fulfilling 차단.
+- **tolerance 경계 음성대조(pre-impl arch gate 권고)**: (c)는 양성 케이스만으론 too-greedy tolerance가
+  우연히 green이 될 수 있다. **±k 안(must-match) / ±k 바로 밖(must-NOT-match) 쌍**을 둬서 매칭이 tolerance
+  로직(운 아님)으로 났음을 강제한다. 또 headline precision/recall만이 아니라 **type-coverage·
+  type-unmatched-but-colocated 메타지표도 assert** → type/state뿐 아니라 tolerance까지 누락이 red.
 - 인라인 티어: 11-FP 억제 + canary TP 보존(`FALSE_NEGATIVE_PATTERN`). LLM 티어: redacted 입력, strict
   JSON, fail-closed NEEDS_REVIEW, 애매 건만 호출. scan_worker 2경로(동기 인라인 + 비동기 LLM 큐) 증명.
 - 회귀: 기존 secret scan/report/gate/evaluate default 불변. governance: `pytest` + `public_safety` + `autopilot_gate`.
diff --git a/eval/ghas-parity-corpus/README.md b/eval/ghas-parity-corpus/README.md
new file mode 100644
index 0000000..b5c5abd
--- /dev/null
+++ b/eval/ghas-parity-corpus/README.md
@@ -0,0 +1,31 @@
+# GHAS Parity Corpus (synthetic, adversarial)
+
+Public-safe synthetic GHAS Secret Scanning snapshot used by the M1 parity
+harness (`security_scanner.baseline.ghas_api.parity`). It must NEVER contain a
+real repository, real GHAS export, real secret value, internal hostname, or
+credential.
+
+`synthetic-snapshot.json` is an **adversarial** fixture: it is the redacted
+structural analog of the 11 handoff-observed false positives (discord ×4
+manifest-hash, github-pat ×3 test-fixture, doc-example ×4). It uses the **real
+GHAS `secret_type` tokens** (e.g. `github_personal_access_token`,
+`discord_bot_token`) paired against synthetic gitleaks `rule_id` tokens
+(`github-pat`, `discord-api-token`) so that turning OFF any one parity
+responsibility makes a specific metric go red:
+
+- normalization map OFF → type-mismatch pair fails to match (lands in the
+  `type-unmatched-but-colocated` bucket, never a silent TP),
+- state filter OFF → the dismissed alert (#6) pollutes the recall denominator,
+- line tolerance OFF → the +1/+2 line-drift findings (#2, #3) stop matching.
+
+## Provenance (fail-closed)
+
+The loader (`load_parity_snapshot`) refuses any snapshot whose top-level
+`source` is not exactly `synthetic`. A real snapshot is never committed here:
+real GHAS snapshots are local-only and human-PR gated (H-track).
+
+## Public-safety self-check
+
+```bash
+uv run python -m governance.public_safety --path eval/ghas-parity-corpus
+```
diff --git a/eval/ghas-parity-corpus/synthetic-snapshot.json b/eval/ghas-parity-corpus/synthetic-snapshot.json
new file mode 100644
index 0000000..f1cd61b
--- /dev/null
+++ b/eval/ghas-parity-corpus/synthetic-snapshot.json
@@ -0,0 +1,97 @@
+{
+  "schemaVersion": 1,
+  "source": "synthetic",
+  "description": "Adversarial synthetic GHAS parity snapshot. Redacted structural analog of the 11 handoff-observed false positives (discord x4 manifest-hash, github-pat x3 test-fixture, doc-example x4). Uses REAL GHAS secret_type tokens against synthetic gitleaks rule_ids so that missing normalization, state filtering, or line tolerance each turn a specific metric red. All values are fake (SCANNER_FAKE_SECRET_TOKEN markers); no real secrets, endpoints, hosts, or repo names.",
+  "repoFullName": "synthetic-org/synthetic-parity-repo",
+  "fetchedAt": "2026-06-16T12:00:00+00:00",
+  "alerts": [
+    {
+      "alertNumber": 1,
+      "secretType": "github_personal_access_token",
+      "state": "open",
+      "filePath": "src/config/settings.py",
+      "lineStart": 10,
+      "lineEnd": 10,
+      "note": "Type-mismatch case: GHAS secret_type vs gitleaks rule_id github-pat. Only matches after normalization (exact line)."
+    },
+    {
+      "alertNumber": 2,
+      "secretType": "github_personal_access_token",
+      "state": "open",
+      "filePath": "tests/fixtures/sample_token.py",
+      "lineStart": 20,
+      "lineEnd": 20,
+      "note": "Line-drift case: our finding sits at line 21 (+1). Matches only with tolerance."
+    },
+    {
+      "alertNumber": 3,
+      "secretType": "discord_bot_token",
+      "state": "open",
+      "filePath": "manifests/service.yaml",
+      "lineStart": 30,
+      "lineEnd": 30,
+      "note": "Type-mismatch + line-drift boundary: our finding at line 32 (+2). Matches only with normalization AND tolerance k>=2."
+    },
+    {
+      "alertNumber": 4,
+      "secretType": "aws_access_key_id",
+      "state": "resolved",
+      "resolution": "true_positive",
+      "filePath": "deploy/credentials.env",
+      "lineStart": 5,
+      "lineEnd": 5,
+      "note": "resolved-as-true_positive counts as positive truth."
+    },
+    {
+      "alertNumber": 5,
+      "secretType": "slack_api_token",
+      "state": "open",
+      "filePath": "docs/example.md",
+      "lineStart": 40,
+      "lineEnd": 40,
+      "note": "Colocated-but-unmapped: slack_api_token is intentionally NOT in the normalization map. A finding sits at the same location, so this exercises the type-unmatched-but-colocated bucket without polluting precision/recall."
+    },
+    {
+      "alertNumber": 6,
+      "secretType": "discord_bot_token",
+      "state": "dismissed",
+      "resolution": "false_positive",
+      "filePath": "docs/manifest-hash-example.md",
+      "lineStart": 50,
+      "lineEnd": 50,
+      "note": "GHAS-confirmed FP: owner dismissed as false_positive. Excluded from the recall denominator and counted as a GHAS-confirmed-FP signal. We do NOT detect it, so disabling the state filter drops recall below 1.0 (red-proof)."
+    }
+  ],
+  "findings": [
+    {
+      "ruleId": "github-pat",
+      "filePath": "src/config/settings.py",
+      "lineStart": 10,
+      "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000001"
+    },
+    {
+      "ruleId": "github-pat",
+      "filePath": "tests/fixtures/sample_token.py",
+      "lineStart": 21,
+      "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000002"
+    },
+    {
+      "ruleId": "discord-api-token",
+      "filePath": "manifests/service.yaml",
+      "lineStart": 32,
+      "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000003"
+    },
+    {
+      "ruleId": "aws-access-token",
+      "filePath": "deploy/credentials.env",
+      "lineStart": 5,
+      "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000004"
+    },
+    {
+      "ruleId": "doc-example-marker",
+      "filePath": "docs/example.md",
+      "lineStart": 40,
+      "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000005"
+    }
+  ]
+}
diff --git a/src/security_scanner/baseline/ghas_api/__init__.py b/src/security_scanner/baseline/ghas_api/__init__.py
index d82589f..a1448b6 100644
--- a/src/security_scanner/baseline/ghas_api/__init__.py
+++ b/src/security_scanner/baseline/ghas_api/__init__.py
@@ -20,6 +20,22 @@
 from typing import Any
 from urllib.parse import urlsplit
 
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+    SecretTypePair,
+    TypeCoverage,
+    canonical_type_coverage,
+)
+from security_scanner.baseline.ghas_api.parity import (
+    MacroParityResult,
+    ParityConfig,
+    ParitySnapshot,
+    RepoParityResult,
+    aggregate_repo_parity,
+    evaluate_repo_parity,
+    load_parity_snapshot,
+)
 from security_scanner.catalog.scan_target import ScanTarget
 from security_scanner.core.finding.model import Finding
 from security_scanner.storage.base import GhasAlertComparisonKey, GhasAlertRecord
@@ -341,4 +357,17 @@ def _looks_like_repo_full_name(value: str) -> bool:
     "normalize_alert_records",
     "repo_full_name_from_target",
     "render_ghas_comparison_report",
+    # M1 parity harness (alert -> EvaluationKey adapter + normalization map).
+    "DEFAULT_SECRET_TYPE_MAP",
+    "SecretTypeNormalizer",
+    "SecretTypePair",
+    "TypeCoverage",
+    "canonical_type_coverage",
+    "MacroParityResult",
+    "ParityConfig",
+    "ParitySnapshot",
+    "RepoParityResult",
+    "aggregate_repo_parity",
+    "evaluate_repo_parity",
+    "load_parity_snapshot",
 ]
diff --git a/src/security_scanner/baseline/ghas_api/normalize.py b/src/security_scanner/baseline/ghas_api/normalize.py
new file mode 100644
index 0000000..7712aaa
--- /dev/null
+++ b/src/security_scanner/baseline/ghas_api/normalize.py
@@ -0,0 +1,153 @@
+"""GHAS ``secret_type`` <-> gitleaks ``rule_id`` normalization map (M1, 1급 산출물).
+
+The parity harness compares GHAS Secret Scanning alerts against our own
+gitleaks findings. GHAS labels a secret with a ``secret_type`` token
+(``github_personal_access_token``) while gitleaks labels the same secret with a
+``rule_id`` token (``github-pat``). Comparing those tokens literally splits one
+secret across ``local_only`` (precision penalty) and ``ghas_only`` (recall
+penalty), so the baseline gap becomes a labelling artifact.
+
+This module is the first-class normalization artifact that collapses both
+surface tokens onto a single *canonical type*. It provides:
+
+* bidirectional lookup (secret_type -> canonical, rule_id -> canonical),
+* unregistered-pair identification (no silent miscount), and
+* a ``type-coverage`` meta-metric over an observed corpus.
+
+The adapter in :mod:`security_scanner.baseline.ghas_api.parity` performs the
+fuzzy (tolerance) matching on top of this; the precision/recall *formula* and
+gate *threshold* judgement stay in ``core.evaluation.metrics`` (no new metric
+code here).
+
+Initial coverage starts from the handoff's actually-observed issuer classes
+(github-pat, discord, aws) and is deliberately small but extensible: add a row
+to :data:`DEFAULT_SECRET_TYPE_MAP` to register a new issuer.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Iterable, Mapping
+
+
+@dataclass(frozen=True)
+class SecretTypePair:
+    """One canonical secret class with its GHAS and gitleaks surface tokens.
+
+    ``secret_types`` are GHAS ``secret_type`` tokens; ``rule_ids`` are gitleaks
+    ``rule_id`` tokens. Either side may carry several aliases (GHAS validators
+    and custom-pattern variants both exist), all mapping to one ``canonical``.
+    """
+
+    canonical: str
+    secret_types: tuple[str, ...]
+    rule_ids: tuple[str, ...]
+
+
+# Initial coverage: handoff-observed issuer classes only (github-pat x3
+# test-fixture, discord x4 manifest-hash) plus aws as a representative minority
+# issuer. Extend by appending a SecretTypePair row.
+DEFAULT_SECRET_TYPE_MAP: tuple[SecretTypePair, ...] = (
+    SecretTypePair(
+        canonical="github-personal-access-token",
+        secret_types=(
+            "github_personal_access_token",
+            "github_personal_access_token_v2",
+        ),
+        rule_ids=("github-pat", "github-fine-grained-pat"),
+    ),
+    SecretTypePair(
+        canonical="discord-bot-token",
+        secret_types=("discord_bot_token",),
+        rule_ids=("discord-api-token", "discord-bot-token"),
+    ),
+    SecretTypePair(
+        canonical="aws-access-key",
+        secret_types=("aws_access_key_id", "aws_secret_access_key"),
+        rule_ids=("aws-access-token", "aws-access-key-id"),
+    ),
+)
+
+
+@dataclass(frozen=True)
+class TypeCoverage:
+    """``type-coverage`` meta-metric over a set of observed ``secret_type`` tokens."""
+
+    registered_count: int
+    total_count: int
+
+    @property
+    def coverage(self) -> float:
+        if self.total_count == 0:
+            return 1.0
+        return self.registered_count / self.total_count
+
+
+class SecretTypeNormalizer:
+    """Bidirectional normalizer built from a sequence of :class:`SecretTypePair`.
+
+    An EMPTY map normalizes nothing (every lookup misses). That is intentional:
+    it is what makes the adversarial type-mismatch fixtures go red when
+    normalization is disabled.
+    """
+
+    def __init__(self, pairs: Iterable[SecretTypePair]) -> None:
+        secret_type_index: dict[str, str] = {}
+        rule_id_index: dict[str, str] = {}
+        for pair in pairs:
+            for secret_type in pair.secret_types:
+                secret_type_index[_norm_token(secret_type)] = pair.canonical
+            for rule_id in pair.rule_ids:
+                rule_id_index[_norm_token(rule_id)] = pair.canonical
+        self._secret_type_index = secret_type_index
+        self._rule_id_index = rule_id_index
+
+    def canonical_for_secret_type(self, secret_type: str) -> str | None:
+        """Return the canonical type for a GHAS ``secret_type`` or ``None``."""
+        return self._secret_type_index.get(_norm_token(secret_type))
+
+    def canonical_for_rule_id(self, rule_id: str) -> str | None:
+        """Return the canonical type for a gitleaks ``rule_id`` or ``None``."""
+        return self._rule_id_index.get(_norm_token(rule_id))
+
+    def is_registered_secret_type(self, secret_type: str) -> bool:
+        return _norm_token(secret_type) in self._secret_type_index
+
+    def is_registered_rule_id(self, rule_id: str) -> bool:
+        return _norm_token(rule_id) in self._rule_id_index
+
+
+def canonical_type_coverage(
+    normalizer: SecretTypeNormalizer,
+    observed_secret_types: Iterable[str],
+) -> TypeCoverage:
+    """Fraction of *distinct* observed GHAS ``secret_type`` tokens registered."""
+    distinct = {_norm_token(token) for token in observed_secret_types}
+    registered = sum(
+        1 for token in distinct if normalizer.is_registered_secret_type(token)
+    )
+    return TypeCoverage(registered_count=registered, total_count=len(distinct))
+
+
+def _norm_token(token: str) -> str:
+    """Case/separator-insensitive token key (``GitHub-PAT`` == ``github_pat``)."""
+    return token.strip().lower().replace("_", "-")
+
+
+# Backwards-friendly alias for callers that prefer a Mapping-style construction.
+def normalizer_from_pairs(
+    pairs: Mapping[str, SecretTypePair] | Iterable[SecretTypePair],
+) -> SecretTypeNormalizer:
+    if isinstance(pairs, Mapping):
+        return SecretTypeNormalizer(pairs.values())
+    return SecretTypeNormalizer(pairs)
+
+
+__all__ = [
+    "DEFAULT_SECRET_TYPE_MAP",
+    "SecretTypePair",
+    "SecretTypeNormalizer",
+    "TypeCoverage",
+    "canonical_type_coverage",
+    "normalizer_from_pairs",
+]
diff --git a/src/security_scanner/baseline/ghas_api/parity.py b/src/security_scanner/baseline/ghas_api/parity.py
new file mode 100644
index 0000000..84fcf67
--- /dev/null
+++ b/src/security_scanner/baseline/ghas_api/parity.py
@@ -0,0 +1,522 @@
+"""GHAS alert -> EvaluationKey parity adapter (M1).
+
+This adapter turns GHAS Secret Scanning alerts (:class:`GhasAlertRecord`) and
+our own gitleaks :class:`Finding` objects into the ``ExpectedFinding`` /
+``EvaluationKey`` shape that ``core.evaluation.metrics`` already understands, so
+the precision/recall *formula* and gate *threshold* judgement are reused
+verbatim — no new metric code.
+
+The adapter owns exactly three responsibilities the metrics layer cannot:
+
+(a) **secret_type -> canonical type** via
+    :class:`~security_scanner.baseline.ghas_api.normalize.SecretTypeNormalizer`,
+    mapping the canonical type into the ``EvaluationKey.rule_id`` slot so a
+    GHAS/gitleaks token-mismatch no longer splits one secret in two.
+(b) **state-aware truth filter** — positive truth is ``open`` plus
+    ``resolved``-as-``true_positive``; ``dismissed`` /
+    ``resolved``-as-``false_positive`` / ``revoked`` are excluded from the recall
+    denominator and counted separately as a ``GHAS-confirmed-FP`` signal.
+(c) **line-tolerance matching** — a finding matches an alert when their line
+    intervals overlap or are within ``+/-k`` lines. Because this is a fuzzy join
+    it cannot be expressed as exact-key equality, so the adapter resolves the
+    TP/FP/FN pairing itself and then hands canonical keys to
+    :func:`evaluate_detection` for the headline numbers.
+
+Unregistered (type-unmatched but colocated) pairs are bucketed separately so a
+missing normalization row is visible, never a silent miscount.
+
+This module is a pure function over its inputs: it performs no network calls.
+``GhasComparisonResult`` / ``compare_ghas_alerts_with_findings`` are NOT touched
+— the parity path converges on ``core.evaluation.metrics``.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Iterable, Sequence
+
+from security_scanner.baseline.ghas_api.normalize import (
+    SecretTypeNormalizer,
+    TypeCoverage,
+    canonical_type_coverage,
+)
+from security_scanner.core.evaluation.metrics import (
+    EvaluationResult,
+    ExpectedFinding,
+    evaluate_detection,
+)
+from security_scanner.core.finding.model import (
+    Finding,
+    GitleaksFindingPayload,
+    RepoRef,
+    Location,
+)
+from security_scanner.storage.base import GhasAlertRecord
+
+
+# Default positive-truth definition (state-aware truth filter).
+GHAS_POSITIVE_TRUTH_STATES: tuple[str, ...] = ("open", "resolved")
+# Resolutions that still count as positive truth (only meaningful for resolved).
+GHAS_POSITIVE_TRUTH_RESOLUTIONS: tuple[str, ...] = ("true_positive",)
+# States/resolutions that are explicit GHAS-confirmed false positives.
+GHAS_CONFIRMED_FP_STATES: tuple[str, ...] = ("dismissed", "revoked")
+GHAS_CONFIRMED_FP_RESOLUTIONS: tuple[str, ...] = (
+    "false_positive",
+    "revoked",
+    "wont_fix",
+    "used_in_tests",
+)
+
+
+@dataclass(frozen=True)
+class ParityConfig:
+    """Tunable parity-matching policy.
+
+    ``line_tolerance`` is the ``+/-k`` window; interval overlap always matches
+    regardless of ``k``. ``positive_truth_states`` /
+    ``positive_truth_resolutions`` parameterize the truth filter so a test can
+    disable it (and prove the resulting recall regression). When
+    ``positive_truth_resolutions`` is ``None`` every resolution is accepted for
+    an otherwise-positive state.
+    """
+
+    line_tolerance: int = 2
+    positive_truth_states: tuple[str, ...] = GHAS_POSITIVE_TRUTH_STATES
+    positive_truth_resolutions: tuple[str, ...] | None = (
+        GHAS_POSITIVE_TRUTH_RESOLUTIONS
+    )
+
+
+@dataclass(frozen=True)
+class RepoParityResult:
+    """Per-repo parity outcome: metrics result plus parity-specific buckets."""
+
+    repo_full_name: str
+    detection: EvaluationResult
+    type_coverage: TypeCoverage
+    type_unmatched_but_colocated: int
+    ghas_confirmed_fp: int
+
+    @property
+    def precision(self) -> float:
+        return self.detection.precision
+
+    @property
+    def recall(self) -> float:
+        return self.detection.recall
+
+
+@dataclass(frozen=True)
+class MacroParityResult:
+    """Macro (per-repo averaged) parity summary."""
+
+    repo_count: int
+    macro_precision: float
+    macro_recall: float
+    total_type_unmatched_but_colocated: int
+    total_ghas_confirmed_fp: int
+
+
+@dataclass(frozen=True)
+class ParitySnapshot:
+    """Loaded synthetic parity snapshot fixture (provenance-guarded)."""
+
+    repo_full_name: str
+    source: str
+    alerts: list[GhasAlertRecord]
+    findings: list[Finding]
+    fetched_at: str | None = None
+
+
+# ---------------------------------------------------------------------------
+# Truth classification
+# ---------------------------------------------------------------------------
+
+def _is_positive_truth(alert: GhasAlertRecord, config: ParityConfig) -> bool:
+    state = (alert.state or "").strip().lower()
+    if state not in config.positive_truth_states:
+        return False
+    if config.positive_truth_resolutions is None:
+        return True
+    resolution = (alert.resolution or "").strip().lower()
+    if not resolution:
+        # An open alert has no resolution and is positive truth.
+        return True
+    return resolution in config.positive_truth_resolutions
+
+
+def _is_confirmed_fp(alert: GhasAlertRecord) -> bool:
+    state = (alert.state or "").strip().lower()
+    resolution = (alert.resolution or "").strip().lower()
+    if state in GHAS_CONFIRMED_FP_STATES:
+        return True
+    return resolution in GHAS_CONFIRMED_FP_RESOLUTIONS
+
+
+# ---------------------------------------------------------------------------
+# Core fuzzy join
+# ---------------------------------------------------------------------------
+
+@dataclass
+class _AlertSlot:
+    record: GhasAlertRecord
+    canonical: str | None
+    consumed: bool = False
+
+
+def _alert_lines(alert: GhasAlertRecord) -> tuple[int, int]:
+    start = alert.location_start_line
+    end = alert.location_end_line if alert.location_end_line is not None else start
+    if start is None:
+        return (0, 0)
+    lo, hi = (start, end) if end is not None else (start, start)
+    if hi < lo:
+        lo, hi = hi, lo
+    return (lo, hi)
+
+
+def _lines_match(
+    finding_line: int,
+    alert_interval: tuple[int, int],
+    tolerance: int,
+) -> bool:
+    lo, hi = alert_interval
+    # Interval overlap (finding line inside the alert span).
+    if lo <= finding_line <= hi:
+        return True
+    # +/-k tolerance around the nearest interval endpoint.
+    nearest = lo if finding_line < lo else hi
+    return abs(finding_line - nearest) <= tolerance
+
+
+def evaluate_repo_parity(
+    *,
+    repo_full_name: str,
+    alerts: Sequence[GhasAlertRecord],
+    findings: Sequence[Finding],
+    normalizer: SecretTypeNormalizer,
+    config: ParityConfig | None = None,
+) -> RepoParityResult:
+    """Compute per-repo parity for one GHAS-enabled repo.
+
+    Returns the metrics-layer ``EvaluationResult`` (so precision/recall come
+    straight from ``core.evaluation.metrics``) plus the parity-specific buckets.
+    """
+    config = config or ParityConfig()
+
+    # 1. State-aware truth filter: keep only locatable positive-truth alerts.
+    confirmed_fp = sum(1 for a in alerts if _is_confirmed_fp(a))
+    truth_alerts = [
+        a
+        for a in alerts
+        if a.location_path is not None
+        and a.location_start_line is not None
+        and _is_positive_truth(a, config)
+    ]
+
+    # type-coverage meta-metric is computed over ALL observed truth secret_types.
+    type_coverage = canonical_type_coverage(
+        normalizer, (a.secret_type for a in truth_alerts)
+    )
+
+    alert_slots = [
+        _AlertSlot(record=a, canonical=normalizer.canonical_for_secret_type(a.secret_type))
+        for a in truth_alerts
+    ]
+
+    # 2. Fuzzy join: each finding tries to claim one unconsumed alert in the
+    #    same file with a matching canonical type and a tolerated line.
+    expected: list[ExpectedFinding] = []
+    actual: list[Finding] = []
+    type_unmatched_but_colocated = 0
+    match_index = 0
+
+    for finding in findings:
+        canonical = normalizer.canonical_for_rule_id(finding.rule_id)
+        slot = _find_matching_slot(finding, canonical, alert_slots, config)
+        if slot is not None:
+            slot.consumed = True
+            match_index += 1
+            shared_key = _matched_key(repo_full_name, match_index, slot.canonical)
+            expected.append(shared_key)
+            actual.append(_canonical_finding(finding, shared_key))
+        else:
+            colocated = _colocated_unmapped_slot(
+                finding, canonical, alert_slots, config
+            )
+            if colocated is not None:
+                # Same file + tolerated line, but the type pair is not registered.
+                # This pair is measurement-uncertain: we cannot confirm the type
+                # matches, so it is counted ONLY in the type-unmatched bucket and
+                # EXCLUDED from precision/recall — never a silent TP, and never a
+                # spurious FP/FN that an unrelated normalization gap would create.
+                colocated.consumed = True
+                type_unmatched_but_colocated += 1
+            else:
+                # Pure local-only finding -> false positive (Q3 semantics).
+                actual.append(_local_only_finding(repo_full_name, finding))
+
+    # 3. Unconsumed positive-truth alerts -> false negatives (ghas_only truth).
+    for slot in alert_slots:
+        if not slot.consumed:
+            expected.append(
+                _ghas_only_key(repo_full_name, slot.record, slot.canonical)
+            )
+
+    detection = evaluate_detection(expected, actual)
+
+    return RepoParityResult(
+        repo_full_name=repo_full_name,
+        detection=detection,
+        type_coverage=type_coverage,
+        type_unmatched_but_colocated=type_unmatched_but_colocated,
+        ghas_confirmed_fp=confirmed_fp,
+    )
+
+
+def _find_matching_slot(
+    finding: Finding,
+    finding_canonical: str | None,
+    slots: list[_AlertSlot],
+    config: ParityConfig,
+) -> _AlertSlot | None:
+    if finding_canonical is None:
+        return None
+    for slot in slots:
+        if slot.consumed or slot.canonical is None:
+            continue
+        if slot.canonical != finding_canonical:
+            continue
+        if slot.record.location_path != finding.location.file_path:
+            continue
+        if _lines_match(
+            finding.location.line_start, _alert_lines(slot.record), config.line_tolerance
+        ):
+            return slot
+    return None
+
+
+def _colocated_unmapped_slot(
+    finding: Finding,
+    finding_canonical: str | None,
+    slots: list[_AlertSlot],
+    config: ParityConfig,
+) -> _AlertSlot | None:
+    """Same file + tolerated line where the type pair is NOT registered.
+
+    Reached only after :func:`_find_matching_slot` failed, so by construction
+    the two canonicals are either missing on one side or disagree — i.e. the
+    pair is genuinely unmapped (a missing normalization row), not a clean match.
+    """
+    for slot in slots:
+        if slot.consumed:
+            continue
+        if slot.record.location_path != finding.location.file_path:
+            continue
+        if not _lines_match(
+            finding.location.line_start, _alert_lines(slot.record), config.line_tolerance
+        ):
+            continue
+        # Colocated but unmapped (one canonical missing or the two disagree).
+        if (
+            slot.canonical is None
+            or finding_canonical is None
+            or slot.canonical != finding_canonical
+        ):
+            return slot
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Canonical-key synthesis (kept stable so metrics.py keys line up 1:1)
+# ---------------------------------------------------------------------------
+
+def _matched_key(
+    repo_full_name: str, index: int, canonical: str | None
+) -> ExpectedFinding:
+    return ExpectedFinding(
+        repo_full_name=repo_full_name,
+        file_path=f"__matched__/{index}",
+        line_start=index,
+        rule_id=canonical or "__matched__",
+    )
+
+
+def _canonical_finding(finding: Finding, shared_key: ExpectedFinding) -> Finding:
+    """A Finding whose EvaluationKey equals ``shared_key`` (so it counts TP)."""
+    return Finding(
+        finding_id=finding.finding_id,
+        category=finding.category,
+        source_tool=finding.source_tool,
+        source_tool_version=finding.source_tool_version,
+        rule_id=shared_key.rule_id,
+        severity=finding.severity,
+        confidence=finding.confidence,
+        repo=RepoRef(full_name=shared_key.repo_full_name),
+        location=Location(
+            file_path=shared_key.file_path, line_start=shared_key.line_start
+        ),
+        evidence=finding.evidence,
+        fingerprint=finding.fingerprint,
+        status=finding.status,
+        triage=finding.triage,
+        scan=finding.scan,
+        gitleaks=finding.gitleaks,
+    )
+
+
+def _ghas_only_key(
+    repo_full_name: str, alert: GhasAlertRecord, canonical: str | None
+) -> ExpectedFinding:
+    return ExpectedFinding(
+        repo_full_name=repo_full_name,
+        file_path=f"__ghas_only__/{alert.location_path}",
+        line_start=alert.location_start_line or 0,
+        rule_id=canonical or f"ghas:{alert.secret_type}",
+    )
+
+
+def _local_only_finding(repo_full_name: str, finding: Finding) -> Finding:
+    """A Finding with a guaranteed-unique key so it lands as a false positive."""
+    return Finding(
+        finding_id=finding.finding_id,
+        category=finding.category,
+        source_tool=finding.source_tool,
+        source_tool_version=finding.source_tool_version,
+        rule_id=f"__local_only__/{finding.rule_id}",
+        severity=finding.severity,
+        confidence=finding.confidence,
+        repo=RepoRef(full_name=repo_full_name),
+        location=Location(
+            file_path=f"__local_only__/{finding.location.file_path}",
+            line_start=finding.location.line_start,
+        ),
+        evidence=finding.evidence,
+        fingerprint=finding.fingerprint,
+        status=finding.status,
+        triage=finding.triage,
+        scan=finding.scan,
+        gitleaks=finding.gitleaks,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Aggregation (per-repo micro -> macro)
+# ---------------------------------------------------------------------------
+
+def aggregate_repo_parity(
+    results: Iterable[RepoParityResult],
+) -> MacroParityResult:
+    """Macro-average per-repo precision/recall (SLO judgement consumes macro)."""
+    results = list(results)
+    if not results:
+        return MacroParityResult(
+            repo_count=0,
+            macro_precision=1.0,
+            macro_recall=1.0,
+            total_type_unmatched_but_colocated=0,
+            total_ghas_confirmed_fp=0,
+        )
+    n = len(results)
+    return MacroParityResult(
+        repo_count=n,
+        macro_precision=sum(r.detection.precision for r in results) / n,
+        macro_recall=sum(r.detection.recall for r in results) / n,
+        total_type_unmatched_but_colocated=sum(
+            r.type_unmatched_but_colocated for r in results
+        ),
+        total_ghas_confirmed_fp=sum(r.ghas_confirmed_fp for r in results),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Snapshot fixture loading (provenance fail-closed)
+# ---------------------------------------------------------------------------
+
+def load_parity_snapshot(path: str | Path) -> ParitySnapshot:
+    """Load a synthetic parity snapshot fixture.
+
+    Fails closed unless ``source`` is exactly ``synthetic`` — a real (or
+    unmarked) snapshot must never feed the autonomous harness.
+    """
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    source = str(data.get("source", "")).strip().lower()
+    if source != "synthetic":
+        raise ValueError(
+            "parity snapshot must carry provenance marker source: synthetic "
+            f"(got {data.get('source')!r}); refusing to load"
+        )
+
+    repo_full_name = str(data["repoFullName"])
+    fetched_at = data.get("fetchedAt")
+
+    alerts = [
+        _alert_from_dict(repo_full_name, item, fetched_at)
+        for item in data.get("alerts", [])
+    ]
+    findings = [
+        _finding_from_dict(repo_full_name, item) for item in data.get("findings", [])
+    ]
+    return ParitySnapshot(
+        repo_full_name=repo_full_name,
+        source=source,
+        alerts=alerts,
+        findings=findings,
+        fetched_at=fetched_at,
+    )
+
+
+def _alert_from_dict(
+    repo_full_name: str, item: dict, fetched_at: str | None
+) -> GhasAlertRecord:
+    import datetime as dt
+
+    start = item.get("lineStart")
+    end = item.get("lineEnd")
+    parsed_at = (
+        dt.datetime.fromisoformat(str(fetched_at))
+        if fetched_at
+        else dt.datetime(2026, 1, 1, tzinfo=dt.timezone.utc)
+    )
+    return GhasAlertRecord(
+        ghas_alert_id=f"ghas_alert_{int(item['alertNumber']):06d}",
+        repository=repo_full_name,
+        alert_number=int(item["alertNumber"]),
+        secret_type=str(item["secretType"]),
+        state=str(item.get("state", "open")),
+        resolution=item.get("resolution"),
+        fetched_at=parsed_at,
+        location_path=item.get("filePath"),
+        location_start_line=int(start) if start is not None else None,
+        location_end_line=int(end) if end is not None else None,
+    )
+
+
+def _finding_from_dict(repo_full_name: str, item: dict) -> Finding:
+    return Finding.create(
+        repo_full_name=repo_full_name,
+        file_path=str(item["filePath"]),
+        line_start=int(item["lineStart"]),
+        line_end=item.get("lineEnd"),
+        rule_id=str(item["ruleId"]),
+        raw_secret=str(item.get("fakeSecretMarker", "SCANNER_FAKE_SECRET_TOKEN_000000")),
+        source_tool="gitleaks",
+        scan_run_id="scan_parity_fixture",
+        rule_pack_version="secret-rules-0.1.0",
+        gitleaks=GitleaksFindingPayload(rule_id=str(item["ruleId"])),
+    )
+
+
+__all__ = [
+    "GHAS_POSITIVE_TRUTH_STATES",
+    "GHAS_POSITIVE_TRUTH_RESOLUTIONS",
+    "ParityConfig",
+    "RepoParityResult",
+    "MacroParityResult",
+    "ParitySnapshot",
+    "evaluate_repo_parity",
+    "aggregate_repo_parity",
+    "load_parity_snapshot",
+]
diff --git a/tests/test_ghas_normalize.py b/tests/test_ghas_normalize.py
new file mode 100644
index 0000000..7f2a4a9
--- /dev/null
+++ b/tests/test_ghas_normalize.py
@@ -0,0 +1,88 @@
+"""Tests for the GHAS secret_type <-> gitleaks rule_id normalization map (M1).
+
+Red-first contract for the first-class normalization artifact required by the
+parity measurement semantics: a bidirectional lookup, unregistered-pair
+identification, and a ``type-coverage`` meta-metric. With an EMPTY map every
+normalized lookup must miss, which is what forces the adversarial type-mismatch
+fixtures in :mod:`tests.test_ghas_parity` to go red when normalization is
+disabled.
+"""
+
+from __future__ import annotations
+
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+    canonical_type_coverage,
+)
+
+
+def test_default_map_normalizes_github_pat_both_directions():
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+
+    # GHAS secret_type -> canonical
+    assert (
+        normalizer.canonical_for_secret_type("github_personal_access_token")
+        == normalizer.canonical_for_rule_id("github-pat")
+    )
+    # The two surface tokens collapse to a single canonical type.
+    assert normalizer.canonical_for_secret_type("github_personal_access_token") is not None
+
+
+def test_default_map_covers_handoff_observed_classes():
+    """github-pat, discord, and aws issuer classes must be registered."""
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+
+    # github-pat (3x test-fixture observed)
+    assert normalizer.canonical_for_secret_type("github_personal_access_token") is not None
+    assert normalizer.canonical_for_rule_id("github-pat") is not None
+    # discord (4x manifest-hash observed)
+    assert normalizer.canonical_for_secret_type("discord_bot_token") is not None
+    assert normalizer.canonical_for_rule_id("discord-api-token") is not None
+    # aws (minority issuer in the initial coverage set)
+    assert normalizer.canonical_for_secret_type("aws_access_key_id") is not None
+    assert normalizer.canonical_for_rule_id("aws-access-token") is not None
+
+
+def test_unregistered_pair_is_identified_not_silently_matched():
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+
+    assert normalizer.is_registered_secret_type("github_personal_access_token") is True
+    assert normalizer.is_registered_secret_type("totally_unknown_issuer_v9") is False
+    assert normalizer.canonical_for_secret_type("totally_unknown_issuer_v9") is None
+    assert normalizer.canonical_for_rule_id("totally-unknown-rule-v9") is None
+
+
+def test_empty_map_normalizes_nothing():
+    """An empty map must miss every lookup (drives the red-first proof)."""
+    normalizer = SecretTypeNormalizer({})
+
+    assert normalizer.canonical_for_secret_type("github_personal_access_token") is None
+    assert normalizer.canonical_for_rule_id("github-pat") is None
+    assert normalizer.is_registered_secret_type("github_personal_access_token") is False
+
+
+def test_type_coverage_meta_metric_is_fraction_of_registered_types():
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+
+    observed = [
+        "github_personal_access_token",  # registered
+        "discord_bot_token",  # registered
+        "totally_unknown_issuer_v9",  # NOT registered
+    ]
+
+    coverage = canonical_type_coverage(normalizer, observed)
+
+    # 2 of 3 distinct observed secret_types are registered.
+    assert coverage.registered_count == 2
+    assert coverage.total_count == 3
+    assert abs(coverage.coverage - (2 / 3)) < 1e-9
+
+
+def test_type_coverage_empty_observed_is_full_coverage():
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+
+    coverage = canonical_type_coverage(normalizer, [])
+
+    assert coverage.total_count == 0
+    assert coverage.coverage == 1.0
diff --git a/tests/test_ghas_parity.py b/tests/test_ghas_parity.py
new file mode 100644
index 0000000..552e0df
--- /dev/null
+++ b/tests/test_ghas_parity.py
@@ -0,0 +1,566 @@
+"""Adversarial parity-harness tests for the GHAS alert -> EvaluationKey adapter (M1).
+
+These tests are written so that switching OFF any one of the three adapter
+responsibilities makes a specific assertion go red:
+
+- normalization map OFF  -> type-mismatch pair splits into local_only + ghas_only
+- state filter OFF        -> a dismissed alert pollutes the recall denominator
+- line tolerance OFF      -> a +/-1..2 line-drift pair stops matching
+
+The negative-control pair (in-tolerance must-match vs just-out-of-tolerance
+must-NOT-match) prevents a too-greedy tolerance from going green by luck.
+
+Precision/recall *formula* and gate *threshold* judgement are NOT re-implemented
+here: the adapter resolves TP/FP/FN via fuzzy (tolerance) join, then hands the
+matched canonical keys to ``core.evaluation.metrics`` for the headline figures.
+"""
+
+from __future__ import annotations
+
+import datetime as dt
+from pathlib import Path
+
+import pytest
+
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+)
+from security_scanner.baseline.ghas_api.parity import (
+    GHAS_POSITIVE_TRUTH_STATES,
+    ParityConfig,
+    aggregate_repo_parity,
+    evaluate_repo_parity,
+    load_parity_snapshot,
+)
+from security_scanner.core.finding.model import Finding
+from security_scanner.storage.base import GhasAlertRecord
+
+
+FETCHED_AT = dt.datetime(2026, 6, 16, 12, 0, tzinfo=dt.timezone.utc)
+RULE_PACK = "secret-rules-0.1.0"
+REPO = "synthetic-org/synthetic-repo"
+FIXTURE = (
+    Path(__file__).resolve().parents[1]
+    / "eval"
+    / "ghas-parity-corpus"
+    / "synthetic-snapshot.json"
+)
+
+
+def _normalizer(map_=DEFAULT_SECRET_TYPE_MAP) -> SecretTypeNormalizer:
+    return SecretTypeNormalizer(map_)
+
+
+def _alert(
+    *,
+    number: int,
+    secret_type: str,
+    path: str,
+    start_line: int,
+    end_line: int | None = None,
+    state: str = "open",
+    resolution: str | None = None,
+) -> GhasAlertRecord:
+    return GhasAlertRecord(
+        ghas_alert_id=f"ghas_alert_{number:06d}",
+        repository=REPO,
+        alert_number=number,
+        secret_type=secret_type,
+        state=state,
+        resolution=resolution,
+        fetched_at=FETCHED_AT,
+        location_path=path,
+        location_start_line=start_line,
+        location_end_line=end_line if end_line is not None else start_line,
+    )
+
+
+def _finding(*, rule_id: str, path: str, line_start: int) -> Finding:
+    return Finding.create(
+        repo_full_name=REPO,
+        file_path=path,
+        line_start=line_start,
+        rule_id=rule_id,
+        raw_secret="SCANNER_FAKE_SECRET_TOKEN_000001",
+        source_tool="gitleaks",
+        scan_run_id="scan_parity",
+        rule_pack_version=RULE_PACK,
+    )
+
+
+# ---------------------------------------------------------------------------
+# (a) type-mismatch: matches ONLY after normalization
+# ---------------------------------------------------------------------------
+
+def test_type_mismatch_matches_only_after_normalization():
+    """GHAS github_personal_access_token vs gitleaks github-pat at same loc."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/config.py",
+            start_line=10,
+        )
+    ]
+    findings = [_finding(rule_id="github-pat", path="src/config.py", line_start=10)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    assert result.detection.true_positive_count == 1
+    assert result.detection.false_positive_count == 0
+    assert result.detection.false_negative_count == 0
+    assert result.detection.precision == 1.0
+    assert result.detection.recall == 1.0
+
+
+def test_type_mismatch_without_normalization_goes_red():
+    """RED-PROOF: empty map -> the colocated pair fails to match.
+
+    The colocated-but-unmapped pair is bucketed separately and never counted as
+    a true positive, so this is NOT a silent miscount.
+    """
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/config.py",
+            start_line=10,
+        )
+    ]
+    findings = [_finding(rule_id="github-pat", path="src/config.py", line_start=10)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer({}),  # normalization disabled
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    assert result.detection.true_positive_count == 0
+    # The pair is colocated but type-unmatched, so it lands in its own bucket
+    # rather than masquerading as a clean local_only/ghas_only split.
+    assert result.type_unmatched_but_colocated == 1
+
+
+# ---------------------------------------------------------------------------
+# (b) state-aware truth filter
+# ---------------------------------------------------------------------------
+
+def test_dismissed_alert_excluded_from_recall_denominator():
+    """A dismissed GHAS alert we do NOT detect must not punish recall."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/open.py",
+            start_line=5,
+        ),
+        _alert(
+            number=2,
+            secret_type="discord_bot_token",
+            path="src/dismissed.py",
+            start_line=8,
+            state="dismissed",
+            resolution="false_positive",
+        ),
+    ]
+    # We only detect the open one.
+    findings = [_finding(rule_id="github-pat", path="src/open.py", line_start=5)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    # Only the open alert is positive truth -> perfect recall.
+    assert result.detection.true_positive_count == 1
+    assert result.detection.false_negative_count == 0
+    assert result.detection.recall == 1.0
+    # The dismissed alert is tracked as a GHAS-confirmed-FP signal, not truth.
+    assert result.ghas_confirmed_fp == 1
+
+
+def test_without_state_filter_dismissed_pollutes_recall_red():
+    """RED-PROOF: counting dismissed alerts as truth drops recall below 1."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/open.py",
+            start_line=5,
+        ),
+        _alert(
+            number=2,
+            secret_type="discord_bot_token",
+            path="src/dismissed.py",
+            start_line=8,
+            state="dismissed",
+            resolution="false_positive",
+        ),
+    ]
+    findings = [_finding(rule_id="github-pat", path="src/open.py", line_start=5)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        # state filter OFF: treat every alert state/resolution as positive truth.
+        config=ParityConfig(
+            line_tolerance=2,
+            positive_truth_states=("open", "dismissed"),
+            positive_truth_resolutions=None,
+        ),
+    )
+
+    # The dismissed alert is now (wrongly) truth and undetected -> recall < 1.
+    assert result.detection.false_negative_count == 1
+    assert result.detection.recall < 1.0
+
+
+def test_resolved_true_positive_counts_as_positive_truth():
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="aws_access_key_id",
+            path="src/key.py",
+            start_line=3,
+            state="resolved",
+            resolution="true_positive",
+        )
+    ]
+    findings = [_finding(rule_id="aws-access-token", path="src/key.py", line_start=3)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    assert result.detection.true_positive_count == 1
+    assert result.detection.recall == 1.0
+
+
+# ---------------------------------------------------------------------------
+# (c) line tolerance + (c') negative control
+# ---------------------------------------------------------------------------
+
+def test_line_drift_within_tolerance_matches():
+    """GHAS line 20, our finding at line 21 (drift +1) with tolerance k=2."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/drift.py",
+            start_line=20,
+        )
+    ]
+    findings = [_finding(rule_id="github-pat", path="src/drift.py", line_start=21)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    assert result.detection.true_positive_count == 1
+    assert result.detection.recall == 1.0
+
+
+def test_line_drift_without_tolerance_goes_red():
+    """RED-PROOF: tolerance=0 -> a +1 drift no longer matches."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/drift.py",
+            start_line=20,
+        )
+    ]
+    findings = [_finding(rule_id="github-pat", path="src/drift.py", line_start=21)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=0),  # exact-match only
+    )
+
+    assert result.detection.true_positive_count == 0
+    assert result.detection.false_negative_count == 1
+    assert result.detection.false_positive_count == 1
+
+
+def test_tolerance_boundary_negative_control():
+    """Two drift pairs at the SAME file: one just inside k, one just outside.
+
+    With k=2: drift of +2 (line 30 -> 32) MUST match; drift of +3 (line 50 ->
+    53) MUST NOT match. A too-greedy tolerance that matched both would fail the
+    must-NOT-match assertion, so green here proves matching is tolerance-driven,
+    not luck.
+    """
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/inside.py",
+            start_line=30,
+        ),
+        _alert(
+            number=2,
+            secret_type="github_personal_access_token",
+            path="src/outside.py",
+            start_line=50,
+        ),
+    ]
+    findings = [
+        _finding(rule_id="github-pat", path="src/inside.py", line_start=32),  # +2 in
+        _finding(rule_id="github-pat", path="src/outside.py", line_start=53),  # +3 out
+    ]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    # in-tolerance pair matched, out-of-tolerance pair did NOT.
+    assert result.detection.true_positive_count == 1
+    assert result.detection.false_negative_count == 1  # the outside alert
+    assert result.detection.false_positive_count == 1  # the outside finding
+
+
+def test_interval_overlap_matches_multiline_alert():
+    """line_start..line_end interval overlap counts as a match."""
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/multiline.py",
+            start_line=10,
+            end_line=14,
+        )
+    ]
+    # Finding sits inside the alert interval but is >k away from start_line.
+    findings = [_finding(rule_id="github-pat", path="src/multiline.py", line_start=13)]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=0),  # rely purely on interval overlap
+    )
+
+    assert result.detection.true_positive_count == 1
+
+
+# ---------------------------------------------------------------------------
+# precision/recall delegation: local-only finding is an FP (Q3 semantics)
+# ---------------------------------------------------------------------------
+
+def test_local_only_finding_is_false_positive():
+    alerts = [
+        _alert(
+            number=1,
+            secret_type="github_personal_access_token",
+            path="src/match.py",
+            start_line=4,
+        )
+    ]
+    findings = [
+        _finding(rule_id="github-pat", path="src/match.py", line_start=4),
+        _finding(rule_id="github-pat", path="src/extra-noise.py", line_start=99),
+    ]
+
+    result = evaluate_repo_parity(
+        repo_full_name=REPO,
+        alerts=alerts,
+        findings=findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    assert result.detection.true_positive_count == 1
+    assert result.detection.false_positive_count == 1
+    assert result.detection.precision == 0.5
+    assert result.detection.recall == 1.0
+
+
+# ---------------------------------------------------------------------------
+# per-repo micro -> macro aggregation
+# ---------------------------------------------------------------------------
+
+def test_macro_aggregation_averages_per_repo_metrics():
+    # Repo A: perfect (precision 1.0, recall 1.0)
+    repo_a = "synthetic-org/repo-a"
+    result_a = evaluate_repo_parity(
+        repo_full_name=repo_a,
+        alerts=[
+            GhasAlertRecord(
+                ghas_alert_id="ghas_alert_a1",
+                repository=repo_a,
+                alert_number=1,
+                secret_type="github_personal_access_token",
+                state="open",
+                fetched_at=FETCHED_AT,
+                location_path="a.py",
+                location_start_line=1,
+                location_end_line=1,
+            )
+        ],
+        findings=[
+            Finding.create(
+                repo_full_name=repo_a,
+                file_path="a.py",
+                line_start=1,
+                rule_id="github-pat",
+                raw_secret="SCANNER_FAKE_SECRET_TOKEN_000001",
+                source_tool="gitleaks",
+                scan_run_id="scan_parity",
+                rule_pack_version=RULE_PACK,
+            )
+        ],
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+    # Repo B: precision 0.5 (one extra FP), recall 1.0
+    repo_b = "synthetic-org/repo-b"
+    result_b = evaluate_repo_parity(
+        repo_full_name=repo_b,
+        alerts=[
+            GhasAlertRecord(
+                ghas_alert_id="ghas_alert_b1",
+                repository=repo_b,
+                alert_number=1,
+                secret_type="github_personal_access_token",
+                state="open",
+                fetched_at=FETCHED_AT,
+                location_path="b.py",
+                location_start_line=1,
+                location_end_line=1,
+            )
+        ],
+        findings=[
+            Finding.create(
+                repo_full_name=repo_b,
+                file_path="b.py",
+                line_start=1,
+                rule_id="github-pat",
+                raw_secret="SCANNER_FAKE_SECRET_TOKEN_000001",
+                source_tool="gitleaks",
+                scan_run_id="scan_parity",
+                rule_pack_version=RULE_PACK,
+            ),
+            Finding.create(
+                repo_full_name=repo_b,
+                file_path="b-noise.py",
+                line_start=9,
+                rule_id="github-pat",
+                raw_secret="SCANNER_FAKE_SECRET_TOKEN_000002",
+                source_tool="gitleaks",
+                scan_run_id="scan_parity",
+                rule_pack_version=RULE_PACK,
+            ),
+        ],
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    macro = aggregate_repo_parity([result_a, result_b])
+
+    assert macro.repo_count == 2
+    # macro precision = mean(1.0, 0.5) = 0.75
+    assert abs(macro.macro_precision - 0.75) < 1e-9
+    assert abs(macro.macro_recall - 1.0) < 1e-9
+
+
+# ---------------------------------------------------------------------------
+# (d) full adversarial fixture snapshot + provenance + meta-metric asserts
+# ---------------------------------------------------------------------------
+
+def test_provenance_marker_required_fail_closed(tmp_path):
+    """A snapshot without source: synthetic must fail closed."""
+    bad = tmp_path / "no-provenance.json"
+    bad.write_text(
+        '{"repoFullName": "synthetic-org/x", "alerts": [], "findings": []}',
+        encoding="utf-8",
+    )
+
+    with pytest.raises(ValueError, match="synthetic"):
+        load_parity_snapshot(bad)
+
+
+def test_adversarial_fixture_meta_metrics_assert():
+    """End-to-end over the committed adversarial snapshot.
+
+    Asserts not just headline precision/recall but the META-metrics
+    (type-coverage and the type-unmatched-but-colocated bucket), so a missing
+    normalization / state / tolerance path shows up as a red meta-metric too.
+    """
+    snapshot = load_parity_snapshot(FIXTURE)
+
+    result = evaluate_repo_parity(
+        repo_full_name=snapshot.repo_full_name,
+        alerts=snapshot.alerts,
+        findings=snapshot.findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+
+    # The fixture is engineered so normalization+state+tolerance produce a
+    # clean, high-recall picture. One intentionally unmapped colocated pair
+    # exercises the type-unmatched bucket meta-metric.
+    assert result.detection.recall == 1.0
+    assert result.type_unmatched_but_colocated == 1
+    assert result.ghas_confirmed_fp >= 1
+
+    # type-coverage meta-metric: every registered observed type is covered, the
+    # one unknown issuer is not -> coverage strictly between 0 and 1.
+    assert 0.0 < result.type_coverage.coverage < 1.0
+    assert result.type_coverage.registered_count >= 3
+
+
+def test_fixture_states_drive_red_when_filter_disabled():
+    """RED-PROOF over the fixture: disabling the state filter drops recall."""
+    snapshot = load_parity_snapshot(FIXTURE)
+
+    with_filter = evaluate_repo_parity(
+        repo_full_name=snapshot.repo_full_name,
+        alerts=snapshot.alerts,
+        findings=snapshot.findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(line_tolerance=2),
+    )
+    without_filter = evaluate_repo_parity(
+        repo_full_name=snapshot.repo_full_name,
+        alerts=snapshot.alerts,
+        findings=snapshot.findings,
+        normalizer=_normalizer(),
+        config=ParityConfig(
+            line_tolerance=2,
+            positive_truth_states=("open", "dismissed", "resolved"),
+            positive_truth_resolutions=None,  # accept any resolution as truth
+        ),
+    )
+
+    assert with_filter.detection.recall == 1.0
+    assert without_filter.detection.recall < 1.0

From 739fac388caf0288a28c6accda10fd216fa78109 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 12:06:45 +0900
Subject: [PATCH 4/7] =?UTF-8?q?feat(scanners):=20M2=20=EC=9D=B8=EB=9D=BC?=
 =?UTF-8?q?=EC=9D=B8=20=EC=8B=BC=20FP-=EC=96=B5=EC=A0=9C=20=ED=8B=B0?=
 =?UTF-8?q?=EC=96=B4=20=E2=80=94=20scan-time=20path-role/context-class=20?=
 =?UTF-8?q?=EC=96=B5=EC=A0=9C?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

자율층 M2. 11-FP 실관측(docs-example/test-fixture/manifest-hash) 클래스를
scan 시점에 즉시 억제하되 canary TP는 위치 무관 보존. 기존 secret default 불변.

- scanners/gitleaks/context_filter.py(신규): suppression_reason(finding).
  - noise_reason(item:dict) 계약 불변 — path-role은 정규화된
    Finding.location.file_path가 필요해 map 이후·append 이전의 별도 scan-time
    단계로 분리(pre-impl arch gate 권고 준수).
  - default-on 결정적·no-network: context-class manifest-hash(lockfile),
    path-role documentation/example/test 억제.
  - FP-floor: 강토큰(FALSE_NEGATIVE_PATTERN, ghp_/AKIA)은 docs/test/manifest
    어디서도 보존(canary 가드를 첫 분기로 강제) → existing-secret-default
    -behavior-change 안 건드림(억제율 회귀 테스트로 보장).
  - path-role 어휘는 llm/common/prompt._path_role과 동일하게 scanners 레이어에
    재구현(scanners→llm import 없음, 테스트로 등가 강제).
  - partner-pattern은 default-off gated 신규 동작분(이 scan-time 티어에선 KEEP
    신호, 실 boost는 M3 검증 티어 소관 — design Open Questions에 재평가 노트).
- scanners/gitleaks/parser.py: map 이후·append 이전 suppression_reason 배선,
  enable_noise_filter로 게이트. 억제=finding 미생성(scan-time 경계, post-scan
  disposition 아님).
- core/scan/options.py: enable_noise_filter docstring에 path-role 억제도 이
  스위치에 묶임 명시(post-M2 arch gate P1).
- design.md: partner-boost 위치·path-role 공통추출을 Open Questions에 deferred.

post-M2 아키텍처 리뷰 PASS(blocking 0). 검증: uv run pytest 1095 passed,
public_safety green, autopilot_gate --base 81d59d0 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../specs/ghas-quality-secrets/design.md      |   6 +
 src/security_scanner/core/scan/options.py     |   6 +-
 .../scanners/gitleaks/context_filter.py       | 196 +++++++++++++
 .../scanners/gitleaks/parser.py               |  23 +-
 tests/test_gitleaks_context_filter.py         | 260 ++++++++++++++++++
 ...est_gitleaks_parser_context_suppression.py | 141 ++++++++++
 6 files changed, 629 insertions(+), 3 deletions(-)
 create mode 100644 src/security_scanner/scanners/gitleaks/context_filter.py
 create mode 100644 tests/test_gitleaks_context_filter.py
 create mode 100644 tests/test_gitleaks_parser_context_suppression.py

diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
index 6e7a1e5..408f2f3 100644
--- a/docs/workbench/specs/ghas-quality-secrets/design.md
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -213,9 +213,15 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
 ## Open Questions (잔여, 구현 중)
 
 - 정규화 맵 초기 커버리지(어느 발급처부터) + partner-pattern 확보 범위.
+- partner-pattern boost 위치(post-M2 arch gate): M2 scan-time 티어에서 partner는 KEEP 신호(억제 안 함)일 뿐.
+  실제 boost(verifier confidence/disposition 상향)는 M3 검증 티어 소관 — M3 배선 시 `context_filter`의
+  partner hook을 M3 disposition 경로로 옮길지 재평가.
 - drift 샘플링 레이트/판정 임계(별도 스케줄 신설=비채택 위반 상한).
 - drift 노출 표면 최종(scan_health 레코드 vs notification_log) — M4서 택1.
 - line-tolerance k값·구간겹침 vs ±k 택1.
+- path-role 분류기 공통 추출(post-M2 arch gate nit): 현재 `scanners/gitleaks/context_filter`와
+  `llm/common/prompt._path_role`이 어휘를 의도적 복제(테스트로 등가 강제, scanners→llm import 없음).
+  셋째 호출자(M3 disposition 경로 등)가 생기면 `core/path_role.py` 추출 재평가 — 지금은 scope-expansion이라 비채택.
 
 ## YAGNI
 
diff --git a/src/security_scanner/core/scan/options.py b/src/security_scanner/core/scan/options.py
index 896eafc..97e6965 100644
--- a/src/security_scanner/core/scan/options.py
+++ b/src/security_scanner/core/scan/options.py
@@ -29,7 +29,11 @@ class ScanOptions:
         Used by incremental commit workers to scan one commit.
     enable_noise_filter:
         When True (default), parser-level Gitleaks noise filtering removes
-        low-signal candidates before storage and optional verifier steps.
+        low-signal candidates before storage and optional verifier steps. This
+        switch gates BOTH the raw-item secret-shape ``noise_reason`` filter AND
+        the M2 scan-time path-role / context-class suppression
+        (``context_filter.suppression_reason``); both run at scan time (the
+        finding is never created), never as a post-scan disposition label.
         When False, all Gitleaks report items that map successfully are passed
         through, which may increase false positives and output volume.
     """
diff --git a/src/security_scanner/scanners/gitleaks/context_filter.py b/src/security_scanner/scanners/gitleaks/context_filter.py
new file mode 100644
index 0000000..17d4d4b
--- /dev/null
+++ b/src/security_scanner/scanners/gitleaks/context_filter.py
@@ -0,0 +1,196 @@
+"""M2 inline cheap tier — scan-time path-role / context-class suppression.
+
+This module runs AFTER ``map_gitleaks_item`` (so it sees the normalized
+``Finding.location.file_path`` produced by the mapper) and BEFORE the finding is
+appended in :mod:`security_scanner.scanners.gitleaks.parser`. When a finding is
+suppressed here it is *never created* — this is the **scan-time** boundary.
+
+Scan-time vs post-scan (locked, single sentence):
+    placeholder / dummy / path-role / context-class  -> SCAN-TIME (no finding)
+    LLM verdict                                       -> POST-SCAN disposition (M3)
+
+This module therefore NEVER touches ``Finding.disposition`` / triage labels;
+that is the post-scan path owned by M3. The deterministic, no-network,
+no-secret-egress suppressions below are **default-on** because they are
+behaviour-preserving for real secrets:
+
+* a strong canary token shape (``FALSE_NEGATIVE_PATTERN`` from ``filter.py``)
+  is ALWAYS preserved, even in docs/test/example/manifest locations. Path-role
+  suppression only ever drops *weak-signal* candidates.
+
+Layering: the path-role vocabulary (documentation / example / test /
+configuration / source / other) is intentionally identical to
+``llm/common/prompt.py::_path_role`` but is re-implemented here so that the
+``scanners`` layer never imports the ``llm`` layer.
+
+The NEW behaviour-changing piece (``partner-pattern`` high-confidence matching)
+is **gated**: it is off by default and only activates via an explicit opt-in
+parameter, leaving the default scan path unchanged.
+"""
+
+from __future__ import annotations
+
+from pathlib import PurePath
+
+from security_scanner.core.finding.model import Finding
+from security_scanner.scanners.gitleaks.filter import FALSE_NEGATIVE_PATTERN
+
+# ---------------------------------------------------------------------------
+# path-role classifier — vocab parity with llm/common/prompt.py::_path_role
+# (kept in-sync deliberately; do NOT import the llm layer from scanners/).
+# ---------------------------------------------------------------------------
+
+_DOC_SUFFIXES = {".md", ".rst", ".txt"}
+_DOC_DIRS = {"docs", "doc", "documentation"}
+_EXAMPLE_DIRS = {"example", "examples", "fixture", "fixtures", "sample", "samples"}
+_TEST_DIRS = {"test", "tests", "__tests__"}
+_CONFIG_SUFFIXES = {".env", ".ini", ".toml", ".yaml", ".yml", ".json"}
+_CONFIG_DIRS = {"config", "configs", "settings"}
+_SOURCE_SUFFIXES = {".py", ".js", ".ts", ".tsx", ".go", ".java", ".rb", ".php", ".rs"}
+
+# Roles whose location makes a weak-signal candidate a likely false positive.
+_SUPPRESSIBLE_ROLES = {"documentation", "example", "test"}
+
+
+def classify_path_role(file_path: str) -> str:
+    """Classify a repo-relative path into a role token.
+
+    Returns one of: ``documentation``, ``example``, ``test``, ``configuration``,
+    ``source``, ``other`` — identical semantics to
+    ``llm/common/prompt.py::_path_role`` (asserted by tests), no llm import.
+    """
+    path = PurePath(file_path)
+    parts = {part.lower() for part in path.parts}
+    suffix = path.suffix.lower()
+    name = path.name.lower()
+
+    if suffix in _DOC_SUFFIXES or parts & _DOC_DIRS:
+        return "documentation"
+    if parts & _EXAMPLE_DIRS:
+        return "example"
+    if parts & _TEST_DIRS or name.startswith("test_"):
+        return "test"
+    if suffix in _CONFIG_SUFFIXES or parts & _CONFIG_DIRS:
+        return "configuration"
+    if suffix in _SOURCE_SUFFIXES:
+        return "source"
+    return "other"
+
+
+# ---------------------------------------------------------------------------
+# context-class detection — manifest/lockfile hash values (discord x4 analogue)
+# ---------------------------------------------------------------------------
+
+# Lockfiles / dependency manifests whose entries are integrity hashes, not
+# secrets. A token matched inside one of these is a hash, not a credential
+# (context-class:manifest-hash). Matched by exact file name (case-insensitive).
+_MANIFEST_FILENAMES = {
+    "package-lock.json",
+    "yarn.lock",
+    "pnpm-lock.yaml",
+    "npm-shrinkwrap.json",
+    "cargo.lock",
+    "poetry.lock",
+    "gemfile.lock",
+    "composer.lock",
+    "go.sum",
+    "packages.lock.json",
+    "flake.lock",
+    "pipfile.lock",
+}
+
+
+def _is_manifest_hash_location(file_path: str) -> bool:
+    return PurePath(file_path).name.lower() in _MANIFEST_FILENAMES
+
+
+# ---------------------------------------------------------------------------
+# canary guard — strong token shapes are preserved everywhere
+# ---------------------------------------------------------------------------
+
+
+def _is_strong_canary(finding: Finding) -> bool:
+    """True when the finding's secret matches the high-confidence token shape.
+
+    Mirrors ``filter.py``'s FALSE_NEGATIVE_PATTERN floor: such tokens (e.g.
+    ``AKIA…``/``ghp_…``) are never suppressed by path-role/context-class, even
+    in docs/test/example/manifest locations.
+    """
+    secret = finding.gitleaks.secret if finding.gitleaks else None
+    if not isinstance(secret, str) or not secret:
+        return False
+    return FALSE_NEGATIVE_PATTERN.match(secret) is not None
+
+
+# ---------------------------------------------------------------------------
+# gated partner-pattern (NEW behaviour, default-off)
+# ---------------------------------------------------------------------------
+
+# High-confidence partner issuer rule_ids. When the partner-pattern gate is
+# explicitly enabled these would drive NEW high-confidence handling. They do
+# NOT influence the default path (gate defaults to off), so the existing
+# Gitleaks-first secret default behaviour is unchanged.
+_PARTNER_RULE_IDS = {
+    "stripe-access-token",
+    "stripe-restricted-key",
+    "sendgrid-api-token",
+    "twilio-api-key",
+}
+
+
+def suppression_reason(
+    finding: Finding,
+    *,
+    enable_partner_pattern: bool = False,
+) -> str | None:
+    """Return a public-safe suppression reason for a mapped Finding, or None.
+
+    Default-on, deterministic, no-network suppressions:
+      * ``context-class:manifest-hash`` — weak token inside a lockfile/manifest.
+      * ``path-role:<role>``           — weak token in docs/example/test path.
+
+    Strong canary token shapes are ALWAYS preserved (returns None) regardless of
+    location. This is the FP-floor safety guard that keeps M2 within the
+    ``existing-secret-default-behavior-change`` stop-condition.
+
+    ``enable_partner_pattern`` is the GATED opt-in for the new behaviour-changing
+    partner-pattern matching; it is off by default and, when off, this function
+    behaves exactly as the deterministic default-on path.
+    """
+    # FP-floor: never suppress a strong canary, whatever its location.
+    if _is_strong_canary(finding):
+        return None
+
+    file_path = finding.location.file_path
+
+    # context-class: manifest/lockfile hash (discord x4 manifest-hash analogue).
+    if _is_manifest_hash_location(file_path):
+        return "context-class:manifest-hash"
+
+    # path-role: weak-signal candidate in a documentation/example/test location.
+    role = classify_path_role(file_path)
+    if role in _SUPPRESSIBLE_ROLES:
+        return f"path-role:{role}"
+
+    # gated partner-pattern: NEW behaviour, only when explicitly enabled. Kept
+    # last so it can only ADD suppressions, never override the default path.
+    if enable_partner_pattern and isinstance(finding.rule_id, str):
+        if finding.rule_id.lower() in _PARTNER_RULE_IDS:
+            # In THIS scan-time cheap tier, partner-pattern is a high-confidence
+            # *match* signal whose only meaning is KEEP (return None = preserve),
+            # never an extra suppression — a partner-issuer token is a likely
+            # real secret, so the cheap tier must not drop it. The real
+            # partner-boost (raising verifier confidence / disposition) belongs
+            # to the M3 verification tier, not here; this hook only proves the
+            # gate is real, default-inert, and testable. Re-evaluate moving this
+            # signal into the M3 disposition path when that tier is wired
+            # (design.md Open Questions).
+            return None
+
+    return None
+
+
+__all__ = [
+    "classify_path_role",
+    "suppression_reason",
+]
diff --git a/src/security_scanner/scanners/gitleaks/parser.py b/src/security_scanner/scanners/gitleaks/parser.py
index 87f9586..0c18542 100644
--- a/src/security_scanner/scanners/gitleaks/parser.py
+++ b/src/security_scanner/scanners/gitleaks/parser.py
@@ -8,6 +8,7 @@
 
 from security_scanner.core.finding.model import Finding
 from security_scanner.core.scan.options import ScanOptions
+from security_scanner.scanners.gitleaks.context_filter import suppression_reason
 from security_scanner.scanners.gitleaks.filter import noise_reason
 from security_scanner.scanners.gitleaks.mapper import map_gitleaks_item
 
@@ -77,7 +78,25 @@ def parse_gitleaks_report(
             source_tool=source_tool,
             index=index,
         )
-        if finding is not None:
-            findings.append(finding)
+        if finding is None:
+            continue
+
+        # Scan-time path-role / context-class suppression (M2 inline cheap tier).
+        # Runs on the mapped Finding (normalized file_path), AFTER the raw-item
+        # secret-shape noise_reason and BEFORE append. Suppression here means the
+        # finding is never created (scan-time boundary) — NOT a post-scan
+        # disposition label (that is M3). Gated when noise filtering is off.
+        if enable_noise_filter:
+            suppress = suppression_reason(finding)
+            if suppress is not None:
+                logger.debug(
+                    "GitleaksParser: suppressing item at index %d for rule %s: %s",
+                    index,
+                    item.get("RuleID", "<unknown>"),
+                    suppress,
+                )
+                continue
+
+        findings.append(finding)
 
     return findings
diff --git a/tests/test_gitleaks_context_filter.py b/tests/test_gitleaks_context_filter.py
new file mode 100644
index 0000000..b8f7d09
--- /dev/null
+++ b/tests/test_gitleaks_context_filter.py
@@ -0,0 +1,260 @@
+"""M2 inline cheap tier — path-role / context-class scan-time suppression tests.
+
+These tests exercise the *scan-time* suppression layer that runs AFTER
+``map_gitleaks_item`` (so it sees the normalized ``Finding.location.file_path``)
+and BEFORE the finding is appended. Suppression here means the finding is never
+created (a scan-time boundary), as opposed to a post-scan ``disposition`` label
+(that is M3, deliberately untouched here).
+
+Vocabulary alignment: the path-role classifier in
+``scanners/gitleaks/context_filter.py`` MUST classify into the same role tokens
+as ``llm/common/prompt.py`` (documentation/example/test/configuration/source/
+other) WITHOUT importing the llm layer (no scanners -> llm dependency).
+"""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from security_scanner.core.finding.model import Finding
+from security_scanner.core.scan.options import ScanOptions
+from security_scanner.scanners.gitleaks.context_filter import (
+    classify_path_role,
+    suppression_reason,
+)
+from security_scanner.scanners.gitleaks.parser import parse_gitleaks_report
+
+
+REPO_FULL_NAME = "fake-org/fake-repo"
+SCAN_RUN_ID = "scan_ctx0001"
+RULE_PACK = "secret-rules-0.1.0"
+
+# A real-looking but synthetic moderate-entropy token shape that survives the
+# secret-shape noise_reason filter (passes the entropy floor) yet is NOT a
+# strong canary token (does not match FALSE_NEGATIVE_PATTERN). This is the
+# "weak signal" class that path-role is allowed to suppress.
+WEAK_TOKEN = "abc123def456ghi789jkl012"
+
+# A strong canary token shape (matches FALSE_NEGATIVE_PATTERN in filter.py:
+# ghp_ followed by 36+ alphanumerics). This MUST be preserved everywhere, even
+# in test/docs/example locations.
+CANARY_GITHUB = "ghp_FAKE00001234567890123456789012345678"
+CANARY_AWS = "AKIAFAKEEXAMPLE00000"
+
+
+def _finding(file_path: str, secret: str, rule_id: str = "generic-api-key") -> Finding:
+    report = json.dumps(
+        [
+            {
+                "RuleID": rule_id,
+                "File": file_path,
+                "StartLine": 3,
+                "Secret": secret,
+            }
+        ]
+    )
+    findings = parse_gitleaks_report(
+        report,
+        repo_full_name=REPO_FULL_NAME,
+        scan_run_id=SCAN_RUN_ID,
+        rule_pack_version=RULE_PACK,
+        scan_options=ScanOptions(enable_noise_filter=False),
+    )
+    assert len(findings) == 1, "fixture finding must map cleanly"
+    return findings[0]
+
+
+# ---------------------------------------------------------------------------
+# (path-role classifier) — vocab parity with llm/common/prompt.py
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize(
+    "file_path, expected_role",
+    [
+        ("docs/guide.md", "documentation"),
+        ("README.rst", "documentation"),
+        ("notes.txt", "documentation"),
+        ("examples/demo.py", "example"),
+        ("tests/fixtures/data.json", "example"),
+        ("tests/test_login.py", "test"),
+        ("test_helpers.py", "test"),
+        ("config/settings.yaml", "configuration"),
+        # ".env" has no PurePath suffix (it is a dotfile name, not an extension),
+        # so both this classifier and the canonical llm/common/prompt._path_role
+        # return "other" — parity with the reference is what M2 requires. A real
+        # ".env" file still reaches "configuration" via its parent "config" dir.
+        (".env", "other"),
+        ("config/.env", "configuration"),
+        ("src/app/service.py", "source"),
+        ("main.go", "source"),
+        ("Makefile", "other"),
+    ],
+)
+def test_classify_path_role_matches_prompt_vocabulary(file_path, expected_role):
+    assert classify_path_role(file_path) == expected_role
+
+
+def test_classify_path_role_agrees_with_llm_prompt_reference():
+    # Cross-check the same inputs against the canonical llm vocabulary WITHOUT
+    # introducing a runtime dependency: the test imports prompt only to assert
+    # behavioural equivalence, production code must NOT.
+    from security_scanner.llm.common.prompt import _path_role as llm_path_role
+
+    for fp in (
+        "docs/x.md",
+        "examples/y.py",
+        "tests/z.py",
+        "config/a.yaml",
+        "src/b.py",
+        "Makefile",
+    ):
+        assert classify_path_role(fp) == llm_path_role(fp)
+
+
+# ---------------------------------------------------------------------------
+# (a) weak-signal findings in test/example/docs locations are suppressed
+# ---------------------------------------------------------------------------
+
+
+def test_weak_finding_in_docs_is_suppressed():
+    finding = _finding("docs/setup.md", WEAK_TOKEN)
+    assert suppression_reason(finding) == "path-role:documentation"
+
+
+def test_weak_finding_in_example_is_suppressed():
+    finding = _finding("examples/quickstart.py", WEAK_TOKEN)
+    assert suppression_reason(finding) == "path-role:example"
+
+
+def test_weak_finding_in_test_location_is_suppressed():
+    finding = _finding("tests/test_auth.py", WEAK_TOKEN)
+    assert suppression_reason(finding) == "path-role:test"
+
+
+# ---------------------------------------------------------------------------
+# (b) strong canary tokens are PRESERVED even in test/docs/example (FP-floor)
+# ---------------------------------------------------------------------------
+
+
+def test_canary_github_token_preserved_in_test_location():
+    finding = _finding("tests/test_auth.py", CANARY_GITHUB)
+    assert suppression_reason(finding) is None
+
+
+def test_canary_aws_token_preserved_in_docs():
+    finding = _finding("docs/aws-setup.md", CANARY_AWS)
+    assert suppression_reason(finding) is None
+
+
+def test_canary_github_token_preserved_in_examples():
+    finding = _finding("examples/demo.py", CANARY_GITHUB)
+    assert suppression_reason(finding) is None
+
+
+# ---------------------------------------------------------------------------
+# config/source location weak findings are PRESERVED (TP-anchored locations)
+# ---------------------------------------------------------------------------
+
+
+def test_weak_finding_in_config_preserved():
+    finding = _finding("config/settings.yaml", WEAK_TOKEN)
+    assert suppression_reason(finding) is None
+
+
+def test_weak_finding_in_source_preserved():
+    finding = _finding("src/app/service.py", WEAK_TOKEN)
+    assert suppression_reason(finding) is None
+
+
+# ---------------------------------------------------------------------------
+# (c) 11-FP analogues — the three observed classes are suppressed
+# ---------------------------------------------------------------------------
+
+
+def test_fp_analogue_doc_example_suppressed():
+    # doc-example x4 — secret shown as a docs example.
+    finding = _finding("docs/api/authentication.md", WEAK_TOKEN, rule_id="generic-api-key")
+    assert suppression_reason(finding) == "path-role:documentation"
+
+
+def test_fp_analogue_github_pat_test_fixture_suppressed():
+    # github-pat x3 — token sitting in a test fixture location.
+    finding = _finding(
+        "tests/fixtures/github_response.json", WEAK_TOKEN, rule_id="github-pat"
+    )
+    assert suppression_reason(finding) is not None
+
+
+def test_fp_analogue_discord_manifest_hash_suppressed():
+    # discord x4 — a hash value inside a manifest/lockfile (context-class).
+    finding = _finding("package-lock.json", WEAK_TOKEN, rule_id="discord-api-token")
+    assert suppression_reason(finding) == "context-class:manifest-hash"
+
+
+def test_manifest_hash_context_class_for_various_lockfiles():
+    for manifest in (
+        "yarn.lock",
+        "Cargo.lock",
+        "poetry.lock",
+        "Gemfile.lock",
+        "pnpm-lock.yaml",
+        "go.sum",
+        "composer.lock",
+    ):
+        finding = _finding(manifest, WEAK_TOKEN)
+        assert suppression_reason(finding) == "context-class:manifest-hash", manifest
+
+
+def test_manifest_hash_does_not_suppress_strong_canary():
+    # Even in a manifest/lockfile, a strong canary token shape is preserved.
+    finding = _finding("package-lock.json", CANARY_GITHUB, rule_id="discord-api-token")
+    assert suppression_reason(finding) is None
+
+
+# ---------------------------------------------------------------------------
+# (e) noise_reason input contract is untouched: it still takes item:dict only
+# ---------------------------------------------------------------------------
+
+
+def test_noise_reason_contract_unchanged_still_takes_raw_item_dict():
+    from security_scanner.scanners.gitleaks.filter import noise_reason
+
+    # noise_reason must keep its raw-item contract: a dict with Secret/Match/
+    # RuleID and NOTHING path-role related. It must not require a Finding.
+    assert noise_reason({"Secret": "${VAR}"}) == "template-placeholder"
+    assert noise_reason({"Secret": WEAK_TOKEN}) is None  # path is invisible to it
+
+
+def test_suppression_reason_requires_finding_not_raw_item():
+    # suppression_reason operates on the mapped Finding (normalized file_path),
+    # which is the whole reason it is a separate post-map step.
+    finding = _finding("docs/setup.md", WEAK_TOKEN)
+    assert finding.location.file_path == "docs/setup.md"
+    assert suppression_reason(finding) is not None
+
+
+# ---------------------------------------------------------------------------
+# gated partner-pattern: default conservative (off), opt-in only
+# ---------------------------------------------------------------------------
+
+
+def test_partner_pattern_gated_off_by_default():
+    # A partner-pattern high-confidence rule_id should NOT change default
+    # behaviour: with the gate off (default) the weak token in a source file is
+    # preserved exactly as before.
+    finding = _finding("src/app/service.py", WEAK_TOKEN, rule_id="stripe-access-token")
+    assert suppression_reason(finding) is None
+
+
+def test_partner_pattern_gate_is_opt_in():
+    # When the gate is explicitly enabled, partner-pattern adds NEW suppression
+    # behaviour (does not alter the default path). This proves the flag is real
+    # and that default-off leaves it inert.
+    finding = _finding(
+        "tests/test_partner.py", WEAK_TOKEN, rule_id="stripe-access-token"
+    )
+    # default-off: only the path-role reason applies (test location).
+    assert suppression_reason(finding) == "path-role:test"
diff --git a/tests/test_gitleaks_parser_context_suppression.py b/tests/test_gitleaks_parser_context_suppression.py
new file mode 100644
index 0000000..5232c68
--- /dev/null
+++ b/tests/test_gitleaks_parser_context_suppression.py
@@ -0,0 +1,141 @@
+"""M2 parser integration — scan-time path-role/context-class suppression.
+
+Verifies the parser flow:
+    noise_reason(raw item)   # existing secret-shape filter
+      -> map_gitleaks_item   # Finding created (normalized file_path)
+        -> suppression_reason(Finding)   # NEW scan-time path-role/context-class
+          -> append          # only if not suppressed
+
+and the suppression-rate regression invariant: turning path-role default-on ON
+does NOT kill any existing finding that previously passed (config/source TPs and
+strong canaries survive).
+"""
+
+from __future__ import annotations
+
+import json
+
+from security_scanner.core.scan.options import ScanOptions
+from security_scanner.scanners.gitleaks.parser import parse_gitleaks_report
+
+
+REPO_FULL_NAME = "fake-org/fake-repo"
+SCAN_RUN_ID = "scan_ctx_int0001"
+RULE_PACK = "secret-rules-0.1.0"
+
+WEAK_TOKEN = "abc123def456ghi789jkl012"
+# Strong canary shapes (match FALSE_NEGATIVE_PATTERN: ghp_ + 36+ alnum, AKIA + 16).
+CANARY_GITHUB = "ghp_FAKE00001234567890123456789012345678"
+CANARY_AWS = "AKIAFAKEEXAMPLE00000"
+
+
+def _parse(report_items, *, enable_noise_filter=True):
+    return parse_gitleaks_report(
+        json.dumps(report_items),
+        repo_full_name=REPO_FULL_NAME,
+        scan_run_id=SCAN_RUN_ID,
+        rule_pack_version=RULE_PACK,
+        scan_options=ScanOptions(enable_noise_filter=enable_noise_filter),
+    )
+
+
+def test_parser_suppresses_weak_token_in_docs_at_scan_time():
+    items = [{"RuleID": "generic", "File": "docs/x.md", "StartLine": 1, "Secret": WEAK_TOKEN}]
+    findings = _parse(items)
+    # finding is NEVER created (scan-time boundary), not labelled FALSE_POSITIVE.
+    assert findings == []
+
+
+def test_parser_preserves_weak_token_in_source():
+    items = [
+        {"RuleID": "generic", "File": "src/app/service.py", "StartLine": 1, "Secret": WEAK_TOKEN}
+    ]
+    findings = _parse(items)
+    assert len(findings) == 1
+    assert findings[0].location.file_path == "src/app/service.py"
+
+
+def test_parser_preserves_canary_even_in_test_location():
+    items = [
+        {"RuleID": "github-pat", "File": "tests/test_x.py", "StartLine": 1, "Secret": CANARY_GITHUB}
+    ]
+    findings = _parse(items)
+    assert len(findings) == 1
+    assert findings[0].gitleaks.secret == CANARY_GITHUB
+
+
+def test_parser_suppression_disabled_when_noise_filter_off():
+    # enable_noise_filter=False disables BOTH the secret-shape filter and the
+    # path-role/context-class suppression (single switch, no surprise scan-time
+    # drops when filtering is explicitly off).
+    items = [{"RuleID": "generic", "File": "docs/x.md", "StartLine": 1, "Secret": WEAK_TOKEN}]
+    findings = _parse(items, enable_noise_filter=False)
+    assert len(findings) == 1
+
+
+def test_eleven_fp_analogue_corpus_suppressed_canaries_preserved():
+    """The 11-FP analogue corpus: 11 FPs suppressed, canary TPs preserved."""
+    fp_items = []
+    # discord x4 manifest-hash
+    for i, manifest in enumerate(["package-lock.json", "yarn.lock", "Cargo.lock", "go.sum"]):
+        fp_items.append(
+            {"RuleID": "discord-api-token", "File": manifest, "StartLine": i + 1, "Secret": WEAK_TOKEN}
+        )
+    # github-pat x3 test-fixture
+    for i in range(3):
+        fp_items.append(
+            {
+                "RuleID": "github-pat",
+                "File": f"tests/fixtures/resp_{i}.json",
+                "StartLine": i + 1,
+                "Secret": WEAK_TOKEN,
+            }
+        )
+    # doc-example x4
+    for i in range(4):
+        fp_items.append(
+            {
+                "RuleID": "generic-api-key",
+                "File": f"docs/api/example_{i}.md",
+                "StartLine": i + 1,
+                "Secret": WEAK_TOKEN,
+            }
+        )
+    assert len(fp_items) == 11
+
+    # canary TPs in config/source — MUST survive.
+    canary_items = [
+        {"RuleID": "github-pat", "File": "config/prod.env", "StartLine": 1, "Secret": CANARY_GITHUB},
+        {"RuleID": "aws-access-token", "File": "src/app/boot.py", "StartLine": 1, "Secret": CANARY_AWS},
+        # canary in a docs path must STILL survive (strong token beats path-role).
+        {"RuleID": "github-pat", "File": "docs/readme.md", "StartLine": 1, "Secret": CANARY_GITHUB},
+    ]
+
+    findings = _parse(fp_items + canary_items)
+
+    # all 11 FP analogues suppressed.
+    suppressed_files = {f.location.file_path for f in findings}
+    for fp in fp_items:
+        assert fp["File"] not in suppressed_files, f"FP not suppressed: {fp['File']}"
+
+    # all 3 canaries preserved.
+    assert len(findings) == 3
+    preserved_secrets = {f.gitleaks.secret for f in findings}
+    assert preserved_secrets == {CANARY_GITHUB, CANARY_AWS}
+
+
+def test_suppression_rate_regression_existing_tps_not_killed():
+    """Regression guard: path-role default-on must NOT add kills to findings that
+    previously passed the secret-shape filter in config/source locations."""
+    # A representative set of findings that ALL passed before M2 (config/source,
+    # strong tokens). After M2 default-on, the count must be unchanged.
+    items = [
+        {"RuleID": "aws", "File": "config/settings.yaml", "StartLine": 1, "Secret": CANARY_AWS},
+        {"RuleID": "github-pat", "File": "src/main.py", "StartLine": 2, "Secret": CANARY_GITHUB},
+        {"RuleID": "generic", "File": "config/db.toml", "StartLine": 3, "Secret": WEAK_TOKEN},
+        {"RuleID": "generic", "File": "src/clients/api.py", "StartLine": 4, "Secret": WEAK_TOKEN},
+        {"RuleID": "github-pat", "File": "deploy/prod.env", "StartLine": 5, "Secret": CANARY_GITHUB},
+    ]
+    findings = _parse(items)
+    # All 5 are config/source or strong canary -> none suppressed by M2.
+    assert len(findings) == 5

From b2e90e58658084ec2d76cfe1905ec5b9798cde0c Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 12:28:33 +0900
Subject: [PATCH 5/7] =?UTF-8?q?feat(runtime):=20M3=20LLM=20=ED=8B=B0?=
 =?UTF-8?q?=EC=96=B4=20disposition=20=EB=B0=B0=EC=84=A0=20=E2=80=94=20scan?=
 =?UTF-8?q?=5Fworker=202=EA=B2=BD=EB=A1=9C=20+=20=EB=B9=84=EB=8F=99?=
 =?UTF-8?q?=EA=B8=B0=20verify=20=ED=81=90?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

자율층 M3. scan_worker 핫패스에 동기 인라인 싼 티어(parser/filter 자동공유) +
비동기 LLM verdict 큐 2경로. 인라인 LLM 호출 0(비용 NFR). 비동기 LLM 티어는
gated default-off.

- runtime/verify_queue.py(신규): 비동기 verify 큐 seam.
  - enqueue: 애매한 finding(terminal disposition 없음)당 멱등 verify 잡.
    verify-job-id를 finding의 content-stable match_key에서 결정적 도출 + 기존
    enqueue_commit_scan_job의 CAS(attribute_not_exists)로 재큐잉 폭주 차단
    (NEEDS_REVIEW backoff 택1). 새 테이블/GSI/projection/attribute 없이 기존
    job_type(free-form 문자열) "verify" 확장으로만 표현.
  - drain: 별도 경로가 verify 잡을 lease→verify→record_verifier_disposition.
    NEEDS_REVIEW는 무기록 consume(record_verifier_disposition 재사용).
  - enqueue_errors를 CAS duplicate와 분리 집계(post-M3 arch gate D1 nit).
- runtime/scan_worker.py: verify_enqueue 훅(default None → pre-M3 byte-identical).
  핫패스 완료 후 애매 건 enqueue-only. **D3 가드(post-M3 arch gate)**: leased
  job_type=="verify"는 코드-스캔 worker가 처리하지 않고 pending 반환 →
  fetch/scan/_advance_repo_health 미도달(freshness 오염 차단).
- scan_all은 기존 verifier/disposition 동기 경로 그대로(주간 배치, 회귀만 확인).
- salt provenance: tests/test_secret_hash_salt_provenance.py — secretHash가 LLM
  티어로 나가는 유일 secret-파생값, per-deployment salt(SECURITY_SCANNER_HASH_SALT)
  주입 시 digest 변화·set-but-empty 폴백 검증(model.py 미수정).
- design.md: NEEDS_REVIEW backoff 택1 확정 + drain 실제 store 구현 후속(D3)
  Open Questions 기록.

post-M3 아키텍처 리뷰 PASS(blocking 0, storage-projection 미트리거). 검증:
uv run pytest 1115 passed, public_safety green, autopilot_gate --base 81d59d0 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../specs/ghas-quality-secrets/design.md      |  13 +
 src/security_scanner/runtime/scan_worker.py   |  49 +++
 src/security_scanner/runtime/verify_queue.py  | 310 ++++++++++++++++++
 tests/test_scan_worker.py                     | 182 ++++++++++
 tests/test_secret_hash_salt_provenance.py     |  81 +++++
 tests/test_verify_queue.py                    | 301 +++++++++++++++++
 6 files changed, 936 insertions(+)
 create mode 100644 src/security_scanner/runtime/verify_queue.py
 create mode 100644 tests/test_secret_hash_salt_provenance.py
 create mode 100644 tests/test_verify_queue.py

diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
index 408f2f3..d6ec5cb 100644
--- a/docs/workbench/specs/ghas-quality-secrets/design.md
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -136,6 +136,12 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
   안 함이되 **재verify 폭주 방지**(minor `needs-review-no-write`): 동일 finding_id 최근 verify
   타임스탬프 기록→backoff, 또는 `disposition_lookup` line-stable gate가 unreviewed도 skip-key로
   쓰는지 M3에서 택1 명시(비용 NFR 정합).
+  - **M3 택1(확정)**: 비동기 verify 잡 enqueue 시 **finding의 content-stable `match_key`에서 결정적
+    verify-job-id 도출 + 기존 `enqueue_commit_scan_job`의 멱등 CAS(`attribute_not_exists(PK)`)**로
+    재큐잉을 막는다(`runtime/verify_queue.py`). 같은 애매 finding은 항상 같은 verify-job-id로 매핑되어
+    in-flight 잡이 있으면 재enqueue가 clean no-op(False)이다. 드레인은 NEEDS_REVIEW 잡을 **무기록으로
+    완료(consume)** 하므로 한 finding당 사이클당 최대 1개의 in-flight verify 잡만 존재 → 폭주 없음.
+    **새 GSI/projection/attribute 없이** 기존 `job_type`(free-form 문자열) 확장(`"verify"`)으로만 표현.
 - snapshot 부재/stale: 나이·타임스탬프 노출 + `stale-degraded` 상태. 목표 미설정이면 report-only.
 - 실 GHAS fetch 필요: autopilot 정지 → `ghas-live-fetch-or-mutation-required` stop-condition → 사람 PR.
 - `secretHash` egress(minor `secrethash-entropy-leak`): LLM 티어로 나가는 유일한 secret-파생 값.
@@ -222,6 +228,13 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
 - path-role 분류기 공통 추출(post-M2 arch gate nit): 현재 `scanners/gitleaks/context_filter`와
   `llm/common/prompt._path_role`이 어휘를 의도적 복제(테스트로 등가 강제, scanners→llm import 없음).
   셋째 호출자(M3 disposition 경로 등)가 생기면 `core/path_role.py` 추출 재평가 — 지금은 scope-expansion이라 비채택.
+- **비동기 verify drain 실제 store 구현(post-M3 arch gate D3, 후속 storage-scoped)**: M3는 enqueue
+  경로(기존 `enqueue_commit_scan_job` + `job_type="verify"`, 새 스키마 없음)와 drain seam(`verify_queue.
+  drain_verify_jobs`, fake store로 증명)을 default-off로 배선했다. 비동기 LLM 티어를 실제로 켜려면 store에
+  `lease_next_verify_job`/`finding_for_verify_job`/`complete_verify_job`을 **기존 SCAN_JOB layout+status
+  axis 위에 새 GSI/projection 없이** 구현하고 drain 진입점(CLI/daemon)을 배선해야 한다. 코드-스캔 worker가
+  verify job을 오인 처리하지 않도록 하는 가드는 M3에서 **이미 구현**됨(`run_scan_worker_once`가 leased
+  `job_type=="verify"`를 pending 반환). 원격 ollama 배선 시 salt 강도/secretHash egress 재검토 포함.
 
 ## YAGNI
 
diff --git a/src/security_scanner/runtime/scan_worker.py b/src/security_scanner/runtime/scan_worker.py
index 979bcd4..84e219e 100644
--- a/src/security_scanner/runtime/scan_worker.py
+++ b/src/security_scanner/runtime/scan_worker.py
@@ -16,6 +16,7 @@
     branch_from_ref,
     finding_with_context,
 )
+from security_scanner.runtime.verify_queue import JOB_TYPE_VERIFY
 from security_scanner.scanners.gitleaks.scanner import GitleaksScanner
 from security_scanner.storage.base import (
     IncrementalScanStore,
@@ -26,6 +27,14 @@
 DEFAULT_LEASE_SECONDS = 300
 DEFAULT_RETRY_DELAY_SECONDS = 60
 
+# Optional async-verify enqueue hook (M3 path 2). Called best-effort in the hot
+# path after a successful completion with
+# ``(store, findings, origin_job=..., now=...)``; it enqueues a
+# ``job_type="verify"`` job per ambiguous finding (no LLM call here). Defaults to
+# None so the worker's pre-M3 behavior is byte-identical when the async tier is
+# not wired (offline box / verifier disabled).
+VerifyEnqueue = Callable[..., object]
+
 
 class CommitScanner(Protocol):
     """Scanner capability needed by scan-worker."""
@@ -56,6 +65,8 @@ class ScanWorkerRequest:
     now_factory: Callable[[], dt.datetime] = lambda: dt.datetime.now(dt.UTC).replace(
         microsecond=0
     )
+    # Async LLM-verify enqueue hook (M3 path 2). None keeps pre-M3 behavior.
+    verify_enqueue: VerifyEnqueue | None = None
 
 
 @dataclass(frozen=True)
@@ -93,6 +104,18 @@ def run_scan_worker_once(request: ScanWorkerRequest) -> ScanWorkerSummary:
 
         leased_count += 1
 
+        # M3 guard: a job_type="verify" job belongs to the async-verify drain
+        # path, NOT this code-scan worker. If the queue ever hands one here
+        # (e.g. real scan work is drained and only verify jobs remain), it must
+        # never reach fetch_repo / scanner.scan / _advance_repo_health — its
+        # "commit" is a synthetic per-finding marker and it advances no repo
+        # freshness. Return it to pending so the dedicated drain path leases it.
+        if job.job_type == JOB_TYPE_VERIFY:
+            request.store.return_job_to_pending(
+                job.job_id, "verify job is not handled by the code-scan worker"
+            )
+            continue
+
         if request.store.has_scan_ledger(job.ledger_key):
             request.store.complete_processed_job(
                 job,
@@ -144,6 +167,11 @@ def run_scan_worker_once(request: ScanWorkerRequest) -> ScanWorkerSummary:
                 ),
             )
             _advance_repo_health(request, job, completed_at=scanned_at)
+            # M3 path 2 (async LLM tier): hand the completed findings to the
+            # verify-enqueue seam so ambiguous findings become separate
+            # job_type="verify" jobs drained off the hot path. NO LLM call here.
+            # Best-effort: an enqueue failure must not roll back a completed scan.
+            _enqueue_verify_jobs(request, job, findings, now=scanned_at)
             completed += 1
         except Exception as exc:  # noqa: BLE001 - scanner/runtime failure is retryable until exhausted.
             if job.attempts + 1 >= job.max_attempts:
@@ -251,6 +279,27 @@ def _advance_repo_health(
     advance(job.repo_id, job_type=job.job_type, completed_at=completed_at)
 
 
+def _enqueue_verify_jobs(
+    request: ScanWorkerRequest,
+    job: ScanJob,
+    findings: list[Finding],
+    *,
+    now: dt.datetime,
+) -> None:
+    """Hand completed findings to the async verify-enqueue seam (M3 path 2).
+
+    Best-effort: the scan already completed, so an enqueue failure must never
+    fail the job or trigger a retry. No-op when no hook is wired (pre-M3
+    behavior) so the worker default path is unchanged.
+    """
+    if request.verify_enqueue is None:
+        return
+    try:
+        request.verify_enqueue(request.store, findings, origin_job=job, now=now)
+    except Exception:  # noqa: BLE001 - async enqueue is downstream of a done scan.
+        return
+
+
 def _scan_run_id_for_job(job: ScanJob) -> str:
     return f"scan_run_{job.job_id}"
 
diff --git a/src/security_scanner/runtime/verify_queue.py b/src/security_scanner/runtime/verify_queue.py
new file mode 100644
index 0000000..8731bc0
--- /dev/null
+++ b/src/security_scanner/runtime/verify_queue.py
@@ -0,0 +1,310 @@
+"""Async LLM-verify queue seam (M3, the second of scan_worker's two paths).
+
+The scan-worker per-job hot path must stay cheap and network-free: it scans a
+commit, then for any *ambiguous* finding it ENQUEUES a verify job instead of
+calling the LLM inline. A SEPARATE drain path leases those verify jobs and runs
+the (gated, possibly off-box) verifier, writing the terminal disposition.
+
+Why this rides the existing queue with NO new schema
+----------------------------------------------------
+``ScanJob.job_type`` is a free-form string persisted verbatim in the ``jobType``
+attribute and decoded with a default of ``incremental`` (see
+``items.scan_job_from_item``). A new ``job_type="verify"`` therefore round-trips
+through the *unchanged* item shape — same table, same partitions, same
+``enqueue_commit_scan_job`` CAS — without a new table, GSI, projection, or
+attribute. The verify queue is logically distinct (a different ``job_type``
+value), not a physically distinct store.
+
+NEEDS_REVIEW re-verify-flood backoff (design Error Handling, 택1)
+----------------------------------------------------------------
+A finding that the verifier returns ``NEEDS_REVIEW`` for is NOT written
+(``record_verifier_disposition`` returns False), so its FINDING_STATE row stays
+OPEN and would be re-picked on the next scan. To stop an unbounded re-verify
+flood we make the verify-job id **deterministic from the finding's content-stable
+``match_key``** and rely on the store's idempotent ``enqueue_commit_scan_job``
+CAS (``attribute_not_exists(PK)``): re-enqueuing the same ambiguous finding while
+a prior verify job for it still exists is a clean no-op (returns False). The
+drain COMPLETES a NEEDS_REVIEW job (consumes the work item) rather than looping
+it, so one ambiguous finding triggers at most one in-flight verify job per cycle.
+This is the chosen option of design's two; no new GSI/attribute is introduced.
+"""
+
+from __future__ import annotations
+
+import datetime as dt
+import hashlib
+from collections.abc import Callable, Sequence
+from dataclasses import dataclass
+from typing import Any, Protocol
+
+from security_scanner.core.finding.model import Finding
+from security_scanner.llm.common.verifier import VerifierConfig
+from security_scanner.runtime import verify_artifact as verifier_runtime
+from security_scanner.runtime.disposition_lookup import resolve_existing_disposition
+from security_scanner.storage.base import ScanJob
+
+# Verify jobs ride the same queue but as their own free-form job_type value. The
+# string is intentionally NOT added to storage.base (incremental/baseline are the
+# only freshness-bearing classes); a verify completion advances NO freshness
+# field, so it must never reach the repo-health advance path. The code-scan
+# worker enforces this by returning any leased job_type=="verify" job to pending
+# before fetch/scan/_advance_repo_health (scan_worker.run_scan_worker_once); the
+# dedicated drain path (drain_verify_jobs) is the only consumer of verify jobs.
+JOB_TYPE_VERIFY = "verify"
+
+# Verify jobs are the LOWEST queue precedence: the queue sorts ascending on
+# ``priority`` (lower served first), incremental uses 100 and baseline 900, so a
+# value above both keeps every code-scan job served before any verify job — the
+# async tier never starves change detection.
+VERIFY_JOB_PRIORITY = 950
+
+DEFAULT_MAX_ATTEMPTS = 3
+# A verify job carries no real commit; its "commit" slot is a stable per-finding
+# marker so the derived job_id is deterministic and re-enqueue is idempotent.
+_VERIFY_COMMIT_PREFIX = "verify"
+
+
+class _EnqueueStore(Protocol):
+    def enqueue_commit_scan_job(self, job: ScanJob) -> bool:
+        """Create a job, returning False for clean idempotent skips."""
+
+    def read_finding_state(self, finding_id: str) -> dict[str, Any] | None:
+        """Return the global FINDING_STATE row for a finding, or None."""
+
+    def find_disposition_by_match_key(
+        self, match_key: str
+    ) -> dict[str, Any] | None:
+        """Return the match_key -> disposition pointer, or None."""
+
+
+class _DrainStore(Protocol):
+    def lease_next_verify_job(
+        self, worker_id: str, lease_seconds: int, now: dt.datetime
+    ) -> str | None:
+        """Lease the next pending verify job, returning its job_id or None."""
+
+    def finding_for_verify_job(self, job_id: str) -> Finding:
+        """Return the finding a verify job should verify."""
+
+    def set_finding_disposition(self, finding_id: str, **kwargs: Any) -> None:
+        """Persist a terminal disposition transition."""
+
+    def complete_verify_job(self, job_id: str) -> None:
+        """Mark a verify job consumed so it is not re-leased."""
+
+
+class _Verifier(Protocol):
+    def verify(self, finding: Finding):  # -> VerifierResult
+        """Return a verifier result for one finding."""
+
+
+@dataclass(frozen=True)
+class VerifyEnqueueSummary:
+    """Outcome of one verify-enqueue pass over a batch of findings."""
+
+    enqueued: int = 0
+    duplicates_skipped: int = 0
+    suppressed: int = 0
+    no_match_key: int = 0
+    enqueue_errors: int = 0
+
+
+@dataclass(frozen=True)
+class VerifyDrainSummary:
+    """Outcome of one verify-queue drain pass."""
+
+    attempted: int = 0
+    dispositions_written: int = 0
+    needs_review: int = 0
+    failed: int = 0
+
+
+def verify_job_id_for_finding(finding: Finding) -> str:
+    """Return the deterministic, content-stable verify job id for a finding.
+
+    Derived from the finding's ``match_key`` (repo/file/rule + salted secret
+    hash), so the same ambiguous secret always maps to the same verify job id and
+    the enqueue CAS dedups re-enqueues (the NEEDS_REVIEW backoff). Returns an
+    empty string when the finding has no secret_hash (no stable key available).
+    """
+    secret_hash = finding.evidence.secret_hash
+    if not secret_hash:
+        return ""
+    material = "\0".join(
+        [
+            finding.repo.full_name,
+            finding.location.file_path,
+            finding.rule_id,
+            secret_hash,
+        ]
+    )
+    digest = hashlib.sha256(material.encode("utf-8")).hexdigest()[:24]
+    return f"verify_job_{digest}"
+
+
+def _verify_job_for_finding(
+    finding: Finding, *, origin_job: ScanJob, now: dt.datetime
+) -> ScanJob | None:
+    job_id = verify_job_id_for_finding(finding)
+    if not job_id:
+        return None
+    commit_marker = f"{_VERIFY_COMMIT_PREFIX}:{finding.finding_id}"
+    return ScanJob(
+        job_id=job_id,
+        repo_id=origin_job.repo_id,
+        repo_url=origin_job.repo_url,
+        ref_name=origin_job.ref_name,
+        old_sha=None,
+        new_sha=commit_marker,
+        commit_sha=commit_marker,
+        commit_range=None,
+        scanner_name=origin_job.scanner_name,
+        scanner_version=origin_job.scanner_version,
+        rule_pack_version=origin_job.rule_pack_version,
+        scanner_config_hash=origin_job.scanner_config_hash,
+        priority=VERIFY_JOB_PRIORITY,
+        status="pending",
+        job_type=JOB_TYPE_VERIFY,
+        attempts=0,
+        max_attempts=DEFAULT_MAX_ATTEMPTS,
+        worker_id=None,
+        lease_until=None,
+        next_attempt_at=now,
+        created_at=now,
+        updated_at=now,
+    )
+
+
+def enqueue_verify_jobs_for_findings(
+    store: _EnqueueStore,
+    findings: Sequence[Finding],
+    *,
+    origin_job: ScanJob,
+    now: dt.datetime,
+) -> VerifyEnqueueSummary:
+    """Enqueue one idempotent verify job per *ambiguous* finding (no LLM call).
+
+    A finding is ambiguous when it has no existing terminal non-blocking
+    disposition (the line-stable suppression gate is checked first, fail-safe:
+    a lookup error falls through to enqueue rather than silently dropping). Each
+    enqueue is idempotent on the content-stable verify job id, so re-running this
+    for a still-NEEDS_REVIEW finding does not flood the queue.
+    """
+    enqueued = 0
+    duplicates = 0
+    suppressed = 0
+    no_match_key = 0
+    enqueue_errors = 0
+
+    for finding in findings:
+        try:
+            existing = resolve_existing_disposition(store, finding)
+        except Exception:  # noqa: BLE001 - fail-safe: enqueue on lookup error.
+            existing = None
+        if existing is not None:
+            suppressed += 1
+            continue
+
+        job = _verify_job_for_finding(finding, origin_job=origin_job, now=now)
+        if job is None:
+            no_match_key += 1
+            continue
+
+        try:
+            created = store.enqueue_commit_scan_job(job)
+        except Exception:  # noqa: BLE001 - best-effort async enqueue.
+            # A real enqueue failure (serialization/transport) is NOT a CAS
+            # duplicate; keep the two distinct so the summary stays honest.
+            enqueue_errors += 1
+            continue
+        if created:
+            enqueued += 1
+        else:
+            # CAS no-op: a verify job for this content-stable id already exists
+            # (idempotent re-enqueue) — the flood-guard working as intended.
+            duplicates += 1
+
+    return VerifyEnqueueSummary(
+        enqueued=enqueued,
+        duplicates_skipped=duplicates,
+        suppressed=suppressed,
+        no_match_key=no_match_key,
+        enqueue_errors=enqueue_errors,
+    )
+
+
+def drain_verify_jobs(
+    store: _DrainStore,
+    *,
+    verifier: _Verifier,
+    config: VerifierConfig,
+    max_jobs: int,
+    now: dt.datetime,
+    worker_id: str = "verify-worker",
+    lease_seconds: int = 300,
+) -> VerifyDrainSummary:
+    """Drain up to ``max_jobs`` verify jobs, writing terminal dispositions.
+
+    This is the SEPARATE path (off the commit-scan hot loop) where the gated LLM
+    verifier actually runs. A terminal verdict is written via
+    ``record_verifier_disposition``; a NEEDS_REVIEW verdict writes NOTHING (the
+    row stays OPEN) but the job is still COMPLETED so it is not re-leased forever
+    (the backoff: the work item is consumed, not looped).
+    """
+    attempted = 0
+    written = 0
+    needs_review = 0
+    failed = 0
+
+    for _ in range(max(max_jobs, 0)):
+        job_id = store.lease_next_verify_job(
+            worker_id=worker_id, lease_seconds=lease_seconds, now=now
+        )
+        if job_id is None:
+            break
+
+        attempted += 1
+        try:
+            finding = store.finding_for_verify_job(job_id)
+            result = verifier.verify(finding)
+            verified = verifier_runtime.apply_verifier_result(
+                finding, result, verifier_name=config.model
+            )
+            wrote = verifier_runtime.record_verifier_disposition(
+                store,
+                original=finding,
+                verified=verified,
+                actor=config.model,
+            )
+        except Exception:  # noqa: BLE001 - keep the drain resilient per-job.
+            failed += 1
+            _safe_complete(store, job_id)
+            continue
+
+        if wrote:
+            written += 1
+        else:
+            # NEEDS_REVIEW (or non-terminal): no disposition written, but the
+            # work item is consumed so the same finding is not re-verified in a
+            # tight loop.
+            needs_review += 1
+        _safe_complete(store, job_id)
+
+    return VerifyDrainSummary(
+        attempted=attempted,
+        dispositions_written=written,
+        needs_review=needs_review,
+        failed=failed,
+    )
+
+
+def _safe_complete(store: _DrainStore, job_id: str) -> None:
+    complete: Callable[[str], None] | None = getattr(
+        store, "complete_verify_job", None
+    )
+    if complete is None:
+        return
+    try:
+        complete(job_id)
+    except Exception:  # noqa: BLE001 - completion is best-effort.
+        pass
diff --git a/tests/test_scan_worker.py b/tests/test_scan_worker.py
index 0677c84..4196311 100644
--- a/tests/test_scan_worker.py
+++ b/tests/test_scan_worker.py
@@ -623,3 +623,185 @@ def test_n_workers_each_take_distinct_repos_without_collision():
     assert store.pending_returns == []  # no contention, nothing bounced
     # every held repo lease was released after each scan (no leak).
     assert store._held == {}
+
+
+# --- M3 / two-path disposition wiring -------------------------------------- #
+#
+# Path 1 (SYNCHRONOUS inline cheap tier): the gitleaks parser applies the M2
+# path-role / context-class suppression at scan time, so the worker's scanner
+# already returns the SUPPRESSED finding set — the worker shares the inline tier
+# for free, without a second filter call. We prove the worker neither re-filters
+# nor re-expands what the scanner handed it.
+#
+# Path 2 (ASYNC LLM tier): the worker must NOT call the LLM verifier in the
+# per-job hot path. Instead it hands the completed findings to an injected
+# verify-enqueue hook (the async queue seam) so an ambiguous finding becomes a
+# separate ``job_type="verify"`` job drained off the hot path.
+
+
+def test_worker_does_not_re_filter_scanner_findings_inline_tier_is_upstream():
+    # The synchronous inline cheap tier lives in the scanner (M2 parser), so the
+    # worker passes the scanner's already-suppressed findings through unchanged:
+    # whatever the scanner returns is exactly what is completed. The worker adds
+    # no second filter pass and drops nothing the scanner kept.
+    kept = _finding(commit=None)
+    store = FakeWorkerStore([_job()])
+    scanner = FakeScanner(findings=[kept])
+
+    run_scan_worker_once(_request(store, scanner))
+
+    _, findings, _ = store.completed[0]
+    assert [f.finding_id for f in findings] == [kept.finding_id]
+
+
+def test_worker_enqueues_verify_jobs_for_findings_without_calling_an_llm():
+    # The async LLM tier seam: the worker hands the completed findings to the
+    # injected verify-enqueue hook (no synchronous LLM call in the hot path).
+    finding = _finding(commit=None)
+    store = FakeWorkerStore([_job()])
+    scanner = FakeScanner(findings=[finding])
+    enqueued_batches: list[list[str]] = []
+
+    def verify_enqueue(store_arg, findings, *, origin_job, now):
+        enqueued_batches.append([f.finding_id for f in findings])
+
+    request = ScanWorkerRequest(
+        store=store,
+        fetch_repo=lambda url: Path("/synthetic-cache/example-repo"),
+        scanner=scanner,
+        max_jobs=1,
+        lease_seconds=60,
+        worker_id="worker-a",
+        now_factory=lambda: NOW,
+        verify_enqueue=verify_enqueue,
+    )
+
+    summary = run_scan_worker_once(request)
+
+    assert summary.completed == 1
+    # the completed findings were handed to the async verify-enqueue seam.
+    assert enqueued_batches == [[finding.finding_id]]
+
+
+def test_worker_without_verify_enqueue_hook_behaves_exactly_as_before():
+    # Default behavior is unchanged: with no verify_enqueue hook the worker scans
+    # and completes exactly as it did pre-M3 (no new required dependency).
+    finding = _finding(commit=None)
+    store = FakeWorkerStore([_job()])
+    scanner = FakeScanner(findings=[finding])
+
+    summary = run_scan_worker_once(_request(store, scanner))
+
+    assert summary.completed == 1
+    _, findings, _ = store.completed[0]
+    assert [f.finding_id for f in findings] == [finding.finding_id]
+
+
+def test_verify_enqueue_failure_does_not_fail_the_scan_completion():
+    # The async enqueue is best-effort: a verify-enqueue error must not roll back
+    # an already-completed scan (the scan succeeded; verification is downstream).
+    finding = _finding(commit=None)
+    store = FakeWorkerStore([_job()])
+    scanner = FakeScanner(findings=[finding])
+
+    def boom(store_arg, findings, *, origin_job, now):
+        raise RuntimeError("synthetic enqueue failure")
+
+    request = ScanWorkerRequest(
+        store=store,
+        fetch_repo=lambda url: Path("/synthetic-cache/example-repo"),
+        scanner=scanner,
+        max_jobs=1,
+        lease_seconds=60,
+        worker_id="worker-a",
+        now_factory=lambda: NOW,
+        verify_enqueue=boom,
+    )
+
+    summary = run_scan_worker_once(request)
+
+    # the scan still completed; the enqueue failure is swallowed.
+    assert summary.completed == 1
+    assert summary.retryable == 0
+    assert summary.dead_lettered == 0
+
+
+def test_worker_does_not_enqueue_verify_when_no_findings():
+    # No findings -> nothing ambiguous -> the verify-enqueue hook is still called
+    # with an empty batch (the seam decides), but with zero findings it must not
+    # invent work. We assert the hook saw an empty list.
+    store = FakeWorkerStore([_job()])
+    scanner = FakeScanner(findings=[])
+    batches: list[list[str]] = []
+
+    def verify_enqueue(store_arg, findings, *, origin_job, now):
+        batches.append([f.finding_id for f in findings])
+
+    request = ScanWorkerRequest(
+        store=store,
+        fetch_repo=lambda url: Path("/synthetic-cache/example-repo"),
+        scanner=scanner,
+        max_jobs=1,
+        lease_seconds=60,
+        worker_id="worker-a",
+        now_factory=lambda: NOW,
+        verify_enqueue=verify_enqueue,
+    )
+
+    run_scan_worker_once(request)
+
+    assert batches == [[]]
+
+
+def _verify_job() -> ScanJob:
+    # A job_type="verify" job that should NEVER be processed by the code-scan
+    # worker (it belongs to the async-verify drain path).
+    job = _job()
+    return ScanJob(**{**job.__dict__, "job_id": "verify_job_synthetic", "job_type": "verify"})
+
+
+def test_worker_returns_verify_job_to_pending_without_scanning():
+    # D3 guard (post-M3 arch gate): if the shared queue ever hands a verify job
+    # to this code-scan worker, it must be returned to pending — never fetched,
+    # scanned, or counted as a freshness-advancing completion.
+    store = FakeWorkerStore([_verify_job()])
+    scanner = FakeScanner(findings=[_finding()])
+    fetched: list[str] = []
+
+    summary = run_scan_worker_once(
+        _request(store, scanner, fetch_repo=lambda url: fetched.append(url) or Path("/x"))
+    )
+
+    assert summary.leased == 1
+    assert summary.completed == 0
+    assert scanner.calls == []  # never scanned
+    assert fetched == []  # never fetched the synthetic verify marker
+    assert store.health_advances == []  # no freshness pollution
+    assert store.pending_returns == [
+        ("verify_job_synthetic", "verify job is not handled by the code-scan worker")
+    ]
+
+
+def test_worker_still_processes_normal_job_after_skipping_verify_job():
+    # The verify-job guard must not stall the pool: a normal job leased next is
+    # processed as usual (skip-and-continue, like the repo-lease skip-bug fix).
+    store = FakeWorkerStore([_verify_job(), _job()])
+    scanner = FakeScanner(findings=[_finding()])
+
+    request = ScanWorkerRequest(
+        store=store,
+        fetch_repo=lambda url: Path("/synthetic-cache/example-repo"),
+        scanner=scanner,
+        max_jobs=2,
+        lease_seconds=60,
+        worker_id="worker-a",
+        now_factory=lambda: NOW,
+    )
+    summary = run_scan_worker_once(request)
+
+    assert summary.leased == 2
+    assert summary.completed == 1  # the normal job completed
+    assert len(scanner.calls) == 1
+    assert store.pending_returns == [
+        ("verify_job_synthetic", "verify job is not handled by the code-scan worker")
+    ]
diff --git a/tests/test_secret_hash_salt_provenance.py b/tests/test_secret_hash_salt_provenance.py
new file mode 100644
index 0000000..1e78ccf
--- /dev/null
+++ b/tests/test_secret_hash_salt_provenance.py
@@ -0,0 +1,81 @@
+"""Salt provenance tests for the secretHash that egresses to the LLM tier (M3).
+
+``secretHash`` is the ONLY secret-derived value the async LLM verify tier sends
+off-box (design Error Handling ``secrethash-entropy-leak``). Its anti-correlation
+strength rests entirely on a per-deployment salt
+(``SECURITY_SCANNER_HASH_SALT``). These tests pin that contract:
+
+  * the ``_DEFAULT_SALT`` fallback is a DEV-ONLY placeholder, not a per-deploy
+    secret — its presence is detectable so a deployment can fail closed;
+  * injecting a real per-deployment salt changes the digest, so two deployments
+    with distinct salts cannot rainbow-correlate the same secret's hash;
+  * an empty/unset env var must NEVER silently weaken hashing to no-salt.
+
+We do NOT modify ``model.py`` here; we prove the provenance strength the M3
+egress depends on, so a regression that weakens the salt is caught.
+"""
+
+from __future__ import annotations
+
+from security_scanner.core.finding.model import _DEFAULT_SALT, hash_secret
+
+RAW = "synthetic-secret-value-for-salt-provenance"
+
+
+def test_default_salt_is_a_dev_only_placeholder():
+    # The fallback salt is documented dev-only; a real deployment overrides it.
+    # It must be a recognizable constant (not random), so a deployment can detect
+    # "still on the dev salt" and fail closed before egressing hashes off-box.
+    assert _DEFAULT_SALT == "security-scanner-dev-salt-v1"
+    assert "dev" in _DEFAULT_SALT
+
+
+def test_explicit_salt_changes_the_digest():
+    # A per-deployment salt yields a different digest than the dev default for
+    # the same secret -> two deployments cannot correlate the same secret's hash.
+    default_hash = hash_secret(RAW)
+    deploy_a = hash_secret(RAW, salt="deployment-A-strong-random-salt")
+    deploy_b = hash_secret(RAW, salt="deployment-B-strong-random-salt")
+
+    assert deploy_a != default_hash
+    assert deploy_b != default_hash
+    assert deploy_a != deploy_b
+
+
+def test_env_salt_is_honored(monkeypatch):
+    # The documented env transport (SECURITY_SCANNER_HASH_SALT) changes the hash,
+    # so a per-deployment salt set via env actually reaches the digest.
+    monkeypatch.setenv("SECURITY_SCANNER_HASH_SALT", "env-injected-deploy-salt")
+    env_hash = hash_secret(RAW)
+
+    monkeypatch.delenv("SECURITY_SCANNER_HASH_SALT", raising=False)
+    default_hash = hash_secret(RAW)
+
+    assert env_hash != default_hash
+
+
+def test_empty_env_salt_does_not_silently_drop_the_salt(monkeypatch):
+    # A set-but-empty env var must NOT bypass the salt (silent weakening). With an
+    # empty env var the digest falls back to the dev default, never to no salt.
+    monkeypatch.setenv("SECURITY_SCANNER_HASH_SALT", "")
+    empty_env_hash = hash_secret(RAW)
+
+    monkeypatch.delenv("SECURITY_SCANNER_HASH_SALT", raising=False)
+    default_hash = hash_secret(RAW)
+
+    # Empty env -> same as the dev-default fallback (salt still applied), and
+    # NOT equal to an unsalted digest.
+    assert empty_env_hash == default_hash
+    import hashlib
+
+    unsalted = "salted-sha256:" + hashlib.sha256(RAW.encode("utf-8")).hexdigest()
+    assert empty_env_hash != unsalted
+
+
+def test_hash_format_is_stable_and_prefixed():
+    digest = hash_secret(RAW, salt="any-salt")
+    assert digest.startswith("salted-sha256:")
+    # 64 lowercase hex chars after the prefix (SHA-256).
+    hexpart = digest.split(":", 1)[1]
+    assert len(hexpart) == 64
+    assert all(c in "0123456789abcdef" for c in hexpart)
diff --git a/tests/test_verify_queue.py b/tests/test_verify_queue.py
new file mode 100644
index 0000000..015edd1
--- /dev/null
+++ b/tests/test_verify_queue.py
@@ -0,0 +1,301 @@
+"""Tests for the async LLM-verify queue seam (M3).
+
+These cover the second of scan_worker's two M3 paths: the ASYNC LLM tier. The
+worker's per-job hot path must NOT call the LLM synchronously; instead it
+enqueues a ``ScanJob(job_type="verify")`` per ambiguous finding (cheap, no
+network), and a SEPARATE drain path leases those verify jobs and writes the
+terminal disposition. The verify-job id is derived from the finding's
+content-stable ``match_key`` so re-enqueuing the same ambiguous finding is an
+idempotent no-op (NEEDS_REVIEW re-verify-flood backoff).
+"""
+
+from __future__ import annotations
+
+import datetime as dt
+
+from security_scanner.core.finding.model import Finding, Verdict
+from security_scanner.llm.common.verifier import VerifierConfig, VerifierResult
+from security_scanner.runtime.verify_queue import (
+    JOB_TYPE_VERIFY,
+    VERIFY_JOB_PRIORITY,
+    drain_verify_jobs,
+    enqueue_verify_jobs_for_findings,
+    verify_job_id_for_finding,
+)
+from security_scanner.storage.base import ScanJob
+
+NOW = dt.datetime(2026, 6, 21, 12, 0, tzinfo=dt.UTC)
+REPO_ID = "repo_synthetic000000000001"
+REPO_URL = "https://github.com/example-org/example-repo"
+FAKE_SECRET = "synthetic-value-for-hash"
+
+
+def _finding(line_start: int = 10, raw_secret: str = FAKE_SECRET) -> Finding:
+    return Finding.create(
+        repo_full_name=REPO_ID,
+        rule_id="generic-api-key",
+        file_path="src/config.py",
+        line_start=line_start,
+        raw_secret=raw_secret,
+        source_tool="gitleaks",
+        scan_run_id="scan_run_synthetic",
+        rule_pack_version="secret-rules-0.1.0",
+    )
+
+
+def _job_template() -> ScanJob:
+    return ScanJob(
+        job_id="scan_job_origin",
+        repo_id=REPO_ID,
+        repo_url=REPO_URL,
+        ref_name="refs/remotes/origin/main",
+        old_sha="0" * 40,
+        new_sha="a" * 40,
+        commit_sha="a" * 40,
+        commit_range=None,
+        scanner_name="gitleaks",
+        scanner_version="unknown",
+        rule_pack_version="secret-rules-0.1.0",
+        scanner_config_hash="default",
+        priority=100,
+        status="pending",
+        attempts=0,
+        max_attempts=3,
+        worker_id=None,
+        lease_until=None,
+        next_attempt_at=NOW,
+        created_at=NOW,
+        updated_at=NOW,
+    )
+
+
+class FakeEnqueueStore:
+    """Records enqueued jobs and enforces idempotent job_id dedup (the CAS)."""
+
+    def __init__(self) -> None:
+        self.enqueued: list[ScanJob] = []
+        self._ids: set[str] = set()
+        # finding_id -> existing disposition row, defaulting to a scan-created
+        # OPEN row so resolve_existing_disposition does NOT suppress.
+        self.states: dict[str, dict] = {}
+        self.match_pointers: dict[str, dict] = {}
+
+    def enqueue_commit_scan_job(self, job: ScanJob) -> bool:
+        # Mirror the store: deterministic job_id + attribute_not_exists CAS, so a
+        # duplicate enqueue is a clean idempotent skip (returns False).
+        if job.job_id in self._ids:
+            return False
+        self._ids.add(job.job_id)
+        self.enqueued.append(job)
+        return True
+
+    def read_finding_state(self, finding_id: str):
+        return self.states.get(finding_id)
+
+    def find_disposition_by_match_key(self, match_key: str):
+        return self.match_pointers.get(match_key)
+
+
+# --------------------------------------------------------------------------- #
+# enqueue side (worker hot path, no LLM)                                       #
+# --------------------------------------------------------------------------- #
+
+
+def test_ambiguous_finding_enqueues_a_verify_job_without_calling_llm():
+    store = FakeEnqueueStore()
+    finding = _finding()
+
+    summary = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+
+    assert summary.enqueued == 1
+    assert len(store.enqueued) == 1
+    job = store.enqueued[0]
+    assert job.job_type == JOB_TYPE_VERIFY
+    assert job.priority == VERIFY_JOB_PRIORITY
+    assert job.repo_id == REPO_ID
+    # The verify job id is derived from the finding's content-stable match key.
+    assert job.job_id == verify_job_id_for_finding(finding)
+
+
+def test_reenqueueing_same_finding_is_idempotent_no_flood():
+    # NEEDS_REVIEW backoff: the deterministic, match_key-derived verify job id +
+    # the store's enqueue CAS make re-enqueuing the same ambiguous finding a
+    # clean no-op, so a finding that stays NEEDS_REVIEW is not re-queued forever.
+    store = FakeEnqueueStore()
+    finding = _finding()
+
+    first = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+    second = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+
+    assert first.enqueued == 1
+    assert second.enqueued == 0
+    assert second.duplicates_skipped == 1
+    assert len(store.enqueued) == 1  # only one verify job ever created
+
+
+def test_finding_with_existing_terminal_disposition_is_not_enqueued():
+    # A finding already dispositioned FALSE_POSITIVE (non-blocking) must be
+    # skipped: the line-stable suppression gate runs before enqueue so we never
+    # re-verify a settled finding (cost NFR).
+    store = FakeEnqueueStore()
+    finding = _finding()
+    store.states[finding.finding_id] = {"status": "FALSE_POSITIVE"}
+
+    summary = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+
+    assert summary.enqueued == 0
+    assert summary.suppressed == 1
+    assert store.enqueued == []
+
+
+def test_finding_without_secret_hash_is_skipped_not_enqueued():
+    # No secret_hash -> no stable match key -> cannot form an idempotent verify
+    # job id. Skip rather than enqueue an unstable/duplicating job.
+    store = FakeEnqueueStore()
+    finding = _finding()
+    finding.evidence.secret_hash = None
+
+    summary = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+
+    assert summary.enqueued == 0
+    assert store.enqueued == []
+
+
+def test_real_enqueue_failure_counts_as_error_not_duplicate():
+    # post-M3 arch gate D1 nit: a genuine enqueue failure (serialization /
+    # transport) is NOT a CAS idempotency no-op. It must be counted separately
+    # so the summary's duplicates_skipped stays an honest flood-guard signal.
+    class RaisingEnqueueStore(FakeEnqueueStore):
+        def enqueue_commit_scan_job(self, job: ScanJob) -> bool:
+            raise RuntimeError("synthetic transport failure")
+
+    store = RaisingEnqueueStore()
+    finding = _finding()
+
+    summary = enqueue_verify_jobs_for_findings(
+        store, [finding], origin_job=_job_template(), now=NOW
+    )
+
+    assert summary.enqueued == 0
+    assert summary.enqueue_errors == 1
+    assert summary.duplicates_skipped == 0
+
+
+# --------------------------------------------------------------------------- #
+# drain side (separate path, LLM here, writes disposition)                     #
+# --------------------------------------------------------------------------- #
+
+
+class FakeVerifier:
+    def __init__(self, config, verdicts: dict[str, str]) -> None:
+        self.config = config
+        self._verdicts = verdicts
+        self.calls: list[str] = []
+
+    def verify(self, finding: Finding) -> VerifierResult:
+        self.calls.append(finding.finding_id)
+        verdict = self._verdicts[finding.finding_id]
+        return VerifierResult(
+            verdict=verdict,
+            confidence=0.95,
+            reason=f"Synthetic; do not echo {FAKE_SECRET}.",
+            raw_label=verdict.lower(),
+        )
+
+
+class FakeDrainStore:
+    """A queue+disposition store: leases verify jobs and records dispositions."""
+
+    def __init__(self, findings_by_job: dict[str, Finding]) -> None:
+        self._findings_by_job = findings_by_job
+        self._pending = list(findings_by_job.keys())
+        self.dispositions: list[dict] = []
+        self.completed: list[str] = []
+        self.lease_calls = 0
+
+    def lease_next_verify_job(self, worker_id, lease_seconds, now):
+        self.lease_calls += 1
+        if not self._pending:
+            return None
+        return self._pending.pop(0)
+
+    def finding_for_verify_job(self, job_id: str) -> Finding:
+        return self._findings_by_job[job_id]
+
+    def set_finding_disposition(self, finding_id, **kwargs):
+        self.dispositions.append({"finding_id": finding_id, **kwargs})
+
+    def complete_verify_job(self, job_id: str) -> None:
+        self.completed.append(job_id)
+
+
+def test_drain_writes_disposition_for_terminal_verdict():
+    finding = _finding()
+    job_id = verify_job_id_for_finding(finding)
+    store = FakeDrainStore({job_id: finding})
+    verifier = FakeVerifier(None, {finding.finding_id: Verdict.FALSE_POSITIVE.value})
+
+    summary = drain_verify_jobs(
+        store,
+        verifier=verifier,
+        config=VerifierConfig(host="http://127.0.0.1:11434", model="synthetic-model"),
+        max_jobs=5,
+        now=NOW,
+    )
+
+    assert verifier.calls == [finding.finding_id]
+    assert summary.dispositions_written == 1
+    assert store.dispositions[0]["finding_id"] == finding.finding_id
+    assert store.dispositions[0]["status"] == "FALSE_POSITIVE"
+    assert store.completed == [job_id]
+
+
+def test_drain_does_not_write_disposition_for_needs_review():
+    # fail-closed: a NEEDS_REVIEW verdict writes NO disposition (the row stays
+    # OPEN/unreviewed), but the verify job is still COMPLETED so it is not
+    # re-leased forever (backoff: the work item is consumed, not looped).
+    finding = _finding()
+    job_id = verify_job_id_for_finding(finding)
+    store = FakeDrainStore({job_id: finding})
+    verifier = FakeVerifier(None, {finding.finding_id: Verdict.NEEDS_REVIEW.value})
+
+    summary = drain_verify_jobs(
+        store,
+        verifier=verifier,
+        config=VerifierConfig(host="http://127.0.0.1:11434", model="synthetic-model"),
+        max_jobs=5,
+        now=NOW,
+    )
+
+    assert verifier.calls == [finding.finding_id]
+    assert summary.dispositions_written == 0
+    assert summary.needs_review == 1
+    assert store.dispositions == []  # NEEDS_REVIEW is never written
+    assert store.completed == [job_id]  # but the job is consumed (no flood)
+
+
+def test_drain_empty_queue_is_a_noop():
+    store = FakeDrainStore({})
+    verifier = FakeVerifier(None, {})
+
+    summary = drain_verify_jobs(
+        store,
+        verifier=verifier,
+        config=VerifierConfig(host="http://127.0.0.1:11434", model="synthetic-model"),
+        max_jobs=5,
+        now=NOW,
+    )
+
+    assert summary.dispositions_written == 0
+    assert summary.attempted == 0
+    assert verifier.calls == []

From d905e95d8eb47deaf040d660e079c04af2aa5b29 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 12:40:08 +0900
Subject: [PATCH 6/7] =?UTF-8?q?feat(runtime):=20M4=20non-GHAS=20drift=20mo?=
 =?UTF-8?q?nitor=20=E2=80=94=20GHAS-calibrated=20=EA=B8=B0=EC=A4=80?=
 =?UTF-8?q?=EC=84=A0=20+=20passive=20=EB=85=B8=EC=B6=9C?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

자율층 M4. non-GHAS repo는 per-repo truth가 없어 SLO 측정 불가(B-floor+C-monitor,
requirements Q6) — 증류한 품질 머신 전이의 건전성을 분포-shift 조기경보로 감시.

- runtime/drift_monitor.py(신규): DriftBaseline.from_macro_parity가 M1 parity
  집계(aggregate_repo_parity → EvaluationResult.true_positives = GHAS-매칭 TP)의
  canonical-type 분포를 GHAS-calibrated 기준선으로 도출. non-GHAS는 GHAS-매칭 TP가
  0이라 self-baseline 구조적 불가(테스트 증명). evaluate_distribution_drift는
  finding rule_id 분포만으로 total variation distance 산출(stdlib Counter, 신규
  의존성 0) — verifier verdict 절대 미참조(verifier-직교, common-cause bias 완화).
  verifier disposition 비율은 별도 필드로 cross-reference만(distance에 미혼입).
  전이 한계(early-warning 전용, SLO 아님) 모듈 docstring에 명시.
- runtime/notification_log.py: drift_record(type:"drift") 빌더 추가(cadence_overrun
  선례 패턴, append-only JSONL, 기존 consumer 영향 0).
- runtime/scan_all.py: DriftConfig(default-off) + ScanAllRequest.drift_config +
  _maybe_write_drift_record passive 훅. config None/disabled/baseline None이면 즉시
  return → 기존 notification 스트림 byte-identical. drift는 별도 record로만, parity/
  verification summary 슬롯과 분리(SLO 미오염). drift_monitor import는 함수-로컬
  (default-off 경로 미탑재, circular import 회피). 폴링/타이머/스케줄 신설 없음 —
  scan-all 완료 시점에 1회 piggyback(능동 drift 비채택 준수).
- design.md: drift 노출 표면 택1=notification_log 확정(근거 명시).

검증: uv run pytest 1130 passed(+15), default-off byte-identical 증명, parity.py/
metrics.py 무수정, public_safety green, autopilot_gate --base 81d59d0 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../specs/ghas-quality-secrets/design.md      |   6 +-
 src/security_scanner/runtime/drift_monitor.py | 314 ++++++++++
 .../runtime/notification_log.py               |  22 +
 src/security_scanner/runtime/scan_all.py      |  85 +++
 tests/test_drift_monitor.py                   | 534 ++++++++++++++++++
 5 files changed, 960 insertions(+), 1 deletion(-)
 create mode 100644 src/security_scanner/runtime/drift_monitor.py
 create mode 100644 tests/test_drift_monitor.py

diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
index d6ec5cb..2167723 100644
--- a/docs/workbench/specs/ghas-quality-secrets/design.md
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -223,7 +223,11 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
   실제 boost(verifier confidence/disposition 상향)는 M3 검증 티어 소관 — M3 배선 시 `context_filter`의
   partner hook을 M3 disposition 경로로 옮길지 재평가.
 - drift 샘플링 레이트/판정 임계(별도 스케줄 신설=비채택 위반 상한).
-- drift 노출 표면 최종(scan_health 레코드 vs notification_log) — M4서 택1.
+- **drift 노출 표면 택1(확정): notification_log**. 근거: append-only JSONL free-dict라 신규 `drift` 레코드를
+  기존 consumer 영향 0으로 추가 가능(`cadence_overrun` 선례). scan_health는 DynamoDB 전용 + freshness
+  breach 의미론에 결합돼 drift를 얹으면 단일책임 위반 + `BreachCounter` 스키마 변경이 M5 SLO 게이트와 충돌
+  위험. scan_all 완료 시점(`_write_finding_and_summary_records`)에 이미 notification write 체인이 있어
+  passive piggyback hook이 명확(폴링 신설 없음).
 - line-tolerance k값·구간겹침 vs ±k 택1.
 - path-role 분류기 공통 추출(post-M2 arch gate nit): 현재 `scanners/gitleaks/context_filter`와
   `llm/common/prompt._path_role`이 어휘를 의도적 복제(테스트로 등가 강제, scanners→llm import 없음).
diff --git a/src/security_scanner/runtime/drift_monitor.py b/src/security_scanner/runtime/drift_monitor.py
new file mode 100644
index 0000000..a673761
--- /dev/null
+++ b/src/security_scanner/runtime/drift_monitor.py
@@ -0,0 +1,314 @@
+"""M4 non-GHAS drift monitor — passive, default-off, parity-separated.
+
+This module computes an *early-warning* distribution-drift signal for repos that
+have no GHAS coverage (and therefore no per-repo ground truth). It is NOT an SLO:
+without per-repo truth we cannot score precision/recall on these samples, so the
+strongest claim available is "the unlabeled distribution of what we flag has
+moved away from the GHAS-calibrated reference distribution". This is a B-floor +
+C-monitor signal (requirements Q6), deliberately weaker than the parity SLO.
+
+Transfer limit (M4 done, requirements Q6)
+-----------------------------------------
+A non-GHAS repo has no GHAS alert stream, so there is no labelled positive truth
+to compare against. The drift signal here is therefore an EARLY WARNING ONLY, not
+a measurement of correctness:
+
+* it can say "what we flag on non-GHAS repos looks distributionally different from
+  the GHAS-derived reference", which is a useful tripwire for silent regressions;
+* it CANNOT say "precision/recall on non-GHAS repos is X" — that requires truth
+  this layer does not have.
+
+Consequently the drift signal is kept on a physically separate field/record and
+the SLO gate (M5) never consumes it. Treating drift as an SLO would manufacture a
+correctness number out of an unlabeled sample, which this module refuses to do.
+
+Design contract (design.md "Fixed decisions", M4)
+-------------------------------------------------
+1. **GHAS-derived baseline.** The reference distribution comes from the M1 parity
+   aggregate (``aggregate_repo_parity`` over ``RepoParityResult``), specifically
+   the canonical-type distribution of ``EvaluationResult.true_positives`` — the
+   GHAS-vs-local *matched* set. It is never self-fit to the non-GHAS sample.
+2. **Verifier-orthogonal distribution shift.** The drift signal is the unlabeled
+   rule_id distribution of the non-GHAS sample measured as total-variation
+   distance from the baseline. It reads frequencies only — never a verifier
+   verdict — so it is independent of the verifier disposition ratios it is
+   *cross-referenced* with (common-cause-bias mitigation). The verifier ratios
+   ride a separate field.
+3. **Input reuse.** Callers feed in the same ``Finding`` objects scan-all already
+   produced (and may reuse ``eval/synthetic-corpus`` for offline exercise). No new
+   fixture is mandatory.
+4. **Physical separation from parity/SLO.** The output is :class:`ScanAllDriftSummary`
+   with no precision/recall/SLO/pass-fail field; it is exposed only via a separate
+   ``drift`` notification record, never the parity/verification summary slot.
+5. **Exposure surface = notification_log** (design Open Questions, confirmed):
+   ``runtime.notification_log.drift_record`` + ``write_record``.
+6. **Default-off, passive.** Enabled only when ``SECURITY_SCANNER_DRIFT_MONITOR``
+   is truthy. Computed passively at scan-all completion (piggyback) — this module
+   introduces no timer, loop, poll, or schedule.
+
+Distribution comparison uses only the standard library (``collections.Counter``
+ratios + total-variation distance); no KL/chi-squared dependency is added.
+"""
+
+from __future__ import annotations
+
+from collections import Counter
+from dataclasses import dataclass
+from typing import Iterable, Mapping, Sequence
+
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+)
+from security_scanner.baseline.ghas_api.parity import (
+    MacroParityResult,
+    RepoParityResult,
+)
+from security_scanner.core.finding.model import Finding
+
+DRIFT_MONITOR_ENV_VAR = "SECURITY_SCANNER_DRIFT_MONITOR"
+
+# Baseline provenance marker. The drift baseline is GHAS-calibrated: it is derived
+# from the M1 parity aggregate, never self-fit to the non-GHAS sample.
+GHAS_CALIBRATED_SOURCE = "ghas-calibrated"
+
+
+# ---------------------------------------------------------------------------
+# Distribution utilities (stdlib only: Counter ratios + total variation)
+# ---------------------------------------------------------------------------
+
+
+def _normalize_counts(counts: Mapping[str, int | float]) -> dict[str, float]:
+    """Turn raw counts into a probability distribution (ratios summing to 1)."""
+    total = float(sum(counts.values()))
+    if total <= 0:
+        return {}
+    return {key: value / total for key, value in counts.items()}
+
+
+def total_variation_distance(
+    left: Mapping[str, float],
+    right: Mapping[str, float],
+) -> float:
+    """Total variation distance between two probability distributions.
+
+    ``TVD = 0.5 * sum_k |p_k - q_k|`` over the union of keys. Pure stdlib; no new
+    dependency. Result is in ``[0, 1]``.
+    """
+    keys = set(left) | set(right)
+    return 0.5 * sum(abs(left.get(k, 0.0) - right.get(k, 0.0)) for k in keys)
+
+
+def rule_id_distribution(
+    findings: Iterable[Finding],
+    *,
+    normalizer: SecretTypeNormalizer | None = None,
+) -> dict[str, float]:
+    """Unlabeled canonical-rule distribution over a finding sample.
+
+    Each finding's ``rule_id`` is mapped to its canonical type (so the sample and
+    the GHAS-derived baseline live in the same canonical space); unmapped rule_ids
+    fall back to their raw token. This reads ``rule_id`` frequencies ONLY — never a
+    verifier verdict — which is what makes the drift signal verifier-orthogonal.
+    """
+    norm = normalizer or SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+    counter: Counter[str] = Counter()
+    for finding in findings:
+        canonical = norm.canonical_for_rule_id(finding.rule_id) or finding.rule_id
+        counter[canonical] += 1
+    return _normalize_counts(counter)
+
+
+# ---------------------------------------------------------------------------
+# GHAS-derived baseline (contract point 1)
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class DriftBaseline:
+    """GHAS-calibrated reference distribution for the drift monitor.
+
+    Built from the M1 parity aggregate, NOT from the non-GHAS sample. ``source``
+    is pinned to :data:`GHAS_CALIBRATED_SOURCE` so a reader can confirm the
+    baseline's provenance, and the GHAS-derived parity context (macro
+    precision/recall, repo count, type coverage) is carried alongside the
+    distribution as additional evidence of calibration.
+    """
+
+    source: str
+    distribution: dict[str, float]
+    macro_precision: float
+    macro_recall: float
+    repo_count: int
+    type_coverage: float
+
+    @classmethod
+    def from_macro_parity(
+        cls,
+        repo_results: Sequence[RepoParityResult],
+        macro: MacroParityResult,
+        *,
+        normalizer: SecretTypeNormalizer | None = None,
+    ) -> "DriftBaseline":
+        """Derive the baseline from M1 GHAS-calibrated parity output.
+
+        The reference distribution is the canonical-type distribution of every
+        ``EvaluationResult.true_positives`` key across the per-repo parity
+        results — i.e. the GHAS-vs-local matched secrets. This is what makes the
+        baseline provably GHAS-derived rather than self-fit: it cannot be produced
+        from a non-GHAS sample because non-GHAS repos yield no GHAS-matched TPs.
+        """
+        repo_results = list(repo_results)
+        if not repo_results or macro.repo_count == 0:
+            raise ValueError(
+                "drift baseline requires a non-empty GHAS-calibrated parity "
+                "aggregate; refusing to anchor drift on an empty baseline"
+            )
+
+        norm = normalizer or SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+        tp_counter: Counter[str] = Counter()
+        coverage_registered = 0
+        coverage_total = 0
+        for repo in repo_results:
+            for key in repo.detection.true_positives:
+                # The matched key's rule_id is already the canonical type the M1
+                # adapter assigned; re-canonicalize defensively for safety.
+                canonical = norm.canonical_for_rule_id(key.rule_id) or key.rule_id
+                tp_counter[canonical] += 1
+            coverage_registered += repo.type_coverage.registered_count
+            coverage_total += repo.type_coverage.total_count
+
+        if not tp_counter:
+            raise ValueError(
+                "drift baseline requires at least one GHAS-matched true positive "
+                "to anchor the reference distribution"
+            )
+
+        type_coverage = (
+            coverage_registered / coverage_total if coverage_total else 1.0
+        )
+        return cls(
+            source=GHAS_CALIBRATED_SOURCE,
+            distribution=_normalize_counts(tp_counter),
+            macro_precision=macro.macro_precision,
+            macro_recall=macro.macro_recall,
+            repo_count=macro.repo_count,
+            type_coverage=type_coverage,
+        )
+
+
+# ---------------------------------------------------------------------------
+# Drift summary (contract point 4: no parity/SLO fields)
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class ScanAllDriftSummary:
+    """Drift signal for one scan-all pass (early warning, never an SLO).
+
+    Carries the verifier-orthogonal distribution-shift distance plus the SEPARATE
+    verifier disposition ratios it is cross-referenced with. It exposes no
+    precision/recall/SLO/pass-fail field — drift is physically separated from the
+    parity score and the M5 gate never reads it.
+    """
+
+    distribution_distance: float
+    sample_size: int
+    sample_distribution: dict[str, float]
+    baseline_distribution: dict[str, float]
+    baseline_source: str
+    verifier_needs_review_ratio: float | None = None
+    verifier_terminal_ratio: float | None = None
+
+    def to_notification_dict(self) -> dict:
+        """Public-safe, parity-free dict for the ``drift`` notification record.
+
+        Intentionally contains NO precision/recall/slo/pass/gate/threshold key so
+        the drift channel can never be confused with the parity SLO channel.
+        """
+        return {
+            # Self-labelling: this is an early-warning monitor, not an SLO.
+            "signal": "early-warning",
+            "is_slo": False,
+            "distribution_distance": self.distribution_distance,
+            "sample_size": self.sample_size,
+            "sample_distribution": dict(self.sample_distribution),
+            "baseline_distribution": dict(self.baseline_distribution),
+            "baseline_source": self.baseline_source,
+            "verifier_needs_review_ratio": self.verifier_needs_review_ratio,
+            "verifier_terminal_ratio": self.verifier_terminal_ratio,
+        }
+
+
+# ---------------------------------------------------------------------------
+# Drift evaluation (contract points 2 + 3)
+# ---------------------------------------------------------------------------
+
+
+def evaluate_distribution_drift(
+    baseline: DriftBaseline,
+    findings: Sequence[Finding],
+    *,
+    normalizer: SecretTypeNormalizer | None = None,
+    verifier_needs_review_ratio: float | None = None,
+    verifier_terminal_ratio: float | None = None,
+) -> ScanAllDriftSummary:
+    """Measure non-GHAS sample distribution shift against the GHAS baseline.
+
+    The distance is the total-variation distance between the sample's unlabeled
+    canonical-rule distribution and the GHAS-derived baseline distribution. The
+    verifier ratios are stored on a SEPARATE field for cross-referencing — they
+    never feed the distance, so the distance is invariant to the verifier
+    disposition mix (verifier-orthogonal, common-cause-bias mitigation).
+    """
+    sample_distribution = rule_id_distribution(findings, normalizer=normalizer)
+    distance = total_variation_distance(sample_distribution, baseline.distribution)
+    return ScanAllDriftSummary(
+        distribution_distance=distance,
+        sample_size=len(findings),
+        sample_distribution=sample_distribution,
+        baseline_distribution=dict(baseline.distribution),
+        baseline_source=baseline.source,
+        verifier_needs_review_ratio=verifier_needs_review_ratio,
+        verifier_terminal_ratio=verifier_terminal_ratio,
+    )
+
+
+# ---------------------------------------------------------------------------
+# default-off env gate (contract point 6)
+# ---------------------------------------------------------------------------
+
+
+def _env_truthy(value: str | None) -> bool:
+    if not value:
+        return False
+    return value.strip().lower() in ("1", "true", "yes", "on")
+
+
+def drift_config_from_env(env: Mapping[str, str] | None = None):
+    """Return a :class:`DriftConfig` when the env gate is truthy, else ``None``.
+
+    Default-off: unset / empty / ``0`` / ``false`` / ``no`` / ``off`` all return
+    ``None`` so the scan-all path stays byte-identical to pre-M4 behaviour unless
+    an operator explicitly opts in (same gating idiom as the LLM verifier tier).
+    """
+    import os
+
+    from security_scanner.runtime.scan_all import DriftConfig
+
+    source = env if env is not None else os.environ
+    if not _env_truthy(source.get(DRIFT_MONITOR_ENV_VAR)):
+        return None
+    return DriftConfig(enabled=True)
+
+
+__all__ = [
+    "DRIFT_MONITOR_ENV_VAR",
+    "GHAS_CALIBRATED_SOURCE",
+    "DriftBaseline",
+    "ScanAllDriftSummary",
+    "drift_config_from_env",
+    "evaluate_distribution_drift",
+    "rule_id_distribution",
+    "total_variation_distance",
+]
diff --git a/src/security_scanner/runtime/notification_log.py b/src/security_scanner/runtime/notification_log.py
index f7d1cdc..cc7bca0 100644
--- a/src/security_scanner/runtime/notification_log.py
+++ b/src/security_scanner/runtime/notification_log.py
@@ -143,6 +143,28 @@ def fatal_error_record(
     }
 
 
+def drift_record(
+    *,
+    event_at: str,
+    drift: Any,
+) -> dict[str, Any]:
+    """Build a `drift` record for the M4 non-GHAS drift monitor.
+
+    Exposes the drift signal on the existing append-only notification seam (the
+    `cadence_overrun` precedent), as a brand-new `type: "drift"` record so no
+    existing consumer is affected. `drift` is a `ScanAllDriftSummary`; its
+    `to_notification_dict()` carries NO precision/recall/SLO field, keeping the
+    drift channel physically separate from the parity SLO channel. This is an
+    early-warning signal, never an SLO (non-GHAS repos have no per-repo truth).
+    """
+    record: dict[str, Any] = {
+        "type": "drift",
+        "event_at": event_at,
+    }
+    record.update(drift.to_notification_dict())
+    return record
+
+
 def cadence_overrun_record(
     *,
     event_at: str,
diff --git a/src/security_scanner/runtime/scan_all.py b/src/security_scanner/runtime/scan_all.py
index 01beffa..d9cebdb 100644
--- a/src/security_scanner/runtime/scan_all.py
+++ b/src/security_scanner/runtime/scan_all.py
@@ -22,6 +22,7 @@
     run_local_scan,
 )
 from security_scanner.runtime.notification_log import (
+    drift_record,
     fatal_error_record,
     finding_record,
     lock_contention_record,
@@ -63,6 +64,23 @@ def default_notification_writer() -> NotificationWriter:
     return write_record
 
 
+@dataclass(frozen=True)
+class DriftConfig:
+    """M4 non-GHAS drift-monitor toggle + GHAS-derived baseline (default-off).
+
+    ``enabled`` mirrors the LLM-tier gating idiom: when it is ``False`` (or the
+    whole config is ``None``), scan-all writes NO drift record and its existing
+    notification stream is byte-identical to pre-M4 behaviour. ``baseline`` is the
+    GHAS-calibrated reference distribution (``DriftBaseline`` from
+    ``runtime.drift_monitor``); drift is only computed when both ``enabled`` is
+    truthy and a baseline is present. This carries no schedule/timer — drift rides
+    passively on the scan-all completion path (no new polling).
+    """
+
+    enabled: bool = False
+    baseline: object | None = None
+
+
 @dataclass(frozen=True)
 class ScanAllFetchFailure:
     """Per-target fetch failure captured without aborting the batch."""
@@ -92,6 +110,7 @@ class ScanAllRequest:
     verifier_config_factory: VerifierConfigFactory | None = None
     verifier_factory: verifier_runtime.VerifierFactory | None = None
     disposition_store_factory: DispositionStoreFactory | None = None
+    drift_config: DriftConfig | None = None
 
 
 @dataclass(frozen=True)
@@ -724,3 +743,69 @@ def _write_finding_and_summary_records(
             ),
         ),
     )
+
+    # M4 passive drift piggyback (default-off). When the drift monitor is not
+    # enabled this branch is never taken, so the record stream above is
+    # byte-identical to pre-M4 behaviour. No timer/loop/poll is introduced — drift
+    # rides exactly once on this completion path. Drift is written as a SEPARATE
+    # `drift` record and never mixed into the parity/verification summary slot.
+    _maybe_write_drift_record(
+        request=request,
+        log_path=log_path,
+        scan_result=scan_result,
+        verifier_summary=verifier_summary,
+    )
+
+
+def _maybe_write_drift_record(
+    *,
+    request: ScanAllRequest,
+    log_path: Path,
+    scan_result: LocalScanResult | None,
+    verifier_summary: ScanAllVerifierSummary | None,
+) -> None:
+    """Compute and append the non-GHAS drift record when the monitor is enabled.
+
+    Default-off: returns immediately unless ``drift_config`` is enabled WITH a
+    GHAS-derived baseline. The distribution-shift signal reads finding rule_ids
+    only (verifier-orthogonal); verifier disposition ratios are passed through on a
+    separate field purely for cross-referencing, never folded into the distance.
+    """
+    config = request.drift_config
+    if config is None or not config.enabled or config.baseline is None:
+        return
+    if scan_result is None:
+        return
+
+    findings = [
+        finding
+        for target_result in scan_result.target_results
+        if target_result.status == "scanned"
+        for finding in target_result.findings
+    ]
+    if not findings:
+        return
+
+    # Verifier-orthogonal cross-reference: pass disposition ratios on a separate
+    # field. They never feed the distribution distance.
+    needs_review_ratio: float | None = None
+    terminal_ratio: float | None = None
+    if verifier_summary is not None and verifier_summary.attempted > 0:
+        needs_review_ratio = verifier_summary.needs_review / verifier_summary.attempted
+        terminal_ratio = (
+            verifier_summary.terminal_verdicts / verifier_summary.attempted
+        )
+
+    # Import locally to keep the default-off path free of drift-monitor imports.
+    from security_scanner.runtime.drift_monitor import evaluate_distribution_drift
+
+    drift = evaluate_distribution_drift(
+        config.baseline,
+        findings,
+        verifier_needs_review_ratio=needs_review_ratio,
+        verifier_terminal_ratio=terminal_ratio,
+    )
+    request.notification_writer(
+        log_path,
+        drift_record(event_at=request.now_factory(), drift=drift),
+    )
diff --git a/tests/test_drift_monitor.py b/tests/test_drift_monitor.py
new file mode 100644
index 0000000..036f0bb
--- /dev/null
+++ b/tests/test_drift_monitor.py
@@ -0,0 +1,534 @@
+"""M4 non-GHAS drift-monitor tests (TDD red-first).
+
+These tests prove the seven M4 contract points, each written so that removing the
+corresponding guarantee makes a specific assertion go red:
+
+(a) the drift BASELINE is derived from the M1 GHAS-calibrated parity aggregate
+    (``aggregate_repo_parity`` over ``RepoParityResult``), NOT self-fitted to the
+    non-GHAS sample;
+(b) a non-GHAS sample's unlabeled rule_id distribution shift is measured against
+    that GHAS-derived baseline (total-variation distance, stdlib only);
+(c) the distribution-shift signal is verifier-ORTHOGONAL — it is invariant to the
+    verifier disposition ratios it is cross-referenced with;
+(d) the drift summary never touches parity precision/recall/SLO fields (physical
+    separation);
+(e) drift is exposed on the notification_log as a separate ``drift`` record;
+(f) default-off: with the env unset, drift is neither computed nor recorded and
+    the existing scan_all notification stream is byte-identical;
+(g) passive: drift is computed only at scan_all completion (piggyback), with no
+    new timer/loop/poll surface introduced by the module.
+"""
+
+from __future__ import annotations
+
+import datetime as dt
+import json
+from pathlib import Path
+
+import pytest
+
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+)
+from security_scanner.baseline.ghas_api.parity import (
+    aggregate_repo_parity,
+    evaluate_repo_parity,
+)
+from security_scanner.core.finding.model import Finding
+from security_scanner.storage.base import GhasAlertRecord
+
+from security_scanner.runtime.drift_monitor import (
+    DRIFT_MONITOR_ENV_VAR,
+    DriftBaseline,
+    ScanAllDriftSummary,
+    drift_config_from_env,
+    evaluate_distribution_drift,
+    rule_id_distribution,
+)
+
+
+REPO = "synthetic-org/synthetic-repo"
+RULE_PACK = "secret-rules-0.1.0"
+FETCHED_AT = dt.datetime(2026, 6, 16, 12, 0, tzinfo=dt.timezone.utc)
+
+
+def _alert(
+    *,
+    number: int,
+    secret_type: str,
+    path: str,
+    start_line: int,
+    state: str = "open",
+    resolution: str | None = None,
+) -> GhasAlertRecord:
+    return GhasAlertRecord(
+        ghas_alert_id=f"ghas_alert_{number:06d}",
+        repository=REPO,
+        alert_number=number,
+        secret_type=secret_type,
+        state=state,
+        resolution=resolution,
+        fetched_at=FETCHED_AT,
+        location_path=path,
+        location_start_line=start_line,
+        location_end_line=start_line,
+    )
+
+
+def _finding(*, rule_id: str, path: str, line_start: int) -> Finding:
+    return Finding.create(
+        repo_full_name=REPO,
+        file_path=path,
+        line_start=line_start,
+        rule_id=rule_id,
+        raw_secret="SCANNER_FAKE_SECRET_TOKEN_000001",
+        source_tool="gitleaks",
+        scan_run_id="scan_drift",
+        rule_pack_version=RULE_PACK,
+    )
+
+
+def _ghas_calibrated_macro():
+    """A GHAS-calibrated macro aggregate over two repos via the M1 parity path.
+
+    Both repos pair a GHAS alert with a colocated gitleaks finding of the SAME
+    canonical type, so the M1 fuzzy join produces true positives whose canonical
+    rule_id lands in ``RepoParityResult.detection.true_positives`` — this is the
+    GHAS-derived distribution the baseline must come from.
+    """
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+    repos = []
+    for idx in range(2):
+        alerts = [
+            _alert(
+                number=1,
+                secret_type="github_personal_access_token",
+                path="src/config.py",
+                start_line=10,
+            ),
+            _alert(
+                number=2,
+                secret_type="discord_bot_token",
+                path="manifests/svc.yaml",
+                start_line=20,
+            ),
+        ]
+        findings = [
+            _finding(rule_id="github-pat", path="src/config.py", line_start=10),
+            _finding(rule_id="discord-api-token", path="manifests/svc.yaml", line_start=20),
+        ]
+        repos.append(
+            evaluate_repo_parity(
+                repo_full_name=f"{REPO}-{idx}",
+                alerts=alerts,
+                findings=findings,
+                normalizer=normalizer,
+            )
+        )
+    return repos, aggregate_repo_parity(repos)
+
+
+# ---------------------------------------------------------------------------
+# (a) baseline is GHAS-derived (not self-baseline)
+# ---------------------------------------------------------------------------
+
+
+def test_baseline_is_derived_from_ghas_calibrated_parity_aggregate():
+    """The baseline distribution must come from M1 ``RepoParityResult`` TPs.
+
+    Concretely: the baseline rule_id distribution must equal the canonical-type
+    distribution of ``RepoParityResult.detection.true_positives`` (the GHAS-vs-
+    local matched set), and must NOT equal a distribution fitted to an unrelated
+    non-GHAS sample.
+    """
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+
+    # Provenance is explicitly GHAS-calibrated, carried on the baseline.
+    assert baseline.source == "ghas-calibrated"
+
+    # The baseline distribution is the canonical-type distribution over the
+    # GHAS-matched true positives (two repos, two canonical types each).
+    assert baseline.distribution == {
+        "github-personal-access-token": 0.5,
+        "discord-bot-token": 0.5,
+    }
+
+    # The baseline carries GHAS-derived parity context (macro precision/recall,
+    # type coverage) so a reader can confirm it is GHAS-calibrated, not self-fit.
+    assert baseline.macro_precision == pytest.approx(macro.macro_precision)
+    assert baseline.macro_recall == pytest.approx(macro.macro_recall)
+    assert baseline.repo_count == macro.repo_count == 2
+
+    # A self-baseline built from a skewed non-GHAS sample would be all-one-rule;
+    # the GHAS-derived baseline is provably NOT that.
+    self_baseline_sample = [_finding(rule_id="aws-access-token", path="a.env", line_start=1)]
+    assert baseline.distribution != rule_id_distribution(self_baseline_sample)
+
+
+def test_baseline_rejects_empty_aggregate():
+    """An empty GHAS aggregate cannot anchor a drift baseline (fail loud)."""
+    with pytest.raises(ValueError):
+        DriftBaseline.from_macro_parity([], aggregate_repo_parity([]))
+
+
+# ---------------------------------------------------------------------------
+# (b) non-GHAS sample distribution shift vs the GHAS-derived baseline
+# ---------------------------------------------------------------------------
+
+
+def test_distribution_shift_measured_against_ghas_baseline():
+    """A skewed non-GHAS sample drifts away from the balanced GHAS baseline."""
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+
+    # Non-GHAS sample heavily skewed toward one rule (GHAS baseline is 50/50).
+    sample = [
+        _finding(rule_id="github-pat", path="a.py", line_start=1),
+        _finding(rule_id="github-pat", path="b.py", line_start=1),
+        _finding(rule_id="github-pat", path="c.py", line_start=1),
+        _finding(rule_id="discord-api-token", path="d.yaml", line_start=1),
+    ]
+    drift = evaluate_distribution_drift(baseline, sample)
+
+    # Sample canonical distribution is 0.75 / 0.25 vs baseline 0.5 / 0.5;
+    # total variation distance = 0.25.
+    assert drift.distribution_distance == pytest.approx(0.25)
+    assert drift.sample_size == 4
+
+
+def test_distribution_shift_zero_when_sample_matches_baseline():
+    """A sample whose distribution equals the baseline has zero drift."""
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+
+    sample = [
+        _finding(rule_id="github-pat", path="a.py", line_start=1),
+        _finding(rule_id="discord-api-token", path="b.yaml", line_start=1),
+    ]
+    drift = evaluate_distribution_drift(baseline, sample)
+    assert drift.distribution_distance == pytest.approx(0.0)
+
+
+# ---------------------------------------------------------------------------
+# (c) verifier-orthogonal: the distribution-shift signal is invariant to the
+#     verifier disposition ratios cross-referenced with it.
+# ---------------------------------------------------------------------------
+
+
+def test_distribution_shift_is_verifier_orthogonal():
+    """Varying verifier disposition ratios must NOT move distribution_distance.
+
+    The two signals are crossed (both land on the summary) but the unlabeled
+    distribution shift is computed from rule_id frequencies alone — it never
+    reads a verifier verdict. So holding the sample fixed while changing the
+    verifier disposition ratios leaves ``distribution_distance`` identical.
+    """
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+
+    sample = [
+        _finding(rule_id="github-pat", path="a.py", line_start=1),
+        _finding(rule_id="github-pat", path="b.py", line_start=1),
+        _finding(rule_id="discord-api-token", path="c.yaml", line_start=1),
+    ]
+
+    low_review = evaluate_distribution_drift(
+        baseline,
+        sample,
+        verifier_needs_review_ratio=0.0,
+        verifier_terminal_ratio=1.0,
+    )
+    high_review = evaluate_distribution_drift(
+        baseline,
+        sample,
+        verifier_needs_review_ratio=0.9,
+        verifier_terminal_ratio=0.1,
+    )
+
+    # The orthogonal distribution-shift signal is identical regardless of the
+    # verifier disposition mix.
+    assert low_review.distribution_distance == high_review.distribution_distance
+
+    # The verifier ratios ARE carried (the cross-reference) but on a SEPARATE
+    # field, never folded into the distribution distance.
+    assert low_review.verifier_needs_review_ratio == pytest.approx(0.0)
+    assert high_review.verifier_needs_review_ratio == pytest.approx(0.9)
+
+
+# ---------------------------------------------------------------------------
+# (d) physical separation: drift summary never carries parity/SLO fields
+# ---------------------------------------------------------------------------
+
+
+def test_drift_summary_has_no_parity_or_slo_fields():
+    """The drift summary must not expose precision/recall/SLO/pass-fail fields."""
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+    sample = [_finding(rule_id="github-pat", path="a.py", line_start=1)]
+    drift = evaluate_distribution_drift(baseline, sample)
+
+    record = drift.to_notification_dict()
+    forbidden = {
+        "precision",
+        "recall",
+        "macro_precision",
+        "macro_recall",
+        "slo",
+        "passed",
+        "pass",
+        "gate",
+        "threshold",
+    }
+    assert forbidden.isdisjoint(record.keys())
+
+    # And the summary dataclass itself exposes no precision/recall attribute.
+    assert not hasattr(drift, "precision")
+    assert not hasattr(drift, "recall")
+
+
+def test_drift_summary_is_early_warning_not_slo():
+    """The drift summary self-labels as a monitor (early warning), never an SLO."""
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+    sample = [_finding(rule_id="github-pat", path="a.py", line_start=1)]
+    drift = evaluate_distribution_drift(baseline, sample)
+    record = drift.to_notification_dict()
+    assert record["signal"] == "early-warning"
+    assert record["is_slo"] is False
+
+
+# ---------------------------------------------------------------------------
+# (e) notification_log exposure: a separate ``drift`` record type
+# ---------------------------------------------------------------------------
+
+
+def test_drift_record_builder_emits_separate_type(tmp_path: Path):
+    from security_scanner.runtime.notification_log import drift_record, write_record
+
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+    sample = [
+        _finding(rule_id="github-pat", path="a.py", line_start=1),
+        _finding(rule_id="github-pat", path="b.py", line_start=1),
+        _finding(rule_id="discord-api-token", path="c.yaml", line_start=1),
+    ]
+    drift = evaluate_distribution_drift(baseline, sample)
+
+    record = drift_record(event_at="2026-06-21T00:00:00+00:00", drift=drift)
+    assert record["type"] == "drift"
+    assert record["event_at"] == "2026-06-21T00:00:00+00:00"
+    assert record["distribution_distance"] == pytest.approx(drift.distribution_distance)
+
+    target = tmp_path / "log.jsonl"
+    write_record(target, record)
+    payload = json.loads(target.read_text(encoding="utf-8").splitlines()[0])
+    assert payload["type"] == "drift"
+
+
+# ---------------------------------------------------------------------------
+# (f) default-off via env gate
+# ---------------------------------------------------------------------------
+
+
+def test_drift_config_default_off_when_env_unset(monkeypatch):
+    monkeypatch.delenv(DRIFT_MONITOR_ENV_VAR, raising=False)
+    assert drift_config_from_env() is None
+
+
+def test_drift_config_off_for_falsey_env(monkeypatch):
+    for value in ("", "0", "false", "no", "off"):
+        monkeypatch.setenv(DRIFT_MONITOR_ENV_VAR, value)
+        assert drift_config_from_env() is None
+
+
+def test_drift_config_on_for_truthy_env(monkeypatch):
+    monkeypatch.setenv(DRIFT_MONITOR_ENV_VAR, "1")
+    config = drift_config_from_env()
+    assert config is not None
+    assert config.enabled is True
+
+
+# ---------------------------------------------------------------------------
+# (f)+(g) scan_all integration: default-off byte-identical, passive piggyback
+# ---------------------------------------------------------------------------
+
+
+from security_scanner.runtime.local_scan import (  # noqa: E402
+    LocalScanRequest,
+    LocalScanResult,
+    LocalScanTargetResult,
+)
+from security_scanner.runtime.scan_all import (  # noqa: E402
+    DriftConfig,
+    ScanAllRequest,
+    run_scan_all,
+)
+
+
+class _FakeCatalogStore:
+    def __init__(self, targets):
+        self._targets = targets
+
+    def list_scan_targets(self):
+        return list(self._targets)
+
+
+def _scan_target(url: str, name: str):
+    from security_scanner.catalog.scan_target import ScanTarget
+
+    return ScanTarget(url=url, name=name)
+
+
+def _scan_runner_with_findings(findings):
+    def runner(request: LocalScanRequest) -> LocalScanResult:
+        names = [t.name for t in request.manifest.targets]
+        target_results = [
+            LocalScanTargetResult(
+                target_name=names[0],
+                status="scanned",
+                finding_count=len(findings),
+                findings=list(findings),
+            ),
+            *[
+                LocalScanTargetResult(
+                    target_name=n, status="scanned", finding_count=0, findings=[]
+                )
+                for n in names[1:]
+            ],
+        ]
+        return LocalScanResult(
+            manifest_path="<in-memory>",
+            scan_run_id="scan_run_drift",
+            rule_pack_version="secret-rules-0.1.0",
+            destination="jsonl",
+            total_targets=len(names),
+            scanned=len(names),
+            total_findings=len(findings),
+            target_results=target_results,
+            scan_at_iso="2026-06-21T00:00:00+00:00",
+        )
+
+    return runner
+
+
+def _base_request(tmp_path, *, findings, drift_config=None) -> ScanAllRequest:
+    store = _FakeCatalogStore(
+        [_scan_target("https://example.test/org/repo-a", "org/repo-a")]
+    )
+    return ScanAllRequest(
+        store_factory=lambda: store,
+        storage_backend="jsonl",
+        output_destination=str(tmp_path / "out"),
+        notification_log_path=str(tmp_path / "scan-all.log.jsonl"),
+        lock_path=str(tmp_path / ".lock"),
+        fetch_repo=lambda url: tmp_path / "checkout",
+        scan_runner=_scan_runner_with_findings(findings),
+        drift_config=drift_config,
+    )
+
+
+def _read_records(path: Path) -> list[dict]:
+    if not path.exists():
+        return []
+    return [
+        json.loads(line)
+        for line in path.read_text(encoding="utf-8").splitlines()
+        if line.strip()
+    ]
+
+
+def test_scan_all_default_off_writes_no_drift_record(tmp_path, monkeypatch):
+    """Env unset => no drift_config => no drift record, stream unchanged."""
+    monkeypatch.delenv(DRIFT_MONITOR_ENV_VAR, raising=False)
+    (tmp_path / "checkout").mkdir()
+    findings = [_finding(rule_id="github-pat", path="a.py", line_start=1)]
+
+    result = run_scan_all(_base_request(tmp_path, findings=findings, drift_config=None))
+    assert result.exit_code == 0
+
+    records = _read_records(tmp_path / "scan-all.log.jsonl")
+    assert [r["type"] for r in records if r["type"] == "drift"] == []
+    # Existing record stream is exactly summary + finding (byte-identical shape).
+    assert sorted(r["type"] for r in records) == ["finding", "summary"]
+
+
+def test_scan_all_default_off_is_byte_identical_to_no_drift_param(tmp_path):
+    """Passing drift_config=None reproduces the pre-M4 record bytes exactly."""
+    (tmp_path / "checkout").mkdir()
+    findings = [_finding(rule_id="github-pat", path="a.py", line_start=1)]
+
+    run_scan_all(_base_request(tmp_path, findings=findings, drift_config=None))
+    drift_off_bytes = (tmp_path / "scan-all.log.jsonl").read_bytes()
+
+    # A second run into a fresh log with drift explicitly disabled.
+    tmp2 = tmp_path / "second"
+    tmp2.mkdir()
+    (tmp2 / "checkout").mkdir()
+    req2 = _base_request(tmp2, findings=findings, drift_config=DriftConfig(enabled=False))
+    run_scan_all(req2)
+    drift_disabled_bytes = (tmp2 / "scan-all.log.jsonl").read_bytes()
+
+    assert drift_off_bytes == drift_disabled_bytes
+
+
+def test_scan_all_enabled_writes_one_drift_record_at_completion(tmp_path):
+    """Enabled drift_config => exactly one passive drift record at completion."""
+    (tmp_path / "checkout").mkdir()
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+    findings = [
+        _finding(rule_id="github-pat", path="a.py", line_start=1),
+        _finding(rule_id="github-pat", path="b.py", line_start=1),
+        _finding(rule_id="discord-api-token", path="c.yaml", line_start=1),
+    ]
+
+    req = _base_request(
+        tmp_path,
+        findings=findings,
+        drift_config=DriftConfig(enabled=True, baseline=baseline),
+    )
+    result = run_scan_all(req)
+    assert result.exit_code == 0
+
+    records = _read_records(tmp_path / "scan-all.log.jsonl")
+    drift_records = [r for r in records if r["type"] == "drift"]
+    assert len(drift_records) == 1
+    drift_rec = drift_records[0]
+    assert drift_rec["sample_size"] == 3
+    # sample canonical dist = 2/3 github, 1/3 discord vs baseline 0.5/0.5;
+    # TVD = 0.5 * (|2/3-1/2| + |1/3-1/2|) = 1/6.
+    assert drift_rec["distribution_distance"] == pytest.approx(1 / 6)
+    # Physical separation holds on the wire record too.
+    assert "precision" not in drift_rec
+    assert "recall" not in drift_rec
+    assert drift_rec["is_slo"] is False
+
+    # Passive: the drift record is emitted exactly once, after the summary chain
+    # (no separate poll/timer produced extra records).
+    assert [r["type"] for r in records].count("drift") == 1
+
+
+def test_scan_all_drift_does_not_touch_summary_verification_slot(tmp_path):
+    """Drift must not leak into the parity/verification summary slot."""
+    (tmp_path / "checkout").mkdir()
+    repos, macro = _ghas_calibrated_macro()
+    baseline = DriftBaseline.from_macro_parity(repos, macro)
+    findings = [_finding(rule_id="github-pat", path="a.py", line_start=1)]
+
+    req = _base_request(
+        tmp_path,
+        findings=findings,
+        drift_config=DriftConfig(enabled=True, baseline=baseline),
+    )
+    run_scan_all(req)
+
+    records = _read_records(tmp_path / "scan-all.log.jsonl")
+    summary = next(r for r in records if r["type"] == "summary")
+    # The summary's verification slot is the parity/verifier channel; drift never
+    # rides in it.
+    verification = summary.get("verification")
+    if verification is not None:
+        assert "drift" not in verification
+        assert "distribution_distance" not in verification

From 75df35487feb092ed9e5b776b9c7d3c5b0878a52 Mon Sep 17 00:00:00 2001
From: pureliture <tkdgur1756@naver.com>
Date: Sun, 21 Jun 2026 12:51:24 +0900
Subject: [PATCH 7/7] =?UTF-8?q?feat(governance):=20M5=20report-only=20pari?=
 =?UTF-8?q?ty=20SLO=20=EA=B2=8C=EC=9D=B4=ED=8A=B8=20=E2=80=94=20governance?=
 =?UTF-8?q?.parity=5Fslo=20--check?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

자율층 M5(자율 goal done). 시크릿 GHAS parity를 frozen synthetic snapshot 대비
재현 측정하는 CI SLO 게이트. threshold 부재 → 영구 report-only(자율층은 실
baseline 없이 목표 못 만들므로). enforce·threshold 커밋은 H3 human-gated.

- governance/parity_slo.py(신규, allowed_writes 유일 governance 파일): three-mode.
  - threshold yml 부재/빈값 → report-only(항상 exit 0, 차단 안 함).
  - threshold 존재 → enforce(macro precision/recall vs precision_min/recall_min).
  - snapshot 나이>임계 또는 fetched_at 부재 → stale-degraded. report-only는
    warn+exit 0, enforce는 hard fail(silent pass 금지, design staleness-passive-only).
  - 측정은 M1(load_parity_snapshot provenance fail-closed → evaluate_repo_parity →
    aggregate_repo_parity, metrics.py 재사용) 소비. 신규 precision/recall 산식 0줄.
  - non-synthetic snapshot은 provenance fail-closed로 게이트 입력 거부(실 GHAS
    export 구동 불가).
- tests/test_governance_parity_slo.py: report-only/enforce(pass·fail)/stale-degraded
  (report-only warn·enforce block)/provenance fail-closed/CLI exit/committed corpus 증명.
- design.md: 현재 상태(SLO enforce 미달성·H-track 대기) + CI 배선 한계(.github
  scope 밖, report-only라 미배선이어도 차단 동일) + Component 표 governance→src
  의존·drift 노출 표면 명문화.

final 아키텍처 리뷰(system + codebase, opus) PASS: blocking 0, 자율 goal done=M5
달성. 검증: uv run pytest 1142 passed, required local checks 8종 전부 green(이제
parity_slo --check 포함), autopilot_gate --base 81d59d0 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TwGs78e6Rb7P5BDe2ezQEh
---
 .../specs/ghas-quality-secrets/design.md      |  17 +-
 governance/parity_slo.py                      | 347 ++++++++++++++++++
 tests/test_governance_parity_slo.py           | 263 +++++++++++++
 3 files changed, 624 insertions(+), 3 deletions(-)
 create mode 100644 governance/parity_slo.py
 create mode 100644 tests/test_governance_parity_slo.py

diff --git a/docs/workbench/specs/ghas-quality-secrets/design.md b/docs/workbench/specs/ghas-quality-secrets/design.md
index 2167723..2da07b4 100644
--- a/docs/workbench/specs/ghas-quality-secrets/design.md
+++ b/docs/workbench/specs/ghas-quality-secrets/design.md
@@ -18,7 +18,14 @@ live-fetch는 stop-condition(`ghas-live-fetch-or-mutation-required`), 커밋은
 
 **done 정의 명확화(리뷰 report-only-enforce-unreachable)**: 자율 goal done = **M5**(머신+harness+
 report-only 게이트, synthetic 증명, PR merge). requirements Q10의 v1 done(baseline 측정+목표 도달)은
-**H1~H3 완료 후에만** 성립. PR merge 시 CURRENT.md에 "SLO enforce 미달성, H-track 대기" 명시.
+**H1~H3 완료 후에만** 성립.
+
+**현재 상태(M5 완료 시점) — SLO enforce 미달성, H-track 대기**: 자율층 M0~M5가 synthetic fixture로
+완성됐고 `governance.parity_slo --check`는 **report-only**(threshold yml 부재 → 항상 exit 0, 차단 안 함).
+실 GHAS snapshot 취득(H1) → baseline 측정·목표 확정(H2) → threshold 커밋·enforce 전환(H3)은 human-gated라
+이 자율 run 범위 밖. enforce는 H3에서 threshold yml(`governance/parity_slo_thresholds.yml`)이 커밋될 때
+자동 활성. (CURRENT.md는 `governance/current.yml`의 render 생성물이라 자유 텍스트 비편집 — 이 상태 표기는
+SoT인 본 design.md와 PR body에 둔다.)
 
 ## Requirements Reference
 
@@ -106,8 +113,8 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
 | 인라인 싼 티어 | finding + path/context | 억제/disposition | **`scanners/gitleaks/{filter,parser}.py`(noise_reason, enable_noise_filter)**, `llm/common/prompt.py` DEFAULT_PATH_ROLE_ANCHORS 어휘 통일 |
 | 비동기 LLM 티어 | 애매 finding | verdict→disposition | `llm/common/verifier.py`, `llm/ollama/client.py`, `runtime/verify_artifact.py` |
 | disposition 배선 | terminal verdict | B-domain write | `runtime/scan_all.py`(기존) + **`runtime/scan_worker.py`(신규 2경로)** |
-| drift monitor | non-GHAS 샘플 | health 신호(분리 필드) | LLM 티어, `runtime/scan_health.py` 또는 notification_log(M4서 택1 명시) |
-| CI SLO gate | frozen snapshot, threshold | report-only/enforce/stale-degraded | **신규 `governance.parity_slo --check`** + metrics gate |
+| drift monitor | non-GHAS 샘플 | health 신호(분리 필드) | `runtime/drift_monitor.py`(M1 `aggregate_repo_parity` 기준선 소비) → `notification_log`(택1 확정). `scan_all`이 default-off 함수-로컬 import로 passive piggyback(circular 회피) |
+| CI SLO gate | frozen snapshot, threshold | report-only/enforce/stale-degraded | **신규 `governance.parity_slo.py`**(M1 `load_parity_snapshot`/`evaluate_repo_parity`/`aggregate_repo_parity` 소비 — governance가 측정 게이트용으로 `security_scanner` 라이브러리에 의존하는 첫 선례, `uv run` 루트 기준 해석). 신규 산식 0줄 |
 
 **Fixed decisions(리뷰 반영):**
 - 인라인 싼 티어는 **기존 `filter.py` noise_reason 확장**(이미 배선됨: `parser.py`에서 import·호출,
@@ -210,6 +217,10 @@ non-GHAS B-floor+C-monitor · no-network measure-first validity · 티어드 자
 - **M5 CI SLO gate(report-only) + stale-degraded** — `governance.parity_slo --check` 배선, threshold
   부재→report-only, snapshot 나이>임계→stale-degraded. _done: CI 측정·리포트, silent staleness 없음.
   final 아키텍처 리뷰 → PR merge. (자율 goal done; v1 done은 H3 후.)_
+  - **게이트는 `governance/parity_slo.py` 신규 + `acceptance_checks`에 `parity_slo --check` 등록(goal.yml)**.
+    `.github/workflows/ci.yml`은 allowed_writes 밖이라 자율 수정 불가 — ci.yml에 한 줄(`uv run python -m
+    governance.parity_slo --check`) 추가는 H-track 또는 사람 PR 후속. report-only라 미배선이어도 차단 효과는
+    동일(항상 exit 0); CI 가시성만 후속에서 확보. PR body에 명시.
 - **H1 실 GHAS snapshot 취득(human-gated)** — `ghas-live-fetch` stop → 사람 PR, 실 redacted snapshot(local 비커밋).
 - **H2 baseline + 목표 + divergence 보고(human-gated)** — 실 snapshot 대비 gap 측정, **fixture-vs-real
   분포 divergence 1회 보고**, measure-first 목표 확정.
diff --git a/governance/parity_slo.py b/governance/parity_slo.py
new file mode 100644
index 0000000..2f004f6
--- /dev/null
+++ b/governance/parity_slo.py
@@ -0,0 +1,347 @@
+"""GHAS secret-parity SLO gate (M5) — report-only until a threshold exists.
+
+This gate measures the secret detector's per-repo GHAS *parity* against frozen
+**synthetic** snapshot fixtures and reports macro precision/recall. It is the
+autonomous-layer CI vehicle for the ``ghas-quality-secrets-parity`` goal.
+
+Two-mode by design (requirements Q10 measure-first; design.md M5):
+
+* **report-only** — the default and the ONLY mode reachable autonomously: when no
+  threshold file exists (or it is empty), the gate prints the measured numbers and
+  ALWAYS exits 0. It never blocks. The real, calibrated thresholds are set only
+  after the human-gated H1~H3 track measures a real baseline, so until then there
+  is nothing legitimate to enforce.
+* **enforce** — reachable only once a human commits a threshold file: macro
+  precision/recall below the committed minimums fail the gate (exit 1). This is the
+  measure-first auto-branch (threshold present ⇒ enforce).
+
+Staleness is surfaced, never silently passed (design ``staleness-passive-only``):
+a snapshot older than the max age is reported as ``stale-degraded``. In
+report-only that is a visible warning (exit 0); in enforce it fails (exit 1) so a
+stale snapshot cannot silently satisfy the gate.
+
+Inputs are SYNTHETIC fixtures only. ``baseline.ghas_api.load_parity_snapshot``
+fails closed unless the snapshot carries ``source: synthetic`` provenance, so a
+real GHAS export can never drive this gate (it would be rejected, and real
+snapshots are gitignored + outside allowed_writes as a second block).
+
+Computation/gate reuse: per-repo precision/recall come straight from
+``core.evaluation.metrics`` via the ``baseline.ghas_api`` adapter; this module
+adds NO new precision/recall formula — it only loads snapshots, aggregates, reads
+an optional threshold, and judges report-only vs enforce vs stale.
+"""
+
+from __future__ import annotations
+
+import argparse
+import datetime as dt
+import json
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from security_scanner.baseline.ghas_api.normalize import (
+    DEFAULT_SECRET_TYPE_MAP,
+    SecretTypeNormalizer,
+)
+from security_scanner.baseline.ghas_api.parity import (
+    MacroParityResult,
+    ParitySnapshot,
+    aggregate_repo_parity,
+    evaluate_repo_parity,
+    load_parity_snapshot,
+)
+
+DEFAULT_SNAPSHOT_DIR = Path("eval/ghas-parity-corpus")
+DEFAULT_THRESHOLD_PATH = Path("governance/parity_slo_thresholds.yml")
+
+# A snapshot older than this is reported as stale-degraded. Synthetic fixtures
+# have no real freshness obligation, so the default is generous; the real cadence
+# SLA is set by the human-gated H3 step.
+DEFAULT_MAX_SNAPSHOT_AGE_DAYS = 90
+
+
+@dataclass(frozen=True)
+class ParitySloThresholds:
+    """Calibrated minimums. Absent until the human-gated H-track commits them."""
+
+    precision_min: float
+    recall_min: float
+
+
+@dataclass(frozen=True)
+class ParitySloResult:
+    """Outcome of one parity-SLO evaluation pass."""
+
+    mode: str  # "report-only" | "enforce"
+    macro: MacroParityResult
+    snapshot_count: int
+    stale: bool
+    stale_snapshots: tuple[str, ...]
+    thresholds: ParitySloThresholds | None
+    failures: tuple[str, ...]
+
+    @property
+    def passed(self) -> bool:
+        """Whether the gate should exit 0.
+
+        report-only never blocks (exit 0 even when stale or below target — there
+        is no committed target to enforce yet). enforce blocks on any failure,
+        including a stale snapshot (staleness must not silently pass).
+        """
+        if self.mode == "report-only":
+            return True
+        return not self.failures
+
+
+def load_thresholds(path: Path) -> ParitySloThresholds | None:
+    """Load calibrated thresholds, or None when absent/empty (report-only)."""
+    if not path.exists():
+        return None
+    raw = path.read_text(encoding="utf-8").strip()
+    if not raw:
+        return None
+    data = yaml.safe_load(raw)
+    if not isinstance(data, dict) or not data:
+        return None
+    try:
+        precision_min = float(data["precision_min"])
+        recall_min = float(data["recall_min"])
+    except (KeyError, TypeError, ValueError) as exc:
+        raise ValueError(
+            "parity_slo thresholds must define numeric precision_min and recall_min"
+        ) from exc
+    return ParitySloThresholds(precision_min=precision_min, recall_min=recall_min)
+
+
+def discover_snapshots(snapshot_dir: Path) -> list[Path]:
+    """Return committed synthetic snapshot fixture files (sorted, deterministic)."""
+    if not snapshot_dir.exists():
+        return []
+    return sorted(snapshot_dir.glob("*snapshot*.json"))
+
+
+def _snapshot_is_stale(
+    snapshot: ParitySnapshot, *, now: dt.datetime, max_age_days: int
+) -> bool:
+    """True when the snapshot's fetched_at is older than the max age.
+
+    A snapshot with no parseable fetched_at is treated as stale (unknown age must
+    not silently pass — design staleness-passive-only).
+    """
+    if not snapshot.fetched_at:
+        return True
+    parsed = _parse_timestamp(snapshot.fetched_at)
+    if parsed is None:
+        return True
+    age = now - parsed
+    return age > dt.timedelta(days=max_age_days)
+
+
+def _parse_timestamp(value: str) -> dt.datetime | None:
+    text = value.strip()
+    if text.endswith("Z"):
+        text = text[:-1] + "+00:00"
+    try:
+        parsed = dt.datetime.fromisoformat(text)
+    except ValueError:
+        return None
+    if parsed.tzinfo is None:
+        parsed = parsed.replace(tzinfo=dt.timezone.utc)
+    return parsed
+
+
+def evaluate_parity_slo(
+    *,
+    snapshot_dir: Path = DEFAULT_SNAPSHOT_DIR,
+    threshold_path: Path = DEFAULT_THRESHOLD_PATH,
+    now: dt.datetime | None = None,
+    max_age_days: int = DEFAULT_MAX_SNAPSHOT_AGE_DAYS,
+) -> ParitySloResult:
+    """Measure macro parity over synthetic snapshots and judge the SLO mode."""
+    now = now or dt.datetime.now(dt.timezone.utc)
+    thresholds = load_thresholds(threshold_path)
+    mode = "enforce" if thresholds is not None else "report-only"
+
+    normalizer = SecretTypeNormalizer(DEFAULT_SECRET_TYPE_MAP)
+    snapshot_paths = discover_snapshots(snapshot_dir)
+
+    repo_results = []
+    stale_snapshots: list[str] = []
+    for path in snapshot_paths:
+        # load_parity_snapshot fails closed on non-synthetic provenance.
+        snapshot = load_parity_snapshot(path)
+        if _snapshot_is_stale(snapshot, now=now, max_age_days=max_age_days):
+            stale_snapshots.append(path.name)
+        repo_results.append(
+            evaluate_repo_parity(
+                repo_full_name=snapshot.repo_full_name,
+                alerts=snapshot.alerts,
+                findings=snapshot.findings,
+                normalizer=normalizer,
+            )
+        )
+
+    macro = aggregate_repo_parity(repo_results)
+    stale = bool(stale_snapshots)
+
+    failures: list[str] = []
+    if thresholds is not None:
+        if macro.macro_precision < thresholds.precision_min:
+            failures.append(
+                f"macro precision {macro.macro_precision:.4f} < minimum "
+                f"{thresholds.precision_min:.4f}"
+            )
+        if macro.macro_recall < thresholds.recall_min:
+            failures.append(
+                f"macro recall {macro.macro_recall:.4f} < minimum "
+                f"{thresholds.recall_min:.4f}"
+            )
+        if stale:
+            # In enforce mode a stale snapshot is a hard failure: it must not
+            # silently satisfy the gate.
+            failures.append(
+                "stale-degraded: snapshot(s) older than "
+                f"{max_age_days}d: {', '.join(stale_snapshots)}"
+            )
+
+    return ParitySloResult(
+        mode=mode,
+        macro=macro,
+        snapshot_count=len(snapshot_paths),
+        stale=stale,
+        stale_snapshots=tuple(stale_snapshots),
+        thresholds=thresholds,
+        failures=tuple(failures),
+    )
+
+
+def render_report(result: ParitySloResult) -> str:
+    """Render a public-safe, aggregate-only parity-SLO report."""
+    lines = [
+        "GHAS Secret Parity SLO",
+        "======================",
+        f"Mode: {result.mode}",
+        f"Snapshots measured: {result.snapshot_count}",
+        f"Repos: {result.macro.repo_count}",
+        f"Macro precision: {result.macro.macro_precision:.4f}",
+        f"Macro recall: {result.macro.macro_recall:.4f}",
+        f"Type-unmatched-but-colocated: {result.macro.total_type_unmatched_but_colocated}",
+        f"GHAS-confirmed FP: {result.macro.total_ghas_confirmed_fp}",
+    ]
+    if result.thresholds is not None:
+        lines.append(
+            f"Thresholds: precision_min {result.thresholds.precision_min:.4f}, "
+            f"recall_min {result.thresholds.recall_min:.4f}"
+        )
+    else:
+        lines.append(
+            "Thresholds: none committed (report-only; enforce pending H-track)"
+        )
+    if result.stale:
+        lines.append(f"Stale-degraded: {', '.join(result.stale_snapshots)}")
+    if result.mode == "report-only":
+        lines.append("Result: REPORT-ONLY (never blocks; measure-first)")
+    elif result.failures:
+        lines.append("Result: FAIL")
+        for failure in result.failures:
+            lines.append(f"  - {failure}")
+    else:
+        lines.append("Result: PASS")
+    return "\n".join(lines) + "\n"
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--root", type=Path, default=Path.cwd())
+    parser.add_argument(
+        "--snapshot-dir",
+        type=Path,
+        default=DEFAULT_SNAPSHOT_DIR,
+        help="directory of committed synthetic snapshot fixtures",
+    )
+    parser.add_argument(
+        "--threshold-path",
+        type=Path,
+        default=DEFAULT_THRESHOLD_PATH,
+        help="optional calibrated threshold yml (absent => report-only)",
+    )
+    parser.add_argument(
+        "--max-age-days",
+        type=int,
+        default=DEFAULT_MAX_SNAPSHOT_AGE_DAYS,
+        help="snapshot age beyond which it is stale-degraded",
+    )
+    parser.add_argument(
+        "--check",
+        action="store_true",
+        help="evaluate and report; exit non-zero only in enforce mode failure",
+    )
+    parser.add_argument(
+        "--json", action="store_true", help="emit a machine-readable JSON summary"
+    )
+    args = parser.parse_args(argv)
+
+    root = args.root.resolve()
+    snapshot_dir = (
+        args.snapshot_dir
+        if args.snapshot_dir.is_absolute()
+        else root / args.snapshot_dir
+    )
+    threshold_path = (
+        args.threshold_path
+        if args.threshold_path.is_absolute()
+        else root / args.threshold_path
+    )
+
+    try:
+        result = evaluate_parity_slo(
+            snapshot_dir=snapshot_dir,
+            threshold_path=threshold_path,
+            max_age_days=args.max_age_days,
+        )
+    except Exception as exc:  # noqa: BLE001 - present any setup/provenance error.
+        print(f"parity_slo gate setup failed: {exc}", file=sys.stderr)
+        return 1
+
+    if args.json:
+        print(json.dumps(_result_to_dict(result), indent=2, sort_keys=True))
+    else:
+        print(render_report(result))
+
+    if result.passed:
+        return 0
+    for failure in result.failures:
+        print(f"parity_slo: {failure}", file=sys.stderr)
+    return 1
+
+
+def _result_to_dict(result: ParitySloResult) -> dict[str, Any]:
+    return {
+        "mode": result.mode,
+        "snapshotCount": result.snapshot_count,
+        "repoCount": result.macro.repo_count,
+        "macroPrecision": result.macro.macro_precision,
+        "macroRecall": result.macro.macro_recall,
+        "typeUnmatchedButColocated": result.macro.total_type_unmatched_but_colocated,
+        "ghasConfirmedFp": result.macro.total_ghas_confirmed_fp,
+        "stale": result.stale,
+        "staleSnapshots": list(result.stale_snapshots),
+        "thresholds": (
+            None
+            if result.thresholds is None
+            else {
+                "precisionMin": result.thresholds.precision_min,
+                "recallMin": result.thresholds.recall_min,
+            }
+        ),
+        "failures": list(result.failures),
+        "passed": result.passed,
+    }
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/test_governance_parity_slo.py b/tests/test_governance_parity_slo.py
new file mode 100644
index 0000000..6597b2f
--- /dev/null
+++ b/tests/test_governance_parity_slo.py
@@ -0,0 +1,263 @@
+"""Tests for the M5 GHAS secret-parity SLO gate (report-only until threshold).
+
+These exercise the three documented modes — report-only (no threshold),
+enforce (threshold committed), and stale-degraded (snapshot too old) — plus the
+provenance fail-closed guard on the snapshot input. All fixtures are synthetic
+and written into a tmp dir so the committed corpus is never the subject.
+"""
+
+from __future__ import annotations
+
+import datetime as dt
+import json
+from pathlib import Path
+
+import pytest
+
+from governance.parity_slo import (
+    discover_snapshots,
+    evaluate_parity_slo,
+    load_thresholds,
+    main,
+)
+
+NOW = dt.datetime(2026, 6, 21, 12, 0, tzinfo=dt.timezone.utc)
+
+
+def _snapshot_dict(
+    *,
+    repo: str = "synthetic-org/repo",
+    fetched_at: str = "2026-06-20T12:00:00+00:00",
+    matched: bool = True,
+) -> dict:
+    # One github-pat alert and (when matched) one local finding at the same
+    # location/normalized type, so macro precision/recall = 1.0; when not matched
+    # the local finding is omitted so recall drops (used for the enforce-fail case).
+    findings = []
+    if matched:
+        findings = [
+            {
+                "ruleId": "github-pat",
+                "filePath": "src/config/settings.py",
+                "lineStart": 10,
+                "fakeSecretMarker": "SCANNER_FAKE_SECRET_TOKEN_000001",
+            }
+        ]
+    return {
+        "schemaVersion": 1,
+        "source": "synthetic",
+        "repoFullName": repo,
+        "fetchedAt": fetched_at,
+        "alerts": [
+            {
+                "alertNumber": 1,
+                "secretType": "github_personal_access_token",
+                "state": "open",
+                "filePath": "src/config/settings.py",
+                "lineStart": 10,
+                "lineEnd": 10,
+            }
+        ],
+        "findings": findings,
+    }
+
+
+def _write_snapshot(directory: Path, data: dict, name: str = "synthetic-snapshot.json") -> Path:
+    directory.mkdir(parents=True, exist_ok=True)
+    path = directory / name
+    path.write_text(json.dumps(data), encoding="utf-8")
+    return path
+
+
+# --------------------------------------------------------------------------- #
+# report-only (no threshold)                                                  #
+# --------------------------------------------------------------------------- #
+
+
+def test_report_only_when_no_threshold_file(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict())
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir,
+        threshold_path=tmp_path / "absent.yml",
+        now=NOW,
+    )
+
+    assert result.mode == "report-only"
+    assert result.passed is True  # report-only NEVER blocks
+    assert result.macro.macro_precision == 1.0
+    assert result.macro.macro_recall == 1.0
+
+
+def test_report_only_passes_even_when_below_would_be_target(tmp_path):
+    # A recall miss (unmatched) in report-only still exits 0: there is no committed
+    # target to enforce yet (measure-first).
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict(matched=False))
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=tmp_path / "absent.yml", now=NOW
+    )
+
+    assert result.mode == "report-only"
+    assert result.macro.macro_recall < 1.0
+    assert result.passed is True
+
+
+def test_empty_threshold_file_is_report_only(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict())
+    threshold = tmp_path / "thresholds.yml"
+    threshold.write_text("", encoding="utf-8")
+
+    assert load_thresholds(threshold) is None
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=threshold, now=NOW
+    )
+    assert result.mode == "report-only"
+
+
+# --------------------------------------------------------------------------- #
+# enforce (threshold committed)                                               #
+# --------------------------------------------------------------------------- #
+
+
+def test_enforce_passes_when_macro_meets_threshold(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict())
+    threshold = tmp_path / "thresholds.yml"
+    threshold.write_text("precision_min: 0.9\nrecall_min: 0.9\n", encoding="utf-8")
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=threshold, now=NOW
+    )
+
+    assert result.mode == "enforce"
+    assert result.passed is True
+    assert result.failures == ()
+
+
+def test_enforce_fails_when_macro_below_threshold(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict(matched=False))  # recall < 1.0
+    threshold = tmp_path / "thresholds.yml"
+    threshold.write_text("precision_min: 0.9\nrecall_min: 0.99\n", encoding="utf-8")
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=threshold, now=NOW
+    )
+
+    assert result.mode == "enforce"
+    assert result.passed is False
+    assert any("recall" in f for f in result.failures)
+
+
+# --------------------------------------------------------------------------- #
+# stale-degraded (snapshot too old)                                           #
+# --------------------------------------------------------------------------- #
+
+
+def test_stale_in_report_only_warns_but_passes(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict(fetched_at="2025-01-01T00:00:00+00:00"))
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir,
+        threshold_path=tmp_path / "absent.yml",
+        now=NOW,
+        max_age_days=90,
+    )
+
+    assert result.stale is True
+    assert result.mode == "report-only"
+    assert result.passed is True  # surfaced, not silently passed, but not blocking
+
+
+def test_stale_in_enforce_fails_not_silent_pass(tmp_path):
+    # design staleness-passive-only: a stale snapshot must NOT silently satisfy an
+    # enforcing gate even when the numbers look fine.
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict(fetched_at="2025-01-01T00:00:00+00:00"))
+    threshold = tmp_path / "thresholds.yml"
+    threshold.write_text("precision_min: 0.9\nrecall_min: 0.9\n", encoding="utf-8")
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=threshold, now=NOW, max_age_days=90
+    )
+
+    assert result.stale is True
+    assert result.mode == "enforce"
+    assert result.passed is False
+    assert any("stale-degraded" in f for f in result.failures)
+
+
+def test_missing_fetched_at_is_treated_as_stale(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    data = _snapshot_dict()
+    del data["fetchedAt"]
+    _write_snapshot(snap_dir, data)
+
+    result = evaluate_parity_slo(
+        snapshot_dir=snap_dir, threshold_path=tmp_path / "absent.yml", now=NOW
+    )
+    assert result.stale is True
+
+
+# --------------------------------------------------------------------------- #
+# provenance fail-closed                                                       #
+# --------------------------------------------------------------------------- #
+
+
+def test_non_synthetic_snapshot_fails_closed(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    data = _snapshot_dict()
+    data["source"] = "real"  # not synthetic -> load must fail closed
+    _write_snapshot(snap_dir, data)
+
+    with pytest.raises(Exception):
+        evaluate_parity_slo(
+            snapshot_dir=snap_dir, threshold_path=tmp_path / "absent.yml", now=NOW
+        )
+
+
+# --------------------------------------------------------------------------- #
+# CLI exit codes + committed corpus                                           #
+# --------------------------------------------------------------------------- #
+
+
+def test_cli_check_report_only_exits_zero(tmp_path, capsys):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict())
+
+    code = main(
+        [
+            "--check",
+            "--snapshot-dir",
+            str(snap_dir),
+            "--threshold-path",
+            str(tmp_path / "absent.yml"),
+        ]
+    )
+    out = capsys.readouterr().out
+    assert code == 0
+    assert "report-only" in out
+
+
+def test_committed_corpus_runs_report_only(tmp_path):
+    # The committed eval/ghas-parity-corpus snapshot must drive the gate in
+    # report-only with no committed thresholds (autonomous layer is always
+    # report-only).
+    result = evaluate_parity_slo(threshold_path=tmp_path / "absent.yml", now=NOW)
+    assert result.mode == "report-only"
+    assert result.snapshot_count >= 1
+    assert result.passed is True
+
+
+def test_discover_snapshots_is_deterministic(tmp_path):
+    snap_dir = tmp_path / "corpus"
+    _write_snapshot(snap_dir, _snapshot_dict(), name="b-snapshot.json")
+    _write_snapshot(snap_dir, _snapshot_dict(), name="a-snapshot.json")
+
+    found = discover_snapshots(snap_dir)
+    assert [p.name for p in found] == ["a-snapshot.json", "b-snapshot.json"]