Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Agentic Workflow: GHAS급 시크릿 탐지 품질 (parity SLO)

**Status:** Ready for long single-goal execution
**Date:** 2026-06-21
**Goal ID:** `ghas-quality-secrets-parity`
**Spec:** `docs/workbench/specs/ghas-quality-secrets/{requirements,design,review}.md`
**Merge flow:** pull request

장시간 단일 goal 실행 패킷. 시크릿 탐지를 **GHAS parity SLO**에 맞추는 품질 머신과 측정 harness를
구축한다. 실 GHAS live-fetch는 stop-condition이므로 자율층은 **synthetic redacted snapshot fixture**로만
증명하고, 실 GHAS 취득·baseline·enforce는 human-gated(H1~H3)로 격리한다.

## Goal

시크릿 탐지의 per-repo 1:1 GHAS parity 측정 harness + 티어드 FP-억제 품질 머신 + report-only CI SLO
게이트를 synthetic fixture로 TDD 완성하고 PR/CI/merge까지 닫는다.

**완료 기준(자율 goal done = M5):**

- **M1**: `core/evaluation/metrics.py` 위 per-repo precision/recall parity harness. `baseline/ghas_api`는
GHAS alert→`EvaluationKey` 어댑터로만. `secret_type↔rule_id` 정규화 맵 + type-coverage 메타. state-aware
truth(open+resolved-TP만 positive; dismissed 분리). 라인 tolerance(구간겹침/±k). full-history universe.
**신규 precision/recall·gate 계산 코드 0줄(metrics 재사용).** 적대적 fixture(type-mismatch/line-drift/
dismissed)에서 누락이 red.
- **M2**: 인라인 싼 티어 = `scanners/gitleaks/filter.py` noise_reason 확장(path-role/context-class +
partner-pattern). 결정적·no-network 부분은 default-on, 동작 변경분만 gated. scan-time filter(미생성) vs
post-scan disposition(FALSE_POSITIVE) 경계 명시. 11-FP 억제 + canary TP 보존 + 기존 default 불변.
- **M3**: LLM 티어 disposition 자동배선. `runtime/scan_all.py`(기존) + **`runtime/scan_worker.py`(신규)**:
worker는 per-job 핫패스라 인라인 싼 티어만 동기, 애매 건 LLM verdict는 별도 비동기 큐/후속 잡. NEEDS_REVIEW
재verify backoff/skip-key. secretHash salt provenance.
- **M4**: non-GHAS drift monitor. GHAS-calibrated 분포(M1) 기준선 + verifier-직교 분포-shift 교차. 입력
`eval/synthetic-corpus` 재사용. parity metric과 분리 필드. default-off·오프라인 비활성. 폴링 스케줄 신설 없음.
- **M5**: report-only CI SLO 게이트 = 신규 `governance.parity_slo --check`. threshold yml 부재→report-only.
snapshot 나이>임계→`stale-degraded`(silent pass 금지).
- 기존 Gitleaks-first secret default path 불변. GHAS trigger/upload/mutation/live-fetch 없음.
- Architecture review gate(pre/post-M2/post-M3/final) blocking finding 없음. PR CI + local governance gate 통과.

**human-gated 운영층(자율 루프 밖, stop-condition PR):** H1 실 GHAS snapshot 취득(local 비커밋) → H2
baseline 측정 + fixture-vs-real divergence 보고 + measure-first 목표 확정 → H3 threshold 커밋 + enforce 전환.

## Execution Contract

- 단일 장기 goal로 M0~M5를 끝까지. 중간 milestone 사용자 승인 없음. 사람 개입은 stop condition 시에만.
- Subagent 적극 사용. 구현 worker는 `gpt-5.5` reasoning_effort high; 보조 coding/review는 repo policy.
- PR 만들고 CI 통과 후 merge 가능 상태까지. runtime mutation 허용하되 committed artifact는 synthetic/redacted만.
- 실 endpoint/host/credential/private path/real GHAS export/real finding 커밋 금지.

## Fixed Decisions

- Scope: 시크릿 서브트랙 M1~M5. vuln/SAST는 별도 goal(`.claude/specs/20260621-ghas-quality-vuln-subtrack/`).
- 측정: GHAS parity SLO, per-repo 1:1, snapshot=ground-truth(frozen synthetic). 계산은 `metrics.py` 재사용,
ghas_api는 어댑터. precision은 Q3상 GHAS를 못 넘음(parity = "GHAS만큼").
- validity check: no-network(verifier+휴리스틱+partner-pattern). live validity는 evidence-gated 연기.
- 인라인 싼 티어: 기존 `filter.py` 확장(default-on 결정적 부분), 신규 동작분만 gated.
- LLM 티어: detector 아님, verifier/explainer. redacted 입력(raw snippet 금지), strict JSON, fail-closed
NEEDS_REVIEW(무기록 + 재verify backoff). default-off gated.
- snapshot: synthetic redacted fixture만 커밋, `source: synthetic` provenance marker 필수, marker 없으면
fail-closed. 실 snapshot은 `.gitignore` + allowed_writes 비포함으로 이중 차단.
- CI 게이트: threshold 부재→report-only, 존재→enforce(measure-first 자동 분기). 자율층은 항상 report-only.
- **governance 핵심 자율수정 금지**: allowed_writes는 `governance/parity_slo.py`만. `autopilot_goal.yml`·
`autopilot_gate.py`·`public_safety.py` 수정 필요 시 stop(scope-expansion) → 사람 PR.
- GHAS: out of scope(자율). live fetch/upload/mutation 금지 → H1~H3 human-gated.

## Required Architecture Review Gate

Mandatory, blocking. Checkpoints: (1) pre-implementation (2) post-M2 (3) post-M3 (4) final.
Blocking finding이 SoT change/scope expansion/unsafe data/secret default change를 요구할 때만 정지;
그 외엔 같은 goal 안에서 수정.

## Multi-agent Execution Model

Subagent를 disjoint 책임으로. Main agent가 통합·최종 판단.

| Role | Responsibility | Write scope |
| --- | --- | --- |
| `system_architecture_manager` | Architecture gate, SoT drift, 측정 방법론 건전성 | read-only |
| `codebase_architecture_manager` | seam/locality(metrics 재사용, scan_worker 2경로, filter seam) | read-only |
| Worker A | parity harness + 정규화 맵 + 적대적 fixture | `src/security_scanner/baseline/**`, `eval/**`, tests |
| Worker B | 인라인 싼 티어(filter.py 확장) | `src/security_scanner/scanners/gitleaks/**`, tests |
| Worker C | LLM 티어 disposition 배선(scan_all + scan_worker) | `src/security_scanner/runtime/**`, `llm/**`, tests |
| Worker D | drift monitor + CI parity_slo 게이트 | `governance/parity_slo.py`, `src/security_scanner/runtime/scan_health*`, tests |
| Reviewer | public-safety/security review | read-only |
| `code_simplifier` | 최종 clarity pass(행동 보존) | touched files only |

## Allowed Write Surface

`governance/autopilot_goal.yml`의 `allowed_writes`가 authoritative. 요약: 승격된 spec, 이 workflow 문서,
src/tests/eval/examples, `governance/parity_slo.py`(신규 게이트만), ledger, CURRENT.md. **`governance/**`
광역 아님** — 그 밖 governance 파일 변경은 scope expansion으로 정지.

## Suggested Work Plan

### Readiness (M0)
1. 계약 읽기: `AGENTS.md`, `governance/autopilot_goal.yml`, 이 문서, spec 3종(requirements/design/review).
2. pre-implementation architecture review. 3. 워크트리 격리 + write surface 확인.

### M1 parity harness + 정규화 + 적대적 fixture
1. red-first: 정규화 맵 누락이 `type-unmatched` 버킷으로 분리됨; state-aware truth가 dismissed를 분모에서
제외; line-tolerance가 ±k 매칭; precision/recall이 `metrics.py`에서 산출; 적대적 fixture가 누락을 red로.
2. 구현: ghas_api alert→EvaluationKey 어댑터, 정규화 맵, harness가 metrics 재사용. **신규 계산 코드 0줄.**

### M2 인라인 싼 티어
1. red-first: path-role/context-class 억제, partner-pattern 고신뢰, canary TP 보존, 기존 default 불변.
2. 구현: filter.py noise_reason 확장. scan-time vs post-scan 경계 명시. post-M2 architecture review.

### M3 LLM 티어 disposition (scan_all + scan_worker)
1. red-first: scan_worker 동기 인라인 + 비동기 LLM 큐; NEEDS_REVIEW 무기록+backoff; disposition durable write.
2. 구현: scan_worker에 2경로 배선. post-M3 architecture review.

### M4 drift monitor
1. red-first: 기준선이 GHAS-derived임 증명, SLO 미오염, 분리 필드, 폴링 신설 없음.
2. 구현: eval/synthetic-corpus 재사용 + verifier-직교 분포-shift.

### M5 CI SLO gate(report-only)
1. red-first: threshold 부재→report-only, 나이>임계→stale-degraded.
2. 구현: `governance/parity_slo.py`. final architecture review → PR. CURRENT.md에 "SLO enforce 미달성, H-track 대기".

## Required Local Checks

```bash
uv run pytest
uv run python -m governance.render --validate
uv run python -m governance.render --check
uv run python -m governance.rebuild_ledger_index --check
uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
uv run python -m governance.public_safety --diff origin/main...HEAD
uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-secrets
uv run python -m governance.parity_slo --check
uv run python -m governance.autopilot_gate --base origin/main
```

## Stop Conditions

`governance/autopilot_goal.yml`의 `stop_conditions`(정본 16). 핵심: ghas-live-fetch-or-mutation-required
(H1 실 snapshot), existing-secret-default-behavior-change, architecture-review-blocking-finding,
storage-projection-or-schema-migration-required(disposition durable write), public-safety-hit,
scope-expansion(governance 핵심 파일 수정 포함), same-blocker-three-times, break-glass.

## Resume Prompt

```text
Goal: complete `ghas-quality-secrets-parity` in the security-scanner repo through a PR.

Read first:
- AGENTS.md
- governance/autopilot_goal.yml
- docs/workbench/agentic-workflows/2026-06-21-ghas-quality-secrets-goal.md
- docs/workbench/specs/ghas-quality-secrets/requirements.md
- docs/workbench/specs/ghas-quality-secrets/design.md
- docs/workbench/specs/ghas-quality-secrets/review.md
- src/security_scanner/baseline/ghas_api/__init__.py
- src/security_scanner/core/evaluation/metrics.py
- src/security_scanner/scanners/gitleaks/{filter,parser}.py
- src/security_scanner/runtime/{scan_all,scan_worker,verify_artifact}.py
- src/security_scanner/core/finding/model.py
- src/security_scanner/llm/common/verifier.py

Implement M1~M5 (autonomous, synthetic fixtures only, no real GHAS):
M1 parity harness on core/evaluation/metrics.py + secret_type<->rule_id normalization map +
state-aware truth + line tolerance + adversarial fixtures. Zero new precision/recall code.
M2 inline cheap tier by extending scanners/gitleaks/filter.py noise_reason (default-on deterministic
part, gated for new behavior). scan-time vs post-scan boundary.
M3 LLM tier disposition wiring into scan_all AND scan_worker (worker: sync inline + async LLM queue).
NEEDS_REVIEW no-write + re-verify backoff.
M4 non-GHAS drift monitor (GHAS-calibrated baseline, separated field, no new polling).
M5 report-only CI SLO gate governance.parity_slo --check (report-only until threshold yml exists).

Use multi-agent execution. Mandatory architecture gates: pre-implementation, post-M2, post-M3, final.
Do not change existing Gitleaks-first secret defaults. Do not call GHAS, upload, mutate, fetch live,
commit real snapshots/findings, or modify governance/autopilot_goal.yml | autopilot_gate.py |
public_safety.py (allowed_writes = governance/parity_slo.py only). Real GHAS snapshot fetch, baseline
measurement, and enforce flip are human-gated H1~H3, OUT of this run. Finish by opening a PR, waiting
for CI, and merging when green. Autonomous done = M5 (report-only gate); record "SLO enforce pending
H-track" in CURRENT.md.

Required checks:
- uv run pytest
- uv run python -m governance.render --validate
- uv run python -m governance.render --check
- uv run python -m governance.rebuild_ledger_index --check
- uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
- uv run python -m governance.public_safety --diff origin/main...HEAD
- uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-secrets
- uv run python -m governance.parity_slo --check
- uv run python -m governance.autopilot_gate --base origin/main
```
Loading
Loading