Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CURRENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

- Project: `security-scanner`
- Merge mode: `guarded-auto-merge`
- Active goal: `personal-prod-deploy`
- Active goal: `ghas-quality-vuln-parity`
- Last auto merge: `ledger:20260617T003405Z-autopilot-3236f4`
- Ledger entries: `4`
- Ledger index hash: `sha256:e1893a649a1101b74a087b5eaaa275813a85708c5bb46c4ae70c24e10a111050`
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Agentic Workflow: GHAS급 vuln/SAST 탐지 품질 (CodeQL parity SLO)

**Status:** Ready for long single-goal execution
**Date:** 2026-06-21
**Goal ID:** `ghas-quality-vuln-parity`
**Spec:** `docs/workbench/specs/ghas-quality-vuln-subtrack/{requirements,design,review}.md`
**Merge flow:** pull request

장시간 단일 goal 실행 패킷. vuln/SAST 탐지를 **GHAS code-scanning(CodeQL) parity SLO**에 맞추는 측정
harness + FP-억제 품질 머신을 구축한다. 시크릿 서브트랙(PR #58)의 검증된 2층 구조를 1:1 전이하되,
**vuln 고유로 durable disposition을 자율층에서 빼 H-track으로** 옮긴다(VulnerabilityFinding이 durable
store에 미적재 + `set_finding_disposition`이 FINDING_STATE 부재 시 ValueError → storage-projection
stop-condition). 실 code-scanning live-fetch는 stop-condition, 커밋은 synthetic-or-redacted-only.

## Goal

vuln/SAST의 per-repo 1:1 CodeQL parity 측정 harness + 인라인 FP-억제 티어 + 합성 회귀 게이트 enforce +
report-only parity 게이트 배선을 synthetic fixture로 TDD 완성하고 PR/CI/merge까지 닫는다.

**완료 기준(자율 goal done = M3):**

- **M1**: code-scanning 도메인 모델 `CodeScanAlertRecord`(redacted) + 매처
`compare_codescan_alerts_with_findings`(CWE-교집합 3등급: matched-by-cwe/by-rule-token/unmatched) +
적대적 fixture. **precision/recall은 `core/vulnerability/evaluation.py` 재사용(신규 계산 코드 0줄)**,
`CodeScanAlertRecord→VulnerabilityEvaluationKey` 어댑터로만. line-window는 진짜 `|alert−finding|≤N`,
recall 분모=open+fixed alert만·precision 페널티=dismissed 별도. 네트워크 0.
- **M2**: 인라인 싼 티어(scan-vuln 후처리: code_flow_count·severity floor·저신뢰 rule 억제) — 결정적·
메타데이터-only·억제율 회귀로 보장되는 부분만 default-on, 동작 바꾸는 신규 억제는 gated. 합성 코퍼스를
SQLi/XSS/path-traversal/command-injection/SSRF 5종으로 확장 + rule-class 정규화 적용. 기존 scan-vuln
default 출력 불변(canary TP 보존).
- **M3(자율 done)**: 합성 회귀 게이트 enforce(evaluate precision≥0.90/recall≥0.99) + report-only parity
게이트 `governance.vuln_parity_slo --check`(threshold yml 부재→report-only, frozen synthetic snapshot
대비, 나이>임계→stale-degraded) 배선. 실 snapshot 없이 결정적 재현 증명.
- 기존 Gitleaks-first secret + 기존 vuln scan/import/report/gate default path 불변.
- GHAS trigger/upload/alert mutation/**live-fetch 없음**. Architecture review(pre/post-M2/post-M3/final)
blocking 0. PR CI + local governance gate 통과.

**H-track(자율 루프 밖, stop-condition PR):** H1 실 code-scanning snapshot 취득 → H2 baseline + fixture-
vs-real divergence → H3 목표 확정 + parity enforce → **H4 vuln verdict durable disposition 배선(storage
projection)**.

## Execution Contract

- 단일 장기 goal로 M1~M3을 끝까지. 중간 승인 없음. 사람 개입은 stop-condition 시에만.
- Subagent 적극 사용(구현 worker gpt-5.5/high; 보조는 repo policy). PR 만들고 CI 통과 후 merge 가능까지.
- 실 endpoint/host/credential/private path/real SARIF/real code-scanning export/real finding 커밋 금지.

## Fixed Decisions

- Scope: vuln 자율 M1~M3(synthetic-only). 실 fetch·baseline·enforce·durable disposition은 H-track.
- 측정: CodeQL code-scanning alert oracle, per-repo 1:1, snapshot=ground-truth(frozen synthetic). 계산은
`core/vulnerability/evaluation.py` 재사용(제4 엔진 신설 금지). 합성 evaluate와 parity 매처 같은 계산 코어.
- 매칭: rule-class 정규화 + line-window를 합성 게이트·parity 둘 다 동일 의미론 적용(VFR8 정합).
- 인라인 티어: 결정적·메타데이터-only 부분 default-on, 동작 변경분 gated. validity-check 아날로그 없음.
- **durable disposition 금지(자율)**: vuln verdict는 v1 자율에서 기존 throwaway JSONL 유지. durable
영속은 storage projection 필요 → `storage-projection-or-schema-migration-required` stop → H4.
- snapshot: synthetic redacted fixture만 커밋(`source: synthetic` marker 필수, 없으면 fail-closed). 실
snapshot은 `.gitignore` + allowed_writes 비포함 이중 차단.
- **governance 핵심 자율수정 금지**: allowed_writes는 `governance/vuln_parity_slo.py`만(시크릿
`governance/parity_slo.py`와 별도 파일). `autopilot_goal.yml`·`autopilot_gate.py`·`public_safety.py`
수정 필요 시 stop(scope-expansion) → 사람 PR.
- 슬롯: 자율 코드는 active_goal 슬롯 없이 머지(머지 시 governance 3파일 main(theirs) 채택). 실 슬롯 전환은 사용자 결정.

## Required Architecture Review Gate

Mandatory blocking. pre-implementation / post-M2 / post-M3 / final. SoT change·scope expansion·unsafe
data·기존 default 변경 요구 시만 정지; 그 외 in-goal 수정.

## Multi-agent Execution Model

Subagent를 disjoint 책임으로(매처/모델 Worker A, 인라인 티어 Worker B, 합성 게이트+parity_slo Worker C,
architecture/security reviewer read-only, code_simplifier). Main agent 통합·최종 판단.

## Allowed Write Surface

`governance/autopilot_goal.yml`의 `allowed_writes`가 authoritative. 요약: 승격 spec, 이 workflow 문서,
src/tests/eval/examples, `governance/vuln_parity_slo.py`(신규 게이트만), ledger, CURRENT.md. **`governance/**`
광역 아님** — 그 밖 governance 변경은 scope expansion 정지.

## Suggested Work Plan

### Readiness (M0 = goal-setup, 이미 orchestrator가 수행)
goal-setup(spec 승격 + autopilot_goal.yml goal_id + current.yml active_goal + CURRENT.md 원자 커밋)은
orchestrator가 완료. 너는 pre-implementation architecture review부터 시작.

### M1 측정 substrate
1. red-first: 매처 CWE/rule-token/line-window/dismissed 채점; 적대적 fixture(CWE-부재/라인드리프트/
CodeQL↔Semgrep 다른 rule.id/dismissed)에서 정규화·윈도·필터 누락이 red; precision/recall이
`core/vulnerability/evaluation.py`에서 산출; 분모 state-aware.
2. 구현: CodeScanAlertRecord, 어댑터, 매처(신규 precision/recall 계산 0줄). line-window N fixture 확정.

### M2 인라인 티어 + 합성 강화
1. red-first: 안전 코드 FP 억제 + 취약 recall 유지(evaluate gate), default-on이 recall≥0.99 안 깸,
기존 default 출력 불변, 독립 적대 쌍 회귀.
2. 구현: 인라인 gating(default-on/gated 경계), 합성 코퍼스 5종 + rule-class 정규화. post-M2 review.

### M3 합성 게이트 + parity_slo (자율 done)
1. red-first: 합성 회귀 게이트 enforce; `governance/vuln_parity_slo.py` report-only(threshold 부재)·
frozen synthetic snapshot 대비·stale-degraded.
2. 구현: vuln_parity_slo.py. final review → PR. CURRENT.md에 "parity SLO enforce 미달성, H-track 대기".

## Required Local Checks

```bash
uv run pytest
uv run python -m governance.render --validate
uv run python -m governance.render --check
uv run python -m governance.rebuild_ledger_index --check
uv run python -m governance.render_github_ruleset --output governance/main_ruleset.json --check
uv run python -m governance.public_safety --diff origin/main...HEAD
uv run python -m governance.public_safety --path docs/workbench/specs/ghas-quality-vuln-subtrack
uv run python -m governance.vuln_parity_slo --check
uv run python -m governance.autopilot_gate --base origin/main
```

## Stop Conditions

`governance/autopilot_goal.yml`의 `stop_conditions`(정본 16). 핵심: ghas-live-fetch-or-mutation-required
(H1 실 fetch), **storage-projection-or-schema-migration-required**(durable disposition·snapshot durable →
H4), existing-secret-default-behavior-change, architecture-review-blocking-finding, public-safety-hit,
scope-expansion(governance 핵심 파일 수정), same-blocker-three-times, break-glass.

## Resume Prompt

```text
Goal: complete `ghas-quality-vuln-parity` in the security-scanner repo through a PR.

Read first:
- AGENTS.md
- governance/autopilot_goal.yml
- docs/workbench/agentic-workflows/2026-06-21-ghas-quality-vuln-parity-goal.md
- docs/workbench/specs/ghas-quality-vuln-subtrack/{requirements,design,review}.md
- src/security_scanner/core/vulnerability/{evaluation,model}.py
- src/security_scanner/baseline/ghas_api/__init__.py
- src/security_scanner/runtime/vulnerability_verify_artifact.py
- src/security_scanner/cli/commands (import-sarif/scan-vuln/report/gate/evaluate)

Implement M1~M3 (autonomous, synthetic fixtures only, no real GHAS/code-scanning):
M1 CodeScanAlertRecord + compare_codescan_alerts_with_findings matcher (CWE 3-tier) + adversarial
fixtures. Reuse core/vulnerability/evaluation.py (zero new precision/recall code). True |line|<=N
window, state-aware denominators.
M2 inline cheap tier (metadata-only default-on / gated for behavior change), synthetic corpus 5 CWE
classes + rule-class normalization. Existing scan-vuln default output unchanged.
M3 synthetic regression gate enforce + report-only parity gate governance.vuln_parity_slo --check.

Do NOT: durable-persist vuln verdict (storage projection -> H4 human-gated), call/fetch GHAS code-
scanning, commit real SARIF/findings, modify governance/autopilot_goal.yml | autopilot_gate.py |
public_safety.py (allowed_writes = governance/vuln_parity_slo.py only), change existing secret/vuln
scan defaults. Real snapshot fetch, baseline, enforce, durable disposition are human-gated H1~H4.
Use multi-agent. Mandatory architecture gates: pre-implementation, post-M2, post-M3, final. Finish
by opening a PR, waiting for CI, merge when green. Autonomous done = M3; record "parity SLO enforce
pending H-track" in CURRENT.md.

Required checks: pytest; render --validate/--check; rebuild_ledger_index --check;
render_github_ruleset --check; public_safety --diff and --path docs/workbench/specs/ghas-quality-vuln-
subtrack; vuln_parity_slo --check; autopilot_gate --base origin/main.
```
Loading
Loading