From 4b8a529f1af9276b02250befbca5882a94fa8e8f Mon Sep 17 00:00:00 2001 From: lsh1756 Date: Mon, 25 May 2026 20:46:46 +0900 Subject: [PATCH 1/9] docs: add getting-started guide for local checkout scanning Introduce docs/views/04-getting-started.md covering the end-to-end flow for scanning already-cloned local repositories without the LLM verifier path. README quick-start is trimmed to the minimal command list and now links to the new guide. docs/README.md index and doc-map.yml are updated to register the new view. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[MASKED_EMAIL]> --- README.md | 33 ++----- docs/README.md | 4 +- docs/_harness/doc-map.yml | 10 ++ docs/views/04-getting-started.md | 160 +++++++++++++++++++++++++++++++ 4 files changed, 180 insertions(+), 27 deletions(-) create mode 100644 docs/views/04-getting-started.md diff --git a/README.md b/README.md index 8109aad..daaaf16 100644 --- a/README.md +++ b/README.md @@ -51,41 +51,22 @@ targets.local.yaml -> workspace -> Gitleaks -> Finding -> local store -> report ## 빠른 시작 -필요한 도구: - -- `uv` -- `gitleaks` -- 스캔할 로컬 checkout - -의존성을 설치합니다. +전제: `uv`, `gitleaks` v8, 스캔할 로컬 checkout이 준비되어 있습니다. ```bash uv sync -``` - -예제 설정을 복사합니다. - -```bash cp examples/targets.local.example.yaml targets.local.yaml -``` +# targets.local.yaml에 로컬 checkout 경로를 적습니다. -기본 스캔을 실행합니다. - -```bash -uv run security-scanner scan \ - --manifest targets.local.yaml \ - --output private/findings.jsonl - -uv run security-scanner report \ - --findings private/findings.jsonl - -uv run security-scanner gate \ - --findings private/findings.jsonl \ - --max 0 +uv run security-scanner scan --manifest targets.local.yaml --output private/findings.jsonl +uv run security-scanner report --findings private/findings.jsonl +uv run security-scanner gate --findings private/findings.jsonl --max 0 ``` `private/`는 gitignore 대상입니다. 실제 스캔 결과와 로컬 설정은 이 경계 안에 두세요. +단계별 설명, manifest 필드, 트러블슈팅은 [시작하기 가이드](docs/views/04-getting-started.md)를 참고하세요. + ## 로컬 NoSQL 저장소 DynamoDB-compatible backend는 로컬에서 조회 패턴을 검증하기 위한 저장소입니다. 관리형 저장소 연동은 현재 지원 범위가 아닙니다. diff --git a/docs/README.md b/docs/README.md index c528c77..4a0a012 100644 --- a/docs/README.md +++ b/docs/README.md @@ -25,7 +25,8 @@ 2. [시스템 구조와 실행 환경](views/01-system-architecture-and-runtime.md) 3. [소스 스캔 결과 NoSQL Schema](views/02-source-scan-results-nosql-schema.md) 4. [탐지 결과와 지표](views/03-secret-detection-results-and-metrics.md) -5. [progress dashboard](dashboards/progress.html) +5. [시작하기](views/04-getting-started.md) +6. [progress dashboard](dashboards/progress.html) ## 공개 후보 문서 @@ -35,6 +36,7 @@ | [01-system-architecture-and-runtime.md](views/01-system-architecture-and-runtime.md) | 전체 구조, 실행 흐름, codebase dependency boundary | | [02-source-scan-results-nosql-schema.md](views/02-source-scan-results-nosql-schema.md) | 스캔 결과를 어떻게 저장하고 어떤 질문에 답하려는지 | | [03-secret-detection-results-and-metrics.md](views/03-secret-detection-results-and-metrics.md) | 어떤 결과를 공개할 수 있고, 지표를 어떻게 해석하는지 | +| [04-getting-started.md](views/04-getting-started.md) | 이미 클론된 로컬 저장소로 첫 스캔을 끝까지 돌리는 절차 | | [05-operations-transition-architecture.md](views/05-operations-transition-architecture.md) | 로컬 실행 경로에서 반복 실행과 확장 adapter로 넘어가는 기준 | | [06-research-and-technical-decisions.md](views/06-research-and-technical-decisions.md) | 왜 Gitleaks-first인지, 다른 도구는 어떤 위치인지 | | [09-public-repo-safety-policy.md](views/09-public-repo-safety-policy.md) | 공개 저장소에 쓰면 안 되는 것과 커밋 전 확인 기준 | diff --git a/docs/_harness/doc-map.yml b/docs/_harness/doc-map.yml index 7692446..5d2a893 100644 --- a/docs/_harness/doc-map.yml +++ b/docs/_harness/doc-map.yml @@ -93,6 +93,16 @@ human_views: docs/workbench/context/legacy-source/09-public-repo-security-policy.md: 9e1cfe3e4dbb772174b05160885852977a7d130810005bc0863700d74aa444c2 assets: - docs/assets/metrics-flow.svg + - view_id: getting-started + title: 시작하기 + path: docs/views/04-getting-started.md + output_path: confluence://프로젝트-개요-및-추진-전략/시작하기 + generated_by: human_doc_curator + generated_at: 2026-05-25 + safety_class: public + source_docs: [] + source_hashes: {} + assets: [] - view_id: operations-transition-architecture title: 확장 경로와 운영 기준 path: docs/views/05-operations-transition-architecture.md diff --git a/docs/views/04-getting-started.md b/docs/views/04-getting-started.md new file mode 100644 index 0000000..c3c1875 --- /dev/null +++ b/docs/views/04-getting-started.md @@ -0,0 +1,160 @@ +# 시작하기 + +이 문서는 **이미 로컬에 클론된 저장소들**을 대상으로 첫 스캔을 끝까지 돌리는 절차를 정리합니다. + +LLM verifier를 사용하지 않는 최소 경로만 다룹니다. Verifier와 synthetic 평가는 마지막 절에서 다음 단계로 안내합니다. + +대상 독자는 처음 이 도구를 실행해 보는 사람입니다. + +## 사전 확인 + +다음이 충족된 상태에서 시작합니다. + +| 항목 | 확인 명령 | 비고 | +| --- | --- | --- | +| `uv` 설치됨 | `uv --version` | Python runtime 관리 도구 | +| `gitleaks` v8 이상 | `gitleaks version` | 외부 secret 탐지 바이너리 | +| 스캔 대상 working tree | `ls /.git` | 각 저장소가 로컬 디렉터리로 존재 | + +조건이 빠져 있으면 각 도구의 공식 설치 안내를 따른 뒤 다시 진행합니다. 이 문서는 OS별 설치 절차를 다루지 않습니다. + +## 1. 프로젝트 의존성 설치 + +저장소 root에서 한 번 실행합니다. + +```bash +uv sync +``` + +성공하면 `.venv/`가 생성되고 이후 `uv run` 명령이 동작합니다. + +## 2. manifest 작성 + +스캔 대상은 `targets.local.yaml`에 나열합니다. 이 파일은 gitignore 대상이라 공개 저장소로 올라가지 않습니다. + +예시 파일을 복사한 뒤 편집합니다. + +```bash +cp examples/targets.local.example.yaml targets.local.yaml +``` + +이미 클론되어 있는 working tree의 절대 경로를 `path`로 적습니다. + +```yaml +version: 1 + +targets: + - name: org/repo-a + path: /home/user/repos/repo-a + enabled: true + - name: org/repo-b + path: /home/user/repos/repo-b + enabled: true + +scan: + include_history: true + exclude: + - "**/node_modules/**" + - "**/.venv/**" + +gitleaks_config: "" +``` + +| 필드 | 의미 | +| --- | --- | +| `name` | 보고서·저장소에서 사용할 논리 식별자. `org/repo` 형식 권장 | +| `path` | 로컬 working tree 절대 경로. `~` 사용 가능 | +| `enabled` | `false`로 두면 엔트리를 지우지 않고도 스캔에서 제외 | +| `include_history` | `true`면 git 커밋 히스토리까지 스캔, `false`면 working tree만 | +| `exclude` | 매니페스트의 glob 제외 패턴. 현재 구현에서는 아직 미적용 | +| `gitleaks_config` | gitleaks ruleset TOML 경로. 비우면 gitleaks 기본 룰 사용 | + +> `include_history`는 기본 `true`입니다. 첫 실행에서 결과량이 크면 `false`로 바꿔 working tree만 먼저 확인하는 편이 낫습니다. + +## 3. 결과 저장 경로 준비 + +스캔 결과는 `private/` 하위로 출력합니다. 이 디렉터리는 gitignore 대상입니다. + +```bash +mkdir -p private +``` + +## 4. 스캔 실행 + +```bash +uv run security-scanner scan \ + --manifest targets.local.yaml \ + --output private/findings.jsonl +``` + +실행하면 다음 형태의 로그가 출력됩니다. + +``` +Scanning 2 enabled target(s) from targets.local.yaml +Scan run ID: scan_, rule pack: secret-rules-0.1.0 + scanned org/repo-a: 12 finding(s) + scanned org/repo-b: 0 finding(s) +Done: scanned 2/2 target(s), 12 finding(s) -> private/findings.jsonl +``` + +`Scan run ID`는 이후 단계에서 특정 실행만 다시 볼 때 사용합니다. + +## 5. 보고서 확인 + +저장된 finding을 사람이 읽는 형태로 출력합니다. + +```bash +uv run security-scanner report \ + --findings private/findings.jsonl +``` + +전체 finding 수, target별 분포, rule별 분포가 표시됩니다. + +특정 실행만 보고 싶으면 `--scan-run-id`를 함께 넘깁니다. + +```bash +uv run security-scanner report \ + --findings private/findings.jsonl \ + --scan-run-id scan_ +``` + +## 6. Gate 판단 + +threshold를 넘는 blocking finding이 있으면 비정상 종료합니다. CI 통합용 명령입니다. + +```bash +uv run security-scanner gate \ + --findings private/findings.jsonl \ + --max 0 +``` + +`--max 0`은 blocking finding이 1건이라도 있으면 fail입니다. 운영 중인 저장소에 처음 적용할 때는 큰 값으로 시작해 baseline을 잡고 점진적으로 줄입니다. + +종료 코드 0은 pass, 1은 fail입니다. + +## 자주 막히는 지점 + +| 증상 | 원인 | 대응 | +| --- | --- | --- | +| `gitleaks binary not found` | gitleaks가 PATH에 없음 | gitleaks v8 설치 후 `which gitleaks` 확인 | +| `Target ...: local path does not exist` | manifest의 `path`가 잘못됨 | 절대 경로 확인, `~` 확장 여부 확인 | +| 모든 finding의 severity가 HIGH | rule 메타 매핑 미구현 | 현재 알려진 한계. 보고서 해석 시 rule id를 함께 본다 | +| 스캔이 너무 오래 걸림 | `include_history: true` + 커밋 많은 저장소 | manifest에서 `include_history: false`로 우선 확인 | +| finding이 0인데 분명 secret이 있을 것 같음 | 워킹트리에는 없고 과거 커밋에만 존재 | `include_history: true`로 다시 실행 | +| 결과 파일이 매번 덮어써짐 | `scan`은 JSONL 백엔드에서 매 실행 시 store를 초기화 | 이력을 남기려면 DynamoDB-compatible 백엔드 사용 | + +## 다음 단계 + +기본 스캔 흐름이 끝났습니다. 더 진행할 수 있는 경로는 다음과 같습니다. + +- **Verifier 적용** — finding을 redacted metadata 기준으로 triage 상태를 보조합니다. README의 평가와 verifier 절을 참고합니다. +- **Synthetic corpus 평가** — `eval/synthetic-corpus/`로 precision, recall, false negative를 측정합니다. +- **DynamoDB-compatible 저장소** — 여러 스캔 실행을 한 store에 누적합니다. README의 로컬 NoSQL 저장소 절을 참고합니다. +- **공개 저장소 안전 기준** — finding을 다른 도구로 옮기거나 공유하기 전에 [공개 저장소 안전 정책](09-public-repo-safety-policy.md)을 확인합니다. + +## 관련 문서 + +- [프로젝트 개요와 추진 전략](00-project-overview-and-strategy.md) +- [시스템 구조와 실행 환경](01-system-architecture-and-runtime.md) +- [탐지 결과와 지표](03-secret-detection-results-and-metrics.md) +- [공개 저장소 안전 정책](09-public-repo-safety-policy.md) From ed827c05038ce87961ac46f67f0a345375ba3b20 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 21:10:57 +0900 Subject: [PATCH 2/9] docs: remove numbered view filenames Co-Authored-By: Codex GPT-5 --- README.md | 26 ++++++++-------- docs/README.md | 30 +++++++++---------- docs/_harness/doc-map.yml | 18 +++++------ ...-getting-started.md => getting-started.md} | 10 +++---- ...scan-orchestration-target-architecture.md} | 0 ... => operations-transition-architecture.md} | 0 ...gy.md => project-overview-and-strategy.md} | 0 ...policy.md => public-repo-safety-policy.md} | 0 ...md => research-and-technical-decisions.md} | 0 ...> secret-detection-results-and-metrics.md} | 0 ...md => source-scan-results-nosql-schema.md} | 0 ....md => system-architecture-and-runtime.md} | 0 12 files changed, 42 insertions(+), 42 deletions(-) rename docs/views/{07-getting-started.md => getting-started.md} (94%) rename docs/views/{04-local-scan-orchestration-target-architecture.md => local-scan-orchestration-target-architecture.md} (100%) rename docs/views/{05-operations-transition-architecture.md => operations-transition-architecture.md} (100%) rename docs/views/{00-project-overview-and-strategy.md => project-overview-and-strategy.md} (100%) rename docs/views/{09-public-repo-safety-policy.md => public-repo-safety-policy.md} (100%) rename docs/views/{06-research-and-technical-decisions.md => research-and-technical-decisions.md} (100%) rename docs/views/{03-secret-detection-results-and-metrics.md => secret-detection-results-and-metrics.md} (100%) rename docs/views/{02-source-scan-results-nosql-schema.md => source-scan-results-nosql-schema.md} (100%) rename docs/views/{01-system-architecture-and-runtime.md => system-architecture-and-runtime.md} (100%) diff --git a/README.md b/README.md index 8170be9..3bf9927 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ targets.local.yaml -> workspace -> Gitleaks -> Finding -> local store -> report - synthetic corpus로 precision, recall, false negative를 계산합니다. - Ollama-compatible verifier로 finding의 triage 상태를 보조합니다. -자세한 진행 상황은 [progress dashboard](docs/dashboards/progress.html)와 [project overview](docs/views/00-project-overview-and-strategy.md)에 정리되어 있습니다. +자세한 진행 상황은 [progress dashboard](docs/dashboards/progress.html)와 [project overview](docs/views/project-overview-and-strategy.md)에 정리되어 있습니다. ## 현재 지원 범위 @@ -65,7 +65,7 @@ uv run security-scanner gate --findings private/findings.jsonl --max 0 `private/`는 gitignore 대상입니다. 실제 스캔 결과와 로컬 설정은 이 경계 안에 두세요. -단계별 설명, manifest 필드, 트러블슈팅은 [시작하기 가이드](docs/views/07-getting-started.md)를 참고하세요. +단계별 설명, manifest 필드, 트러블슈팅은 [시작하기 가이드](docs/views/getting-started.md)를 참고하세요. ## 로컬 NoSQL 저장소 @@ -93,7 +93,7 @@ uv run security-scanner gate \ 스캔을 실행하면 `Scan run ID`가 출력됩니다. 특정 실행 결과만 보고 싶으면 그 값을 `--scan-run-id`로 넘깁니다. 저장소 전체를 대상으로 판단할 때만 생략합니다. -Schema와 조회 기준은 [소스 스캔 결과 NoSQL Schema](docs/views/02-source-scan-results-nosql-schema.md)에 정리되어 있습니다. +Schema와 조회 기준은 [소스 스캔 결과 NoSQL Schema](docs/views/source-scan-results-nosql-schema.md)에 정리되어 있습니다. ## 평가와 verifier @@ -144,20 +144,20 @@ core <- scanners/storage/llm/adapters <- cli/runtime - `llm/ollama/`는 redacted metadata 기반 verifier 호출을 담당합니다. - `adapters/aws_future/`는 현재 실행 경로가 아닙니다. -더 자세한 설명은 [시스템 구조와 실행 환경](docs/views/01-system-architecture-and-runtime.md)을 보세요. +더 자세한 설명은 [시스템 구조와 실행 환경](docs/views/system-architecture-and-runtime.md)을 보세요. ## 문서 문서의 시작점은 [docs/README.md](docs/README.md)입니다. -- [프로젝트 개요와 추진 전략](docs/views/00-project-overview-and-strategy.md) -- [시스템 구조와 실행 환경](docs/views/01-system-architecture-and-runtime.md) -- [소스 스캔 결과 NoSQL Schema](docs/views/02-source-scan-results-nosql-schema.md) -- [탐지 결과와 지표](docs/views/03-secret-detection-results-and-metrics.md) -- [Local Scan Orchestration 목표 구조](docs/views/04-local-scan-orchestration-target-architecture.md) -- [확장 경로와 운영 기준](docs/views/05-operations-transition-architecture.md) -- [리서치 요약과 기술 결정](docs/views/06-research-and-technical-decisions.md) -- [공개 저장소 안전 정책](docs/views/09-public-repo-safety-policy.md) +- [프로젝트 개요와 추진 전략](docs/views/project-overview-and-strategy.md) +- [시스템 구조와 실행 환경](docs/views/system-architecture-and-runtime.md) +- [소스 스캔 결과 NoSQL Schema](docs/views/source-scan-results-nosql-schema.md) +- [탐지 결과와 지표](docs/views/secret-detection-results-and-metrics.md) +- [Local Scan Orchestration 목표 구조](docs/views/local-scan-orchestration-target-architecture.md) +- [확장 경로와 운영 기준](docs/views/operations-transition-architecture.md) +- [리서치 요약과 기술 결정](docs/views/research-and-technical-decisions.md) +- [공개 저장소 안전 정책](docs/views/public-repo-safety-policy.md) `docs/views/`, `docs/assets/`, `docs/dashboards/`는 publish 후보입니다. @@ -174,7 +174,7 @@ core <- scanners/storage/llm/adapters <- cli/runtime 3. 파일명을 지정해서 stage합니다. `git add -A`와 `git add .`는 피합니다. 4. 실제 증거 자료는 `private/` 또는 저장소 밖에 둡니다. -세부 기준은 [공개 저장소 안전 정책](docs/views/09-public-repo-safety-policy.md)에 있습니다. +세부 기준은 [공개 저장소 안전 정책](docs/views/public-repo-safety-policy.md)에 있습니다. ## 다음 작업 diff --git a/docs/README.md b/docs/README.md index 3784245..dc395d8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -21,27 +21,27 @@ ## 먼저 읽을 문서 -1. [프로젝트 개요와 추진 전략](views/00-project-overview-and-strategy.md) -2. [시스템 구조와 실행 환경](views/01-system-architecture-and-runtime.md) -3. [소스 스캔 결과 NoSQL Schema](views/02-source-scan-results-nosql-schema.md) -4. [탐지 결과와 지표](views/03-secret-detection-results-and-metrics.md) -5. [Local Scan Orchestration 구조](views/04-local-scan-orchestration-target-architecture.md) -6. [시작하기](views/07-getting-started.md) +1. [프로젝트 개요와 추진 전략](views/project-overview-and-strategy.md) +2. [시스템 구조와 실행 환경](views/system-architecture-and-runtime.md) +3. [소스 스캔 결과 NoSQL Schema](views/source-scan-results-nosql-schema.md) +4. [탐지 결과와 지표](views/secret-detection-results-and-metrics.md) +5. [Local Scan Orchestration 구조](views/local-scan-orchestration-target-architecture.md) +6. [시작하기](views/getting-started.md) 7. [progress dashboard](dashboards/progress.html) ## 공개 후보 문서 | 문서 | 읽으면 알 수 있는 것 | | --- | --- | -| [00-project-overview-and-strategy.md](views/00-project-overview-and-strategy.md) | 이 프로젝트가 해결하려는 문제, 현재 지원 범위, 성공 기준 | -| [01-system-architecture-and-runtime.md](views/01-system-architecture-and-runtime.md) | 전체 구조, 실행 흐름, codebase dependency boundary | -| [02-source-scan-results-nosql-schema.md](views/02-source-scan-results-nosql-schema.md) | 스캔 결과를 어떻게 저장하고 어떤 질문에 답하려는지 | -| [03-secret-detection-results-and-metrics.md](views/03-secret-detection-results-and-metrics.md) | 어떤 결과를 공개할 수 있고, 지표를 어떻게 해석하는지 | -| [04-local-scan-orchestration-target-architecture.md](views/04-local-scan-orchestration-target-architecture.md) | Local Scan Orchestration의 Python file 참조 관계와 runtime call chain | -| [05-operations-transition-architecture.md](views/05-operations-transition-architecture.md) | 로컬 실행 경로에서 반복 실행과 확장 adapter로 넘어가는 기준 | -| [06-research-and-technical-decisions.md](views/06-research-and-technical-decisions.md) | 왜 Gitleaks-first인지, 다른 도구는 어떤 위치인지 | -| [07-getting-started.md](views/07-getting-started.md) | 이미 클론된 로컬 저장소로 첫 스캔을 끝까지 돌리는 절차 | -| [09-public-repo-safety-policy.md](views/09-public-repo-safety-policy.md) | 공개 저장소에 쓰면 안 되는 것과 커밋 전 확인 기준 | +| [project-overview-and-strategy.md](views/project-overview-and-strategy.md) | 이 프로젝트가 해결하려는 문제, 현재 지원 범위, 성공 기준 | +| [system-architecture-and-runtime.md](views/system-architecture-and-runtime.md) | 전체 구조, 실행 흐름, codebase dependency boundary | +| [source-scan-results-nosql-schema.md](views/source-scan-results-nosql-schema.md) | 스캔 결과를 어떻게 저장하고 어떤 질문에 답하려는지 | +| [secret-detection-results-and-metrics.md](views/secret-detection-results-and-metrics.md) | 어떤 결과를 공개할 수 있고, 지표를 어떻게 해석하는지 | +| [local-scan-orchestration-target-architecture.md](views/local-scan-orchestration-target-architecture.md) | Local Scan Orchestration의 Python file 참조 관계와 runtime call chain | +| [operations-transition-architecture.md](views/operations-transition-architecture.md) | 로컬 실행 경로에서 반복 실행과 확장 adapter로 넘어가는 기준 | +| [research-and-technical-decisions.md](views/research-and-technical-decisions.md) | 왜 Gitleaks-first인지, 다른 도구는 어떤 위치인지 | +| [getting-started.md](views/getting-started.md) | 이미 클론된 로컬 저장소로 첫 스캔을 끝까지 돌리는 절차 | +| [public-repo-safety-policy.md](views/public-repo-safety-policy.md) | 공개 저장소에 쓰면 안 되는 것과 커밋 전 확인 기준 | ## 시각 자료 diff --git a/docs/_harness/doc-map.yml b/docs/_harness/doc-map.yml index 1f08f9a..75caeb5 100644 --- a/docs/_harness/doc-map.yml +++ b/docs/_harness/doc-map.yml @@ -32,7 +32,7 @@ source_sets: human_views: - view_id: project-overview-and-strategy title: 프로젝트 개요와 추진 전략 - path: docs/views/00-project-overview-and-strategy.md + path: docs/views/project-overview-and-strategy.md output_path: confluence://프로젝트-개요-및-추진-전략 generated_by: human_doc_curator generated_at: 2026-05-23 @@ -49,7 +49,7 @@ human_views: - docs/assets/supported-workflow-map.svg - view_id: system-architecture-and-runtime title: 시스템 구조와 실행 환경 - path: docs/views/01-system-architecture-and-runtime.md + path: docs/views/system-architecture-and-runtime.md output_path: confluence://프로젝트-개요-및-추진-전략/시스템-구조-및-실행-환경 generated_by: human_doc_curator generated_at: 2026-05-23 @@ -68,7 +68,7 @@ human_views: - docs/assets/codebase-architecture.svg - view_id: source-scan-results-nosql-schema title: 소스 스캔 결과 NoSQL Schema - path: docs/views/02-source-scan-results-nosql-schema.md + path: docs/views/source-scan-results-nosql-schema.md output_path: confluence://프로젝트-개요-및-추진-전략/소스코드-스캔-결과-NoSQL-Schema generated_by: human_doc_curator generated_at: 2026-05-23 @@ -81,7 +81,7 @@ human_views: - docs/assets/schema-access-patterns.svg - view_id: secret-detection-results-and-metrics title: 탐지 결과와 지표 - path: docs/views/03-secret-detection-results-and-metrics.md + path: docs/views/secret-detection-results-and-metrics.md output_path: confluence://프로젝트-개요-및-추진-전략/Secret-Detection-결과 generated_by: human_doc_curator generated_at: 2026-05-23 @@ -98,7 +98,7 @@ human_views: - docs/assets/metrics-flow.svg - view_id: local-scan-orchestration-target-architecture title: Local Scan Orchestration 구조 - path: docs/views/04-local-scan-orchestration-target-architecture.md + path: docs/views/local-scan-orchestration-target-architecture.md output_path: confluence://프로젝트-개요-및-추진-전략/Local-Scan-Orchestration-구조 generated_by: human_doc_curator generated_at: 2026-05-24 @@ -118,7 +118,7 @@ human_views: - docs/assets/local-scan-target-scenario-seams.drawio - view_id: operations-transition-architecture title: 확장 경로와 운영 기준 - path: docs/views/05-operations-transition-architecture.md + path: docs/views/operations-transition-architecture.md output_path: confluence://프로젝트-개요-및-추진-전략/운영-전환-및-배포-아키텍처 generated_by: human_doc_curator generated_at: 2026-05-23 @@ -135,7 +135,7 @@ human_views: - docs/assets/operations-transition.svg - view_id: research-and-technical-decisions title: 리서치 요약과 기술 결정 - path: docs/views/06-research-and-technical-decisions.md + path: docs/views/research-and-technical-decisions.md output_path: confluence://프로젝트-개요-및-추진-전략/리서치-요약-및-기술-결정 generated_by: human_doc_curator generated_at: 2026-05-23 @@ -152,7 +152,7 @@ human_views: - docs/assets/decision-matrix.svg - view_id: getting-started title: 시작하기 - path: docs/views/07-getting-started.md + path: docs/views/getting-started.md output_path: confluence://프로젝트-개요-및-추진-전략/시작하기 generated_by: human_doc_curator generated_at: 2026-05-25 @@ -162,7 +162,7 @@ human_views: assets: [] - view_id: public-repo-safety-policy title: 공개 저장소 안전 정책 - path: docs/views/09-public-repo-safety-policy.md + path: docs/views/public-repo-safety-policy.md output_path: confluence://프로젝트-개요-및-추진-전략/데이터-취급-및-공개-범위-관리 generated_by: human_doc_curator generated_at: 2026-05-23 diff --git a/docs/views/07-getting-started.md b/docs/views/getting-started.md similarity index 94% rename from docs/views/07-getting-started.md rename to docs/views/getting-started.md index c3c1875..6c2ef59 100644 --- a/docs/views/07-getting-started.md +++ b/docs/views/getting-started.md @@ -150,11 +150,11 @@ uv run security-scanner gate \ - **Verifier 적용** — finding을 redacted metadata 기준으로 triage 상태를 보조합니다. README의 평가와 verifier 절을 참고합니다. - **Synthetic corpus 평가** — `eval/synthetic-corpus/`로 precision, recall, false negative를 측정합니다. - **DynamoDB-compatible 저장소** — 여러 스캔 실행을 한 store에 누적합니다. README의 로컬 NoSQL 저장소 절을 참고합니다. -- **공개 저장소 안전 기준** — finding을 다른 도구로 옮기거나 공유하기 전에 [공개 저장소 안전 정책](09-public-repo-safety-policy.md)을 확인합니다. +- **공개 저장소 안전 기준** — finding을 다른 도구로 옮기거나 공유하기 전에 [공개 저장소 안전 정책](public-repo-safety-policy.md)을 확인합니다. ## 관련 문서 -- [프로젝트 개요와 추진 전략](00-project-overview-and-strategy.md) -- [시스템 구조와 실행 환경](01-system-architecture-and-runtime.md) -- [탐지 결과와 지표](03-secret-detection-results-and-metrics.md) -- [공개 저장소 안전 정책](09-public-repo-safety-policy.md) +- [프로젝트 개요와 추진 전략](project-overview-and-strategy.md) +- [시스템 구조와 실행 환경](system-architecture-and-runtime.md) +- [탐지 결과와 지표](secret-detection-results-and-metrics.md) +- [공개 저장소 안전 정책](public-repo-safety-policy.md) diff --git a/docs/views/04-local-scan-orchestration-target-architecture.md b/docs/views/local-scan-orchestration-target-architecture.md similarity index 100% rename from docs/views/04-local-scan-orchestration-target-architecture.md rename to docs/views/local-scan-orchestration-target-architecture.md diff --git a/docs/views/05-operations-transition-architecture.md b/docs/views/operations-transition-architecture.md similarity index 100% rename from docs/views/05-operations-transition-architecture.md rename to docs/views/operations-transition-architecture.md diff --git a/docs/views/00-project-overview-and-strategy.md b/docs/views/project-overview-and-strategy.md similarity index 100% rename from docs/views/00-project-overview-and-strategy.md rename to docs/views/project-overview-and-strategy.md diff --git a/docs/views/09-public-repo-safety-policy.md b/docs/views/public-repo-safety-policy.md similarity index 100% rename from docs/views/09-public-repo-safety-policy.md rename to docs/views/public-repo-safety-policy.md diff --git a/docs/views/06-research-and-technical-decisions.md b/docs/views/research-and-technical-decisions.md similarity index 100% rename from docs/views/06-research-and-technical-decisions.md rename to docs/views/research-and-technical-decisions.md diff --git a/docs/views/03-secret-detection-results-and-metrics.md b/docs/views/secret-detection-results-and-metrics.md similarity index 100% rename from docs/views/03-secret-detection-results-and-metrics.md rename to docs/views/secret-detection-results-and-metrics.md diff --git a/docs/views/02-source-scan-results-nosql-schema.md b/docs/views/source-scan-results-nosql-schema.md similarity index 100% rename from docs/views/02-source-scan-results-nosql-schema.md rename to docs/views/source-scan-results-nosql-schema.md diff --git a/docs/views/01-system-architecture-and-runtime.md b/docs/views/system-architecture-and-runtime.md similarity index 100% rename from docs/views/01-system-architecture-and-runtime.md rename to docs/views/system-architecture-and-runtime.md From ce67a0adc58931961c90ed63bff5f34f2a041bc3 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 21:17:50 +0900 Subject: [PATCH 3/9] docs: clarify source checkout quick start Co-Authored-By: Codex GPT-5 --- README.md | 4 +++- docs/views/getting-started.md | 12 +++++++++++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3bf9927..420f6e7 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,9 @@ targets.local.yaml -> workspace -> Gitleaks -> Finding -> local store -> report ## 빠른 시작 -전제: `uv`, `gitleaks` v8, 스캔할 로컬 checkout이 준비되어 있습니다. +전제: `uv`, `gitleaks` v8, `security-scanner` source checkout, 스캔할 별도 로컬 checkout이 준비되어 있습니다. + +아래 명령은 `security-scanner` 저장소 root에서 실행합니다. `targets.local.yaml`에는 스캔 대상 저장소 경로를 적습니다. ```bash uv sync diff --git a/docs/views/getting-started.md b/docs/views/getting-started.md index 6c2ef59..ca02583 100644 --- a/docs/views/getting-started.md +++ b/docs/views/getting-started.md @@ -1,11 +1,20 @@ # 시작하기 -이 문서는 **이미 로컬에 클론된 저장소들**을 대상으로 첫 스캔을 끝까지 돌리는 절차를 정리합니다. +이 문서는 `security-scanner` source checkout에서 CLI를 실행해, **이미 로컬에 클론된 다른 저장소들**을 대상으로 첫 스캔을 끝까지 돌리는 절차를 정리합니다. LLM verifier를 사용하지 않는 최소 경로만 다룹니다. Verifier와 synthetic 평가는 마지막 절에서 다음 단계로 안내합니다. 대상 독자는 처음 이 도구를 실행해 보는 사람입니다. +## 실행 위치 + +현재는 배포 패키지가 아니라 source checkout 기준으로 실행합니다. + +- `security-scanner` checkout: 이 문서의 명령을 실행하는 도구 저장소입니다. +- 스캔 대상 checkout: `targets.local.yaml`에 적는 별도 로컬 저장소입니다. + +아래 명령은 모두 `security-scanner` 저장소 root에서 실행합니다. 스캔 대상 저장소는 같은 디렉터리일 필요가 없습니다. + ## 사전 확인 다음이 충족된 상태에서 시작합니다. @@ -14,6 +23,7 @@ LLM verifier를 사용하지 않는 최소 경로만 다룹니다. Verifier와 s | --- | --- | --- | | `uv` 설치됨 | `uv --version` | Python runtime 관리 도구 | | `gitleaks` v8 이상 | `gitleaks version` | 외부 secret 탐지 바이너리 | +| `security-scanner` source checkout | `ls pyproject.toml src/security_scanner` | 이 문서의 명령을 실행하는 저장소 | | 스캔 대상 working tree | `ls /.git` | 각 저장소가 로컬 디렉터리로 존재 | 조건이 빠져 있으면 각 도구의 공식 설치 안내를 따른 뒤 다시 진행합니다. 이 문서는 OS별 설치 절차를 다루지 않습니다. From 4e2d21cd47b1ec20ac2cd425d2d47ac9c70038d4 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 21:28:30 +0900 Subject: [PATCH 4/9] docs: add ollama verifier quick start Co-Authored-By: Codex GPT-5 --- README.md | 9 ++++--- docs/views/getting-started.md | 45 ++++++++++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 420f6e7..cfb872b 100644 --- a/README.md +++ b/README.md @@ -113,12 +113,15 @@ uv run security-scanner evaluate \ Verifier 적용 전후도 같은 방식으로 비교합니다. +Ollama가 scanner를 실행하는 Ubuntu host에 설치되어 있으면 host는 localhost로 둡니다. + ```bash +export SECURITY_SCANNER_OLLAMA_HOST=http://127.0.0.1:11434 +export SECURITY_SCANNER_OLLAMA_MODEL=lfm2.5-thinking + uv run security-scanner verify \ --findings private/eval-findings.jsonl \ - --output private/eval-verified-findings.jsonl \ - --ollama-host "$SECURITY_SCANNER_OLLAMA_HOST" \ - --ollama-model "$SECURITY_SCANNER_OLLAMA_MODEL" + --output private/eval-verified-findings.jsonl uv run security-scanner evaluate \ --expected eval/synthetic-corpus/expected-findings.example.json \ diff --git a/docs/views/getting-started.md b/docs/views/getting-started.md index ca02583..3b04ae0 100644 --- a/docs/views/getting-started.md +++ b/docs/views/getting-started.md @@ -2,7 +2,7 @@ 이 문서는 `security-scanner` source checkout에서 CLI를 실행해, **이미 로컬에 클론된 다른 저장소들**을 대상으로 첫 스캔을 끝까지 돌리는 절차를 정리합니다. -LLM verifier를 사용하지 않는 최소 경로만 다룹니다. Verifier와 synthetic 평가는 마지막 절에서 다음 단계로 안내합니다. +기본 경로는 LLM verifier 없이 `scan`, `report`, `gate`까지 끝내는 흐름입니다. Ollama verifier는 선택 단계로 분리합니다. 대상 독자는 처음 이 도구를 실행해 보는 사람입니다. @@ -142,11 +142,51 @@ uv run security-scanner gate \ 종료 코드 0은 pass, 1은 fail입니다. +## 7. Ollama verifier 적용 + +Ollama verifier는 secret detector가 아닙니다. Gitleaks가 만든 finding을 redacted metadata 기준으로 재검토해 triage 상태만 보조합니다. Finding은 삭제하지 않습니다. + +이 단계는 Ubuntu에 Ollama가 설치되어 있고, 사용할 model이 준비된 경우에만 실행합니다. Scanner와 Ollama가 같은 Ubuntu host에 있으면 host는 localhost입니다. + +```bash +ollama list +``` + +Verifier 설정은 환경 변수로 넘길 수 있습니다. + +```bash +export SECURITY_SCANNER_OLLAMA_HOST=http://127.0.0.1:11434 +export SECURITY_SCANNER_OLLAMA_MODEL=lfm2.5-thinking +``` + +스캔 결과를 별도 verified artifact로 씁니다. 입력 파일을 덮어쓰지 않습니다. + +```bash +uv run security-scanner verify \ + --findings private/findings.jsonl \ + --output private/verified-findings.jsonl +``` + +검증된 결과를 보고서와 gate에 사용할 수 있습니다. + +```bash +uv run security-scanner report \ + --findings private/verified-findings.jsonl + +uv run security-scanner gate \ + --findings private/verified-findings.jsonl \ + --max 0 +``` + +Ollama 응답 실패, timeout, 낮은 confidence는 모두 review-needed로 남습니다. 실제 secret 값, raw match, code snippet은 verifier prompt에 보내지 않습니다. + ## 자주 막히는 지점 | 증상 | 원인 | 대응 | | --- | --- | --- | | `gitleaks binary not found` | gitleaks가 PATH에 없음 | gitleaks v8 설치 후 `which gitleaks` 확인 | +| `Ollama host is required` | verifier host 미설정 | `SECURITY_SCANNER_OLLAMA_HOST` 또는 `--ollama-host` 설정 | +| `Ollama model is required` | verifier model 미설정 | `SECURITY_SCANNER_OLLAMA_MODEL` 또는 `--ollama-model` 설정 | | `Target ...: local path does not exist` | manifest의 `path`가 잘못됨 | 절대 경로 확인, `~` 확장 여부 확인 | | 모든 finding의 severity가 HIGH | rule 메타 매핑 미구현 | 현재 알려진 한계. 보고서 해석 시 rule id를 함께 본다 | | 스캔이 너무 오래 걸림 | `include_history: true` + 커밋 많은 저장소 | manifest에서 `include_history: false`로 우선 확인 | @@ -155,9 +195,8 @@ uv run security-scanner gate \ ## 다음 단계 -기본 스캔 흐름이 끝났습니다. 더 진행할 수 있는 경로는 다음과 같습니다. +기본 스캔과 선택 verifier 흐름이 끝났습니다. 더 진행할 수 있는 경로는 다음과 같습니다. -- **Verifier 적용** — finding을 redacted metadata 기준으로 triage 상태를 보조합니다. README의 평가와 verifier 절을 참고합니다. - **Synthetic corpus 평가** — `eval/synthetic-corpus/`로 precision, recall, false negative를 측정합니다. - **DynamoDB-compatible 저장소** — 여러 스캔 실행을 한 store에 누적합니다. README의 로컬 NoSQL 저장소 절을 참고합니다. - **공개 저장소 안전 기준** — finding을 다른 도구로 옮기거나 공유하기 전에 [공개 저장소 안전 정책](public-repo-safety-policy.md)을 확인합니다. From b4a79dcee725cce79f775029148037a5daf5cae9 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 21:51:07 +0900 Subject: [PATCH 5/9] verifier: constrain ollama response schema Co-Authored-By: Codex GPT-5 --- README.md | 18 ++++++++++++ eval/verifier-corpus/README.md | 28 +++++++++++++++++++ .../checkout/config/positive.env | 4 +++ eval/verifier-corpus/checkout/docs/example.md | 7 +++++ .../expected-findings.example.json | 22 +++++++++++++++ eval/verifier-corpus/gitleaks.synthetic.toml | 10 +++++++ .../targets.local.example.yaml | 13 +++++++++ src/security_scanner/llm/common/prompt.py | 28 +++++++++++++++++++ src/security_scanner/llm/ollama/client.py | 24 +++++++++++++++- tests/test_llm_verifier.py | 19 +++++++++++++ 10 files changed, 172 insertions(+), 1 deletion(-) create mode 100644 eval/verifier-corpus/README.md create mode 100644 eval/verifier-corpus/checkout/config/positive.env create mode 100644 eval/verifier-corpus/checkout/docs/example.md create mode 100644 eval/verifier-corpus/expected-findings.example.json create mode 100644 eval/verifier-corpus/gitleaks.synthetic.toml create mode 100644 eval/verifier-corpus/targets.local.example.yaml diff --git a/README.md b/README.md index cfb872b..82387bd 100644 --- a/README.md +++ b/README.md @@ -129,6 +129,24 @@ uv run security-scanner evaluate \ --after-findings private/eval-verified-findings.jsonl ``` +오탐 감소 흐름은 detector-visible documentation candidate를 포함한 별도 corpus로 확인합니다. + +```bash +uv run security-scanner scan \ + --manifest eval/verifier-corpus/targets.local.example.yaml \ + --output private/verifier-findings.jsonl + +uv run security-scanner verify \ + --findings private/verifier-findings.jsonl \ + --output private/verifier-verified-findings.jsonl + +uv run security-scanner evaluate \ + --expected eval/verifier-corpus/expected-findings.example.json \ + --findings private/verifier-findings.jsonl \ + --after-findings private/verifier-verified-findings.jsonl \ + --precision-min 0.5 +``` + Verifier는 detector가 아닙니다. Finding을 삭제하지 않고, 사람이 검토할 때 참고할 triage 상태만 붙입니다. 응답 실패, timeout, 낮은 confidence는 모두 “검토 필요” 상태로 남깁니다. diff --git a/eval/verifier-corpus/README.md b/eval/verifier-corpus/README.md new file mode 100644 index 0000000..5a7a836 --- /dev/null +++ b/eval/verifier-corpus/README.md @@ -0,0 +1,28 @@ +# Synthetic Verifier Corpus + +This corpus is for Ollama verifier smoke tests. It intentionally contains one +expected positive candidate and one detector-visible documentation candidate +that should be safe for a verifier to clear as a false positive. + +All values are fake and public-safe. + +Run from the repository root: + +```bash +uv run security-scanner scan \ + --manifest eval/verifier-corpus/targets.local.example.yaml \ + --output private/verifier-findings.jsonl + +uv run security-scanner verify \ + --findings private/verifier-findings.jsonl \ + --output private/verifier-verified-findings.jsonl + +uv run security-scanner evaluate \ + --expected eval/verifier-corpus/expected-findings.example.json \ + --findings private/verifier-findings.jsonl \ + --after-findings private/verifier-verified-findings.jsonl \ + --precision-min 0.5 +``` + +The `private/` output path is gitignored. Do not commit generated verifier +artifacts. diff --git a/eval/verifier-corpus/checkout/config/positive.env b/eval/verifier-corpus/checkout/config/positive.env new file mode 100644 index 0000000..0e16b2e --- /dev/null +++ b/eval/verifier-corpus/checkout/config/positive.env @@ -0,0 +1,4 @@ +# Synthetic verifier fixture. +# Values in this file are fake and public-safe. +SERVICE_NAME=synthetic-verifier +SYNTHETIC_TOKEN=SCANNER_FAKE_SECRET_TOKEN_100001 diff --git a/eval/verifier-corpus/checkout/docs/example.md b/eval/verifier-corpus/checkout/docs/example.md new file mode 100644 index 0000000..a82683b --- /dev/null +++ b/eval/verifier-corpus/checkout/docs/example.md @@ -0,0 +1,7 @@ +# Synthetic Documentation Example + +This documentation shows a fake token shape for parser examples: + +`SCANNER_FAKE_SECRET_TOKEN_900001` + +The value above is intentionally synthetic and public-safe. diff --git a/eval/verifier-corpus/expected-findings.example.json b/eval/verifier-corpus/expected-findings.example.json new file mode 100644 index 0000000..934d238 --- /dev/null +++ b/eval/verifier-corpus/expected-findings.example.json @@ -0,0 +1,22 @@ +{ + "schemaVersion": 1, + "name": "synthetic-verifier-v1", + "description": "Public-safe verifier corpus with one expected positive and one documentation false-positive candidate.", + "expectedFindings": [ + { + "repoFullName": "synthetic-org/verifier-repo", + "filePath": "config/positive.env", + "lineStart": 4, + "ruleId": "synthetic-fake-token" + } + ], + "knownNegatives": [ + { + "repoFullName": "synthetic-org/verifier-repo", + "filePath": "docs/example.md", + "lineStart": 5, + "ruleId": "synthetic-fake-token", + "reason": "Documentation shows a synthetic token-shaped example, not a real credential." + } + ] +} diff --git a/eval/verifier-corpus/gitleaks.synthetic.toml b/eval/verifier-corpus/gitleaks.synthetic.toml new file mode 100644 index 0000000..7b40148 --- /dev/null +++ b/eval/verifier-corpus/gitleaks.synthetic.toml @@ -0,0 +1,10 @@ +title = "security-scanner synthetic verifier rules" + +[[rules]] +id = "synthetic-fake-token" +description = "Synthetic fake token marker for public verifier fixtures" +regex = '''SCANNER_FAKE_SECRET_TOKEN_[0-9]{6}''' +secretGroup = 0 +keywords = [ + "SCANNER_FAKE_SECRET_TOKEN", +] diff --git a/eval/verifier-corpus/targets.local.example.yaml b/eval/verifier-corpus/targets.local.example.yaml new file mode 100644 index 0000000..2e176d8 --- /dev/null +++ b/eval/verifier-corpus/targets.local.example.yaml @@ -0,0 +1,13 @@ +version: 1 + +targets: + - name: synthetic-org/verifier-repo + path: eval/verifier-corpus/checkout + enabled: true + +scan: + include_history: false + exclude: [] + +# Gitleaks runs with the checkout as cwd, so this path is relative to checkout/. +gitleaks_config: ../gitleaks.synthetic.toml diff --git a/src/security_scanner/llm/common/prompt.py b/src/security_scanner/llm/common/prompt.py index f53b75c..7809925 100644 --- a/src/security_scanner/llm/common/prompt.py +++ b/src/security_scanner/llm/common/prompt.py @@ -21,6 +21,7 @@ def build_redacted_prompt(finding: Finding) -> str: "repoFingerprint": _fingerprint(finding.repo.full_name), "location": { "pathKind": _path_kind(finding.location.file_path), + "pathRole": _path_role(finding.location.file_path), "fileExtension": _file_extension(finding.location.file_path), "lineStart": finding.location.line_start, "lineEnd": finding.location.line_end, @@ -38,6 +39,10 @@ def build_redacted_prompt(finding: Finding) -> str: "Do not request or reveal raw secrets, snippets, repository names, hosts, or paths.", "Return strict JSON only with keys: label, confidence, reason.", "Allowed label values: true_positive, false_positive, needs_review.", + "confidence must be a JSON number between 0.0 and 1.0.", + 'Do not return confidence as a string such as "LOW", "MEDIUM", or "HIGH".', + "Use pathRole as weak context: documentation/example/test paths are more likely false_positive; configuration/source paths are more likely true_positive.", + 'Example valid response: {"label":"needs_review","confidence":0.61,"reason":"metadata is insufficient for a safer decision"}', "Finding metadata:", json.dumps(metadata, sort_keys=True, separators=(",", ":")), ] @@ -58,3 +63,26 @@ def _file_extension(file_path: str) -> str: if not suffix or len(suffix) > 16: return "" return suffix + + +def _path_role(file_path: str) -> str: + path = PurePath(file_path) + parts = {part.lower() for part in path.parts} + suffix = path.suffix.lower() + name = path.name.lower() + + if suffix in {".md", ".rst", ".txt"} or parts & {"docs", "doc", "documentation"}: + return "documentation" + if parts & {"example", "examples", "fixture", "fixtures", "sample", "samples"}: + return "example" + if parts & {"test", "tests", "__tests__"} or name.startswith("test_"): + return "test" + if suffix in {".env", ".ini", ".toml", ".yaml", ".yml", ".json"} or parts & { + "config", + "configs", + "settings", + }: + return "configuration" + if suffix in {".py", ".js", ".ts", ".tsx", ".go", ".java", ".rb", ".php", ".rs"}: + return "source" + return "other" diff --git a/src/security_scanner/llm/ollama/client.py b/src/security_scanner/llm/ollama/client.py index ab4e9ad..d2f0f36 100644 --- a/src/security_scanner/llm/ollama/client.py +++ b/src/security_scanner/llm/ollama/client.py @@ -20,6 +20,27 @@ Transport = Callable[[dict, float], str] +_VERIFIER_RESPONSE_SCHEMA = { + "type": "object", + "properties": { + "label": { + "type": "string", + "enum": ["true_positive", "false_positive", "needs_review"], + }, + "confidence": { + "type": "number", + "minimum": 0.0, + "maximum": 1.0, + }, + "reason": { + "type": "string", + }, + }, + "required": ["label", "confidence", "reason"], + "additionalProperties": False, +} + + class OllamaChatVerifier: """Verifier that sends redacted prompts to an Ollama-compatible chat API.""" @@ -31,7 +52,8 @@ def verify(self, finding: Finding) -> VerifierResult: payload = { "model": self.config.model, "stream": False, - "format": "json", + "format": _VERIFIER_RESPONSE_SCHEMA, + "options": {"temperature": 0}, "messages": [ { "role": "system", diff --git a/tests/test_llm_verifier.py b/tests/test_llm_verifier.py index dabacc5..dfe761b 100644 --- a/tests/test_llm_verifier.py +++ b/tests/test_llm_verifier.py @@ -65,6 +65,17 @@ def test_prompt_excludes_raw_secret_match_repo_and_private_absolute_path(): assert "salted-sha256:" in prompt +def test_prompt_requires_numeric_confidence_and_includes_safe_path_role(): + prompt = build_redacted_prompt( + _finding(file_path="docs/sample.md", line_start=7) + ) + + assert "confidence must be a JSON number between 0.0 and 1.0" in prompt + assert "Do not return confidence as a string" in prompt + assert '"pathRole":"documentation"' in prompt + assert "docs/sample.md" not in prompt + + def test_valid_json_maps_to_verifier_verdict(): result = parse_verifier_response( json.dumps( @@ -177,6 +188,14 @@ def log_message(self, format: str, *args: object) -> None: body = captured["body"] assert body["model"] == "lfm2.5-thinking" + assert body["format"]["type"] == "object" + assert body["format"]["properties"]["confidence"]["type"] == "number" + assert body["format"]["properties"]["label"]["enum"] == [ + "true_positive", + "false_positive", + "needs_review", + ] + assert body["options"]["temperature"] == 0 rendered = json.dumps(body) assert FAKE_SECRET not in rendered assert FAKE_MATCH not in rendered From 6be747017f2a05bc28ce245950833cb1592b7c51 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 22:05:18 +0900 Subject: [PATCH 6/9] verifier: tighten ollama triage prompt Co-Authored-By: Codex GPT-5 --- src/security_scanner/llm/common/prompt.py | 5 ++++- tests/test_llm_verifier.py | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/security_scanner/llm/common/prompt.py b/src/security_scanner/llm/common/prompt.py index 7809925..ea23ac0 100644 --- a/src/security_scanner/llm/common/prompt.py +++ b/src/security_scanner/llm/common/prompt.py @@ -41,7 +41,10 @@ def build_redacted_prompt(finding: Finding) -> str: "Allowed label values: true_positive, false_positive, needs_review.", "confidence must be a JSON number between 0.0 and 1.0.", 'Do not return confidence as a string such as "LOW", "MEDIUM", or "HIGH".', - "Use pathRole as weak context: documentation/example/test paths are more likely false_positive; configuration/source paths are more likely true_positive.", + "Do not choose needs_review only because raw secret text is redacted.", + "Use needs_review only when the metadata conflicts or has no usable location signal.", + "Decision rule: configuration/source pathRole with a secret-scanner hit is usually true_positive.", + "Decision rule: documentation/example/test pathRole with a token-shaped example is usually false_positive.", 'Example valid response: {"label":"needs_review","confidence":0.61,"reason":"metadata is insufficient for a safer decision"}', "Finding metadata:", json.dumps(metadata, sort_keys=True, separators=(",", ":")), diff --git a/tests/test_llm_verifier.py b/tests/test_llm_verifier.py index dfe761b..ef1d536 100644 --- a/tests/test_llm_verifier.py +++ b/tests/test_llm_verifier.py @@ -72,6 +72,9 @@ def test_prompt_requires_numeric_confidence_and_includes_safe_path_role(): assert "confidence must be a JSON number between 0.0 and 1.0" in prompt assert "Do not return confidence as a string" in prompt + assert "Do not choose needs_review only because raw secret text is redacted." in prompt + assert "configuration/source pathRole" in prompt + assert "documentation/example/test pathRole" in prompt assert '"pathRole":"documentation"' in prompt assert "docs/sample.md" not in prompt From 21cc42b872f0af30c56e380dfe74c6f68aee4634 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 22:10:39 +0900 Subject: [PATCH 7/9] verifier: make ollama prompt deterministic Co-Authored-By: Codex GPT-5 --- src/security_scanner/llm/common/prompt.py | 11 +++++++---- src/security_scanner/llm/ollama/client.py | 1 + tests/test_llm_verifier.py | 4 ++-- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/src/security_scanner/llm/common/prompt.py b/src/security_scanner/llm/common/prompt.py index ea23ac0..1536576 100644 --- a/src/security_scanner/llm/common/prompt.py +++ b/src/security_scanner/llm/common/prompt.py @@ -42,10 +42,13 @@ def build_redacted_prompt(finding: Finding) -> str: "confidence must be a JSON number between 0.0 and 1.0.", 'Do not return confidence as a string such as "LOW", "MEDIUM", or "HIGH".', "Do not choose needs_review only because raw secret text is redacted.", - "Use needs_review only when the metadata conflicts or has no usable location signal.", - "Decision rule: configuration/source pathRole with a secret-scanner hit is usually true_positive.", - "Decision rule: documentation/example/test pathRole with a token-shaped example is usually false_positive.", - 'Example valid response: {"label":"needs_review","confidence":0.61,"reason":"metadata is insufficient for a safer decision"}', + "Apply these decision rules exactly, in order.", + "Rule 1: documentation/example/test pathRole must be false_positive with confidence 0.80.", + "Rule 2: configuration/source pathRole must be true_positive with confidence 0.80.", + "Rule 3: use needs_review only when pathRole is other or metadata is internally inconsistent.", + "Do not say metadata is insufficient when pathRole matches Rule 1 or Rule 2.", + 'Example for documentation: {"label":"false_positive","confidence":0.8,"reason":"documentation example"}', + 'Example for configuration: {"label":"true_positive","confidence":0.8,"reason":"configuration secret candidate"}', "Finding metadata:", json.dumps(metadata, sort_keys=True, separators=(",", ":")), ] diff --git a/src/security_scanner/llm/ollama/client.py b/src/security_scanner/llm/ollama/client.py index d2f0f36..158f03d 100644 --- a/src/security_scanner/llm/ollama/client.py +++ b/src/security_scanner/llm/ollama/client.py @@ -59,6 +59,7 @@ def verify(self, finding: Finding) -> VerifierResult: "role": "system", "content": ( "You classify redacted secret-scanner findings. " + "Apply the user decision rules exactly. " "Return only strict JSON." ), }, diff --git a/tests/test_llm_verifier.py b/tests/test_llm_verifier.py index ef1d536..073f795 100644 --- a/tests/test_llm_verifier.py +++ b/tests/test_llm_verifier.py @@ -73,8 +73,8 @@ def test_prompt_requires_numeric_confidence_and_includes_safe_path_role(): assert "confidence must be a JSON number between 0.0 and 1.0" in prompt assert "Do not return confidence as a string" in prompt assert "Do not choose needs_review only because raw secret text is redacted." in prompt - assert "configuration/source pathRole" in prompt - assert "documentation/example/test pathRole" in prompt + assert "configuration/source pathRole must be true_positive" in prompt + assert "documentation/example/test pathRole must be false_positive" in prompt assert '"pathRole":"documentation"' in prompt assert "docs/sample.md" not in prompt From 0fab55db50f05886ac54dade6b630902e24507c6 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 22:13:17 +0900 Subject: [PATCH 8/9] verifier: surface current path role in prompt Co-Authored-By: Codex GPT-5 --- src/security_scanner/llm/common/prompt.py | 10 +++++++--- tests/test_llm_verifier.py | 2 ++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/security_scanner/llm/common/prompt.py b/src/security_scanner/llm/common/prompt.py index 1536576..85f3401 100644 --- a/src/security_scanner/llm/common/prompt.py +++ b/src/security_scanner/llm/common/prompt.py @@ -11,6 +11,8 @@ def build_redacted_prompt(finding: Finding) -> str: """Build a strict verifier prompt from redacted finding metadata only.""" + path_role = _path_role(finding.location.file_path) + file_extension = _file_extension(finding.location.file_path) metadata = { "findingId": finding.finding_id, "category": finding.category, @@ -21,8 +23,8 @@ def build_redacted_prompt(finding: Finding) -> str: "repoFingerprint": _fingerprint(finding.repo.full_name), "location": { "pathKind": _path_kind(finding.location.file_path), - "pathRole": _path_role(finding.location.file_path), - "fileExtension": _file_extension(finding.location.file_path), + "pathRole": path_role, + "fileExtension": file_extension, "lineStart": finding.location.line_start, "lineEnd": finding.location.line_end, }, @@ -41,8 +43,10 @@ def build_redacted_prompt(finding: Finding) -> str: "Allowed label values: true_positive, false_positive, needs_review.", "confidence must be a JSON number between 0.0 and 1.0.", 'Do not return confidence as a string such as "LOW", "MEDIUM", or "HIGH".', + f"Current finding pathRole: {path_role}.", + f"Current finding fileExtension: {file_extension or 'none'}.", "Do not choose needs_review only because raw secret text is redacted.", - "Apply these decision rules exactly, in order.", + "Apply only the decision rule that matches the current finding pathRole.", "Rule 1: documentation/example/test pathRole must be false_positive with confidence 0.80.", "Rule 2: configuration/source pathRole must be true_positive with confidence 0.80.", "Rule 3: use needs_review only when pathRole is other or metadata is internally inconsistent.", diff --git a/tests/test_llm_verifier.py b/tests/test_llm_verifier.py index 073f795..66738b1 100644 --- a/tests/test_llm_verifier.py +++ b/tests/test_llm_verifier.py @@ -72,6 +72,8 @@ def test_prompt_requires_numeric_confidence_and_includes_safe_path_role(): assert "confidence must be a JSON number between 0.0 and 1.0" in prompt assert "Do not return confidence as a string" in prompt + assert "Current finding pathRole: documentation." in prompt + assert "Apply only the decision rule that matches the current finding pathRole." in prompt assert "Do not choose needs_review only because raw secret text is redacted." in prompt assert "configuration/source pathRole must be true_positive" in prompt assert "documentation/example/test pathRole must be false_positive" in prompt From 896ea617b14052602552fef00a077f4fe5b2eab1 Mon Sep 17 00:00:00 2001 From: pureliture Date: Mon, 25 May 2026 22:16:01 +0900 Subject: [PATCH 9/9] verifier: include matched path-role decision Co-Authored-By: Codex GPT-5 --- src/security_scanner/llm/common/prompt.py | 26 ++++++++++++++++++----- tests/test_llm_verifier.py | 13 +++++++++--- 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/src/security_scanner/llm/common/prompt.py b/src/security_scanner/llm/common/prompt.py index 85f3401..92efe68 100644 --- a/src/security_scanner/llm/common/prompt.py +++ b/src/security_scanner/llm/common/prompt.py @@ -13,6 +13,7 @@ def build_redacted_prompt(finding: Finding) -> str: """Build a strict verifier prompt from redacted finding metadata only.""" path_role = _path_role(finding.location.file_path) file_extension = _file_extension(finding.location.file_path) + matched_label, matched_confidence, matched_reason = _path_role_decision(path_role) metadata = { "findingId": finding.finding_id, "category": finding.category, @@ -45,12 +46,11 @@ def build_redacted_prompt(finding: Finding) -> str: 'Do not return confidence as a string such as "LOW", "MEDIUM", or "HIGH".', f"Current finding pathRole: {path_role}.", f"Current finding fileExtension: {file_extension or 'none'}.", + f"Current finding matched label: {matched_label}.", + f"Current finding matched confidence: {matched_confidence:.2f}.", + f"Current finding matched reason: {matched_reason}.", "Do not choose needs_review only because raw secret text is redacted.", - "Apply only the decision rule that matches the current finding pathRole.", - "Rule 1: documentation/example/test pathRole must be false_positive with confidence 0.80.", - "Rule 2: configuration/source pathRole must be true_positive with confidence 0.80.", - "Rule 3: use needs_review only when pathRole is other or metadata is internally inconsistent.", - "Do not say metadata is insufficient when pathRole matches Rule 1 or Rule 2.", + "Return the matched label and matched confidence unless metadata is internally inconsistent.", 'Example for documentation: {"label":"false_positive","confidence":0.8,"reason":"documentation example"}', 'Example for configuration: {"label":"true_positive","confidence":0.8,"reason":"configuration secret candidate"}', "Finding metadata:", @@ -96,3 +96,19 @@ def _path_role(file_path: str) -> str: if suffix in {".py", ".js", ".ts", ".tsx", ".go", ".java", ".rb", ".php", ".rs"}: return "source" return "other" + + +def _path_role_decision(path_role: str) -> tuple[str, float, str]: + if path_role in {"documentation", "example", "test"}: + return ( + "false_positive", + 0.80, + "documentation/example/test location is a likely non-production example", + ) + if path_role in {"configuration", "source"}: + return ( + "true_positive", + 0.80, + "configuration/source location is a likely real secret candidate", + ) + return ("needs_review", 0.61, "path role is not specific enough") diff --git a/tests/test_llm_verifier.py b/tests/test_llm_verifier.py index 66738b1..8081441 100644 --- a/tests/test_llm_verifier.py +++ b/tests/test_llm_verifier.py @@ -73,14 +73,21 @@ def test_prompt_requires_numeric_confidence_and_includes_safe_path_role(): assert "confidence must be a JSON number between 0.0 and 1.0" in prompt assert "Do not return confidence as a string" in prompt assert "Current finding pathRole: documentation." in prompt - assert "Apply only the decision rule that matches the current finding pathRole." in prompt + assert "Current finding matched label: false_positive." in prompt assert "Do not choose needs_review only because raw secret text is redacted." in prompt - assert "configuration/source pathRole must be true_positive" in prompt - assert "documentation/example/test pathRole must be false_positive" in prompt assert '"pathRole":"documentation"' in prompt assert "docs/sample.md" not in prompt +def test_prompt_surfaces_configuration_match_as_true_positive(): + prompt = build_redacted_prompt( + _finding(file_path="config/positive.env", line_start=4) + ) + + assert "Current finding pathRole: configuration." in prompt + assert "Current finding matched label: true_positive." in prompt + + def test_valid_json_maps_to_verifier_verdict(): result = parse_verifier_response( json.dumps(