fix(vuln): semgrep config auto→p/default (metrics-off 비호환 해소)#66
Conversation
러너가 프라이버시상 --metrics=off를 강제하는데 semgrep `auto` config는 metrics를 요구해 "Cannot create auto config when metrics are off"로 실패 → 라이브 카탈로그 스캔이 매 tick 실패하던 문제. 호스트 단일-repo smoke로 확인. - ScanVulnerabilityRequest/CatalogScanRequest semgrep_config 기본값 auto→p/default. - scan-vuln --semgrep-config 기본값/도움말 auto→p/default(+ 사유 명시). - vuln-scan.service ExecStart --semgrep-config p/default. - 회귀 테스트: 러너 --metrics=off 강제, 기본 config가 auto 아님, unit이 auto 미사용. proof: p/default --metrics=off 호스트 실행 exit 0 + SARIF 생성. uv run pytest 1360 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the default Semgrep configuration from "auto" to "p/default" across the systemd service, CLI arguments, and scan request classes to prevent failures caused by the incompatibility of "auto" with the forced "--metrics=off" flag. It also adds regression tests to guard against the use of "auto". The reviewer recommends adding defensive validation in the post_init methods of ScanVulnerabilityRequest and CatalogScanRequest to explicitly reject "auto" and raise a ValueError, as well as adding corresponding tests to verify this validation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # NOT "auto": the runner forces --metrics=off for privacy, and semgrep | ||
| # rejects `auto` config when metrics are off ("Cannot create auto config | ||
| # when metrics are off"). A pinned registry pack runs without telemetry. | ||
| semgrep_config: str = "p/default" | ||
| timeout_seconds: int = 300 | ||
| path_policy: str = "redacted" |
There was a problem hiding this comment.
SemgrepCompatibleRunner는 항상 --metrics=off를 강제하므로, semgrep_config가 "auto"로 설정되면 무조건 실패하게 됩니다. 사용자가 CLI나 API를 통해 명시적으로 "auto"를 전달할 경우, 불필요한 서브프로세스 실행 없이 즉시 에러를 발생시키도록 __post_init__에서 방어적 유효성 검증(defensive validation)을 추가하는 것이 좋습니다.
| # NOT "auto": the runner forces --metrics=off for privacy, and semgrep | |
| # rejects `auto` config when metrics are off ("Cannot create auto config | |
| # when metrics are off"). A pinned registry pack runs without telemetry. | |
| semgrep_config: str = "p/default" | |
| timeout_seconds: int = 300 | |
| path_policy: str = "redacted" | |
| # NOT "auto": the runner forces --metrics=off for privacy, and semgrep | |
| # rejects `auto` config when metrics are off ("Cannot create auto config | |
| # when metrics are off"). A pinned registry pack runs without telemetry. | |
| semgrep_config: str = "p/default" | |
| timeout_seconds: int = 300 | |
| path_policy: str = "redacted" | |
| def __post_init__(self) -> None: | |
| if self.semgrep_config == "auto": | |
| raise ValueError( | |
| "semgrep_config='auto' is incompatible with --metrics=off" | |
| ) |
| # "p/default", not "auto": auto is incompatible with the runner's forced | ||
| # --metrics=off (see ScanVulnerabilityRequest). | ||
| semgrep_config: str = "p/default" | ||
| timeout_seconds: int = 1800 | ||
| path_policy: str = "redacted" |
There was a problem hiding this comment.
동일하게 CatalogScanRequest에서도 semgrep_config가 "auto"로 설정되는 것을 방지하기 위해 __post_init__ 유효성 검증을 추가하는 것이 안전합니다.
# "p/default", not "auto": auto is incompatible with the runner's forced
# --metrics=off (see ScanVulnerabilityRequest).
semgrep_config: str = "p/default"
timeout_seconds: int = 1800
path_policy: str = "redacted"
def __post_init__(self) -> None:
if self.semgrep_config == "auto":
raise ValueError(
"semgrep_config='auto' is incompatible with --metrics=off"
)| def test_scan_request_default_config_is_not_auto(): | ||
| assert ScanVulnerabilityRequest(root=".", output_path="o").semgrep_config != "auto" | ||
|
|
||
|
|
||
| def test_catalog_request_default_config_is_not_auto(): | ||
| req = CatalogScanRequest(artifact_dir="a", scan_run_id="r") | ||
| assert req.semgrep_config != "auto" |
There was a problem hiding this comment.
ScanVulnerabilityRequest 및 CatalogScanRequest에 추가된 "auto" 설정 방지 유효성 검증이 올바르게 작동하는지 확인하는 회귀 테스트를 추가하는 것이 좋습니다.
| def test_scan_request_default_config_is_not_auto(): | |
| assert ScanVulnerabilityRequest(root=".", output_path="o").semgrep_config != "auto" | |
| def test_catalog_request_default_config_is_not_auto(): | |
| req = CatalogScanRequest(artifact_dir="a", scan_run_id="r") | |
| assert req.semgrep_config != "auto" | |
| def test_scan_request_default_config_is_not_auto(): | |
| assert ScanVulnerabilityRequest(root=".", output_path="o").semgrep_config != "auto" | |
| def test_scan_request_rejects_auto_config(): | |
| import pytest | |
| with pytest.raises(ValueError, match="semgrep_config='auto' is incompatible"): | |
| ScanVulnerabilityRequest(root=".", output_path="o", semgrep_config="auto") | |
| def test_catalog_request_default_config_is_not_auto(): | |
| req = CatalogScanRequest(artifact_dir="a", scan_run_id="r") | |
| assert req.semgrep_config != "auto" | |
| def test_catalog_request_rejects_auto_config(): | |
| import pytest | |
| with pytest.raises(ValueError, match="semgrep_config='auto' is incompatible"): | |
| CatalogScanRequest(artifact_dir="a", scan_run_id="r", semgrep_config="auto") |
문제 (호스트 enable 중 발견)
vuln-scan 러너는 프라이버시상
--metrics=off를 강제하는데, semgrep--config auto는 metrics를 요구해서 실패한다:semgrep-compatible scan failed with exit code 2: Cannot create auto config when metrics are off.→ 라이브 카탈로그 스캔이 매 tick 전부 실패. ragflow-ubuntu에서 단일-repo smoke로 재현.
수정
ScanVulnerabilityRequest/CatalogScanRequestsemgrep_config기본값auto→p/default.scan-vuln --semgrep-config기본값/도움말auto→p/default(사유 명시).security-scanner-personal-vuln-scan.serviceExecStart--semgrep-config p/default.test_vuln_semgrep_config_metrics.py): 러너--metrics=off강제 / 기본 config가auto아님 / unit이auto미사용.proof
semgrep scan --config p/default --metrics=off ...exit 0 + SARIF 생성 확인.uv run pytest1360 passed / 4 skipped; ruff clean.Refs #64 #65.