[FEATURE] CI/CD integration suite — baseline scan, diff scan, strict mode, manifest tests, golden snapshots

## Summary

CodeLens has basic `check` command for CI but lacks baseline diff scanning, strict mode thresholds, manifest-driven test suite, and golden snapshot regression for rules. Add full CI/CD integration suite so CodeLens can serve as a quality gate in GitHub Actions / GitLab CI / Jenkins.

## Worker consensus (7 reports)

| Worker | Source | Contribution |
|---|---|---|
| CodeGraph | `update!/CodeLens_CodeGraph_Upgrade_Analysis.md` #11 | Git sync hooks (`post-commit`, `post-merge`, `post-checkout`) running `codelens scan --incremental` in background. Marker-fenced install. |
| CodeGraph | same file #20 | Agent benchmark harness — 7 real-world codebase fixtures (VS Code, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire). `claude -p` headless with `--strict-mcp-config`. CI regression fails if CodeLens+CodeLens result worse than baseline. |
| Opengrep | `update!/CodeLens_Opengrep_Upgrade_Analysis.md` #47 | `codelens ci` command — auto-detect CI env, `--baseline-commit SHA`, `--diff-depth N` (transitive dependents), `--error-on-severity <level>`, auto-upload SARIF to GitHub. `codelens install-ci` generates workflow file. |
| Semgrep | `update!/CodeLens_Upgrade_Issues_from_Semgrep.md` CL-010 | `--baseline-commit <SHA>` + `--diff-scan` flags to scan/secrets/dataflow/vuln-scan/smell/complexity/dead-code/taint. Output: `{new_findings, preexisting_findings, total}`. GitHub Actions `codelens-pr-check.yml` auto-sets baseline to PR base SHA. |
| Semgrep | same file CL-012 | Strict mode: `--strict` (exit non-zero on warning), `--error` (exit non-zero if severity ≥ high), `--severity-threshold <level>`, `--max-findings N`. |
| UBS | `update!/CodeLens_UBS_Upgrade_Analysis.md` #7 | Comparison/baseline delta scan — `--comparison=<baseline.json>`, compute delta per severity/command/language, `--new-only` / `--show-resolved` flags. SARIF `automationDetails.guid`. |
| UBS | same file #8 | `--staged` (scan only `git diff --cached --name-only --diff-filter=ACMR` files), `--diff` / `--git-diff` (working tree vs HEAD), `--diff-vs=<ref>`. Target <1s for <50 changed files. |
| UBS | same file #9 | Manifest-driven test suite — JSON schema with `expect` block (exit_code, totals, require_substrings, forbid_substrings). Port UBS `run_manifest.py` runner (~300 LOC). |
| UBS | same file #10 | Rule quality harness with golden snapshot regression — track 3 scopes × 4 metrics, compare vs golden, fail with diff. `--update-goldens`. |
| OpenTaint | `update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md` B1 | Rule test harness with `@PositiveRuleSample` / `@NegativeRuleSample` annotations. `test-result.json` per rule. CI workflow `.github/workflows/codelens-rule-tests.yml`. |

## Proposed phased scope

**Phase 1 — Baseline + diff scan (P1, 1-2 weeks)**
- `--baseline-commit <SHA>` flag (merge UBS #7 + Semgrep CL-010)
- `--diff-scan` / `--staged` / `--diff-vs=<ref>` flags (UBS #8)
- Output: `{new_findings, preexisting_findings, total_findings, delta_per_severity}`
- SARIF `automationDetails.guid` for grouping CI runs
- Finding identity = hash of `(rule_id, file, line, severity)`
- Cache findings in `.codelens/baseline_<SHA>.json`
- New `scripts/git_integration.py` wrapping `git diff --name-only`

**Phase 2 — Strict mode + thresholds (P1, 3 days)**
- `--strict` (exit non-zero on warning)
- `--error` (exit non-zero if severity ≥ high)
- `--severity-threshold <level>`
- `--max-findings N` (CI gate)
- Exit code evaluator at end of command execution

**Phase 3 — `codelens ci` orchestration command (P1, 1-2 weeks)**
- Auto-detect CI env from env vars (`GITHUB_ACTIONS`, `GITLAB_CI`, `JENKINS_URL`, `BITBUCKET_BUILD_NUMBER`)
- `--baseline-commit SHA` (default: PR base SHA)
- `--diff-depth N` (include transitive dependents — reuse `dependents_engine.py`)
- `--error-on-severity <level>`
- Auto-upload SARIF to GitHub code scanning (if in GitHub Actions)
- `codelens install-ci` generates workflow file (`.github/workflows/codelens.yml`, `.gitlab-ci.yml`, `Jenkinsfile`, `bitbucket-pipelines.yml`)

**Phase 4 — Manifest-driven test suite (P1, 1-2 weeks)**
- New `benchmarks/manifest.json` schema with per-test `expect` block
- Port UBS `run_manifest.py` runner (~300 LOC, MIT license compatible)
- `--case`, `--list`, `--fail-fast`, `--tag`, `--verbose` flags
- Capture artifacts per case (`benchmarks/artifacts/<case_id>/`)
- Migrate existing fixtures, target 200+ cases in 3 months

**Phase 5 — Rule quality harness + golden snapshots (P1, 1 week)**
- New `benchmarks/rule_quality_harness.py` + `benchmarks/goldens/rule_coverage.json`
- Track 3 scopes (all, campaign, smoke) × 4 metrics per scope
- `--update-goldens` for intentional rule changes
- CI workflow `codelens-quality-gate.yml`

**Phase 6 — Git sync hooks (P2, 1 week, optional)**
- 3 hook types: `post-commit`, `post-merge`, `post-checkout`
- Run `codelens scan --incremental` in background (via `nohup ... & disown`, never blocking git)
- Marker-fenced install: `codelens install --git-hooks` / `codelens uninstall --git-hooks`

**Phase 7 — Agent benchmark harness (P2, 3-4 weeks, optional)**
- 7 real-world codebase fixtures (VS Code TS, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire)
- Each codebase has 1 canonical architecture question
- `claude -p` headless with `--strict-mcp-config`, 4 runs per arm, median reported
- CI integration via `codelens-benchmark.yml` on release tag

## Acceptance criteria

- [ ] `codelens scan --baseline-commit <SHA>` reports only new findings
- [ ] `codelens check --strict --error` exits non-zero on any high-severity finding
- [ ] `codelens ci` auto-detects GitHub Actions and uploads SARIF
- [ ] `codelens install-ci` generates valid workflow file for GitHub/GitLab/Jenkins/Bitbucket
- [ ] Manifest test suite runs 50+ cases with correct pass/fail
- [ ] Rule quality harness catches regression when a rule is weakened

## Files

- New: `scripts/git_integration.py`, `scripts/commands/ci.py`, `scripts/commands/install_ci.py`, `scripts/commands/rule_test.py` (already proposed in rule validation issue), `benchmarks/manifest.json`, `benchmarks/run_manifest.py`, `benchmarks/rule_quality_harness.py`, `benchmarks/goldens/rule_coverage.json`, `scripts/templates/codelens-ci-{github,gitlab,jenkins}.yml.tmpl`, `.github/workflows/codelens-pr-check.yml`, `.github/workflows/codelens-quality-gate.yml`
- Update: `scripts/codelens.py` (new flags), `scripts/commands/check.py`, `scripts/pre_commit_hook.py`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] CI/CD integration suite — baseline scan, diff scan, strict mode, manifest tests, golden snapshots #57

Summary

Worker consensus (7 reports)

Proposed phased scope

Acceptance criteria

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Worker	Source	Contribution
CodeGraph	`update!/CodeLens_CodeGraph_Upgrade_Analysis.md` #11	Git sync hooks (`post-commit`, `post-merge`, `post-checkout`) running `codelens scan --incremental` in background. Marker-fenced install.
CodeGraph	same file #20	Agent benchmark harness — 7 real-world codebase fixtures (VS Code, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire). `claude -p` headless with `--strict-mcp-config`. CI regression fails if CodeLens+CodeLens result worse than baseline.
Opengrep	`update!/CodeLens_Opengrep_Upgrade_Analysis.md` #47	`codelens ci` command — auto-detect CI env, `--baseline-commit SHA`, `--diff-depth N` (transitive dependents), `--error-on-severity <level>`, auto-upload SARIF to GitHub. `codelens install-ci` generates workflow file.
Semgrep	`update!/CodeLens_Upgrade_Issues_from_Semgrep.md` CL-010	`--baseline-commit <SHA>` + `--diff-scan` flags to scan/secrets/dataflow/vuln-scan/smell/complexity/dead-code/taint. Output: `{new_findings, preexisting_findings, total}`. GitHub Actions `codelens-pr-check.yml` auto-sets baseline to PR base SHA.
Semgrep	same file CL-012	Strict mode: `--strict` (exit non-zero on warning), `--error` (exit non-zero if severity ≥ high), `--severity-threshold <level>`, `--max-findings N`.
UBS	`update!/CodeLens_UBS_Upgrade_Analysis.md` #7	Comparison/baseline delta scan — `--comparison=<baseline.json>`, compute delta per severity/command/language, `--new-only` / `--show-resolved` flags. SARIF `automationDetails.guid`.
UBS	same file #8	`--staged` (scan only `git diff --cached --name-only --diff-filter=ACMR` files), `--diff` / `--git-diff` (working tree vs HEAD), `--diff-vs=<ref>`. Target <1s for <50 changed files.
UBS	same file #9	Manifest-driven test suite — JSON schema with `expect` block (exit_code, totals, require_substrings, forbid_substrings). Port UBS `run_manifest.py` runner (~300 LOC).
UBS	same file #10	Rule quality harness with golden snapshot regression — track 3 scopes × 4 metrics, compare vs golden, fail with diff. `--update-goldens`.
OpenTaint	`update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md` B1	Rule test harness with `@PositiveRuleSample` / `@NegativeRuleSample` annotations. `test-result.json` per rule. CI workflow `.github/workflows/codelens-rule-tests.yml`.

[FEATURE] CI/CD integration suite — baseline scan, diff scan, strict mode, manifest tests, golden snapshots #57

Description

Summary

Worker consensus (7 reports)

Proposed phased scope

Acceptance criteria

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions