Skip to content

[FEATURE] CI/CD integration suite — baseline scan, diff scan, strict mode, manifest tests, golden snapshots #57

Description

@Wolfvin

Summary

CodeLens has basic check command for CI but lacks baseline diff scanning, strict mode thresholds, manifest-driven test suite, and golden snapshot regression for rules. Add full CI/CD integration suite so CodeLens can serve as a quality gate in GitHub Actions / GitLab CI / Jenkins.

Worker consensus (7 reports)

Worker Source Contribution
CodeGraph update!/CodeLens_CodeGraph_Upgrade_Analysis.md #11 Git sync hooks (post-commit, post-merge, post-checkout) running codelens scan --incremental in background. Marker-fenced install.
CodeGraph same file #20 Agent benchmark harness — 7 real-world codebase fixtures (VS Code, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire). claude -p headless with --strict-mcp-config. CI regression fails if CodeLens+CodeLens result worse than baseline.
Opengrep update!/CodeLens_Opengrep_Upgrade_Analysis.md #47 codelens ci command — auto-detect CI env, --baseline-commit SHA, --diff-depth N (transitive dependents), --error-on-severity <level>, auto-upload SARIF to GitHub. codelens install-ci generates workflow file.
Semgrep update!/CodeLens_Upgrade_Issues_from_Semgrep.md CL-010 --baseline-commit <SHA> + --diff-scan flags to scan/secrets/dataflow/vuln-scan/smell/complexity/dead-code/taint. Output: {new_findings, preexisting_findings, total}. GitHub Actions codelens-pr-check.yml auto-sets baseline to PR base SHA.
Semgrep same file CL-012 Strict mode: --strict (exit non-zero on warning), --error (exit non-zero if severity ≥ high), --severity-threshold <level>, --max-findings N.
UBS update!/CodeLens_UBS_Upgrade_Analysis.md #7 Comparison/baseline delta scan — --comparison=<baseline.json>, compute delta per severity/command/language, --new-only / --show-resolved flags. SARIF automationDetails.guid.
UBS same file #8 --staged (scan only git diff --cached --name-only --diff-filter=ACMR files), --diff / --git-diff (working tree vs HEAD), --diff-vs=<ref>. Target <1s for <50 changed files.
UBS same file #9 Manifest-driven test suite — JSON schema with expect block (exit_code, totals, require_substrings, forbid_substrings). Port UBS run_manifest.py runner (~300 LOC).
UBS same file #10 Rule quality harness with golden snapshot regression — track 3 scopes × 4 metrics, compare vs golden, fail with diff. --update-goldens.
OpenTaint update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md B1 Rule test harness with @PositiveRuleSample / @NegativeRuleSample annotations. test-result.json per rule. CI workflow .github/workflows/codelens-rule-tests.yml.

Proposed phased scope

Phase 1 — Baseline + diff scan (P1, 1-2 weeks)

Phase 2 — Strict mode + thresholds (P1, 3 days)

  • --strict (exit non-zero on warning)
  • --error (exit non-zero if severity ≥ high)
  • --severity-threshold <level>
  • --max-findings N (CI gate)
  • Exit code evaluator at end of command execution

Phase 3 — codelens ci orchestration command (P1, 1-2 weeks)

  • Auto-detect CI env from env vars (GITHUB_ACTIONS, GITLAB_CI, JENKINS_URL, BITBUCKET_BUILD_NUMBER)
  • --baseline-commit SHA (default: PR base SHA)
  • --diff-depth N (include transitive dependents — reuse dependents_engine.py)
  • --error-on-severity <level>
  • Auto-upload SARIF to GitHub code scanning (if in GitHub Actions)
  • codelens install-ci generates workflow file (.github/workflows/codelens.yml, .gitlab-ci.yml, Jenkinsfile, bitbucket-pipelines.yml)

Phase 4 — Manifest-driven test suite (P1, 1-2 weeks)

  • New benchmarks/manifest.json schema with per-test expect block
  • Port UBS run_manifest.py runner (~300 LOC, MIT license compatible)
  • --case, --list, --fail-fast, --tag, --verbose flags
  • Capture artifacts per case (benchmarks/artifacts/<case_id>/)
  • Migrate existing fixtures, target 200+ cases in 3 months

Phase 5 — Rule quality harness + golden snapshots (P1, 1 week)

  • New benchmarks/rule_quality_harness.py + benchmarks/goldens/rule_coverage.json
  • Track 3 scopes (all, campaign, smoke) × 4 metrics per scope
  • --update-goldens for intentional rule changes
  • CI workflow codelens-quality-gate.yml

Phase 6 — Git sync hooks (P2, 1 week, optional)

  • 3 hook types: post-commit, post-merge, post-checkout
  • Run codelens scan --incremental in background (via nohup ... & disown, never blocking git)
  • Marker-fenced install: codelens install --git-hooks / codelens uninstall --git-hooks

Phase 7 — Agent benchmark harness (P2, 3-4 weeks, optional)

  • 7 real-world codebase fixtures (VS Code TS, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire)
  • Each codebase has 1 canonical architecture question
  • claude -p headless with --strict-mcp-config, 4 runs per arm, median reported
  • CI integration via codelens-benchmark.yml on release tag

Acceptance criteria

  • codelens scan --baseline-commit <SHA> reports only new findings
  • codelens check --strict --error exits non-zero on any high-severity finding
  • codelens ci auto-detects GitHub Actions and uploads SARIF
  • codelens install-ci generates valid workflow file for GitHub/GitLab/Jenkins/Bitbucket
  • Manifest test suite runs 50+ cases with correct pass/fail
  • Rule quality harness catches regression when a rule is weakened

Files

  • New: scripts/git_integration.py, scripts/commands/ci.py, scripts/commands/install_ci.py, scripts/commands/rule_test.py (already proposed in rule validation issue), benchmarks/manifest.json, benchmarks/run_manifest.py, benchmarks/rule_quality_harness.py, benchmarks/goldens/rule_coverage.json, scripts/templates/codelens-ci-{github,gitlab,jenkins}.yml.tmpl, .github/workflows/codelens-pr-check.yml, .github/workflows/codelens-quality-gate.yml
  • Update: scripts/codelens.py (new flags), scripts/commands/check.py, scripts/pre_commit_hook.py

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions