Skip to content

[FEATURE] SCA lockfile parser expansion + dependency-aware rules (14 new parsers, project-depends-on) #53

Description

@Wolfvin

Summary

CodeLens vuln-scan currently supports 4 lockfile formats (package-lock.json, Cargo.lock, poetry.lock, go.sum). Expand to 14 additional formats and add dependency-aware rule support so rules like Log4Shell only fire if the vulnerable dependency is actually present.

Worker consensus (4 reports)

Worker Source Contribution
Opengrep update!/CodeLens_Opengrep_Upgrade_Analysis.md #37 14 lockfile parsers: pnpm-lock.yaml (v6/v9/workspace), yarn.lock (v1/v2/v3), Pipfile.lock, Gemfile.lock, composer.lock, packages.lock.json (NuGet), pubspec.lock (Dart), Package.resolved (SwiftPM v1/v2/v3), gradle.lockfile + build.gradle, pom.xml + maven_dep_tree.txt, mix.lock (Elixir), requirements.txt, Pipfile, pyproject.toml (Poetry deps). Each returns List[Dependency] with name, version, ecosystem, source_file, transitivity.
Opengrep same file #38 Dependency-aware rule (project-depends-on: [{ namespace, package, version }]). Rule matches only if BOTH pattern matches in code AND dependency tree contains the package. Namespace support: pypi, npm, maven, cargo, gem, go, nuget, pub, hex, composer, mix, swiftpm, gradle.
Semgrep update!/CodeLens_Upgrade_Issues_from_Semgrep.md CL-018 8 new ecosystems (overlaps with Opengrep #37 — should be merged): NuGet, Pub, SwiftPM, Mix, Gradle, Maven, Composer, Gem.
OpenTaint update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md B3 Attack surface discovery & dependency triage — codelens triage-deps classifies each dep as flagged / dismissed / unsure. codelens discover-surface <package-name> extracts project-used members, classifies source/sink/propagator, checks rule coverage.

Proposed scope (P1, 4-6 weeks total)

Phase 1 — 14 lockfile parsers (P1, 2-3 weeks)

  • New scripts/sca_parsers/ directory with one file per format
  • Each parser returns List[Dependency] with name, version, ecosystem, source_file, transitivity
  • Integrate with vulnscan_engine.py (auto-scan new lockfile) and osv_client.py (batch query)
  • Update framework_detect.py for project-type auto-detect
  • Test fixtures at tests/fixtures/sca/
  • Port logic from Semgrep cli/src/semdep/parsers/ (LGPL — reimplement from spec, do not copy)

Phase 2 — Dependency-aware rules (P1, 1-2 weeks, depends on Phase 1)

  • New project-depends-on: [{ namespace, package, version }] field in rule YAML
  • Rule matches only if BOTH pattern matches AND dependency tree contains the package in version range
  • Version formats: PEP 440 (pip), semver (npm/cargo/nuget), Maven range, Gemfile tilde
  • Subproject resolution: find nearest lockfile from source file (walk up parent dir)
  • Cache dependency tree per subproject (invalidate on lockfile mtime change)
  • Finding output adds matched_dependencies: [{ name, version, ecosystem }]
  • Example rule: Log4Shell matches Jndi.lookup($X) only if org.apache.logging.log4j:log4j-core < 2.17.0 in dep tree

Phase 3 — Attack surface discovery (P2, 3 weeks, optional)

  • codelens triage-deps [workspace] — classify each dep as flagged / dismissed / unsure
  • codelens discover-surface <package-name> — extract project-used members, classify source/sink/propagator, check rule coverage
  • Auto-suggest rule_pack plugin for packages with partial/none coverage

Acceptance criteria

  • All 14 new lockfile parsers correctly parse their format
  • vuln-scan detects CVEs in dependencies from any of the 14 new formats
  • Log4Shell dependency-aware rule fires correctly (matches only when vulnerable log4j present)
  • Subproject resolution works (different lockfiles in monorepo)
  • Performance: parsing 14 lockfiles <5s total

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions