Summary
CodeLens ships only json, markdown, sarif, ai, compact formatters. Add 8 new formatters covering GitLab CI, Jenkins, Emacs, Vim, JSONL streaming, token-optimized TOON, and SARIF dataflow traces for editor/CI consumers.
Worker consensus (5 reports)
| Worker |
Source |
Contribution |
| Opengrep |
update!/CodeLens_Opengrep_Upgrade_Analysis.md #42 |
6 new formatters: text (table), gitlab-sast, gitlab-secrets, junit-xml, emacs (file:line:col: msg), vim (quickfix). --output <file> (multiple allowed). --incremental-output streams findings as found. |
| Semgrep |
update!/CodeLens_Upgrade_Issues_from_Semgrep.md CL-007 |
5 new formatters: junit_xml, emacs, vim, gitlab_sast, gitlab_secrets. Unified Finding dataclass from scripts/formatters/base.py. |
| Opengrep |
same file #50 |
--dataflow-traces flag enabling SARIF codeFlows field per finding (array of threadFlow, 1 per taint path). GitHub code scanning UI shows full taint path clickable step-by-step. |
| UBS |
update!/CodeLens_UBS_Upgrade_Analysis.md #19 |
TOON (Token-Optimized Object Notation) — ~50% smaller than JSON, ~34% token saving for LLM. Schema inference: findings[65]{severity,count,title}: header + CSV-like rows. |
| UBS |
same file #20 |
JSONL streaming output + Beads/issue-tracker integration. --format=jsonl (1 object per line). Stream mode emits findings as discovered. Issue-tracker import scripts (GitHub Issues, JIRA). |
Proposed scope (P1, 2-3 weeks)
Phase 1 — Unified Finding dataclass (P1, 3 days)
- New
scripts/formatters/base.py with Finding dataclass
- All engines emit
Finding objects, formatters consume them
- Backward compat: existing JSON output unchanged
Phase 2 — 5 standard formatters (P1, 1 week)
text — human-readable table: rule_id | severity | file:line | message
junit-xml — universal test result format for Jenkins/GitLab
emacs — file:line:col: severity: message for compile-mode
vim — file:line:col: message for quickfix
gitlab-sast — GitLab CI native security scan format
Phase 3 — SARIF dataflow traces (P1, 3 days)
--dataflow-traces flag
- Reuse
taint_path from JSON output, convert to SARIF codeFlows (array of threadFlow)
- Validate with
sarif-validator
Phase 4 — JSONL streaming (P2, 1 week)
--format=jsonl — line-delimited JSON
- Stream mode: emit findings as discovered (refactor engines away from collect-all-then-print)
--jsonl-output=<file> for file output
Phase 5 — TOON (P3, 1 week, optional)
--format=toon — token-optimized for LLM
- Python-native encoder (no external
tru binary)
- Fallback to
--format ai (JSON) with stderr warning on encoder error
Phase 6 — Issue-tracker import scripts (P3, 1 week, optional)
scripts/import-to-github-issues.py (uses gh issue create)
scripts/import-to-jira.py
Acceptance criteria
License note
Semgrep formatters are LGPL-2.1 — reference only, reimplement from spec.
Files
- New
scripts/formatters/{base,text,junit_xml,emacs,vim,gitlab_sast,gitlab_secrets,jsonl,toon}.py
- Update
scripts/formatters/sarif.py for codeFlows
- Update
scripts/formatters/__init__.py for new format dispatch
- Update
scripts/codelens.py for --output, --dataflow-traces, --incremental-output flags
Summary
CodeLens ships only
json,markdown,sarif,ai,compactformatters. Add 8 new formatters covering GitLab CI, Jenkins, Emacs, Vim, JSONL streaming, token-optimized TOON, and SARIF dataflow traces for editor/CI consumers.Worker consensus (5 reports)
update!/CodeLens_Opengrep_Upgrade_Analysis.md#42text(table),gitlab-sast,gitlab-secrets,junit-xml,emacs(file:line:col: msg),vim(quickfix).--output <file>(multiple allowed).--incremental-outputstreams findings as found.update!/CodeLens_Upgrade_Issues_from_Semgrep.mdCL-007junit_xml,emacs,vim,gitlab_sast,gitlab_secrets. UnifiedFindingdataclass fromscripts/formatters/base.py.--dataflow-tracesflag enabling SARIFcodeFlowsfield per finding (array ofthreadFlow, 1 per taint path). GitHub code scanning UI shows full taint path clickable step-by-step.update!/CodeLens_UBS_Upgrade_Analysis.md#19findings[65]{severity,count,title}:header + CSV-like rows.--format=jsonl(1 object per line). Stream mode emits findings as discovered. Issue-tracker import scripts (GitHub Issues, JIRA).Proposed scope (P1, 2-3 weeks)
Phase 1 — Unified Finding dataclass (P1, 3 days)
scripts/formatters/base.pywithFindingdataclassFindingobjects, formatters consume themPhase 2 — 5 standard formatters (P1, 1 week)
text— human-readable table:rule_id | severity | file:line | messagejunit-xml— universal test result format for Jenkins/GitLabemacs—file:line:col: severity: messageforcompile-modevim—file:line:col: messageforquickfixgitlab-sast— GitLab CI native security scan formatPhase 3 — SARIF dataflow traces (P1, 3 days)
--dataflow-tracesflagtaint_pathfrom JSON output, convert to SARIFcodeFlows(array ofthreadFlow)sarif-validatorPhase 4 — JSONL streaming (P2, 1 week)
--format=jsonl— line-delimited JSON--jsonl-output=<file>for file outputPhase 5 — TOON (P3, 1 week, optional)
--format=toon— token-optimized for LLMtrubinary)--format ai(JSON) with stderr warning on encoder errorPhase 6 — Issue-tracker import scripts (P3, 1 week, optional)
scripts/import-to-github-issues.py(usesgh issue create)scripts/import-to-jira.pyAcceptance criteria
codeFlowsrenders correctly in GitHub code scanning UI--output <file>supports multiple formats simultaneouslyLicense note
Semgrep formatters are LGPL-2.1 — reference only, reimplement from spec.
Files
scripts/formatters/{base,text,junit_xml,emacs,vim,gitlab_sast,gitlab_secrets,jsonl,toon}.pyscripts/formatters/sarif.pyforcodeFlowsscripts/formatters/__init__.pyfor new format dispatchscripts/codelens.pyfor--output,--dataflow-traces,--incremental-outputflags