Skip to content

Add Neo4j property-graph output (--emit neo4j) with a lossless, namespaced schema #154

Description

@rahlk
Summary
-------
Add a Neo4j backend to codeanalyzer-java so the analysis json can be emitted as a
property graph instead of (or in addition to) analysis.json, reaching parity with
the existing codeanalyzer-python and codeanalyzer-typescript Neo4j backends.

Motivation
----------
The Python (Py*/PY_*) and TypeScript (TS*/TS_*) analyzers already emit a Neo4j
property graph. Java should reach parity so all three can populate a single shared
graph database for cross-language tooling without label/relationship collisions.

Scope
-----
New --emit targets:
  json   (default, analysis.json)
  neo4j  (graph.cypher snapshot, or live Bolt push with --neo4j-uri)
  schema (the schema.neo4j.json contract)

CLI options:
  --app-name, --neo4j-uri, --neo4j-user, --neo4j-password, --neo4j-database
  with NEO4J_URI / NEO4J_USERNAME / NEO4J_PASSWORD / NEO4J_DATABASE environment
  fallback (precedence: CLI flag > env var > default).

Two writers over one projection:
  - CypherWriter: deterministic, self-contained graph.cypher snapshot.
  - BoltWriter: incremental live push (per-compilation-unit content_hash diff,
    targeted replace of changed units, orphan prune on full runs).

Schema catalog as the single source of truth, with a no-container conformance test
asserting the projector never emits an undeclared label/relationship/property, plus
an opt-in Testcontainers Bolt integration test.

Parity / design decisions
-------------------------
- All node labels are J-prefixed and relationship types J_-prefixed (e.g. :JType,
  :JCallable, J_CALLS); constraint and index names are j_-prefixed. This lets a
  Java graph share a database with the Py*/TS* graphs without colliding.
- Provenance property is _module (matches the Python/TypeScript backends).
- The --emit schema output and the checked-in contract are both schema.neo4j.json.
- Lossless projection of the Lombok entity model: initialization blocks, CRUD
  operations/queries, and comments are first-class nodes (:JInitializationBlock,
  :JCrudOperation, :JCrudQuery, :JComment). Maps such as a field's per-variable
  initializers are stored as a *_json property since Neo4j has no map type.

Packaging
---------
- Fat jar bundles the Neo4j driver; live Bolt push works with java -jar.
- The driver is reached through a driver-free BoltSink seam (loaded reflectively),
  with a graceful fallback to writing graph.cypher when the driver is unavailable.

Known limitation
----------------
The GraalVM native image currently cannot run analysis at all (it dies in the
JavaParser symbol-table extraction before reaching any emit code), so --emit neo4j
is not yet usable from the native binary. This is a pre-existing, neo4j-independent
native reflection-metadata gap tracked separately in issue #153. The native build
does bundle the Neo4j driver and compiles cleanly; once #153 is fixed, --emit neo4j
should work from the native binary too.

Also, live population of the neo4j container will be a problem with how the driver
runs (with Netty and its heavy use of reflection). So in the native binary, --neo4j-uri 
gracefully falls back to writing graph.cypher (the file appeared) and the users will
be gently requested to use the java -jar invocation for live database updates.


Status
------
Implemented on branch feature/neo4j (being renamed to
feature/issue-<this>-neo4j-and-fix-153). Schema conformance tests pass; fat-jar
live Bolt push verified end to end.

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions