Is your feature request related to a problem? Please describe.
The Neo4j-backed Python facade is slow for whole-application enumeration. PythonAnalysis.get_methods() → get_all_methods_in_application() walks get_symbol_table(), which does one query for modules (good) but then reconstructs each module faithfully via an N+1 fan-out:
_module_full(module) → per-module queries for classes, functions, module-vars, imports
_class_full(class) → queries for methods, attributes, inner classes (recurses)
_callable_full(callable) → queries for call-sites, declared callables, declared classes, declared vars (recurses)
On a large app (odoo: ~1028 modules, ~1100 classes, ~7102 callables) this is tens of thousands of serialized Bolt round-trips → ~110s for a single get_methods(). It's a classic N+1 reconstruction — deliberately faithful (rebuilds identically to the in-memory PyCodeanalyzer), with fidelity bought in round-trips.
Two aggravating factors:
PyNeo4jBackend._run opens a fresh session() per call (neo4j_backend.py:147-150), so every one of those ~30k queries also pays session-acquisition overhead, not just round-trip latency.
get_methods_with_decorators() raises NotImplementedError (python_analysis.py:944) and its docstring points callers at "manually filter get_methods()" — i.e. the slow path.
The root mismatch: agent workloads (catalog/extract/heap/reach) need set-at-a-time, field-projected reads ("give me {signature, decorators} for all callables"; "give me code for these 600 signatures"), but the SDK offers only one-at-a-time, fully-reconstructed reads. Neo4j excels at the former; the N+1 reconstruction defeats it. Consumers work around it by hand-writing Cypher, which leaks graph schema into agent prompts.
Describe the solution you'd like
Add a small set of bulk, projected, single-round-trip accessors to the PythonAnalysisBackend ABC, implemented on both the Neo4j and in-process backends (parity, so the facade stays backend-agnostic) and surfaced on the PythonAnalysis facade. Return typed Pydantic models (matching cldk.models.python conventions), not dicts. Ranked by impact:
-
get_callables_overview() -> List[CallableOverview] (the big one) — one round-trip, a lightweight projection per callable instead of full reconstruction:
{ signature, class_signature | None, kind, file, start_line, end_line, decorators: list[str], is_entrypoint_hint? }.
Replaces get_methods() for enumeration; callers body-inspect only the few that need it via the existing get_method(...). Turns ~110s into one query.
Cypher shape: MATCH (c:PyCallable) WHERE c._module IN $mods RETURN c.signature, c.decorators, ....
-
get_method_bodies(signatures: list[str]) -> Dict[str, str] — batch body fetch for a known frontier:
MATCH (c:PyCallable) WHERE c.signature IN $sigs RETURN c.signature, c.code. One round-trip for N bodies (serves body-embedding at scale).
-
get_callsites_for(signatures: list[str]) -> Dict[str, List[PyCallSite]] — batch call-sites keyed by owner signature, off the existing PY_HAS_CALLSITE edges, avoiding the per-callable _callable_full fan-out.
-
get_decorated_callables(markers: list[str]) -> List[CallableOverview] — fills the get_methods_with_decorators stub:
MATCH (c:PyCallable) WHERE any(d IN c.decorators WHERE d IN $markers) RETURN .... Makes framework-entrypoint detection one query instead of a full scan.
Plus an orthogonal quick win (separate commit): make _run (or the reconstruction helpers) reuse a single session/transaction instead of one session per query — speeds up the existing get_methods()/get_symbol_table() path without changing fetch shape.
Describe alternatives you've considered
- Hand-written Cypher in consumers — what's happening today; leaks graph schema into agent prompts and isn't backend-agnostic. The point of these accessors is that the SDK owns the query and return shape.
- Speeding up the existing reconstruction only (session reuse, batching the fan-out) — helps, but doesn't address over-fetch: enumeration still rebuilds call-sites, inner callables, and locals that catalog throws away. Projection is the real fix; session reuse is complementary.
Additional context
Priority: #1 + #4 first (share the CallableOverview model; together they unblock catalog's enumeration + entrypoint scan — the slow path), then #2 (heap-phase bodies). #3 optimizes an already-workable path. Implementation to land on feat/issue-XXX-granular-accessors, separate from the TypeScript work in #179.
Is your feature request related to a problem? Please describe.
The Neo4j-backed Python facade is slow for whole-application enumeration.
PythonAnalysis.get_methods()→get_all_methods_in_application()walksget_symbol_table(), which does one query for modules (good) but then reconstructs each module faithfully via an N+1 fan-out:_module_full(module)→ per-module queries for classes, functions, module-vars, imports_class_full(class)→ queries for methods, attributes, inner classes (recurses)_callable_full(callable)→ queries for call-sites, declared callables, declared classes, declared vars (recurses)On a large app (odoo: ~1028 modules, ~1100 classes, ~7102 callables) this is tens of thousands of serialized Bolt round-trips → ~110s for a single
get_methods(). It's a classic N+1 reconstruction — deliberately faithful (rebuilds identically to the in-memoryPyCodeanalyzer), with fidelity bought in round-trips.Two aggravating factors:
PyNeo4jBackend._runopens a freshsession()per call (neo4j_backend.py:147-150), so every one of those ~30k queries also pays session-acquisition overhead, not just round-trip latency.get_methods_with_decorators()raisesNotImplementedError(python_analysis.py:944) and its docstring points callers at "manually filterget_methods()" — i.e. the slow path.The root mismatch: agent workloads (catalog/extract/heap/reach) need set-at-a-time, field-projected reads ("give me
{signature, decorators}for all callables"; "give mecodefor these 600 signatures"), but the SDK offers only one-at-a-time, fully-reconstructed reads. Neo4j excels at the former; the N+1 reconstruction defeats it. Consumers work around it by hand-writing Cypher, which leaks graph schema into agent prompts.Describe the solution you'd like
Add a small set of bulk, projected, single-round-trip accessors to the
PythonAnalysisBackendABC, implemented on both the Neo4j and in-process backends (parity, so the facade stays backend-agnostic) and surfaced on thePythonAnalysisfacade. Return typed Pydantic models (matchingcldk.models.pythonconventions), not dicts. Ranked by impact:get_callables_overview() -> List[CallableOverview](the big one) — one round-trip, a lightweight projection per callable instead of full reconstruction:{ signature, class_signature | None, kind, file, start_line, end_line, decorators: list[str], is_entrypoint_hint? }.Replaces
get_methods()for enumeration; callers body-inspect only the few that need it via the existingget_method(...). Turns ~110s into one query.Cypher shape:
MATCH (c:PyCallable) WHERE c._module IN $mods RETURN c.signature, c.decorators, ....get_method_bodies(signatures: list[str]) -> Dict[str, str]— batch body fetch for a known frontier:MATCH (c:PyCallable) WHERE c.signature IN $sigs RETURN c.signature, c.code. One round-trip for N bodies (serves body-embedding at scale).get_callsites_for(signatures: list[str]) -> Dict[str, List[PyCallSite]]— batch call-sites keyed by owner signature, off the existingPY_HAS_CALLSITEedges, avoiding the per-callable_callable_fullfan-out.get_decorated_callables(markers: list[str]) -> List[CallableOverview]— fills theget_methods_with_decoratorsstub:MATCH (c:PyCallable) WHERE any(d IN c.decorators WHERE d IN $markers) RETURN .... Makes framework-entrypoint detection one query instead of a full scan.Plus an orthogonal quick win (separate commit): make
_run(or the reconstruction helpers) reuse a single session/transaction instead of one session per query — speeds up the existingget_methods()/get_symbol_table()path without changing fetch shape.Describe alternatives you've considered
Additional context
Priority: #1 + #4 first (share the
CallableOverviewmodel; together they unblock catalog's enumeration + entrypoint scan — the slow path), then #2 (heap-phase bodies). #3 optimizes an already-workable path. Implementation to land onfeat/issue-XXX-granular-accessors, separate from the TypeScript work in #179.