Bulk/projection accessors for the Python facade to avoid N+1 reconstruction on the Neo4j backend

**Is your feature request related to a problem? Please describe.**

The Neo4j-backed Python facade is slow for whole-application enumeration. `PythonAnalysis.get_methods()` → `get_all_methods_in_application()` walks `get_symbol_table()`, which does **one** query for modules (good) but then reconstructs each module faithfully via an N+1 fan-out:

- `_module_full(module)` → per-module queries for classes, functions, module-vars, imports
- `_class_full(class)` → queries for methods, attributes, inner classes (recurses)
- `_callable_full(callable)` → queries for call-sites, declared callables, declared classes, declared vars (recurses)

On a large app (odoo: ~1028 modules, ~1100 classes, ~7102 callables) this is **tens of thousands of serialized Bolt round-trips → ~110s** for a single `get_methods()`. It's a classic N+1 reconstruction — deliberately faithful (rebuilds identically to the in-memory `PyCodeanalyzer`), with fidelity bought in round-trips.

Two aggravating factors:
- `PyNeo4jBackend._run` opens a **fresh `session()` per call** (`neo4j_backend.py:147-150`), so every one of those ~30k queries also pays session-acquisition overhead, not just round-trip latency.
- `get_methods_with_decorators()` raises `NotImplementedError` (`python_analysis.py:944`) and its docstring points callers at "manually filter `get_methods()`" — i.e. the slow path.

The root mismatch: agent workloads (catalog/extract/heap/reach) need **set-at-a-time, field-projected reads** ("give me `{signature, decorators}` for all callables"; "give me `code` for these 600 signatures"), but the SDK offers only **one-at-a-time, fully-reconstructed reads**. Neo4j excels at the former; the N+1 reconstruction defeats it. Consumers work around it by hand-writing Cypher, which leaks graph schema into agent prompts.

**Describe the solution you'd like**

Add a small set of **bulk, projected, single-round-trip** accessors to the `PythonAnalysisBackend` ABC, implemented on **both** the Neo4j and in-process backends (parity, so the facade stays backend-agnostic) and surfaced on the `PythonAnalysis` facade. Return typed Pydantic models (matching `cldk.models.python` conventions), not dicts. Ranked by impact:

1. **`get_callables_overview() -> List[CallableOverview]`** *(the big one)* — one round-trip, a lightweight projection per callable instead of full reconstruction:
   `{ signature, class_signature | None, kind, file, start_line, end_line, decorators: list[str], is_entrypoint_hint? }`.
   Replaces `get_methods()` for enumeration; callers body-inspect only the few that need it via the existing `get_method(...)`. Turns ~110s into one query.
   Cypher shape: `MATCH (c:PyCallable) WHERE c._module IN $mods RETURN c.signature, c.decorators, ...`.

2. **`get_method_bodies(signatures: list[str]) -> Dict[str, str]`** — batch body fetch for a known frontier:
   `MATCH (c:PyCallable) WHERE c.signature IN $sigs RETURN c.signature, c.code`. One round-trip for N bodies (serves body-embedding at scale).

3. **`get_callsites_for(signatures: list[str]) -> Dict[str, List[PyCallSite]]`** — batch call-sites keyed by owner signature, off the existing `PY_HAS_CALLSITE` edges, avoiding the per-callable `_callable_full` fan-out.

4. **`get_decorated_callables(markers: list[str]) -> List[CallableOverview]`** — fills the `get_methods_with_decorators` stub:
   `MATCH (c:PyCallable) WHERE any(d IN c.decorators WHERE d IN $markers) RETURN ...`. Makes framework-entrypoint detection one query instead of a full scan.

Plus an orthogonal quick win (separate commit): make `_run` (or the reconstruction helpers) **reuse a single session/transaction** instead of one session per query — speeds up the existing `get_methods()`/`get_symbol_table()` path without changing fetch shape.

**Describe alternatives you've considered**

- *Hand-written Cypher in consumers* — what's happening today; leaks graph schema into agent prompts and isn't backend-agnostic. The point of these accessors is that the SDK owns the query and return shape.
- *Speeding up the existing reconstruction only* (session reuse, batching the fan-out) — helps, but doesn't address over-fetch: enumeration still rebuilds call-sites, inner callables, and locals that catalog throws away. Projection is the real fix; session reuse is complementary.

**Additional context**

Priority: **#1 + #4 first** (share the `CallableOverview` model; together they unblock catalog's enumeration + entrypoint scan — the slow path), then **#2** (heap-phase bodies). #3 optimizes an already-workable path. Implementation to land on `feat/issue-XXX-granular-accessors`, separate from the TypeScript work in #179.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bulk/projection accessors for the Python facade to avoid N+1 reconstruction on the Neo4j backend #180

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Bulk/projection accessors for the Python facade to avoid N+1 reconstruction on the Neo4j backend #180

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions