Problem
src/codebugs/milestones.py is the leakiest module in the codebase: 1718 lines, 18 MCP tools, 7 CLI commands, 4 tables — a wide, shallow interface whose complexity rivals its implementation. Inside it sit four bounded contexts that barely reference each other except through a thin shared spine:
- Foundation — milestone/item/audit CRUD (
create_milestone, add_milestone_item, query_audit, _audit, …)
- Triage —
triage_inbox/dismiss/promote + the _auto_route_finding post-add hook
- Capacity/Pull —
_capacity_for/_eligibility_failure/_candidates/pull_next/release_item/get_wip_status
- Close-gate/branch —
mark_branch_only/mark_integrated/milestone_defer/milestone_close
The concrete testability friction (the thing this RFC fixes): two eligibility branches can't be tested in isolation today.
_eligibility_failure's linked_frs loop does raw SELECT 1 FROM requirements, forcing a requirements fixture for every matrix case.
_has_active_blocker wraps blockers.query_blockers in try/except: return False, which conflates "no active blocker" with "no blockers schema" — the positive-blocker-ineligible case is unreachable without standing up the blockers schema.
So the highest-risk rule in the module (what's eligible to be pulled by a parallel agent) is buried behind I/O into sibling domains.
Proposed Interface
Convert milestones.py → a milestones/ package, star topology, public surface unchanged:
src/codebugs/milestones/
__init__.py # facade: imports contexts, re-export manifest, then the 4 register_* calls VERBATIM
_schema.py # MILESTONES_SCHEMA, SEED_MILESTONES, constants, ensure_schema (ONE schema owner)
_spine.py # _audit, _validate_item_ref, _get_item_by_ref, _milestone_exists,
# _get_milestone, _row_to_milestone/_item/_audit (stdlib + codebugs.types only)
foundation.py # milestone + item + audit CRUD
triage.py # triage_*, _auto_route_finding (post-add hook)
capacity.py # THE DEEP ENGINE: capacity math, eligibility, _candidates, pull_next, release_item, wip
closegate.py # mark_branch_only, mark_integrated, milestone_defer, milestone_close
Plus a surgical 2-accessor test seam in the eligibility engine (the only injection — NOT full ports & adapters):
# capacity.py — real 5-positional signature preserved; only the two foreign reads injectable.
def _eligibility_failure(conn, item, milestone, capacity, held, *,
has_active_blocker=_real_has_active_blocker,
requirement_exists=_real_requirement_exists) -> str | None:
...
- Production:
pull_next(...) calls _eligibility_failure(conn, item, milestone, capacity, held) — defaults bind to the real (late-imported) reads. Capacity-floor math stays inline.
- Tests: inject
has_active_blocker=lambda ref: True/False, requirement_exists=lambda id: True/False — drive the full eligibility matrix (incl. the positive-blocker case) with zero findings/reqs/blockers schemas.
Hidden internally: the 4-context structure, the BEGIN IMMEDIATE atomic-claim dance, capacity ledger arithmetic, the _candidates priority cascade, close-gate aggregation, audit-string formats.
Dependency Strategy
In-process (SQLite, shared conn) for the package internals — merged directly; star topology with _spine as the one shared leaf, no context imports another, no new top-level codebugs.db/sibling imports (preserves today's late-import discipline). The two cross-domain eligibility reads (blockers, requirements) get a minimal injected-accessor seam (the 20% of ports-&-adapters that yields 80% of the win) so the engine can be tested with in-memory fakes; everything else stays plain in-process.
Testing Strategy
- New boundary tests: eligibility matrix via
_eligibility_failure with fake accessors (no sibling schemas), incl. positive-blocker-ineligible; capacity math via pull_next/release_item/get_wip_status on the 4 milestones tables only; close-gate matrix via milestone_close.
- Frozen-surface regression test (load-bearing, lands FIRST): asserts the exact 18 MCP tool names (10 explicit
name= + 8 derived from inner def names — milestone_create/update/list/status/add_item/move_item/set_status/audit_query) and 7 argparse subcommand strings. This is the only guard that catches a silent rename of a bare-@mcp.tool() foundation tool.
- Re-export guard test: imports every name (incl.
_get_item_by_ref, AUTO_ROUTER_ACTOR) from codebugs.milestones so a missing __init__ re-export fails at collection.
- Old tests to replace:
tests/test_milestones.py cases that stand up findings+reqs+blockers solely to exercise eligibility → fake-driven engine tests.
Implementation Recommendations
Durable guidance (not coupled to current line numbers):
- Owns: the milestones domain — one
register_schema("milestones", …, depends_on=("findings","reqs","blockers")), one MILESTONES_SCHEMA with its FKs + seed rows. Four files, one schema owner — sub-modules are one domain, not four.
- Hides: the four-context split, capacity/eligibility/atomic-claim mechanics, audit formats. External callers (
server.py, cli.py, db._ensure_modules_loaded, the post-add hook, autosorter) see exactly today's surface.
- Exposes (FROZEN): byte-identical MCP tool names + CLI subcommand strings;
ensure_schema, register_tools, register_cli, _auto_route_finding; and a complete __init__ re-export manifest of every symbol the test suite touches.
- Migration — value/risk order: regression-gate → seam → file moves (strangler-fig, blame-preserving):
0. Land the frozen-surface regression test (hard gate) before any file move.
- Introduce the 2-accessor seam in the still-single-file module + its fake-driven tests (most of the win, no move risk).
git mv milestones.py milestones/__init__.py (pure move, preserves blame).
3–5. Carve _schema / _spine / context files one commit each.
- Invariant: every move commit re-imports the moved symbol into
__init__ in the same commit; no intermediate red commits; uv run python -m pytest tests/ -v green at every step.
Provenance
Full design + the rejected alternatives (A pure-facade / B max-flexibility / C WorkDispatcher-class / D full ports-&-adapters) and the 3-agent adversarial review (7/10, mandatory fixes applied — Adversarial Review Corrections appendix) live in:
docs/superpowers/specs/2026-06-18-milestones-split-recommendation.md
Candidate #3 from an improve-codebase-architecture pass over the codebase.
Problem
src/codebugs/milestones.pyis the leakiest module in the codebase: 1718 lines, 18 MCP tools, 7 CLI commands, 4 tables — a wide, shallow interface whose complexity rivals its implementation. Inside it sit four bounded contexts that barely reference each other except through a thin shared spine:create_milestone,add_milestone_item,query_audit,_audit, …)triage_inbox/dismiss/promote+ the_auto_route_findingpost-add hook_capacity_for/_eligibility_failure/_candidates/pull_next/release_item/get_wip_statusmark_branch_only/mark_integrated/milestone_defer/milestone_closeThe concrete testability friction (the thing this RFC fixes): two eligibility branches can't be tested in isolation today.
_eligibility_failure'slinked_frsloop does rawSELECT 1 FROM requirements, forcing arequirementsfixture for every matrix case._has_active_blockerwrapsblockers.query_blockersintry/except: return False, which conflates "no active blocker" with "no blockers schema" — the positive-blocker-ineligible case is unreachable without standing up the blockers schema.So the highest-risk rule in the module (what's eligible to be pulled by a parallel agent) is buried behind I/O into sibling domains.
Proposed Interface
Convert
milestones.py→ amilestones/package, star topology, public surface unchanged:Plus a surgical 2-accessor test seam in the eligibility engine (the only injection — NOT full ports & adapters):
pull_next(...)calls_eligibility_failure(conn, item, milestone, capacity, held)— defaults bind to the real (late-imported) reads. Capacity-floor math stays inline.has_active_blocker=lambda ref: True/False,requirement_exists=lambda id: True/False— drive the full eligibility matrix (incl. the positive-blocker case) with zero findings/reqs/blockers schemas.Hidden internally: the 4-context structure, the
BEGIN IMMEDIATEatomic-claim dance, capacity ledger arithmetic, the_candidatespriority cascade, close-gate aggregation, audit-string formats.Dependency Strategy
In-process (SQLite, shared
conn) for the package internals — merged directly; star topology with_spineas the one shared leaf, no context imports another, no new top-levelcodebugs.db/sibling imports (preserves today's late-import discipline). The two cross-domain eligibility reads (blockers, requirements) get a minimal injected-accessor seam (the 20% of ports-&-adapters that yields 80% of the win) so the engine can be tested with in-memory fakes; everything else stays plain in-process.Testing Strategy
_eligibility_failurewith fake accessors (no sibling schemas), incl. positive-blocker-ineligible; capacity math viapull_next/release_item/get_wip_statuson the 4 milestones tables only; close-gate matrix viamilestone_close.name=+ 8 derived from innerdefnames —milestone_create/update/list/status/add_item/move_item/set_status/audit_query) and 7 argparse subcommand strings. This is the only guard that catches a silent rename of a bare-@mcp.tool()foundation tool._get_item_by_ref,AUTO_ROUTER_ACTOR) fromcodebugs.milestonesso a missing__init__re-export fails at collection.tests/test_milestones.pycases that stand up findings+reqs+blockers solely to exercise eligibility → fake-driven engine tests.Implementation Recommendations
Durable guidance (not coupled to current line numbers):
register_schema("milestones", …, depends_on=("findings","reqs","blockers")), oneMILESTONES_SCHEMAwith its FKs + seed rows. Four files, one schema owner — sub-modules are one domain, not four.server.py,cli.py,db._ensure_modules_loaded, the post-add hook, autosorter) see exactly today's surface.ensure_schema,register_tools,register_cli,_auto_route_finding; and a complete__init__re-export manifest of every symbol the test suite touches.0. Land the frozen-surface regression test (hard gate) before any file move.
git mv milestones.py milestones/__init__.py(pure move, preserves blame).3–5. Carve
_schema/_spine/ context files one commit each.__init__in the same commit; no intermediate red commits;uv run python -m pytest tests/ -vgreen at every step.Provenance
Full design + the rejected alternatives (A pure-facade / B max-flexibility / C WorkDispatcher-class / D full ports-&-adapters) and the 3-agent adversarial review (7/10, mandatory fixes applied — Adversarial Review Corrections appendix) live in:
docs/superpowers/specs/2026-06-18-milestones-split-recommendation.mdCandidate #3 from an
improve-codebase-architecturepass over the codebase.