feat(storage): shard list/index GSI hot partitions (#23 follow-on)#39
Conversation
Generalize the #23 RepoAxisKey pattern to the remaining static single-value list/index GSI partitions, and remove never-read dead-write GSI keys. - axis_core.py: shared shard primitives (axis_shard/bucket_width/axis_material); repo_axis.py re-exports as back-compat aliases (#23 behavior unchanged). - list_axis.py: ListAxisSpec (carries index_name + real gsi attr names) + 4 axis specs — TARGET_LIST/REPO_LIST/SCAN_DATE (8), SCAN_JOB pending/leased (4); list_axis_inputs single source of truth for the per-axis key formula. - list_axis_reader.py: read_list_axis (flat, parallel fan-out for the SCAN_JOB lease loop) + read_list_axis_ordered (k-way merge preserving newest-first + limit). fail-closed; dedupe by (PK,SK) preferring higher version. - store.py: list_scan_targets / read_recent_repo_metadata / read_scan_runs_for_date / _read_scan_jobs_by_status route through the fan-out readers (include_legacy migration flag, default off). Direct primary-key reads unchanged. - list_axis_migration.py: per-axis in-place conditional backfill + remaining==0 removal gate (SCAN_JOB status filter excludes completed/dead_letter). - dead-write removal (D3/D4): drop never-read gsi2 #ALL keys from ghas_alert / secret_evidence mappers (GHAS GSI1 repo-axis + secret-evidence gsi1 link fallback preserved); drop gsi1 #ALL keys from ref_state / repo_lease mappers. - CLI: `security-scanner backfill-list-axis [--dry-run]` to run the migration. Implements docs/workbench/specs/scale-redesign-list-axis-sharding/design.md (M1-M7) with the locked self-Q&A decisions (D1-D5) and the spec multi-agent review fixes. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements a cloud-scale redesign for list/index GSI partitions to eliminate hot partitions by sharding GSI1 list axes and removing unused dead-write GSI keys. The reviewer provided valuable feedback, pointing out a thread-safety issue when sharing a boto3 Table resource across threads in ThreadPoolExecutor, potential TypeError crashes during sorting and merging if key fields are None, a performance bottleneck causing a redundant full table scan during backfill, and a potential job starvation issue where legacy pending or leased jobs are ignored due to a missing include_legacy parameter in _read_scan_jobs_by_status.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…bustness) - public-safety: relativize machine-local absolute paths in design.md - list_axis_reader: drop unsafe parallel fan-out (boto3 Table is not thread-safe); serial fan-out, shard counts are small. None-coerce gsi_sk/PK/SK sort & merge keys to avoid TypeError on null attributes. - list_axis_migration: remaining = failed (backfilled/skipped are no longer legacy) — avoids a second full-table scan per axis. - store: _read_scan_jobs_by_status / lease_next_scan_job take include_legacy so legacy pending/leased jobs are not starved during the migration window. - repo_axis: remove now-unused _bucket_width back-compat alias (CodeQL). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Generalize the #23 repo-axis sharding to the remaining static single-value
list/index GSI partitions, and remove never-read dead-write GSI keys. Cloud-scale
hot-partition cleanup; design-only follow-on to #23 (sharded in #34).
Why
After #23,
REPO#<repo>is sharded but other single-value GSI partitions remainhot at scale:
TARGET_LIST#ALL,REPO_LIST#ALL,SCAN_DATE#<date>,SCAN_JOB_STATUS#<status>. Separately, several#ALLGSI projections are writtenbut never read via that GSI (write amplification + hot partition for nothing).
Changes
axis_core.py: shared shard primitives extracted fromrepo_axis.py, which nowre-exports them as back-compat aliases (Shard REPO# GSI1 partition for hot-partition safety at cloud scale #23 behavior unchanged).
list_axis.py:ListAxisSpeccarryingindex_name+ the realgsi*attributenames + 4 axis specs (TARGET_LIST / REPO_LIST / SCAN_DATE = 8, SCAN_JOB
pending/leased = 4).
list_axis_inputsis the single per-axis key formula.list_axis_reader.py:read_list_axis(flat; parallel fan-out for the hotSCAN_JOB lease loop) +
read_list_axis_ordered(k-way merge preservingnewest-first + limit). Fail-closed; dedupe by
(PK,SK)preferring higher version.store.py:list_scan_targets/read_recent_repo_metadata/read_scan_runs_for_date/_read_scan_jobs_by_statusfan out via the readers(
include_legacymigration flag, default off). Direct primary-key reads unchanged.list_axis_migration.py: per-axis in-place conditional backfill +remaining==0removal gate (SCAN_JOB status filter excludes completed/dead_letter).
#ALLkeys fromghas_alert_to_item/secret_evidence_to_item(GHAS GSI1 repo-axis +secret-evidence gsi1 link fallback preserved); drop GSI1
#ALLkeys fromref_state_to_item/repo_lease_to_item.security-scanner backfill-list-axis [--dry-run].Implements
docs/workbench/specs/scale-redesign-list-axis-sharding/design.md(M1–M7) with the locked self-Q&A decisions (D1–D5) and the spec multi-agent
review fixes.
Test
uv run pytest— full suite green (672 passed)E,F,I,UP);advisory/ruffis non-blockingDeferred (environment-only)
backfill-list-axisrun against the runtime host's DynamoDB-local — noreachable endpoint from CI/this environment. Mechanics + CLI + tests verified;
operator runs
--dry-runthen apply on the host.Related to #23.