source-security-dev · pureliture · Jun 18, 2026 · Jun 17, 2026
diff --git a/docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md b/docs/workbench/specs/issue-23-repo-gsi1-sharding/design.md
@@ -0,0 +1,314 @@
+# Issue 23 REPO GSI1 Sharding Design Spec
+
+## 개요
+
+`REPO#<repo>` GSI1 단일 partition에 per-repo entity가 몰리는 hot partition
+위험을 제거한다. 설계는 repo-local read 의미를 유지하면서, 새 row는 shard된
+repo axis에 쓰고 legacy `REPO#<repo>` item은 migration 기간에만 읽는다.
+
+## 요구사항 참조
+
+- Phase 1 source: `requirements.md`
+- Preview companion: `requirements.html`
+- 승인된 핵심 요구사항:
+  - 설계만 먼저 고정하고 구현은 이후 실행 단계에서 진행한다.
+  - 전체 `REPO#<repo>` GSI1 axis를 대상으로 한다.
+  - Legacy read는 migration 기간용 compatibility path다.
+  - 기존 repo-local 정렬 의미와 read behavior를 보존한다.
+  - Cloud rollout package, canary, alarm, rollback runbook은 제외한다.
+
+## 접근 후보
+
+### 후보 A: Fixed shard fan-out with scatter-gather reads
+
+새 row의 repo axis key를 `REPO#<repo>#SHARD#<bucket>` 형태로 분산한다. Repo-local
+reader는 알려진 shard set 전체를 query한다. Migration mode에서만 legacy
+`REPO#<repo>`도 같이 읽은 뒤 canonical order로 병합한다.
+
+이 방식을 선택한다. 현재 single-table/GSI 구조를 크게 바꾸지 않고, #23의
+hot partition 원인을 직접 줄이며, migration-only back-compat 요구사항과 잘
+맞는다.
+
+### 후보 B: Separate repo timeline index
+
+Shard된 write index와 별도의 timeline index를 둔다. Read path는 단순해질 수
+있지만 새 GSI 설계와 table schema migration 범위가 커진다. 또한 timeline index가
+repo 단일 partition이면 같은 hot partition 문제가 되돌아온다.
+
+### 후보 C: Entity-specific partial sharding
+
+`FINDING_OBSERVATION` 같은 고볼륨 entity만 먼저 shard한다. 구현은 작지만
+`FINDING_STATE`, `STATE_EVENT`, `SCAN_LEDGER`, 현재 코드의 `GHAS_ALERT` 같은
+repo-axis row가 남아 #23 완료 기준을 흐린다.
+
+## 아키텍처
+
+```mermaid
+flowchart TD
+  W["Write mapper"] --> RAK["RepoAxisKey helper"]
+  RAK --> S1["GSI1: REPO#repo#SHARD#00"]
+  RAK --> S2["GSI1: REPO#repo#SHARD#01"]
+  RAK --> SN["GSI1: REPO#repo#SHARD#N"]
+  Legacy["Legacy GSI1: REPO#repo"] --> Reader["RepoAxisReader"]
+  S1 --> Reader
+  S2 --> Reader
+  SN --> Reader
+  Reader --> Merge["dedupe and sort"]
+  Merge --> Residual["residual_for_repo"]
+  Merge --> RepoReads["repo-local state/event/ledger reads"]
+```
+
+### 구성요소
+
+`RepoAxisKey`
+
+- 역할: repo-axis shard key 생성을 한 곳에서 소유한다.
+- 입력: `repoAxisId`, entity type, stable shard material, optional timestamp.
+- 출력: `gsi1pk`, `gsi1sk`, shard metadata fields.
+- 의존성: deterministic hash function.
+
+`RepoAxisReader`
+
+- 역할: fan-out read를 storage caller에서 숨긴다.
+- 입력: `repoAxisId`, entity prefix, sort mode, `include_legacy` flag.
+- 출력: 중복 제거된 item list.
+- 의존성: `query_all_pages`, configured shard count.
+
+`RepoAxisMigrationCompat`
+
+- 역할: backfill 기간 동안 legacy `REPO#<repo>` row를 읽을 수 있게 한다.
+- 입력: `RepoAxisReader`와 같은 read request.
+- 출력: compatibility가 켜진 동안만 포함되는 legacy row.
+- 의존성: write ownership 없음. Legacy key를 permanent API로 만들면 안 된다.
+
+`repoAxisId`
+
+- 역할: sharding helper와 reader가 사용하는 단일 repo-axis identifier다.
+- 입력 source는 entity별 기존 domain field를 그대로 사용한다.
+- 목표: local scan target name, incremental `repo_id`, GHAS repository 값이
+  helper 호출부마다 다르게 해석되는 일을 막는다.
+
+## 데이터 모델
+
+### Sharded GSI1 key 형식
+
+새 repo-axis row는 다음 형태를 쓴다.
+
+```text
+gsi1pk = REPO#<repo>#SHARD#<bucket>
+gsi1sk = <existing entity prefix and sort material>
+repoAxisVersion = 2
+repoAxisShardCount = 16
+repoAxisShard = <bucket>
+```
+
+`<bucket>`은 `00`부터 `15`까지의 fixed-width deterministic 값이다.
+`repoAxisVersion = 2`의 shard count는 16으로 고정한다. Shard count는 operator
+runtime knob가 아니라 durable schema contract다. 나중에 count를 바꾸려면
+`repoAxisVersion = 3` 같은 새 version과 별도 rehash migration 또는 active version
+fan-out design이 필요하다.
+
+### repoAxisId mapping
+
+| Entity | repoAxisId source | 비고 |
+| --- | --- | --- |
+| `FINDING` | `finding.repo.full_name` | local scan은 target name 계열을 유지한다. |
+| `FINDING_OBSERVATION` | `finding.repo.full_name` | scan-worker path는 finding context를 `repo_id`와 맞춘다. |
+| `FINDING_STATE` | `finding.repo.full_name` | direct primary read는 `findingId` 기준으로 유지한다. |
+| `STATE_EVENT` | `event.repo` | repo-axis GSI1은 listing/index용이다. |
+| `SCAN_LEDGER` | `entry.repo_id` | point read는 `ScanLedgerKey` primary key로 유지한다. |
+| `GHAS_ALERT` | `alert.repository` | 현재 reader는 table scan이지만 mapper regression 대상이다. |
+
+Helper는 `repoAxisId` 문자열을 canonicalize하지 않는다. 입력 domain object가 이미
+소유한 repository identity를 전달받고, identity normalization은 해당 domain flow의
+책임으로 둔다.
+
+Shard material은 logical row마다 안정적이어야 한다.
+
+| Entity | Shard material |
+| --- | --- |
+| `FINDING` | `findingId` |
+| `FINDING_OBSERVATION` | `scanRunId`, `findingId`, `occurrenceKey` |
+| `FINDING_STATE` | `findingId` |
+| `STATE_EVENT` | `findingId`, `decidedAt`, `eventSeq` |
+| `SCAN_LEDGER` | `repoId`, `commitSha`, scanner tuple |
+| `GHAS_ALERT` | `ghasAlertId` |
+
+`GHAS_ALERT`는 현재 `origin/main`에서 `gsi1pk = REPO#<repository>`로 mapping되므로
+포함한다. 앞으로 같은 repo-axis GSI1 partition에 쓰는 entity가 추가되면
+`RepoAxisKey`를 쓰거나 out of scope로 명시해야 한다.
+
+### Sort key 의미
+
+기존 repo-local sort 의미를 보존한다. Reader가 chronological order에 의존하면
+`gsi1sk`에는 sortable time material이 있어야 한다. 기존 read가 identity/prefix
+기반이면 sharding이 더 강한 time semantics를 새로 만들지 않는다.
+
+예시:
+
+- `STATE_EVENT#<decidedAt>#<findingId>#<eventSeq>`는 time-sortable 상태와 stable
+  tie-breaker를 함께 제공한다.
+- `RUN#...#OBS#...`는 residual derivation용 observation prefix를 유지한다.
+- `LEDGER#...`는 repo ledger lookup/listing material을 유지한다. Chronological
+  ledger listing은 현재 behavior가 아니므로 새 requirement로 만들지 않는다.
+
+Scatter-gather reader가 ordered result를 반환할 때 canonical order는
+`(gsi1sk, PK, SK)`다. Entity별로 더 강한 ordering이 필요하면 spec에 별도 tuple을
+추가해야 한다.
+
+### 범위 분리
+
+| Surface | 현재 storage path | #23 이후 path | 보존 기준 |
+| --- | --- | --- | --- |
+| `read_observations_for_repo` | GSI1 `REPO#<repo>` + `RUN#` prefix | `RepoAxisReader` fan-out + optional legacy mode | residual input parity |
+| `FINDING` repo-axis index | GSI1 `REPO#<repo>` + `FINDING#` prefix | sharded GSI1 | mapper/index parity |
+| `FINDING_STATE` repo-axis index | GSI1 `REPO#<repo>` + `FINDING#` prefix | sharded GSI1 | direct `read_finding_state` unchanged |
+| `STATE_EVENT` repo-axis index | GSI1 `REPO#<repo>` + `STATE_EVENT#` prefix | sharded GSI1 with stable order | direct `read_finding_state_events` unchanged |
+| `SCAN_LEDGER` repo-axis index | GSI1 `REPO#<repo>` + `LEDGER#` prefix | sharded GSI1 | `has_scan_ledger` point read unchanged |
+| `GHAS_ALERT` repo-axis index | GSI1 `REPO#<repository>` | sharded GSI1 | current `read_ghas_alerts` scan behavior unchanged |
+| `REF_STATE` listing | primary `PK = REPO#<repo_id>` + `REF#` prefix | retained primary path | unchanged |
+| `SCAN_RUN` history | primary `PK = REPO#<repo_key>` + `SCAN_RUN#` prefix | retained primary path | unchanged |
+| `SCAN_JOB` repo status index | GSI2 `REPO#<repo_id>` | out of GSI1 #23 scope | unchanged |
+
+## 데이터 흐름
+
+### 새 write 경로
+
+1. Item mapper가 base item을 만든다.
+2. Repo-axis GSI1 entity는 mapper가 `RepoAxisKey`를 호출한다.
+3. Mapper는 sharded `gsi1pk`만 쓴다.
+4. Primary key는 별도 primary-key hotspot이 증명되기 전까지 유지한다.
+   `read_finding_state`, `has_scan_ledger`, per-finding event read는 직접
+   primary-key behavior를 유지한다.
+
+### Repo-local read 경로
+
+1. Caller가 repo-axis item을 entity prefix로 요청한다.
+2. `RepoAxisReader`가 해당 repo의 모든 configured shard partition을 query한다.
+3. Migration mode에서 `include_legacy=True`인 경우에만 legacy
+   `gsi1pk = REPO#<repo>`도 query한다.
+4. Reader가 `(PK, SK)` 기준으로 defensive dedupe한다. 정상 steady state에서는 같은
+   primary item이 legacy와 sharded partition에 동시에 존재하지 않는다.
+5. 기존 behavior가 ordered result를 노출했다면 `(gsi1sk, PK, SK)` 기준으로 sort한다.
+6. Caller는 sharding 전과 같은 logical result set을 받는다.
+
+`include_legacy` 기본값은 `False`다. Migration/backfill 검증 경로와 legacy
+compatibility tests만 명시적으로 `True`를 사용한다. 이 flag가 영구 runtime
+기능처럼 숨겨지면 안 된다.
+
+### Residual derivation 경로
+
+`residual_for_repo`는 기존 ownership split을 유지한다.
+
+- `list_ref_states(repo_id)`는 primary `PK = REPO#<repo_id>` partition에서
+  `REF_STATE` row를 읽는다.
+- `read_observations_for_repo(repo_id)`는 `RUN#` prefix로 `RepoAxisReader`를
+  사용한다. 이 method가 #23에서 반드시 fan-out reader로 전환되는 기존 repo-local
+  read다.
+- `residual_by_branch`는 tests가 ordering dependency를 드러내지 않는 한 pure
+  function으로 유지한다.
+
+`REF_STATE`는 현재 GSI1 repo-axis row가 아니다. 이 설계는 기존 listing behavior를
+보존하지만, 별도 primary-key hot partition 요구가 확인되기 전에는 이동하지 않는다.
+
+## 구성요소 상세
+
+### Item mapping
+
+현재 `gsi1pk = REPO#...`를 내보내는 item mapper는 모두 `RepoAxisKey`를 거쳐야
+한다. 현재 알려진 entity는 다음과 같다.
+
+- `GHAS_ALERT`
+- `FINDING`
+- `FINDING_OBSERVATION`
+- `FINDING_STATE`
+- `STATE_EVENT`
+- `SCAN_LEDGER`
+
+새 mapper가 helper 없이 raw `gsi1pk = f"REPO#..."`를 추가하면 test가 실패해야
+한다.
+
+### Reader compatibility
+
+Compatibility mode는 read-only다.
+
+- New writes는 legacy `REPO#<repo>` GSI1 key로 dual-write하지 않는다.
+- Backfill은 별도 row copy가 아니라 existing primary item의 `gsi1pk`, `gsi1sk`,
+  `repoAxisVersion`, `repoAxisShardCount`, `repoAxisShard`를 conditional update한다.
+- Reader는 migration mode에서만 legacy row를 포함한다.
+- Legacy item과 sharded item은 같은 `(PK, SK)`로 공존할 수 없다. Dedupe는 GSI
+  eventual-consistency 또는 test double 방어용이다.
+
+### Migration 완료 기준
+
+이 설계는 production rollout을 정의하지 않는다. 대신 compatibility code 제거
+gate를 정의한다.
+
+- Scope 안의 모든 legacy `REPO#<repo>` row type에 대해 entity별 inventory count,
+  updated/backfilled count, skipped count, failed count가 report된다.
+- `FINDING`, `FINDING_OBSERVATION`, `FINDING_STATE`, `STATE_EVENT`, `SCAN_LEDGER`,
+  `GHAS_ALERT` 각각에서 remaining legacy `gsi1pk = REPO#<repo>` count가 0이다.
+- Sharded-only read가 representative repo뿐 아니라 entity별 sampled repo에서 같은
+  logical repo-local result를 반환한다.
+- `include_legacy=False`에서 parity tests가 통과한다.
+- Normal sharded read에서 legacy query call count가 0임을 test가 확인한다.
+- Legacy-only, sharded-only, mixed legacy plus sharded state를 tests가 다룬다.
+- Legacy read 제거가 normal sharded read를 바꾸지 않도록 named code path 또는
+  flag가 있다.
+
+## 에러 처리
+
+- Shard query 하나라도 실패하면 repo-axis read는 fail closed한다. Partial repo
+  result를 complete result처럼 반환하지 않는다.
+- Legacy query와 sharded query가 같은 `(PK, SK)`를 반환하면 reader는 higher
+  `repoAxisVersion` item을 우선한다. 이는 expected steady state가 아니라 defensive
+  behavior다.
+- `repoAxisVersion` 또는 `repoAxisShard`가 없는 item은 legacy partition에서 읽힌
+  경우에만 legacy로 취급한다.
+- Shard count 변경은 별도 migration design이다. `repoAxisVersion = 2`는 shard
+  count 16을 durable schema contract로 둔다.
+
+## 테스트 전략
+
+- `RepoAxisKey` deterministic bucket assignment unit test.
+- `RepoAxisKey`가 `repoAxisVersion = 2`, `repoAxisShardCount = 16`, bucket `00`부터
+  `15`를 쓰는지 검증하는 unit test.
+- Entity별 `repoAxisId` source mapping test.
+- 모든 repo-axis entity가 sharded `gsi1pk`를 쓰고 expected `gsi1sk` prefix를
+  유지한다는 mapper test.
+- `STATE_EVENT` sharded `gsi1sk`가 `eventSeq` suffix를 포함한다는 mapper test.
+- Fan-out query, defensive dedupe, `(gsi1sk, PK, SK)` ordering, legacy fallback을
+  검증하는 reader test.
+- Shard query 하나가 실패하면 partial result를 반환하지 않는 fail-closed reader
+  test.
+- `include_legacy=False` 기본값에서 legacy query call count가 0이라는 reader test.
+- `residual_for_repo`가 legacy-only, sharded-only, mixed row에서 같은 logical
+  result를 반환한다는 residual test.
+- `read_finding_state`, `has_scan_ledger` 같은 direct primary-key read가 유지됨을
+  보이는 storage adapter test.
+- Backfill이 row copy가 아니라 existing primary item의 GSI projection field를
+  conditional update한다는 migration test.
+- Entity별 legacy inventory/backfilled/skipped/failed/remaining count report test.
+- `src/security_scanner/storage/adapters/nosql_db` 안에서 helper 밖 raw
+  `gsi1pk = REPO#` construction을 잡는 regression scan. 허용 예외는 `RepoAxisKey`
+  helper와 legacy compatibility reader뿐이다.
+
+## 마일스톤
+
+- M1: Repo-axis helper와 tests를 추가한다. `repoAxisVersion = 2`, shard count 16,
+  bucket range, entity별 `repoAxisId` mapping이 test로 고정되면 완료다.
+- M2: Repo-axis mappers를 helper로 route한다. 현재 알려진 entity가 new writes에서
+  legacy `REPO#<repo>` GSI1 key를 더 이상 내보내지 않고 `STATE_EVENT` ordering
+  suffix가 안정화되면 완료다.
+- M3: Scatter-gather reader와 legacy fallback을 추가한다. Repo-local reads가
+  `include_legacy=False` 기본값, fail-closed behavior, canonical ordering,
+  legacy-only, sharded-only, mixed state에서 통과하면 완료다.
+- M4: Residual과 repo-local readers를 연결한다. `residual_for_repo`와 관련 storage
+  tests가 parity를 증명하면 완료다.
+- M5: Migration removal gate를 구현 근처에 문서화한다. Entity별 inventory,
+  updated/backfilled, skipped, failed, remaining legacy count와 legacy query call
+  count 0 조건이 기록되면 완료다.
+
+## 열린 질문
+
+- 현재 design gate에는 없다. Cloud rollout details는 의도적으로 이 spec 밖에 둔다.
diff --git a/docs/workbench/specs/issue-23-repo-gsi1-sharding/milestones.md b/docs/workbench/specs/issue-23-repo-gsi1-sharding/milestones.md
@@ -0,0 +1,70 @@
+# Milestones — issue-23-repo-gsi1-sharding
+
+Source design: `design.md` (approved). Execution via agentic-execution loop.
+
+## M1 RepoAxisKey helper + tests
+- status: done
+- evidence: `tests/test_repo_axis_sharding.py` M1 cases pass — version=2,
+  shardCount=16, bucket 00..15 fixed-width, deterministic, distributes across all
+  16 buckets, projection shape, no id canonicalization, legacy pk unsharded.
+
+## M2 Route mappers through RepoAxisKey
+- status: done
+- evidence: 20 passed (M1+M2). All repo-axis mappers (FINDING/OBSERVATION/STATE,
+  STATE_EVENT, SCAN_LEDGER, GHAS_ALERT) emit sharded gsi1pk + metadata, preserve
+  gsi1sk prefixes; STATE_EVENT gsi1sk gains eventSeq suffix (GSI2 unchanged);
+  regression scan confirms no raw gsi1pk=REPO#/#SHARD# outside repo_axis modules.
+- note: end-to-end read tests (residual/read_observations) are transiently red
+  here because writes are sharded but reads still hit the unsharded partition;
+  restored by M4 fan-out wiring. Verified jointly in the M4 full-suite run.
+
+## M3 Scatter-gather reader + legacy fallback
+- status: done
+- evidence: `repo_axis_reader.read_repo_axis` + tests — fans out across all 16
+  shards, zero legacy queries when include_legacy=False, merges legacy when True,
+  dedupes (PK,SK) preferring higher repoAxisVersion, canonical (gsi1sk,PK,SK)
+  order, fails closed (no partial) when a shard query raises.
+
+## M4 Wire residual + repo-local readers
+- status: done
+- evidence: 69 passed. store.read_observations_for_repo now fans out via
+  read_repo_axis (include_legacy param, default False). residual parity proven
+  identical across legacy-only / sharded-only / mixed layouts; default read
+  issues no legacy partition query. Direct primary-key reads (read_finding_state,
+  has_scan_ledger, read_finding_state_events) unchanged — full suite green.
+
+## Multi-agent review (post-M5)
+- 5-dimension review (correctness/security/design-spec/tests/migration-ops),
+  each finding adversarially verified by an independent opus agent.
+- 16 findings raised, 9 survived verification (0 critical/high; 3 medium, 5 low,
+  1 nit). All 9 addressed:
+  - regression scan hardened: single source-of-truth for REPO#/#SHARD# literals
+    in repo_axis.py; scan now bans any `#SHARD#` literal + raw gsi1pk=REPO# + a
+    GSI1 read on raw REPO# partition (construction-method-agnostic).
+  - residual parity test now drives the production residual_for_repo path with
+    include_legacy threaded through residual_for_repo + _ResidualStore.
+  - backfill projection moved inside try/except (one bad row → failed, not abort).
+  - STATE_EVENT partial-migration ordering caveat documented in the reader.
+  - migration no longer reaches into private items._shard_material; join +
+    key construction centralized in repo_axis.py (drift guard retained).
+  - added tests: backfill idempotent re-run; fake update_item enforces every
+    ConditionExpression clause; canonical (gsi1sk,PK,SK) tie-break; multi-page
+    pagination within a single shard.
+- evidence: full suite 616 passed, ruff clean; changeset scoped to 6 modified +
+  4 new files + spec docs.
+- follow-up (no defer): #5 fully resolved — the per-entity repo-axis key formula
+  is now a single source of truth (`repo_axis.repo_axis_inputs` /
+  `repo_axis_projection_for_item`). Write mappers build the base item then merge
+  the projection derived from the item's own fields, the exact function the
+  backfill uses, so the two formulas can no longer drift (the parallel
+  derivation in repo_axis_migration was deleted). Still 616 passed, ruff clean.
+
+## M5 Migration removal gate
+- status: done
+- evidence: `repo_axis_migration.py` — backfill conditionally updates each
+  existing primary item's GSI projection in place (update_item, not row copy),
+  proven by the in-place test (row count unchanged, same PK/SK, now sharded);
+  per-entity inventory/backfilled/skipped/failed/remaining report with gate_clear;
+  drift guard pins backfilled projection == write-mapper projection for all 6
+  entities; removal-gate checklist documented in the module docstring near the
+  implementation. Full suite 612 passed, ruff clean.