Skip to content

[codex] Implement Issue #6 CORE NoSQL schema split#11

Merged
pureliture merged 3 commits into
mainfrom
codex/issue-6-core-schema
Jun 12, 2026
Merged

[codex] Implement Issue #6 CORE NoSQL schema split#11
pureliture merged 3 commits into
mainfrom
codex/issue-6-core-schema

Conversation

@pureliture

@pureliture pureliture commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

요약

Issue #6의 DynamoDB-compatible 저장 구조를 CORE 범위로만 정리했습니다.

핵심은 기존 run-scoped FINDING row를 세 역할로 분리하는 것입니다.

  • FINDING: stable finding identity
  • FINDING_OBSERVATION: scan run에서 관측된 finding snapshot
  • FINDING_STATE: lifecycle / triage state

REPO_METASCAN_RUN 흐름은 유지했고, read_for_scan_run() / read_all()의 runtime Finding 복원 의미도 유지했습니다.

구조 변경

flowchart LR
    R["REPO_META<br/>PK=REPO#repoKey<br/>SK=META"]
    S["SCAN_RUN<br/>PK=REPO#repoKey<br/>SK=SCAN_RUN#scanAtIso#scanRunId"]
    I["FINDING<br/>PK=FINDING#findingId<br/>SK=META"]
    O["FINDING_OBSERVATION<br/>PK=RUN#scanRunId<br/>SK=OBS#findingId#occurrenceKey"]
    T["FINDING_STATE<br/>PK=FINDING#findingId<br/>SK=STATE#GLOBAL"]

    R --> S
    S --> O
    O --> I
    I --> T
    O -. "read path에서 state overlay" .-> T
Loading

Read path

sequenceDiagram
    participant Caller
    participant Store as DynamoDbCompatibleFindingStore
    participant DB as DynamoDB-compatible table

    Caller->>Store: read_for_scan_run(scanRunId)
    Store->>DB: Query PK=RUN#scanRunId, SK begins_with OBS#
    DB-->>Store: FINDING_OBSERVATION rows
    Store->>DB: BatchGetItem PK=FINDING#findingId, SK=STATE#GLOBAL
    DB-->>Store: FINDING_STATE rows
    Store-->>Caller: Finding snapshot + lifecycle state overlay
Loading

Manual triage 보호

Scan write는 FINDING_STATE를 조건부 생성만 합니다.
이미 manual triage가 들어간 state row가 있으면 scan write가 verdict / verifier / reason을 덮어쓰지 않습니다.

flowchart TD
    A["scan write"] --> B["FINDING identity upsert"]
    A --> C["FINDING_OBSERVATION write"]
    A --> D{"FINDING_STATE exists?"}
    D -- "no" --> E["default STATE#GLOBAL create"]
    D -- "yes" --> F["preserve manual triage"]
Loading

Review 반영

  • read_for_scan_run()의 per-finding state query를 batch_get_item 기반 100개 단위 batch read로 변경했습니다.
  • finding_to_items() 반환 item에 without_none()을 적용해 top-level None attribute write를 막았습니다.

명시적 non-goals

이번 PR은 Issue #6 comment의 CORE-only 경계만 구현합니다.

  • FindingFingerprintMap 추가 없음
  • ScanRunQueryRows 추가 없음
  • PatternQueryRows 추가 없음
  • standalone Artifacts table/item 추가 없음
  • TTL / streams / Lambda / production DynamoDB behavior 추가 없음

충돌 해소

origin/main의 scan target catalog helper와 이 PR의 Issue #6 occurrence/state helper가 items.py에서 같은 위치에 들어와 충돌했습니다.
둘 다 필요한 변경이라 아래처럼 병합했습니다.

  • scan_target_to_item() / scan_target_from_item() 유지
  • STATE_SCOPE_GLOBAL / occurrence_key_for_finding() 유지
  • finding_to_items()FINDING, FINDING_OBSERVATION, FINDING_STATE split 유지

검증

Post-review 기준으로 다시 확인했습니다.

  • PYTHONDONTWRITEBYTECODE=1 uv run pytest -p no:cacheprovider tests/test_dynamodb_compatible_store.py tests/test_nosql_db_adapter.py tests/test_scan_target_storage.py → 43 passed
  • PYTHONDONTWRITEBYTECODE=1 uv run pytest -p no:cacheprovider → 350 passed
  • git diff --check origin/main...HEAD → passed
  • temporary Dynalite live CRUD smoke → passed
    • table bootstrap
    • write
    • query
    • scan
    • batch state overlay
    • conditional state protection
    • cleanup
    • manual_triage_protected=true

Closes #6

Co-Authored-By: Codex GPT-5 <noreply@openai.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the NoSQL database schema for security scan results by splitting findings into three distinct entities: FINDING (identity), FINDING_OBSERVATION (run-scoped snapshot), and FINDING_STATE (lifecycle state). It introduces deterministic occurrence keys for observations and ensures that manual triage states are not overwritten during subsequent scans. Feedback on these changes suggests optimizing the read_for_scan_run method to avoid an N+1 query bottleneck by batching state retrieval with DynamoDB's batch_get_item API, and wrapping the returned items in without_none to prevent writing None values as NULL attributes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/security_scanner/storage/adapters/nosql_db/store.py
Comment thread src/security_scanner/storage/adapters/nosql_db/items.py Outdated
Co-Authored-By: Codex GPT-5 <noreply@openai.com>
@pureliture pureliture marked this pull request as ready for review June 12, 2026 01:32
Co-Authored-By: Codex GPT-5 <noreply@openai.com>
@pureliture pureliture merged commit b57f135 into main Jun 12, 2026
2 checks passed
@pureliture pureliture deleted the codex/issue-6-core-schema branch June 12, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DynamoDB Schema 리뷰

1 participant