Skip to content

Define import log and source commit policy #12

Description

@alexeygrigorev

Define import log and source commit policy

Status: pending
Tags: docs, migration, P0
Blocks: #4, #7

Current Evidence

_docs/import-log.md already exists and records the initial 2026-06-27 import state for DTC Operations, DataTasks, and Podcast Assistant. It includes source paths, source commits/source state, imported locations, exclusions, and purpose notes.

The issue is not fully satisfied yet because the file does not define the reusable import/source-state policy requested in the intake, does not record validation commands, and does not make the expected fields explicit enough for dependent imports (#4 and #7) to follow without inventing their own convention.

Scope

Update _docs/import-log.md into the canonical policy and register for copied source systems in DataTalksClub/dataops.

The implementation should preserve the existing import evidence and add a clear policy that every import or re-import records enough information for reviewers to identify what was copied, from where, at what source revision/state, what was deliberately excluded, and how the import was validated.

For each copied source system, the log should record:

  • Source system name.
  • Source path and, when applicable, source repository URL or local source identity.
  • Source commit hash for Git-backed sources, or explicit non-Git/local source state when no commit exists.
  • Destination path(s) in this repo.
  • Copied path groups or major copied contents.
  • Deliberate exclusions, especially generated, dependency, cache, secret, local runtime, and private output paths.
  • Validation commands run after import, including whether each command passed, failed, or was skipped with a reason.
  • Follow-up issue references when an import is transitional or will be moved/normalized later.

Keep this issue documentation-only. Do not copy, move, delete, refactor, or re-import source code as part of #12.

Acceptance Criteria

  • _docs/import-log.md contains a named policy section that defines the required fields for future source imports and re-imports.
  • The policy states that source repos/directories such as ../dtc-operations, ../datatasks, and ../podcast-assistant are read-only unless a specific issue explicitly scopes changes there.
  • The policy explains how to record Git-backed sources with git -C <source> rev-parse HEAD and how to record non-Git local directories such as Podcast Assistant.
  • The existing DTC Operations entry records source path, source commit, destination path(s), copied path groups, exclusions, validation commands/statuses, and any known follow-up context.
  • The existing DataTasks entry records source path, source commit, destination path(s), copied path groups, exclusions, validation commands/statuses, and dependency/follow-up context for Import DataTasks into work-engine #4.
  • The existing Podcast Assistant entry records source path, non-Git source state, current destination path, copied path groups, exclusions, validation commands/statuses, and the follow-up context that Import Podcast Assistant into assistants/podcast #7 should move/canonicalize it under assistants/podcast/.
  • Validation command entries are explicit enough that a reviewer can distinguish commands that passed from commands that were not run or were blocked by local environment constraints.
  • The final document does not claim unverified commands passed.
  • No files outside _docs/import-log.md are changed for this issue.

Test Scenarios

  • From the repo root, verify the import log exists and contains the policy and all three current source systems:

    test -f _docs/import-log.md
    rg -n "policy|DTC Operations|DataTasks|Podcast Assistant|Validation" _docs/import-log.md
  • For Git-backed local sources that are available, verify the recorded commit values can be reproduced:

    git -C ../dtc-operations rev-parse HEAD
    git -C ../datatasks rev-parse HEAD
  • For the local non-Git Podcast Assistant source, verify the source-state statement is accurate:

    test -d ../podcast-assistant
    test ! -d ../podcast-assistant/.git
  • Review _docs/import-log.md manually and confirm every source entry has source state, destination path(s), copied paths, exclusions, validation commands/statuses, and follow-up references where relevant.

Dependencies

Out of Scope

  • Importing or re-importing DataTasks.
  • Moving Podcast Assistant from podcast-assistant/ to assistants/podcast/; that remains in Import Podcast Assistant into assistants/podcast #7.
  • Renaming lambda-functions/, restructuring backend code, or changing runtime paths.
  • Adding automated tests beyond lightweight documentation/source-state verification commands.
  • Modifying any source repository or directory outside dataops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Must havedocsDocumentation or process docs workmigrationImport or migration work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions