You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decide and document the repository boundary for DataOps operational knowledge before we move process documents, templates, prompts, or validation workflows out of the application repo.
This is an internal architecture/process planning issue. The output should be an ADR or equivalent decision document plus a concrete migration plan. It should not create repositories, move files, change runtime configuration, or implement sync code.
The decision must use the current V1 direction and repository boundaries as context:
DataTalksClub/dataops is the product/runtime repo for ops.dtcdev.click, including the portal, Lambda APIs, work-engine, assistant modules, tests, app deployment, and small development seeds.
content/ is the current transitional home for imported SOPs, references, images, prompts, indexes, and task templates.
content/tasks/templates/ contains DataTasks workflow templates that encode operational process and must be preserved independently from runtime state.
assistants/podcast/ is the canonical in-repo Podcast Assistant module. Its code and tests belong with the product repo, while its prompts/process knowledge/template inputs may need a documented knowledge-repo boundary.
Runtime workflow state, task instances, reminders, artifact metadata, assistant job metadata, and audit events belong in DynamoDB and portable exports, not in the process-doc repository.
Private or bulky artifacts such as raw uploads, recordings, transcripts, invoices, receipts, generated guest-specific podcast documents, and DynamoDB exports must remain outside public process docs.
../dtc-operations, ../datatasks, and ../podcast-assistant remain read-only source systems unless a separate issue explicitly scopes source-repo changes.
The recommended ADR should evaluate whether the long-term knowledge repo is DataTalksClub/dataops-knowledge, DataTalksClub/dataops-docs, or another name, and whether it should be public or private.
Acceptance Criteria
An internal ADR or decision document is added under docs/ and records the chosen repository boundary, repository name, visibility, ownership, and rationale.
The ADR explicitly distinguishes app/runtime code in DataTalksClub/dataops, canonical process knowledge in the future knowledge/docs repo, shared/account infrastructure in DataTalksClub/aws-infra, runtime state in DynamoDB, and private/bulky artifacts in external storage such as S3/Dropbox/Google Drive.
The ADR decides whether the knowledge repo contains only content/ or also task templates, workflow definitions, screenshots/images, process metadata, prompts, assistant process instructions, and lightweight validation/index files.
The ADR explicitly decides the canonical Git location and preferred format for DataTasks templates currently in content/tasks/templates/, including how those templates stay preserved independently of DynamoDB/work-engine runtime state.
The ADR documents the assistant boundary for assistants/podcast/: what remains product code in dataops, what knowledge/prompts/templates may move to the process repo, and what generated/private assistant outputs must stay outside Git.
A migration plan inventories current directories to keep in dataops, move to the knowledge repo, leave as runtime/external artifacts, or defer. The plan must include content/, content/tasks/templates/, content/images/, content/prompts/, content/indexes/, assistants/podcast/process/, assistants/podcast/templates/, and assistants/podcast/knowledge_base/.
The plan identifies runtime configuration changes needed for Lambda/portal content reads, portal edits, content cache/index refresh, and any GitHub token or repository permission changes.
The plan describes how portal edits should commit back to the knowledge repo, including branch strategy, review expectations, commit authoring, and rollback/revert behavior.
The plan describes CI expectations for the knowledge repo, including process-doc validation, template schema validation, internal link/image checks, search-index generation, and how a docs push refreshes the deployed portal cache without requiring an app redeploy.
The decision includes an issue-tracking model for document work: whether docs issues live in dataops, in the future knowledge repo, or in both with cross-links.
The decision preserves data-safety/export requirements: canonical templates and process docs are Git-backed, runtime task/workflow state remains exportable separately, and private/generated operational data is not moved into a public docs repo.
The issue report lists follow-up implementation issues to create after the ADR is accepted, including repository creation, content migration, template sync/loading, portal config changes, docs CI, and cache refresh wiring.
Test Scenarios
Scenario: ADR separates knowledge from runtime state
Given the current V1 plan, imported content, task templates, and assistant module boundaries
When the ADR is reviewed
Then it clearly states which repository owns process knowledge, which system owns runtime task/workflow state, and which private artifacts stay outside Git.
Scenario: DataTasks templates stay recoverable
Given task templates currently live in content/tasks/templates/ and describe repeatable operations
When the boundary decision is applied
Then the templates have a canonical Git-backed home and a documented sync/load path into the work-engine without making DynamoDB the only source of truth.
Scenario: Portal edits have an ownership path
Given an operator edits an SOP or workflow template through the portal
When the future implementation follows the ADR
Then the edit is written to the knowledge repo with the documented branch/review/permission model, not silently mixed into app-code deployment changes.
Scenario: Assistant knowledge is not mixed with private outputs
Given Podcast Assistant has process instructions, templates, knowledge-base files, generated documents, inbox files, and logs
When the migration plan classifies assistant directories
Then reusable process knowledge is separated from assistant product code and private/generated outputs.
Scenario: Docs changes can refresh the live portal safely
Given a process-doc or template change lands in the future knowledge repo
When CI validation passes
Then the plan explains how the deployed portal refreshes content/search/index data without requiring a new Lambda app deployment.
Out of Scope
Creating DataTalksClub/dataops-knowledge, DataTalksClub/dataops-docs, or any other repository.
Changing ../dtc-operations, ../datatasks, or ../podcast-assistant.
Migrating runtime state, DynamoDB data, generated assistant outputs, recordings, transcripts, invoices, receipts, or other private/bulky artifacts into Git.
Using user-facing prose tooling for this internal architecture/process document.
Dependencies
Use .goal-v1.md as the product goal: the operator experiences one unified workflow-first workspace even if repositories remain separated by ownership.
Use docs/repository-structure-recommendation.md as the current recommendation baseline, but update the decision if the ADR finds a stronger DataOps-specific boundary.
Use _docs/import-log.md as the source-state and read-only source-repo policy.
Use current content/, content/tasks/templates/, work-engine/docs/templates.md, and assistants/podcast/ paths as the inventory baseline.
Keep the decision compatible with the AI Shipping Labs-derived process in _docs/PROCESS.md: PM grooming, specialist review if needed, Tester verification, PM acceptance, local merge/push, and On-Call monitoring for any later implementation work.
Decide separate repository strategy for operations documents
Status: pending
Tags:
docs,process-docs,migration,data,research,P0Depends on: None
Blocks: #25, #33, #34
Scope
Decide and document the repository boundary for DataOps operational knowledge before we move process documents, templates, prompts, or validation workflows out of the application repo.
This is an internal architecture/process planning issue. The output should be an ADR or equivalent decision document plus a concrete migration plan. It should not create repositories, move files, change runtime configuration, or implement sync code.
The decision must use the current V1 direction and repository boundaries as context:
DataTalksClub/dataopsis the product/runtime repo forops.dtcdev.click, including the portal, Lambda APIs, work-engine, assistant modules, tests, app deployment, and small development seeds.content/is the current transitional home for imported SOPs, references, images, prompts, indexes, and task templates.content/tasks/templates/contains DataTasks workflow templates that encode operational process and must be preserved independently from runtime state.assistants/podcast/is the canonical in-repo Podcast Assistant module. Its code and tests belong with the product repo, while its prompts/process knowledge/template inputs may need a documented knowledge-repo boundary.../dtc-operations,../datatasks, and../podcast-assistantremain read-only source systems unless a separate issue explicitly scopes source-repo changes.The recommended ADR should evaluate whether the long-term knowledge repo is
DataTalksClub/dataops-knowledge,DataTalksClub/dataops-docs, or another name, and whether it should be public or private.Acceptance Criteria
docs/and records the chosen repository boundary, repository name, visibility, ownership, and rationale.DataTalksClub/dataops, canonical process knowledge in the future knowledge/docs repo, shared/account infrastructure inDataTalksClub/aws-infra, runtime state in DynamoDB, and private/bulky artifacts in external storage such as S3/Dropbox/Google Drive.content/or also task templates, workflow definitions, screenshots/images, process metadata, prompts, assistant process instructions, and lightweight validation/index files.content/tasks/templates/, including how those templates stay preserved independently of DynamoDB/work-engine runtime state.assistants/podcast/: what remains product code indataops, what knowledge/prompts/templates may move to the process repo, and what generated/private assistant outputs must stay outside Git.dataops, move to the knowledge repo, leave as runtime/external artifacts, or defer. The plan must includecontent/,content/tasks/templates/,content/images/,content/prompts/,content/indexes/,assistants/podcast/process/,assistants/podcast/templates/, andassistants/podcast/knowledge_base/.dataops, in the future knowledge repo, or in both with cross-links.Test Scenarios
Scenario: ADR separates knowledge from runtime state
Given the current V1 plan, imported content, task templates, and assistant module boundaries
When the ADR is reviewed
Then it clearly states which repository owns process knowledge, which system owns runtime task/workflow state, and which private artifacts stay outside Git.
Scenario: DataTasks templates stay recoverable
Given task templates currently live in
content/tasks/templates/and describe repeatable operationsWhen the boundary decision is applied
Then the templates have a canonical Git-backed home and a documented sync/load path into the work-engine without making DynamoDB the only source of truth.
Scenario: Portal edits have an ownership path
Given an operator edits an SOP or workflow template through the portal
When the future implementation follows the ADR
Then the edit is written to the knowledge repo with the documented branch/review/permission model, not silently mixed into app-code deployment changes.
Scenario: Assistant knowledge is not mixed with private outputs
Given Podcast Assistant has process instructions, templates, knowledge-base files, generated documents, inbox files, and logs
When the migration plan classifies assistant directories
Then reusable process knowledge is separated from assistant product code and private/generated outputs.
Scenario: Docs changes can refresh the live portal safely
Given a process-doc or template change lands in the future knowledge repo
When CI validation passes
Then the plan explains how the deployed portal refreshes content/search/index data without requiring a new Lambda app deployment.
Out of Scope
DataTalksClub/dataops-knowledge,DataTalksClub/dataops-docs, or any other repository.dataops.../dtc-operations,../datatasks, or../podcast-assistant.Dependencies
.goal-v1.mdas the product goal: the operator experiences one unified workflow-first workspace even if repositories remain separated by ownership.docs/repository-structure-recommendation.mdas the current recommendation baseline, but update the decision if the ADR finds a stronger DataOps-specific boundary._docs/import-log.mdas the source-state and read-only source-repo policy.content/,content/tasks/templates/,work-engine/docs/templates.md, andassistants/podcast/paths as the inventory baseline._docs/PROCESS.md: PM grooming, specialist review if needed, Tester verification, PM acceptance, local merge/push, and On-Call monitoring for any later implementation work.