You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build the first safe implementation slice for the future DataTalksClub/dataops-knowledge repository without creating that repository, moving production content, or changing the operator-facing read path.
This issue should add a migration scaffold and validation contract inside DataTalksClub/dataops so later agents have an exact, testable target for the knowledge repository layout and workflow-template YAML migration. The deployed portal, work-engine, and operator UX must continue to use the existing in-repo content/ fallback during this slice.
Expected implementation shape:
Add a repo-template/scaffold directory for the future dataops-knowledge repository, using a clear name such as templates/dataops-knowledge/, scaffolds/dataops-knowledge/, or another project-consistent path.
The scaffold must represent the target layout from docs/decisions/dataops-knowledge-repository.md: content/, workflow-templates/, assistant-prompts/, assistant-process/, examples/, images/, indexes/, schemas/, scripts/, and tests/.
Include concise README/guidance in the scaffold stating that it is a migration target only and is not yet the live source for the portal or work-engine.
Add strict validation assets for the future workflow-template YAML format, preferably a JSON Schema under the scaffold's schemas/ directory.
Add a machine-readable migration inventory/manifest that maps every current content/tasks/templates/*.md file to its future workflow-templates/*.yaml target and stable template ID. The manifest must cover exactly the current 11 templates: book-of-the-week, course, maven-ll, newsletter, office-hours, oss, podcast, social-media, tax-report, webinar, and workshop.
Add a validation command in the existing DataOps validation/tooling stack, preferably under lambda-functions/src/lambda_functions/, that checks the scaffold, schema, and migration manifest from the current repo. It should fail on missing scaffold directories, invalid schema JSON, missing or duplicate template mappings, missing current Markdown template files, missing stable IDs, non-task-template doc types for current templates, and target filenames outside workflow-templates/*.yaml.
Add focused automated tests for the new validator, including at least one passing fixture and failure coverage for missing template mappings, duplicate target IDs or paths, and invalid target path/schema shape.
Wire the new validation into the appropriate content/planning CI path so future edits to the scaffold, manifest, schemas, task templates, or validator are checked automatically.
Preserve the current operator experience: no deployed portal route, docs search, work-engine seed, runtime task creation, or portal edit behavior should start reading from the scaffold or future repo in this slice.
Process Curator requirements:
Keep stable document IDs as the migration boundary. File paths may change later; current task-template IDs and future template IDs must remain explicit and testable.
Keep content/tasks/templates/*.md as the transitional canonical source for this issue.
Do not copy SOPs, images, prompts, assistant process files, examples, or generated indexes into the scaffold except for placeholder README files, .gitkeep-style placeholders, schemas, tests, scripts, or minimal synthetic fixtures needed to validate the scaffold.
If the implementation includes an example workflow-template YAML file, it must be clearly synthetic or fixture-only unless the issue explicitly explains why copying production template content is safe. Production template conversion belongs to a later issue.
Acceptance Criteria
A future dataops-knowledge scaffold exists in dataops with the target top-level directories from the accepted ADR and guidance that the scaffold is not yet live production content.
The scaffold includes a strict workflow-template schema covering the core contract from the ADR: stable id, runtime type, name, schema_version, trigger model, bundle links, phases/stage mapping, tasks with stable task IDs, scheduling offsets or rules, required proof/link declarations, default assignee references by stable role/user ID, instruction_doc_id/source document references, and optional migration source metadata.
A machine-readable migration manifest maps exactly all current content/tasks/templates/*.md files to future workflow-templates/*.yaml targets with stable IDs, with no duplicates or missing current files.
A repository validation command checks the scaffold, schema, and manifest and exits non-zero with actionable messages for missing directories, invalid JSON schema, missing/duplicate mappings, missing Markdown template files, invalid template frontmatter, and invalid target YAML paths.
Automated tests cover the validator success path and the required failure cases without depending on production secrets, GitHub writes, or the future external repository.
CI validates the new scaffold/schema/manifest/validator on relevant path changes.
Current portal/content behavior remains unchanged: Lambda/docs search still reads from content/, work-engine runtime templates are not loaded from the scaffold, and portal edit commits are not redirected to a new repository.
No files are moved out of content/, assistants/, work-engine/, or source repos outside dataops.
No DataTalksClub/dataops-knowledge repository is created and no GitHub write/token/branch-protection automation is added in this slice.
Source repos outside dataops, including ../dtc-operations, ../datatasks, and ../podcast-assistant, remain read-only.
Test Scenarios
Scenario: Scaffold defines the future repository without changing runtime behavior
Given the accepted ADR and the current content/ fallback
When the validation command runs against the repository
Then it confirms the future layout, schemas, and migration manifest while the portal and work-engine still use the current in-repo content paths.
Scenario: Current Markdown templates are fully inventoried
Given the 11 current files under content/tasks/templates/
When the migration manifest is validated
Then every current file is mapped to one unique workflow-templates/*.yaml target and one stable template ID, with no missing or duplicate mappings.
Scenario: Bad migration mappings fail clearly
Given a fixture manifest with a missing template, duplicate target path or ID, invalid target path, or missing source Markdown file
When the validator runs in tests
Then it exits with a useful error that tells the next agent what to fix.
Scenario: Workflow-template schema is enforceable before conversion
Given a minimal valid fixture workflow template and invalid fixtures missing required fields
When schema validation runs in tests
Then the valid fixture passes and invalid fixtures fail before any production template conversion begins.
Scenario: Operator UX stays unified during the transition
Given the scaffold exists in the repo
When an operator opens the current portal or a workflow task references process docs
Then the app continues resolving docs from content/ and existing stable document IDs; the scaffold is not exposed as a disconnected second operator tool.
Required Verification
Software Engineer should run and report exact commands and exit codes for:
git diff --check
Focused tests for the new validator, for example uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app/test_validate_knowledge_repo.py
The new validation command against the real repository, using its documented CLI invocation.
Existing docs/content link validation: uv run --project lambda-functions --extra search python -m lambda_functions.validate_docs_links --repo-root . --content-root content
Search-index build to prove current content remains readable: cd lambda-functions && uv run --extra search python -m lambda_functions.build_search_index --docs-dir ../content --output ../.tmp/dataops-content-search.index
Full docs-app test workflow unless the implementation only adds isolated schema files and a narrowly scoped validator; if narrowed, explain why the focused validator tests plus content validation prove the acceptance criteria.
Tester should rerun the relevant verification independently. Screenshots are not required unless the implementation changes portal UI, routes, or rendered content.
Out of Scope
Creating DataTalksClub/dataops-knowledge or changing GitHub repository settings, branch protection, repository visibility, secrets, or tokens.
Moving, copying, or deleting production SOPs, images, prompts, assistant process files, examples, generated indexes, or task templates into a new canonical location.
Converting the 11 Markdown task templates into production canonical YAML files.
Implementing work-engine template loading/sync from Git-backed YAML, template version tracking, or source commit tracking.
Changing Lambda/portal configuration to read from dataops-knowledge.
Implementing portal edit commits to the knowledge repository.
Refreshing deployed portal cache/search index from an external knowledge repository.
Performing the data-safety review for content, images, prompts, assistant knowledge, examples, or private artifacts.
Migrating runtime state, DynamoDB data, assistant outputs, recordings, transcripts, invoices, receipts, sponsor/client records, or other private/bulky files.
Modifying ../dtc-operations, ../datatasks, or ../podcast-assistant.
Use docs/repository-structure-recommendation.md, docs/STRUCTURE.md, and docs/README.md for current content structure, frontmatter, stable ID, and repo-meta conventions.
The next implementation owner is Software Engineer.
Later issues must handle repository creation, data-safety review, production YAML conversion, work-engine sync/load, portal config, portal edit commits, external-repo CI, deployed cache refresh, assistant knowledge migration, and private artifact handling.
Add dataops-knowledge migration scaffold and template validation
Status: pending
Tags:
enhancement,docs,migration,process-docs,testing,data,P1Depends on: #47 (closed)
Blocks: Future repository creation, content migration, template conversion, template sync/loading, portal edit commits, and deployed cache refresh issues.
Scope
Build the first safe implementation slice for the future
DataTalksClub/dataops-knowledgerepository without creating that repository, moving production content, or changing the operator-facing read path.This issue should add a migration scaffold and validation contract inside
DataTalksClub/dataopsso later agents have an exact, testable target for the knowledge repository layout and workflow-template YAML migration. The deployed portal, work-engine, and operator UX must continue to use the existing in-repocontent/fallback during this slice.Expected implementation shape:
dataops-knowledgerepository, using a clear name such astemplates/dataops-knowledge/,scaffolds/dataops-knowledge/, or another project-consistent path.docs/decisions/dataops-knowledge-repository.md:content/,workflow-templates/,assistant-prompts/,assistant-process/,examples/,images/,indexes/,schemas/,scripts/, andtests/.schemas/directory.content/tasks/templates/*.mdfile to its futureworkflow-templates/*.yamltarget and stable template ID. The manifest must cover exactly the current 11 templates:book-of-the-week,course,maven-ll,newsletter,office-hours,oss,podcast,social-media,tax-report,webinar, andworkshop.lambda-functions/src/lambda_functions/, that checks the scaffold, schema, and migration manifest from the current repo. It should fail on missing scaffold directories, invalid schema JSON, missing or duplicate template mappings, missing current Markdown template files, missing stable IDs, non-task-templatedoc types for current templates, and target filenames outsideworkflow-templates/*.yaml.Process Curator requirements:
content/tasks/templates/*.mdas the transitional canonical source for this issue..gitkeep-style placeholders, schemas, tests, scripts, or minimal synthetic fixtures needed to validate the scaffold.Acceptance Criteria
dataops-knowledgescaffold exists indataopswith the target top-level directories from the accepted ADR and guidance that the scaffold is not yet live production content.id, runtimetype,name,schema_version, trigger model, bundle links, phases/stage mapping, tasks with stable task IDs, scheduling offsets or rules, required proof/link declarations, default assignee references by stable role/user ID,instruction_doc_id/source document references, and optional migration source metadata.content/tasks/templates/*.mdfiles to futureworkflow-templates/*.yamltargets with stable IDs, with no duplicates or missing current files.content/, work-engine runtime templates are not loaded from the scaffold, and portal edit commits are not redirected to a new repository.content/,assistants/,work-engine/, or source repos outsidedataops.DataTalksClub/dataops-knowledgerepository is created and no GitHub write/token/branch-protection automation is added in this slice.dataops, including../dtc-operations,../datatasks, and../podcast-assistant, remain read-only.Test Scenarios
Scenario: Scaffold defines the future repository without changing runtime behavior
Given the accepted ADR and the current
content/fallbackWhen the validation command runs against the repository
Then it confirms the future layout, schemas, and migration manifest while the portal and work-engine still use the current in-repo content paths.
Scenario: Current Markdown templates are fully inventoried
Given the 11 current files under
content/tasks/templates/When the migration manifest is validated
Then every current file is mapped to one unique
workflow-templates/*.yamltarget and one stable template ID, with no missing or duplicate mappings.Scenario: Bad migration mappings fail clearly
Given a fixture manifest with a missing template, duplicate target path or ID, invalid target path, or missing source Markdown file
When the validator runs in tests
Then it exits with a useful error that tells the next agent what to fix.
Scenario: Workflow-template schema is enforceable before conversion
Given a minimal valid fixture workflow template and invalid fixtures missing required fields
When schema validation runs in tests
Then the valid fixture passes and invalid fixtures fail before any production template conversion begins.
Scenario: Operator UX stays unified during the transition
Given the scaffold exists in the repo
When an operator opens the current portal or a workflow task references process docs
Then the app continues resolving docs from
content/and existing stable document IDs; the scaffold is not exposed as a disconnected second operator tool.Required Verification
Software Engineer should run and report exact commands and exit codes for:
git diff --checkuv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app/test_validate_knowledge_repo.pyuv run --project lambda-functions --extra search python -m lambda_functions.validate_docs_links --repo-root . --content-root contentcd lambda-functions && uv run --extra search python -m lambda_functions.build_search_index --docs-dir ../content --output ../.tmp/dataops-content-search.indexTester should rerun the relevant verification independently. Screenshots are not required unless the implementation changes portal UI, routes, or rendered content.
Out of Scope
DataTalksClub/dataops-knowledgeor changing GitHub repository settings, branch protection, repository visibility, secrets, or tokens.dataops-knowledge.../dtc-operations,../datatasks, or../podcast-assistant.Dependencies
docs/decisions/dataops-knowledge-repository.md, especially the rule to keepcontent/indataopsuntil read/sync/edit/refresh support exists.docs/repository-structure-recommendation.md,docs/STRUCTURE.md, anddocs/README.mdfor current content structure, frontmatter, stable ID, and repo-meta conventions.