You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a stable document identity policy and the first implementation slice for process documents that are used by DataOps workflows.
The current docs conventions already describe frontmatter IDs, and the docs registry can generate fallback IDs from paths. That fallback is not enough for V1 workflow execution because task templates and runtime tasks need process-document references that survive file renames, repo moves, and the future dataops-knowledge split described in docs/decisions/dataops-knowledge-repository.md.
This issue should make stable IDs an explicit contract for workflow-critical process docs and migrate the first priority set. The implementer should inspect and update the relevant documentation and code paths, especially:
.goal-v1.md
_docs/PROCESS.md
_docs/MERGE_PLAN.md
PROJECT_PLAN.md
PORTAL_ANALYSIS.md
docs/STRUCTURE.md
docs/sop-format.md
docs/local-development.md
docs/decisions/dataops-knowledge-repository.md
docs/v1-workflow-data-model.md
content/tasks/templates/*.md
Podcast workflow process docs referenced by content/tasks/templates/podcast.md and work-engine/scripts/seed-templates.ts
work-engine template/task metadata code and tests when sourceDocIds or instructionDocId mappings are changed
The first migration target is:
Every Markdown task template in content/tasks/templates/*.md has explicit stable frontmatter id, aliases, doc_type: task-template, schema_version: 1, source, systems, tags, and related_docs where applicable.
Every Podcast process doc referenced by the Podcast task template or Podcast seed-template instructionDocId has an explicit stable frontmatter id, or the reference is corrected to the stable ID already present on the target document.
Workflow template records that already expose sourceDocIds and task definitions that expose instructionDocId use these stable document IDs rather than path-derived or Google Docs-only references.
The policy docs explain how IDs, aliases, source metadata, task/workflow references, and repository boundaries work together.
Metadata Format
Use the existing frontmatter style and make it normative for workflow-critical process docs:
id is the canonical stable identity. It must use lowercase letters, numbers, dots, dashes, and underscores only.
Prefer namespaces by type and domain: sop.<domain>.<area>.<slug>, template.<domain>.<area>.<slug>, reference.<domain>.<area>.<slug>, playbook.<domain>.<area>.<slug>, prompt.<domain>.<area>.<slug>, and task-template.tasks.<workflow>.
aliases contains old IDs, old generated IDs, or old repo-relative paths that should still resolve after a rename or migration.
related_docs, task-template sourceDocIds, and task/task-definition instructionDocId should prefer stable IDs over paths and external Google Docs URLs.
source remains provenance for imported docs. It is not the runtime identity.
instructionsUrl may remain for external Google Docs links during migration, but it must not be the only process-doc identity when an in-repo process doc exists.
Migration Strategy
Do this as a narrow first batch, not a repo-wide blind rewrite.
Update the documentation contract first so future agents know when a stable ID is required and how to choose it.
Audit the current workflow-critical docs and list any generated fallback IDs that are being relied on today.
Add explicit IDs to all content/tasks/templates/*.md, preserving existing generated IDs as aliases when the explicit ID differs from the current generated ID.
Add explicit IDs to the Podcast docs referenced by content/tasks/templates/podcast.md and work-engine/scripts/seed-templates.ts where missing.
Update related_docs, sourceDocIds, and instructionDocId values only when needed to point at the canonical stable ID.
Keep source repositories read-only. Do not edit ../dtc-operations, ../datatasks, or ../podcast-assistant.
Do not split content/ into DataTalksClub/dataops-knowledge in this issue. The ADR remains a future boundary decision.
Validation And Search Impact
The implementation must keep the docs registry and search behavior deterministic:
Registry validation must fail loudly for duplicate id values, duplicate aliases, aliases that conflict with another document ID/path, invalid ID syntax, and broken related_docs references.
Workflow-critical docs migrated in this issue must report stable_id: true and id_source: frontmatter from the docs API/registry.
Search index generation must still succeed after the migration and must index the stable id field for migrated docs.
Existing path and alias resolution must keep working for migrated docs.
Generated fallback IDs may remain for legacy docs outside this issue's migration batch, but tests should make clear that workflow-critical docs are not allowed to rely on fallback IDs.
Acceptance Criteria
docs/STRUCTURE.md, docs/sop-format.md, and relevant process/development docs explain the stable-ID requirement for workflow-critical process docs, aliases, source provenance, related_docs, sourceDocIds, instructionDocId, and the future knowledge-repo boundary.
All content/tasks/templates/*.md files have explicit stable frontmatter IDs using the task-template.tasks.<workflow> namespace and are covered by tests.
Podcast workflow docs referenced by content/tasks/templates/podcast.md and Podcast seed-template instructionDocId values have explicit stable IDs or references are corrected to existing explicit IDs.
related_docs, sourceDocIds, and instructionDocId values for the migrated batch resolve through the document registry by stable ID.
Registry/search behavior preserves aliases and path resolution for migrated docs, and duplicate/conflicting IDs fail with actionable validation errors.
Search index build succeeds and includes stable IDs for the migrated task templates and Podcast process docs.
Work-engine template/task tests still prove that sourceDocIds and instructionDocId persist, export, and instantiate correctly.
No source repositories outside DataTalksClub/dataops are modified.
Test Scenarios
Scenario: Task template has stable identity
Given: each Markdown file under content/tasks/templates/
When: the docs registry indexes the content tree
Then: each template record has id_source: frontmatter, stable_id: true, doc_type: task-template, and a unique task-template.tasks.<workflow> ID.
Scenario: Podcast task opens process doc by ID
Given: a Podcast task definition with instructionDocId: sop.media.podcast.create-podcast-document
When: the frontend or docs API resolves the instruction document
Then: the stable ID resolves to the expected Markdown document even if the file path has an alias or generated fallback ID.
Scenario: Duplicate ID fails loudly
Given: two Markdown docs define the same id
When: registry validation or search-index build runs
Then: validation fails with a message naming both conflicting paths.
Scenario: Alias preserves old references
Given: a migrated doc has an alias for its previous generated/path reference
When: /docs/resolve?ref=<alias> or registry resolution is used
Then: it resolves to the canonical document record and reports the canonical id.
Scenario: Search indexes migrated IDs
Given: migrated task-template and Podcast process docs
When: the search index is built
Then: the index build succeeds, migrated docs include their stable id keyword field, and existing search tests still pass.
Scenario: Work-engine keeps process-doc identity
Given: a template with sourceDocIds and task definitions with instructionDocId
When: templates are created, updated, instantiated into tasks, and exported
Then: the stable process-doc IDs are preserved and no code falls back to Google Docs URLs as the only identity for migrated in-repo docs.
Out of Scope
Moving content/ or process knowledge into a new DataTalksClub/dataops-knowledge repository.
Adding stable IDs to every imported SOP/template/reference in content/** beyond the task-template files and Podcast workflow-critical docs named by this issue.
Changing production data, DynamoDB records, source repositories, or external Google Docs.
Dependencies
This issue is foundational for Add internal link and related-doc validation #34 and the workflow mapping issues because they should rely on stable document identities rather than generated path IDs.
No external credentials or human-only checks are expected.
If a referenced Podcast process doc does not exist in content/**, document the missing mapping in the implementation notes and keep the current external instructionsUrl; do not invent a document or edit a source repository.
Labels
Use labels: enhancement, docs, process-docs, backend, work-engine, testing, data, P0.
Remove needs grooming after this body is applied.
Verification Commands
Run the relevant focused checks during implementation, and the full relevant workflow before handoff:
Extend process docs with stable IDs
Status: pending
Tags:
enhancement,docs,process-docs,backend,work-engine,testing,data,P0Depends on: None
Blocks: #34, #36, #37, #38, #39
Scope
Add a stable document identity policy and the first implementation slice for process documents that are used by DataOps workflows.
The current docs conventions already describe frontmatter IDs, and the docs registry can generate fallback IDs from paths. That fallback is not enough for V1 workflow execution because task templates and runtime tasks need process-document references that survive file renames, repo moves, and the future
dataops-knowledgesplit described indocs/decisions/dataops-knowledge-repository.md.This issue should make stable IDs an explicit contract for workflow-critical process docs and migrate the first priority set. The implementer should inspect and update the relevant documentation and code paths, especially:
.goal-v1.md_docs/PROCESS.md_docs/MERGE_PLAN.mdPROJECT_PLAN.mdPORTAL_ANALYSIS.mddocs/STRUCTURE.mddocs/sop-format.mddocs/local-development.mddocs/decisions/dataops-knowledge-repository.mddocs/v1-workflow-data-model.mdcontent/tasks/templates/*.mdcontent/tasks/templates/podcast.mdandwork-engine/scripts/seed-templates.tslambda-functions/src/lambda_functions/doc_registry.pylambda-functions/src/lambda_functions/docs_index.pytests/docs_app/sourceDocIdsorinstructionDocIdmappings are changedThe first migration target is:
content/tasks/templates/*.mdhas explicit stable frontmatterid,aliases,doc_type: task-template,schema_version: 1,source,systems,tags, andrelated_docswhere applicable.instructionDocIdhas an explicit stable frontmatterid, or the reference is corrected to the stable ID already present on the target document.sourceDocIdsand task definitions that exposeinstructionDocIduse these stable document IDs rather than path-derived or Google Docs-only references.Metadata Format
Use the existing frontmatter style and make it normative for workflow-critical process docs:
Rules to preserve or document:
idis the canonical stable identity. It must use lowercase letters, numbers, dots, dashes, and underscores only.sop.<domain>.<area>.<slug>,template.<domain>.<area>.<slug>,reference.<domain>.<area>.<slug>,playbook.<domain>.<area>.<slug>,prompt.<domain>.<area>.<slug>, andtask-template.tasks.<workflow>.aliasescontains old IDs, old generated IDs, or old repo-relative paths that should still resolve after a rename or migration.related_docs, task-templatesourceDocIds, and task/task-definitioninstructionDocIdshould prefer stable IDs over paths and external Google Docs URLs.sourceremains provenance for imported docs. It is not the runtime identity.instructionsUrlmay remain for external Google Docs links during migration, but it must not be the only process-doc identity when an in-repo process doc exists.Migration Strategy
Do this as a narrow first batch, not a repo-wide blind rewrite.
content/tasks/templates/*.md, preserving existing generated IDs as aliases when the explicit ID differs from the current generated ID.content/tasks/templates/podcast.mdandwork-engine/scripts/seed-templates.tswhere missing.related_docs,sourceDocIds, andinstructionDocIdvalues only when needed to point at the canonical stable ID.../dtc-operations,../datatasks, or../podcast-assistant.content/intoDataTalksClub/dataops-knowledgein this issue. The ADR remains a future boundary decision.Validation And Search Impact
The implementation must keep the docs registry and search behavior deterministic:
idvalues, duplicate aliases, aliases that conflict with another document ID/path, invalid ID syntax, and brokenrelated_docsreferences.stable_id: trueandid_source: frontmatterfrom the docs API/registry.idfield for migrated docs.Acceptance Criteria
docs/STRUCTURE.md,docs/sop-format.md, and relevant process/development docs explain the stable-ID requirement for workflow-critical process docs, aliases, source provenance,related_docs,sourceDocIds,instructionDocId, and the future knowledge-repo boundary.content/tasks/templates/*.mdfiles have explicit stable frontmatter IDs using thetask-template.tasks.<workflow>namespace and are covered by tests.content/tasks/templates/podcast.mdand Podcast seed-templateinstructionDocIdvalues have explicit stable IDs or references are corrected to existing explicit IDs.related_docs,sourceDocIds, andinstructionDocIdvalues for the migrated batch resolve through the document registry by stable ID.sourceDocIdsandinstructionDocIdpersist, export, and instantiate correctly.DataTalksClub/dataopsare modified.Test Scenarios
Scenario: Task template has stable identity
Given: each Markdown file under
content/tasks/templates/When: the docs registry indexes the content tree
Then: each template record has
id_source: frontmatter,stable_id: true,doc_type: task-template, and a uniquetask-template.tasks.<workflow>ID.Scenario: Podcast task opens process doc by ID
Given: a Podcast task definition with
instructionDocId: sop.media.podcast.create-podcast-documentWhen: the frontend or docs API resolves the instruction document
Then: the stable ID resolves to the expected Markdown document even if the file path has an alias or generated fallback ID.
Scenario: Duplicate ID fails loudly
Given: two Markdown docs define the same
idWhen: registry validation or search-index build runs
Then: validation fails with a message naming both conflicting paths.
Scenario: Alias preserves old references
Given: a migrated doc has an alias for its previous generated/path reference
When:
/docs/resolve?ref=<alias>or registry resolution is usedThen: it resolves to the canonical document record and reports the canonical
id.Scenario: Search indexes migrated IDs
Given: migrated task-template and Podcast process docs
When: the search index is built
Then: the index build succeeds, migrated docs include their stable
idkeyword field, and existing search tests still pass.Scenario: Work-engine keeps process-doc identity
Given: a template with
sourceDocIdsand task definitions withinstructionDocIdWhen: templates are created, updated, instantiated into tasks, and exported
Then: the stable process-doc IDs are preserved and no code falls back to Google Docs URLs as the only identity for migrated in-repo docs.
Out of Scope
content/or process knowledge into a newDataTalksClub/dataops-knowledgerepository.content/**beyond the task-template files and Podcast workflow-critical docs named by this issue.Dependencies
content/**, document the missing mapping in the implementation notes and keep the current externalinstructionsUrl; do not invent a document or edit a source repository.Labels
Use labels:
enhancement,docs,process-docs,backend,work-engine,testing,data,P0.Remove
needs groomingafter this body is applied.Verification Commands
Run the relevant focused checks during implementation, and the full relevant workflow before handoff:
cd lambda-functions uv run --extra search python -m lambda_functions.build_search_index \ --docs-dir ../content \ --output ../.tmp/dataops-content-search.indexnpm --prefix work-engine test npm --prefix work-engine run typecheck npm --prefix work-engine run build