You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement the V1 artifact contract that lets workflow/task proof, assistant outputs, files, and external links live in one coherent model without storing binaries or secrets in DynamoDB or Git.
This issue should make artifacts usable by the work-engine and visible enough in the V1 operator flow for later Podcast and assistant slices. It is broader than the existing FileRecord upload metadata and narrower than the full assistant job lifecycle from #30.
The implementation should cover:
A first-class artifact metadata entity for generated or operational outputs such as podcast prep documents, transcripts, reports, event pages, public URLs, draft assistant outputs, and reviewed files.
The boundary between task proof fields, bundle links, file metadata, artifact records, assistant output records, and canonical Git-backed process docs.
Storage policy for s3://, Dropbox/Google Drive/external URLs, Git/GitHub content, and local-dev-only file paths.
Metadata-only DynamoDB storage for artifacts and files: stable IDs, storage URI, provider, checksum/size where available, review status, relationship IDs, and timestamps. DynamoDB must not store binary payloads, large text outputs, signed URLs, secrets, or assistant raw logs.
Work-engine API support to create, update, list, attach, approve/reject/archive, and export artifact metadata through the existing authenticated /work/api/* surface.
Task completion/proof behavior for proofRequirement.type = artifact, requiresFile, requiredLinkName, and assistant-generated outputs.
Portable export and validation updates so artifacts.jsonl becomes an exported entity once artifact records exist, while existing files.jsonl stays migration-safe.
V1 UI implications: the operator can see task/bundle artifacts, link or register an external artifact, understand review status, and see why proof-gated completion is blocked.
Use the current V1 runtime boundary from docs/v1-runtime-architecture.md: the Python portal remains the public entry point, work-engine stays private behind /work/api/*, runtime metadata lives in DynamoDB, Git stores reviewed process knowledge, and private/bulky binaries live in S3 or existing private external systems.
Required Model
Add or document the runtime artifact shape in TypeScript and export form. Runtime fields may use camelCase; portable exports must use snake_case.
Minimum artifact fields:
artifactId / artifact_id
type: documented string such as podcast-doc, transcript, recording, report, invoice, event-page, assistant-output, external-link, or other
title
description optional
status: draft, needs-review, approved, rejected, archived, or superseded
storageProvider: s3, dropbox, google-drive, github, external-url, local-dev, or unknown
storageUri / storage_uri
filename optional
contentType / content_type optional
checksum optional, required when DataOps owns the binary and can compute it
sizeBytes / size_bytes optional
visibility or dataClass: at least distinguish public, internal, private, and sensitive
taskId, bundleId, assistantJobId, and fileId optional relationship IDs
sourceType: manual-link, manual-upload, assistant-output, import, migration, or system
createdBy, reviewedBy, createdAt, updatedAt, reviewedAt where available
tags optional
small structured metadata optional, with redaction rules applied
The existing task and bundle artifactRefs remain lightweight references for fast context, but the artifact table/API is the durable source of artifact metadata. A ref alone is not enough to satisfy an artifact proof requirement unless the referenced artifact record exists and has an accepted proof status.
Storage Policy
Implement or document these boundaries in code comments/docs and enforce the parts that affect runtime behavior:
Git/GitHub: canonical SOPs, workflow definitions, reviewed templates, assistant prompts, and small reviewed process assets only. Do not commit private runtime artifacts, bulky generated files, raw assistant logs, podcast recordings, invoices, receipts, statements, or temporary assistant outputs as durable V1 storage.
DynamoDB: metadata only for files, artifacts, assistant job links, task proof state, bundle links, and audit references. No binaries, large generated documents, secrets, signed URLs, OAuth tokens, cookies, or raw assistant logs.
S3: preferred DataOps-owned storage for private/bulky uploaded binaries and generated artifacts when DataOps owns the file. If S3 upload/storage is introduced in this issue, the bucket must be SAM-managed, versioned, private, and referenced by s3://bucket/key plus checksum/size metadata. If S3 binary upload is not introduced, production binary upload must remain clearly disabled or guarded rather than silently writing to Lambda local disk.
Dropbox/Google Drive: acceptable external private systems for existing podcast/audio/document workflows. Store stable private-system URLs or provider URIs as metadata. Do not treat temporary signed download URLs as durable storageUri values.
Public URLs: Luma, Meetup, YouTube, website pages, Spotify, Apple Podcasts, and similar deliverables may be stored as task link, bundle link, and/or artifact metadata depending on whether they are proof for one task or reusable workflow output.
Local filesystem paths: allowed only for local development/test fixtures. Production Lambda local filesystem must not be the durable artifact store.
Acceptance Criteria
Work-engine has a typed artifact model and persistence path for artifact metadata with stable application IDs and explicit relationships to task, bundle, assistant job, and file records where present.
Production table/resource handling is explicit: either a SAM-managed DynamoDB artifacts table plus DATAOPS_ARTIFACTS_TABLE is added, or the implementation documents why artifacts are intentionally stored in an existing table without weakening export/migration safety.
Artifact API routes exist under /api/artifacts and work through the existing /work/api/* broker. They support create/register, list/filter by task/bundle/assistant job/status/type, get by ID, update metadata/status, attach to task/bundle, and archive. They do not expose a second public endpoint.
Artifact records can represent public links, private external-system links, S3-owned objects, assistant outputs, and local-dev-only files without changing the schema.
Artifact status/review semantics are enforced: draft and needs-review outputs are visible but do not satisfy required artifact proof; approved artifacts can satisfy proof; rejected, archived, and superseded artifacts do not satisfy proof.
Task completion blocks when proofRequirement.type = artifact and no approved artifact is attached to the task or its bundle according to the documented lookup rule.
Existing requiredLinkName, requiresFile, and proofRequirement.type = file/url/comment/external-status behavior continues to work and has regression coverage.
Existing file metadata is migration-safe: exports use storage_uri, provider/checksum/size fields when present, and remain backward compatible with legacy storagePath in local data.
Production file/artifact storage does not silently use Lambda local disk. Local filesystem storage is either guarded to local/test mode or replaced by a storage adapter that clearly separates local-dev from production storage.
Portable export writes artifacts.jsonl when artifacts are implemented, removes artifacts from omitted_entities, includes checksums/counts in manifest.json, and excludes binaries and secrets.
Export validation checks artifact required fields, duplicate IDs, parseable timestamps, enum/status values, redaction compliance, task/bundle/file relationships, and assistant-job relationships when assistant jobs are exported. Until Implement assistant job model and lifecycle #30 implements assistant jobs, assistant_job_id may be nullable or treated as an opaque optional field with a documented validator rule.
Docs are updated where needed: docs/v1-workflow-data-model.md, docs/v1-execution-state-schema.md, docs/v1-execution-data-safety.md, docs/v1-runtime-architecture.md, and docs/local-development.md if commands or local storage behavior change.
The V1 frontend shows task/bundle artifacts in the workflow context, distinguishes link/file/artifact proof, shows review status, and provides an operator path to register or attach an external artifact URL without uploading a binary.
Assistant outputs from assistants/podcast/ are represented as artifact metadata when attached to workflow/task context; the local assistants/podcast/documents/, inbox/, and heru_runs/ folders remain local runtime/dev storage, not durable production artifact storage.
Automated tests cover model validation, API routes, proof blocking, export/validate behavior, legacy file compatibility, and frontend artifact visibility/blocked-completion states.
Test Scenarios
Scenario: Register a public link artifact as workflow proof
Given: a workflow bundle has a task that requires an approved artifact proof
When: the operator registers a public URL artifact, attaches it to the task or bundle, and marks it approved
Then: the artifact appears in the task/bundle context and the task can be completed.
Scenario: Draft assistant output is visible but not accepted proof
Given: Podcast Assistant produced a draft document artifact with status needs-review
When: the operator views the related workflow task
Then: the output is visible with review status but cannot satisfy required artifact proof.
When: the operator approves the artifact
Then: it can satisfy the proof requirement.
Scenario: Private or bulky artifact is metadata-only
Given: a podcast recording, transcript, invoice, or report is stored in S3, Dropbox, or Google Drive
When: an artifact record is created
Then: DynamoDB stores provider, storageUri, relationship IDs, size/checksum when available, and review metadata, but not the binary payload or signed temporary URL.
Scenario: Required file and required artifact stay distinct
Given: one task has requiresFile = true and another has proofRequirement.type = artifact
When: the operator attaches a file metadata record only
Then: the file-gated task can complete but the artifact-gated task remains blocked until an approved artifact record exists.
Scenario: Export preserves artifact relationships
Given: exported data includes users, tasks, bundles, files, artifacts, and notifications
When: export:data and validate:export run
Then: artifacts.jsonl is present, manifest counts/checksums match, every artifact relationship points to an exported entity or follows the documented #30 compatibility rule, and no binary content or secret appears in the export.
Scenario: Legacy local file metadata remains portable
Given: existing local/test file records use storagePath
When: the portable export runs
Then: files.jsonl emits a migration-safe storage_uri value and validation still passes.
Scenario: Production local filesystem storage is guarded
Given: the work-engine runs in a production-like environment
When: an upload or artifact operation would write to Lambda local filesystem as durable storage
Then: the operation is rejected with a clear error or routed through the configured production storage adapter.
docs/v1-runtime-architecture.md, docs/v1-workflow-data-model.md, docs/v1-execution-state-schema.md, and docs/v1-execution-data-safety.md are the baseline architecture/data-safety contracts.
Verification Commands
Run the relevant full workflow for the changed surface. Expected minimum for this issue:
npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
npm --prefix work-engine run test:e2e
uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app
cd lambda-functions
sam validate --template-file template.full.yaml
If docs/content metadata or search-facing docs are changed, also run:
cd lambda-functions
uv run --extra search python -m lambda_functions.build_search_index \
--docs-dir ../content \
--output ../.tmp/dataops-content-search.index
If assistants/podcast/** behavior is changed, also run:
Implement V1 artifact model and storage policy
Status: pending
Tags:
enhancement,portal,work-engine,assistant,frontend,backend,infra,data,testing,P0Depends on: None
Blocks: #9, #30, future artifact search/export/migration work
Scope
Implement the V1 artifact contract that lets workflow/task proof, assistant outputs, files, and external links live in one coherent model without storing binaries or secrets in DynamoDB or Git.
This issue should make artifacts usable by the work-engine and visible enough in the V1 operator flow for later Podcast and assistant slices. It is broader than the existing
FileRecordupload metadata and narrower than the full assistant job lifecycle from #30.The implementation should cover:
s3://, Dropbox/Google Drive/external URLs, Git/GitHub content, and local-dev-only file paths./work/api/*surface.proofRequirement.type = artifact,requiresFile,requiredLinkName, and assistant-generated outputs.artifacts.jsonlbecomes an exported entity once artifact records exist, while existingfiles.jsonlstays migration-safe.Use the current V1 runtime boundary from
docs/v1-runtime-architecture.md: the Python portal remains the public entry point, work-engine stays private behind/work/api/*, runtime metadata lives in DynamoDB, Git stores reviewed process knowledge, and private/bulky binaries live in S3 or existing private external systems.Required Model
Add or document the runtime artifact shape in TypeScript and export form. Runtime fields may use camelCase; portable exports must use snake_case.
Minimum artifact fields:
artifactId/artifact_idtype: documented string such aspodcast-doc,transcript,recording,report,invoice,event-page,assistant-output,external-link, orothertitledescriptionoptionalstatus:draft,needs-review,approved,rejected,archived, orsupersededstorageProvider:s3,dropbox,google-drive,github,external-url,local-dev, orunknownstorageUri/storage_urifilenameoptionalcontentType/content_typeoptionalchecksumoptional, required when DataOps owns the binary and can compute itsizeBytes/size_bytesoptionalvisibilityordataClass: at least distinguishpublic,internal,private, andsensitivetaskId,bundleId,assistantJobId, andfileIdoptional relationship IDssourceType:manual-link,manual-upload,assistant-output,import,migration, orsystemcreatedBy,reviewedBy,createdAt,updatedAt,reviewedAtwhere availabletagsoptionalmetadataoptional, with redaction rules appliedThe existing task and bundle
artifactRefsremain lightweight references for fast context, but the artifact table/API is the durable source of artifact metadata. A ref alone is not enough to satisfy an artifact proof requirement unless the referenced artifact record exists and has an accepted proof status.Storage Policy
Implement or document these boundaries in code comments/docs and enforce the parts that affect runtime behavior:
s3://bucket/keyplus checksum/size metadata. If S3 binary upload is not introduced, production binary upload must remain clearly disabled or guarded rather than silently writing to Lambda local disk.storageUrivalues.link, bundle link, and/or artifact metadata depending on whether they are proof for one task or reusable workflow output.Acceptance Criteria
DATAOPS_ARTIFACTS_TABLEis added, or the implementation documents why artifacts are intentionally stored in an existing table without weakening export/migration safety./api/artifactsand work through the existing/work/api/*broker. They support create/register, list/filter by task/bundle/assistant job/status/type, get by ID, update metadata/status, attach to task/bundle, and archive. They do not expose a second public endpoint.draftandneeds-reviewoutputs are visible but do not satisfy required artifact proof;approvedartifacts can satisfy proof;rejected,archived, andsupersededartifacts do not satisfy proof.proofRequirement.type = artifactand no approved artifact is attached to the task or its bundle according to the documented lookup rule.requiredLinkName,requiresFile, andproofRequirement.type = file/url/comment/external-statusbehavior continues to work and has regression coverage.storage_uri, provider/checksum/size fields when present, and remain backward compatible with legacystoragePathin local data.artifacts.jsonlwhen artifacts are implemented, removesartifactsfromomitted_entities, includes checksums/counts inmanifest.json, and excludes binaries and secrets.assistant_job_idmay be nullable or treated as an opaque optional field with a documented validator rule.docs/v1-workflow-data-model.md,docs/v1-execution-state-schema.md,docs/v1-execution-data-safety.md,docs/v1-runtime-architecture.md, anddocs/local-development.mdif commands or local storage behavior change.assistants/podcast/are represented as artifact metadata when attached to workflow/task context; the localassistants/podcast/documents/,inbox/, andheru_runs/folders remain local runtime/dev storage, not durable production artifact storage.Test Scenarios
Scenario: Register a public link artifact as workflow proof
Given: a workflow bundle has a task that requires an approved artifact proof
When: the operator registers a public URL artifact, attaches it to the task or bundle, and marks it
approvedThen: the artifact appears in the task/bundle context and the task can be completed.
Scenario: Draft assistant output is visible but not accepted proof
Given: Podcast Assistant produced a draft document artifact with status
needs-reviewWhen: the operator views the related workflow task
Then: the output is visible with review status but cannot satisfy required artifact proof.
When: the operator approves the artifact
Then: it can satisfy the proof requirement.
Scenario: Private or bulky artifact is metadata-only
Given: a podcast recording, transcript, invoice, or report is stored in S3, Dropbox, or Google Drive
When: an artifact record is created
Then: DynamoDB stores provider,
storageUri, relationship IDs, size/checksum when available, and review metadata, but not the binary payload or signed temporary URL.Scenario: Required file and required artifact stay distinct
Given: one task has
requiresFile = trueand another hasproofRequirement.type = artifactWhen: the operator attaches a file metadata record only
Then: the file-gated task can complete but the artifact-gated task remains blocked until an approved artifact record exists.
Scenario: Export preserves artifact relationships
Given: exported data includes users, tasks, bundles, files, artifacts, and notifications
When:
export:dataandvalidate:exportrunThen:
artifacts.jsonlis present, manifest counts/checksums match, every artifact relationship points to an exported entity or follows the documented #30 compatibility rule, and no binary content or secret appears in the export.Scenario: Legacy local file metadata remains portable
Given: existing local/test file records use
storagePathWhen: the portable export runs
Then:
files.jsonlemits a migration-safestorage_urivalue and validation still passes.Scenario: Production local filesystem storage is guarded
Given: the work-engine runs in a production-like environment
When: an upload or artifact operation would write to Lambda local filesystem as durable storage
Then: the operation is rejected with a clear error or routed through the configured production storage adapter.
Out of Scope
Dependencies
No issue must close before this starts, but implementation must coordinate with:
output_artifact_idsand reuse this artifact model when that issue is groomed/implemented.docs/v1-runtime-architecture.md,docs/v1-workflow-data-model.md,docs/v1-execution-state-schema.md, anddocs/v1-execution-data-safety.mdare the baseline architecture/data-safety contracts.Verification Commands
Run the relevant full workflow for the changed surface. Expected minimum for this issue:
If docs/content metadata or search-facing docs are changed, also run:
cd lambda-functions uv run --extra search python -m lambda_functions.build_search_index \ --docs-dir ../content \ --output ../.tmp/dataops-content-search.indexIf
assistants/podcast/**behavior is changed, also run: