You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement the DataOps V1 raw intake inbox as the shared entry point for operational inputs before they become tasks, workflow bundle context, assistant jobs, files, or artifacts.
DataTalksClub work arrives through Telegram, email, manual notes, links, files, forwarded messages, and source-system imports. Today the work-engine Telegram/email webhooks create tasks directly, while the Podcast Assistant stages Telegram material in local assistants/podcast/inbox/. V1 needs one durable inbox model that preserves the raw operator context, supports triage, and cleanly hands selected inputs to workflow tasks, bundles, assistant jobs, file records, and artifact records.
Implement in the DataOps repo only. Do not modify source repos such as ../podcast-assistant, ../dtc-operations, or ../datatasks.
Product Behavior
The operator should be able to open an Inbox surface in the portal and see untriaged operational inputs from supported sources. Each intake item should show source, sender/from metadata when available, received time, short summary/title, attached links/files metadata, related task or bundle if already linked, assistant readiness, safety/data classification, and current triage status.
The V1 inbox must support these intake sources:
telegram: Telegram messages, notes, voice/file/image metadata, and links received through the existing work-engine webhook or later assistant bridge.
email: forwarded or webhook-created emails with from/subject/body excerpt/links/attachment metadata.
manual: operator-created notes, pasted links, and checklist-style raw requests entered from the portal.
file: uploaded or externally referenced file metadata that arrives before a task/bundle is known.
link: standalone URLs that should be triaged into a task, bundle link, assistant input, or artifact.
import: source-system migration/import records from Trello, spreadsheet, or future scripted imports.
The V1 triage flow should let the operator:
mark an item as new, triaged, attached, converted, ignored, duplicate, blocked, or archived.
assign owner/assignee and priority/tags for triage.
attach one or more intake items to an existing task and/or bundle.
convert an intake item into a new ad-hoc task without losing the intake relationship.
record duplicate relationships and resolution notes.
keep an append-only audit event for triage actions and conversions when audit support exists, or a migration-safe event history field if audit events are not yet implemented.
Data Model
Add a typed IntakeItem model in work-engine and the portable export contract. Runtime fields may use camelCase; portable export fields must use snake_case.
Minimum fields:
id / export intake_item_id
source: telegram, email, manual, file, link, import, assistant, or unknown
sourceMessageId / source_message_id optional stable upstream ID
sourceThreadId / source_thread_id optional
sourceReceivedAt / source_received_at
createdAt, updatedAt, triagedAt, archivedAt where applicable
createdBy, triagedBy, ownerId, assigneeId where available
status: new, triaged, attached, converted, ignored, duplicate, blocked, or archived
title or short subject
summary: bounded operator-facing excerpt, no large raw body dump
bodyRef optional reference to raw body/log storage when needed; do not put unbounded raw content in DynamoDB
sourceActor: bounded object for sender/from/chat metadata with redaction rules
receivedChannels: array or source-specific metadata when a message has both text and attachments
linkRefs: array of { url, title?, normalizedUrl?, type?, safetyStatus? }
assistantReadiness: optional { assistantType, status, inputRefs, missingFields }, where status is not-applicable, candidate, ready, submitted, or blocked
Tasks and bundles may keep lightweight intakeRefs, but the inbox table/API is the durable source of intake metadata.
Data Safety And Boundaries
Do not store Telegram bot tokens, email webhook secrets, OAuth tokens, cookies, signed temporary URLs, raw credentials, or full unbounded message/email bodies in DynamoDB.
Store only bounded summaries/excerpts in the intake item. Store large raw payloads, file binaries, audio, voice notes, images, raw assistant logs, and generated documents through the storage/artifact boundaries from Define artifact model and storage policy #29.
Production tables/resources must be SAM/CloudFormation-owned. Production code must not create unmanaged DynamoDB tables on cold start.
Add a production DATAOPS_INTAKE_TABLE or document why intake safely uses an existing table without weakening export/migration safety.
Local/test mode may auto-create local tables through existing work-engine local setup.
Existing assistants/podcast/inbox/, documents/, and heru_runs/ remain local runtime/dev storage, not durable V1 inbox or artifact storage.
Existing source repos are read-only for this issue.
API
Add authenticated work-engine routes under /api/intake and ensure they work through the existing /work/api/* portal broker. Do not expose a second public endpoint.
Required capabilities:
Create manual intake item with note text, links, tags, data class, and optional task/bundle context.
Create or normalize intake items from Telegram/email webhook payloads without directly creating tasks by default.
List/filter intake items by status, source, owner/assignee, priority, tag, related task, related bundle, assistant readiness, date range, and duplicate state.
Fetch one intake item with bounded detail and relationship refs.
Update triage metadata, owner/assignee, tags, priority, data class, and summary.
Attach/detach intake items to existing tasks and bundles.
Convert an intake item into a task and preserve intakeRefs/relationship history.
Return consistent 400/401/404 responses matching existing work-engine API style.
The existing /api/telegram and /api/email behavior should be changed or wrapped so incoming messages create inbox items first. Creating a task directly from Telegram/email may remain as an explicit compatibility mode only if documented and covered by tests.
Portal UI
Add an operator Inbox placement in the V1 workspace, visible alongside the workflow-first surfaces. It should support:
A dashboard/inbox queue showing new, blocked, duplicate, and assistant-ready items.
A compact count or section on the operator dashboard for untriaged inbox work and blocked intake.
A detail view/panel for one intake item with source metadata, safe excerpt, links/files, relationship refs, and triage actions.
Manual intake creation from the portal.
Attach-to-task, attach-to-bundle, convert-to-task, mark-duplicate, ignore, archive, and blocked-state actions.
Clear distinction between raw intake, task proof, file metadata, artifact metadata, and assistant outputs so operators do not treat unreviewed intake as completed work.
Exports And Migration Safety
Portable export must include intake data once implemented:
Add intake_items.jsonl to normal exports.
Update manifest.json entity files, counts, checksums, redactions, and omitted_entities accurately.
Export relationship IDs for tasks, bundles, files/artifacts, assistant jobs, duplicate links, and users where available.
Dry-run import must report intake inserts/updates and invalid relationships without writing production data.
Normal exports must not include secrets, session tokens, raw credentials, signed URLs, or unbounded raw message/email/file content.
Acceptance Criteria
work-engine defines typed intake source/status/priority/data-class contracts and a durable IntakeItem persistence path with stable application IDs.
Production DynamoDB/SAM ownership is explicit for intake storage, including table env vars and least-privilege IAM if a new table is added.
Authenticated /api/intake routes support create, list/filter, detail, update, attach/detach, convert-to-task, duplicate/ignore/block/archive, link/file/artifact reference registration, and assistant input preparation.
Existing Telegram and email webhook paths create inbox items first, or retain direct task creation only behind a documented compatibility flag with tests.
Manual portal intake can create note/link intake items without external credentials.
Operators can see untriaged and blocked inbox items from the dashboard/workspace and can triage an item through the expected actions.
Intake items can attach to existing tasks and bundles without copying raw unbounded bodies into task comments or bundle descriptions.
Converting intake to an ad-hoc task preserves the intake relationship and marks the intake item converted or attached according to the implemented contract.
Assistant-ready input refs can be built from selected intake items; if Implement assistant job model and lifecycle #30 is implemented in the branch, a draft/queued assistant job can be created from those refs.
File/link/artifact relationships follow Define artifact model and storage policy #29: inbox stores metadata/refs only, never binaries or temporary signed URLs as durable content.
Data safety checks reject or redact detectable secrets in summaries, metadata, and exported records where practical.
Portable export writes and validates intake_items.jsonl; manifest counts/checksums and omitted entities are accurate; dry-run import covers intake records.
Existing task, bundle, file, notification, recurring, auth, export, Telegram, and email tests still pass.
UI tests and screenshots cover the Inbox queue, intake detail/triage actions, manual intake creation, and dashboard placement.
[HUMAN] Real Telegram delivery, email provider delivery, or production external webhook setup is manually verified only if this issue connects live external accounts or secrets.
Test Scenarios
Scenario: Manual note becomes an inbox item
Given: an authenticated operator is in the portal
When: they create a manual intake note with a pasted URL and private data class
Then: the item appears in the Inbox as new, includes bounded summary/link metadata, and stores no binary or secret payload.
Scenario: Telegram webhook creates raw intake instead of direct task
Given: the Telegram webhook receives a text message with a task-like note and source message metadata
When: the webhook is processed
Then: an inbox item with source=telegram is created and no task is created unless explicit compatibility mode is enabled.
Scenario: Email webhook captures forwarded work
Given: the email webhook receives from/subject/body/link metadata
When: the webhook is processed
Then: an inbox item with bounded summary and redacted source metadata is created, and the full raw body is not stored unbounded in DynamoDB.
Scenario: Intake attaches to existing workflow context
Given: an active bundle and task exist
When: the operator attaches a selected inbox item to them
Then: the intake item records the task/bundle relationship, the task/bundle can show an intake reference, and the intake item status becomes attached or remains triaged according to the documented rule.
Scenario: Intake converts to ad-hoc task
Given: a new inbox item that represents standalone work
When: the operator converts it to a task with due date, assignee, and tags
Then: a task is created with source=intake or equivalent documented source, the intake relationship is preserved, and the intake item is marked converted.
Given: two inbox items represent the same email or Telegram request
When: the operator marks one as duplicate of the other with a reason
Then: both records keep the duplicate relationship, the duplicate item no longer appears in the default untriaged queue, and export validation preserves the relationship.
Scenario: Assistant-ready handoff
Given: multiple podcast-related inbox items contain notes, links, and file/artifact refs
When: the operator marks them as ready for podcast assistant input
Then: the inbox item(s) expose #30-compatible input refs and, when #30 APIs are available, can create a draft/queued assistant job without copying raw payload bodies into the job.
Scenario: Export and dry-run import include intake
Given: local test data includes manual, Telegram, email, attached, converted, duplicate, and assistant-ready intake items
When: export, validate, and dry-run import run
Then: intake_items.jsonl is present, manifest counts/checksums match, relationships validate, and secrets/raw unbounded content are absent.
Scenario: UI inbox is not disconnected from workflow work
Given: the operator has untriaged intake, a task, and a bundle
When: they use the Inbox queue and detail UI
Then: they can triage into task/bundle context from the same workspace without opening a separate app or losing dashboard visibility.
Full-text search across inbox, artifacts, and assistant outputs.
Sophisticated duplicate detection beyond exact/upstream-ID/manual duplicate marking unless it falls out cheaply from normalized source IDs.
Production integration with real Telegram, email providers, OAuth, Groq, Heru, Codex, Claude, Dropbox, Google Drive, or S3 credentials beyond safe metadata boundaries.
Migrating all historical Trello cards, spreadsheet rows, emails, or podcast assistant local inbox files.
Storing raw binary attachments, raw audio/image/video payloads, or generated documents in DynamoDB or Git.
Modifying ../podcast-assistant, ../dtc-operations, ../datatasks, or any other source repo.
Replacing the current V1 frontend framework or public/private Lambda architecture.
Dependencies
Define artifact model and storage policy #29 defines the artifact/file storage policy used by intake file refs, artifact refs, storage URIs, redaction expectations, and export relationship validation.
Implement assistant job model and lifecycle #30 defines assistant job input refs, job IDs, lifecycle, and output/log relationships. This issue should hand off assistant-ready input, not duplicate assistant execution.
docs/v1-runtime-architecture.md, docs/v1-execution-state-schema.md, docs/v1-execution-data-safety.md, and docs/v1-workflow-data-model.md are the baseline architecture/data-safety contracts.
Production DynamoDB tables and Lambda permissions must remain SAM/CloudFormation-owned.
Existing assistants/podcast local storage remains a reference for source behavior, not the durable production inbox.
Verification Commands
Run from repo root unless noted otherwise:
npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
npm --prefix work-engine run test:e2e
npm --prefix work-engine run export:data -- .tmp/exports/intake-inbox
npm --prefix work-engine run validate:export -- .tmp/exports/intake-inbox
npm --prefix work-engine run dry-run:import -- .tmp/exports/intake-inbox
If lambda-functions/template.full.yaml or deployment workflow files change, also run:
sam validate --template-file lambda-functions/template.full.yaml
If portal/docs-app routing or Python broker behavior changes, also run:
If assistants/podcast/** behavior changes, also run:
uv run --project assistants/podcast pytest
Tester must capture screenshots for the Inbox queue, intake detail/triage panel, manual intake creation, dashboard placement, and any changed task/bundle attachment views.
Implement raw intake inbox for operational inputs
Status: pending
Tags:
enhancement,portal,work-engine,assistant,backend,frontend,data,infra,testing,P1Depends on: #29, #30
Blocks: #9, future Telegram/email/manual intake integrations, future intake search/deduplication work
Scope
Implement the DataOps V1 raw intake inbox as the shared entry point for operational inputs before they become tasks, workflow bundle context, assistant jobs, files, or artifacts.
DataTalksClub work arrives through Telegram, email, manual notes, links, files, forwarded messages, and source-system imports. Today the work-engine Telegram/email webhooks create tasks directly, while the Podcast Assistant stages Telegram material in local
assistants/podcast/inbox/. V1 needs one durable inbox model that preserves the raw operator context, supports triage, and cleanly hands selected inputs to workflow tasks, bundles, assistant jobs, file records, and artifact records.Implement in the DataOps repo only. Do not modify source repos such as
../podcast-assistant,../dtc-operations, or../datatasks.Product Behavior
The operator should be able to open an Inbox surface in the portal and see untriaged operational inputs from supported sources. Each intake item should show source, sender/from metadata when available, received time, short summary/title, attached links/files metadata, related task or bundle if already linked, assistant readiness, safety/data classification, and current triage status.
The V1 inbox must support these intake sources:
telegram: Telegram messages, notes, voice/file/image metadata, and links received through the existing work-engine webhook or later assistant bridge.email: forwarded or webhook-created emails with from/subject/body excerpt/links/attachment metadata.manual: operator-created notes, pasted links, and checklist-style raw requests entered from the portal.file: uploaded or externally referenced file metadata that arrives before a task/bundle is known.link: standalone URLs that should be triaged into a task, bundle link, assistant input, or artifact.import: source-system migration/import records from Trello, spreadsheet, or future scripted imports.The V1 triage flow should let the operator:
new,triaged,attached,converted,ignored,duplicate,blocked, orarchived.Data Model
Add a typed
IntakeItemmodel inwork-engineand the portable export contract. Runtime fields may use camelCase; portable export fields must use snake_case.Minimum fields:
id/ exportintake_item_idsource:telegram,email,manual,file,link,import,assistant, orunknownsourceMessageId/source_message_idoptional stable upstream IDsourceThreadId/source_thread_idoptionalsourceReceivedAt/source_received_atcreatedAt,updatedAt,triagedAt,archivedAtwhere applicablecreatedBy,triagedBy,ownerId,assigneeIdwhere availablestatus:new,triaged,attached,converted,ignored,duplicate,blocked, orarchivedtitleor short subjectsummary: bounded operator-facing excerpt, no large raw body dumpbodyRefoptional reference to raw body/log storage when needed; do not put unbounded raw content in DynamoDBsourceActor: bounded object for sender/from/chat metadata with redaction rulesreceivedChannels: array or source-specific metadata when a message has both text and attachmentslinkRefs: array of{ url, title?, normalizedUrl?, type?, safetyStatus? }fileRefs: array of file metadata refs compatible with the file/artifact policy from Define artifact model and storage policy #29artifactRefs: optional artifact IDs/refs once Define artifact model and storage policy #29 existstaskIds,bundleIds,assistantJobIds: relationship IDsassistantReadiness: optional{ assistantType, status, inputRefs, missingFields }, where status isnot-applicable,candidate,ready,submitted, orblockedduplicateOfIntakeItemIdand/orrelatedIntakeItemIdstagspriority:low,normal,high, orurgentdataClass:public,internal,private, orsensitivemetadata: small bounded JSON only, redacted and size-limitedRelationship rules:
inputRefsand later link toassistantJobIds, but the assistant lifecycle, retries, approvals, and output artifacts remain owned by Implement assistant job model and lifecycle #30.intakeRefs, but the inbox table/API is the durable source of intake metadata.Data Safety And Boundaries
DATAOPS_INTAKE_TABLEor document why intake safely uses an existing table without weakening export/migration safety.assistants/podcast/inbox/,documents/, andheru_runs/remain local runtime/dev storage, not durable V1 inbox or artifact storage.API
Add authenticated work-engine routes under
/api/intakeand ensure they work through the existing/work/api/*portal broker. Do not expose a second public endpoint.Required capabilities:
intakeRefs/relationship history.The existing
/api/telegramand/api/emailbehavior should be changed or wrapped so incoming messages create inbox items first. Creating a task directly from Telegram/email may remain as an explicit compatibility mode only if documented and covered by tests.Portal UI
Add an operator Inbox placement in the V1 workspace, visible alongside the workflow-first surfaces. It should support:
Exports And Migration Safety
Portable export must include intake data once implemented:
intake_items.jsonlto normal exports.manifest.jsonentity files, counts, checksums, redactions, andomitted_entitiesaccurately.Acceptance Criteria
work-enginedefines typed intake source/status/priority/data-class contracts and a durableIntakeItempersistence path with stable application IDs./api/intakeroutes support create, list/filter, detail, update, attach/detach, convert-to-task, duplicate/ignore/block/archive, link/file/artifact reference registration, and assistant input preparation.convertedorattachedaccording to the implemented contract.intake_items.jsonl; manifest counts/checksums and omitted entities are accurate; dry-run import covers intake records.Test Scenarios
Scenario: Manual note becomes an inbox item
Given: an authenticated operator is in the portal
When: they create a manual intake note with a pasted URL and
privatedata classThen: the item appears in the Inbox as
new, includes bounded summary/link metadata, and stores no binary or secret payload.Scenario: Telegram webhook creates raw intake instead of direct task
Given: the Telegram webhook receives a text message with a task-like note and source message metadata
When: the webhook is processed
Then: an inbox item with
source=telegramis created and no task is created unless explicit compatibility mode is enabled.Scenario: Email webhook captures forwarded work
Given: the email webhook receives from/subject/body/link metadata
When: the webhook is processed
Then: an inbox item with bounded summary and redacted source metadata is created, and the full raw body is not stored unbounded in DynamoDB.
Scenario: Intake attaches to existing workflow context
Given: an active bundle and task exist
When: the operator attaches a selected inbox item to them
Then: the intake item records the task/bundle relationship, the task/bundle can show an intake reference, and the intake item status becomes
attachedor remainstriagedaccording to the documented rule.Scenario: Intake converts to ad-hoc task
Given: a
newinbox item that represents standalone workWhen: the operator converts it to a task with due date, assignee, and tags
Then: a task is created with
source=intakeor equivalent documented source, the intake relationship is preserved, and the intake item is markedconverted.Scenario: Duplicate handling preserves audit context
Given: two inbox items represent the same email or Telegram request
When: the operator marks one as duplicate of the other with a reason
Then: both records keep the duplicate relationship, the duplicate item no longer appears in the default untriaged queue, and export validation preserves the relationship.
Scenario: Assistant-ready handoff
Given: multiple podcast-related inbox items contain notes, links, and file/artifact refs
When: the operator marks them as ready for
podcastassistant inputThen: the inbox item(s) expose #30-compatible input refs and, when #30 APIs are available, can create a draft/queued assistant job without copying raw payload bodies into the job.
Scenario: Export and dry-run import include intake
Given: local test data includes manual, Telegram, email, attached, converted, duplicate, and assistant-ready intake items
When: export, validate, and dry-run import run
Then:
intake_items.jsonlis present, manifest counts/checksums match, relationships validate, and secrets/raw unbounded content are absent.Scenario: UI inbox is not disconnected from workflow work
Given: the operator has untriaged intake, a task, and a bundle
When: they use the Inbox queue and detail UI
Then: they can triage into task/bundle context from the same workspace without opening a separate app or losing dashboard visibility.
Out of Scope
../podcast-assistant,../dtc-operations,../datatasks, or any other source repo.Dependencies
docs/v1-runtime-architecture.md,docs/v1-execution-state-schema.md,docs/v1-execution-data-safety.md, anddocs/v1-workflow-data-model.mdare the baseline architecture/data-safety contracts.assistants/podcastlocal storage remains a reference for source behavior, not the durable production inbox.Verification Commands
Run from repo root unless noted otherwise:
npm --prefix work-engine test npm --prefix work-engine run typecheck npm --prefix work-engine run build npm --prefix work-engine run test:e2e npm --prefix work-engine run export:data -- .tmp/exports/intake-inbox npm --prefix work-engine run validate:export -- .tmp/exports/intake-inbox npm --prefix work-engine run dry-run:import -- .tmp/exports/intake-inboxIf
lambda-functions/template.full.yamlor deployment workflow files change, also run:If portal/docs-app routing or Python broker behavior changes, also run:
If
assistants/podcast/**behavior changes, also run:Tester must capture screenshots for the Inbox queue, intake detail/triage panel, manual intake creation, dashboard placement, and any changed task/bundle attachment views.