Skip to content

Implement raw intake inbox for operational inputs #31

Description

@alexeygrigorev

Implement raw intake inbox for operational inputs

Status: pending
Tags: enhancement, portal, work-engine, assistant, backend, frontend, data, infra, testing, P1
Depends on: #29, #30
Blocks: #9, future Telegram/email/manual intake integrations, future intake search/deduplication work

Scope

Implement the DataOps V1 raw intake inbox as the shared entry point for operational inputs before they become tasks, workflow bundle context, assistant jobs, files, or artifacts.

DataTalksClub work arrives through Telegram, email, manual notes, links, files, forwarded messages, and source-system imports. Today the work-engine Telegram/email webhooks create tasks directly, while the Podcast Assistant stages Telegram material in local assistants/podcast/inbox/. V1 needs one durable inbox model that preserves the raw operator context, supports triage, and cleanly hands selected inputs to workflow tasks, bundles, assistant jobs, file records, and artifact records.

Implement in the DataOps repo only. Do not modify source repos such as ../podcast-assistant, ../dtc-operations, or ../datatasks.

Product Behavior

The operator should be able to open an Inbox surface in the portal and see untriaged operational inputs from supported sources. Each intake item should show source, sender/from metadata when available, received time, short summary/title, attached links/files metadata, related task or bundle if already linked, assistant readiness, safety/data classification, and current triage status.

The V1 inbox must support these intake sources:

  • telegram: Telegram messages, notes, voice/file/image metadata, and links received through the existing work-engine webhook or later assistant bridge.
  • email: forwarded or webhook-created emails with from/subject/body excerpt/links/attachment metadata.
  • manual: operator-created notes, pasted links, and checklist-style raw requests entered from the portal.
  • file: uploaded or externally referenced file metadata that arrives before a task/bundle is known.
  • link: standalone URLs that should be triaged into a task, bundle link, assistant input, or artifact.
  • import: source-system migration/import records from Trello, spreadsheet, or future scripted imports.

The V1 triage flow should let the operator:

  • mark an item as new, triaged, attached, converted, ignored, duplicate, blocked, or archived.
  • assign owner/assignee and priority/tags for triage.
  • attach one or more intake items to an existing task and/or bundle.
  • convert an intake item into a new ad-hoc task without losing the intake relationship.
  • attach link/file metadata to the task or bundle using the storage/artifact policy from Define artifact model and storage policy #29.
  • prepare assistant-ready input references for Implement assistant job model and lifecycle #30 assistant jobs without running the assistant directly from this issue unless Implement assistant job model and lifecycle #30 APIs already exist.
  • record duplicate relationships and resolution notes.
  • keep an append-only audit event for triage actions and conversions when audit support exists, or a migration-safe event history field if audit events are not yet implemented.

Data Model

Add a typed IntakeItem model in work-engine and the portable export contract. Runtime fields may use camelCase; portable export fields must use snake_case.

Minimum fields:

  • id / export intake_item_id
  • source: telegram, email, manual, file, link, import, assistant, or unknown
  • sourceMessageId / source_message_id optional stable upstream ID
  • sourceThreadId / source_thread_id optional
  • sourceReceivedAt / source_received_at
  • createdAt, updatedAt, triagedAt, archivedAt where applicable
  • createdBy, triagedBy, ownerId, assigneeId where available
  • status: new, triaged, attached, converted, ignored, duplicate, blocked, or archived
  • title or short subject
  • summary: bounded operator-facing excerpt, no large raw body dump
  • bodyRef optional reference to raw body/log storage when needed; do not put unbounded raw content in DynamoDB
  • sourceActor: bounded object for sender/from/chat metadata with redaction rules
  • receivedChannels: array or source-specific metadata when a message has both text and attachments
  • linkRefs: array of { url, title?, normalizedUrl?, type?, safetyStatus? }
  • fileRefs: array of file metadata refs compatible with the file/artifact policy from Define artifact model and storage policy #29
  • artifactRefs: optional artifact IDs/refs once Define artifact model and storage policy #29 exists
  • taskIds, bundleIds, assistantJobIds: relationship IDs
  • assistantReadiness: optional { assistantType, status, inputRefs, missingFields }, where status is not-applicable, candidate, ready, submitted, or blocked
  • duplicateOfIntakeItemId and/or relatedIntakeItemIds
  • tags
  • priority: low, normal, high, or urgent
  • dataClass: public, internal, private, or sensitive
  • metadata: small bounded JSON only, redacted and size-limited

Relationship rules:

Data Safety And Boundaries

  • Do not store Telegram bot tokens, email webhook secrets, OAuth tokens, cookies, signed temporary URLs, raw credentials, or full unbounded message/email bodies in DynamoDB.
  • Store only bounded summaries/excerpts in the intake item. Store large raw payloads, file binaries, audio, voice notes, images, raw assistant logs, and generated documents through the storage/artifact boundaries from Define artifact model and storage policy #29.
  • Production tables/resources must be SAM/CloudFormation-owned. Production code must not create unmanaged DynamoDB tables on cold start.
  • Add a production DATAOPS_INTAKE_TABLE or document why intake safely uses an existing table without weakening export/migration safety.
  • Local/test mode may auto-create local tables through existing work-engine local setup.
  • Existing assistants/podcast/inbox/, documents/, and heru_runs/ remain local runtime/dev storage, not durable V1 inbox or artifact storage.
  • Existing source repos are read-only for this issue.

API

Add authenticated work-engine routes under /api/intake and ensure they work through the existing /work/api/* portal broker. Do not expose a second public endpoint.

Required capabilities:

  • Create manual intake item with note text, links, tags, data class, and optional task/bundle context.
  • Create or normalize intake items from Telegram/email webhook payloads without directly creating tasks by default.
  • List/filter intake items by status, source, owner/assignee, priority, tag, related task, related bundle, assistant readiness, date range, and duplicate state.
  • Fetch one intake item with bounded detail and relationship refs.
  • Update triage metadata, owner/assignee, tags, priority, data class, and summary.
  • Attach/detach intake items to existing tasks and bundles.
  • Convert an intake item into a task and preserve intakeRefs/relationship history.
  • Register link/file/artifact references against an intake item using Define artifact model and storage policy #29-compatible metadata.
  • Mark duplicate/ignored/blocked/archived with a required reason where appropriate.
  • Build assistant input refs from selected intake items for Implement assistant job model and lifecycle #30; if Implement assistant job model and lifecycle #30 APIs are present, support creating a draft or queued assistant job from selected ready items.
  • Return consistent 400/401/404 responses matching existing work-engine API style.

The existing /api/telegram and /api/email behavior should be changed or wrapped so incoming messages create inbox items first. Creating a task directly from Telegram/email may remain as an explicit compatibility mode only if documented and covered by tests.

Portal UI

Add an operator Inbox placement in the V1 workspace, visible alongside the workflow-first surfaces. It should support:

  • A dashboard/inbox queue showing new, blocked, duplicate, and assistant-ready items.
  • A compact count or section on the operator dashboard for untriaged inbox work and blocked intake.
  • A detail view/panel for one intake item with source metadata, safe excerpt, links/files, relationship refs, and triage actions.
  • Manual intake creation from the portal.
  • Attach-to-task, attach-to-bundle, convert-to-task, mark-duplicate, ignore, archive, and blocked-state actions.
  • Assistant-ready state display and a handoff path to Implement assistant job model and lifecycle #30 assistant jobs when available.
  • Clear distinction between raw intake, task proof, file metadata, artifact metadata, and assistant outputs so operators do not treat unreviewed intake as completed work.

Exports And Migration Safety

Portable export must include intake data once implemented:

  • Add intake_items.jsonl to normal exports.
  • Update manifest.json entity files, counts, checksums, redactions, and omitted_entities accurately.
  • Export relationship IDs for tasks, bundles, files/artifacts, assistant jobs, duplicate links, and users where available.
  • Validate required fields, duplicate IDs, enum values, parseable timestamps, bounded metadata size, redaction compliance, and relationship integrity.
  • Dry-run import must report intake inserts/updates and invalid relationships without writing production data.
  • Normal exports must not include secrets, session tokens, raw credentials, signed URLs, or unbounded raw message/email/file content.

Acceptance Criteria

  • work-engine defines typed intake source/status/priority/data-class contracts and a durable IntakeItem persistence path with stable application IDs.
  • Production DynamoDB/SAM ownership is explicit for intake storage, including table env vars and least-privilege IAM if a new table is added.
  • Authenticated /api/intake routes support create, list/filter, detail, update, attach/detach, convert-to-task, duplicate/ignore/block/archive, link/file/artifact reference registration, and assistant input preparation.
  • Existing Telegram and email webhook paths create inbox items first, or retain direct task creation only behind a documented compatibility flag with tests.
  • Manual portal intake can create note/link intake items without external credentials.
  • Operators can see untriaged and blocked inbox items from the dashboard/workspace and can triage an item through the expected actions.
  • Intake items can attach to existing tasks and bundles without copying raw unbounded bodies into task comments or bundle descriptions.
  • Converting intake to an ad-hoc task preserves the intake relationship and marks the intake item converted or attached according to the implemented contract.
  • Assistant-ready input refs can be built from selected intake items; if Implement assistant job model and lifecycle #30 is implemented in the branch, a draft/queued assistant job can be created from those refs.
  • File/link/artifact relationships follow Define artifact model and storage policy #29: inbox stores metadata/refs only, never binaries or temporary signed URLs as durable content.
  • Data safety checks reject or redact detectable secrets in summaries, metadata, and exported records where practical.
  • Portable export writes and validates intake_items.jsonl; manifest counts/checksums and omitted entities are accurate; dry-run import covers intake records.
  • Existing task, bundle, file, notification, recurring, auth, export, Telegram, and email tests still pass.
  • UI tests and screenshots cover the Inbox queue, intake detail/triage actions, manual intake creation, and dashboard placement.
  • [HUMAN] Real Telegram delivery, email provider delivery, or production external webhook setup is manually verified only if this issue connects live external accounts or secrets.

Test Scenarios

Scenario: Manual note becomes an inbox item

Given: an authenticated operator is in the portal
When: they create a manual intake note with a pasted URL and private data class
Then: the item appears in the Inbox as new, includes bounded summary/link metadata, and stores no binary or secret payload.

Scenario: Telegram webhook creates raw intake instead of direct task

Given: the Telegram webhook receives a text message with a task-like note and source message metadata
When: the webhook is processed
Then: an inbox item with source=telegram is created and no task is created unless explicit compatibility mode is enabled.

Scenario: Email webhook captures forwarded work

Given: the email webhook receives from/subject/body/link metadata
When: the webhook is processed
Then: an inbox item with bounded summary and redacted source metadata is created, and the full raw body is not stored unbounded in DynamoDB.

Scenario: Intake attaches to existing workflow context

Given: an active bundle and task exist
When: the operator attaches a selected inbox item to them
Then: the intake item records the task/bundle relationship, the task/bundle can show an intake reference, and the intake item status becomes attached or remains triaged according to the documented rule.

Scenario: Intake converts to ad-hoc task

Given: a new inbox item that represents standalone work
When: the operator converts it to a task with due date, assignee, and tags
Then: a task is created with source=intake or equivalent documented source, the intake relationship is preserved, and the intake item is marked converted.

Scenario: Duplicate handling preserves audit context

Given: two inbox items represent the same email or Telegram request
When: the operator marks one as duplicate of the other with a reason
Then: both records keep the duplicate relationship, the duplicate item no longer appears in the default untriaged queue, and export validation preserves the relationship.

Scenario: Assistant-ready handoff

Given: multiple podcast-related inbox items contain notes, links, and file/artifact refs
When: the operator marks them as ready for podcast assistant input
Then: the inbox item(s) expose #30-compatible input refs and, when #30 APIs are available, can create a draft/queued assistant job without copying raw payload bodies into the job.

Scenario: Export and dry-run import include intake

Given: local test data includes manual, Telegram, email, attached, converted, duplicate, and assistant-ready intake items
When: export, validate, and dry-run import run
Then: intake_items.jsonl is present, manifest counts/checksums match, relationships validate, and secrets/raw unbounded content are absent.

Scenario: UI inbox is not disconnected from workflow work

Given: the operator has untriaged intake, a task, and a bundle
When: they use the Inbox queue and detail UI
Then: they can triage into task/bundle context from the same workspace without opening a separate app or losing dashboard visibility.

Out of Scope

  • Implementing the artifact storage policy itself; Define artifact model and storage policy #29 owns artifact model, proof, and storage details.
  • Implementing the assistant job lifecycle, runner, retries, approvals, or output artifacts; Implement assistant job model and lifecycle #30 owns that.
  • Full-text search across inbox, artifacts, and assistant outputs.
  • Sophisticated duplicate detection beyond exact/upstream-ID/manual duplicate marking unless it falls out cheaply from normalized source IDs.
  • Production integration with real Telegram, email providers, OAuth, Groq, Heru, Codex, Claude, Dropbox, Google Drive, or S3 credentials beyond safe metadata boundaries.
  • Migrating all historical Trello cards, spreadsheet rows, emails, or podcast assistant local inbox files.
  • Storing raw binary attachments, raw audio/image/video payloads, or generated documents in DynamoDB or Git.
  • Modifying ../podcast-assistant, ../dtc-operations, ../datatasks, or any other source repo.
  • Replacing the current V1 frontend framework or public/private Lambda architecture.

Dependencies

  • Define artifact model and storage policy #29 defines the artifact/file storage policy used by intake file refs, artifact refs, storage URIs, redaction expectations, and export relationship validation.
  • Implement assistant job model and lifecycle #30 defines assistant job input refs, job IDs, lifecycle, and output/log relationships. This issue should hand off assistant-ready input, not duplicate assistant execution.
  • docs/v1-runtime-architecture.md, docs/v1-execution-state-schema.md, docs/v1-execution-data-safety.md, and docs/v1-workflow-data-model.md are the baseline architecture/data-safety contracts.
  • Production DynamoDB tables and Lambda permissions must remain SAM/CloudFormation-owned.
  • Existing assistants/podcast local storage remains a reference for source behavior, not the durable production inbox.

Verification Commands

Run from repo root unless noted otherwise:

npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
npm --prefix work-engine run test:e2e
npm --prefix work-engine run export:data -- .tmp/exports/intake-inbox
npm --prefix work-engine run validate:export -- .tmp/exports/intake-inbox
npm --prefix work-engine run dry-run:import -- .tmp/exports/intake-inbox

If lambda-functions/template.full.yaml or deployment workflow files change, also run:

sam validate --template-file lambda-functions/template.full.yaml

If portal/docs-app routing or Python broker behavior changes, also run:

uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app

If assistants/podcast/** behavior changes, also run:

uv run --project assistants/podcast pytest

Tester must capture screenshots for the Inbox queue, intake detail/triage panel, manual intake creation, dashboard placement, and any changed task/bundle attachment views.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1ImportantassistantAssistant modules and jobsbackendBackend/APIdataData model, migration, storageenhancementNew or improved functionalityfrontendFrontend UIinfraDeployment and infrastructureportalShared portal shell and UXtestingTests and QAwork-engineDataTasks task execution engine

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions