Migrate spreadsheet TODOs into integrated operations-manager work

# Migrate spreadsheet TODOs into integrated operations-manager work

Status: pending
Tags: `enhancement`, `migration`, `portal`, `process-docs`, `work-engine`, `frontend`, `backend`, `testing`, `data`, `P1`
Depends on: #15, #29, #48, #50
Blocks: None

## Scope

Implement a safe migration path for the Google Spreadsheet TODO data described in `work-engine/docs/data.md` so pending spreadsheet work becomes normal DataOps operations-manager work, not a separate spreadsheet clone. The operator should see imported work in the same Work/Home/task/bundle surfaces as native tasks, recurring work, workflow bundles, proof-gated tasks, waiting follow-ups, process-doc links, artifacts, notifications, and portable exports.

Use these current references:

- `.goal-v1.md`
- `docs/operations-manager-platform-jtbd.md`
- `docs/v1-workflow-data-model.md`
- `docs/v1-execution-state-schema.md`
- `docs/v1-execution-data-safety.md`
- `docs/restore-drill.md`
- `work-engine/docs/data.md`
- `work-engine/docs/specs.md`
- `work-engine/docs/templates.md`
- `work-engine/scripts/migrate-data.ts`
- `work-engine/tests/migrate-data.test.ts`
- `work-engine/tests/export-portable.test.ts`
- `work-engine/tests/dry-run-import.test.ts`
- `work-engine/src/types.ts`
- `content/tasks/templates/` and relevant process docs in `content/**`
- source-system references in `../datatasks` and `../dtc-operations` for comparison only; do not modify source repositories

The current script maps CSV rows into standalone manual tasks. This issue should extend or replace that CSV path so spreadsheet rows can become the right DataOps entities:

- Open pending rows from `TODO list - todo.csv` become ordinary active tasks, workflow tasks, or recurring configs according to their content.
- Repetitive rows from `TODO list - done.csv` may be analyzed to infer recurring patterns, process-doc references, and proof semantics, but completed history must not flood the active task list by default.
- Rows that describe one-off work become tasks with due date, assignee when safely known, source/provenance, comments, process-doc context, proof requirements, waiting/follow-up metadata, and export-safe audit/provenance.
- Rows that match existing workflow patterns should attach to existing workflow bundles/tasks when a safe deterministic match exists, or be reported for human review instead of creating disconnected duplicate work.
- Rows that are truly periodic should become recurring configs or automatic workflow trigger suggestions using the current recurring/template primitives, not many copied historical task rows.
- Process-document creation/update rows should link to in-repo process docs by stable `instructionDocId` where resolvable and preserve external `instructionsUrl`/spreadsheet notes as fallback provenance.
- Notes, process document links, comments, date-finished values, and status values should feed proof/comment/completion metadata only when the data is safe and unambiguous.

The migration must be file-based and local/test safe. It may consume local CSV fixtures or a human-provided export file, but it must not connect to live Google Sheets, mutate Google Sheets, write external systems, or access production spreadsheets during agent verification.

## Acceptance Criteria

- [ ] The CSV migration supports explicit local input files, for example `--source-todo <csv>` and optional `--source-done <csv>`, so tests and operators do not rely on hard-coded production spreadsheet paths.
- [ ] Dry-run/preview mode is the default or clearly available and writes no records. It reports counts for imported tasks, recurring configs/suggestions, workflow attachment candidates, completed rows skipped, blank rows skipped, unsafe rows, unresolved process docs, unresolved workflow matches, proof requirements, waiting/follow-up tasks, and validation errors.
- [ ] By default, only open/pending TODO rows become active DataOps work. Completed historical rows are skipped unless an explicit history/analysis flag is used, and recurring pattern inference from history does not create duplicate historical tasks.
- [ ] Imported spreadsheet work appears in the normal Operations Home, task list, waiting/follow-up, overdue, recurring, and workflow-context surfaces. No separate spreadsheet-import queue is required for daily operation.
- [ ] Each imported task preserves due date from the row `Date` field after robust parsing of known formats; invalid or missing dates are reported and either assigned a safe review date or skipped according to documented rules.
- [ ] Status normalization handles `NEW`, `DONE`, `DONEDONE`, mixed case, blank separator rows, multiline task text, extra empty columns, and `done.csv` rows whose status does not match the file name.
- [ ] Imported tasks use `source: import` or an equivalently documented source value, plus source provenance that includes source file, row number, source date/status, and source task text without exporting spreadsheet credentials or private links that are unsafe to store.
- [ ] Rows matching known recurring duties, such as Slack invites, Trello/card review, newsletter preparation, Mailchimp backup, Slack dump, sponsor performance follow-up, invoice/receipt checks, or bookkeeping TODO checks, become recurring configs, automatic workflow trigger candidates, or explicit migration suggestions rather than repeated standalone tasks.
- [ ] A row can attach to an existing workflow bundle/task only when a deterministic match is available, such as a stable source ID, normalized title/date/template match, or explicit operator-approved mapping; ambiguous matches remain standalone tasks or migration warnings.
- [ ] Process-doc links in notes or done-history metadata resolve to `instructionDocId` when an in-repo document is known; unresolved links preserve `instructionsUrl` and appear in the unresolved-doc report for future #33/workflow mapping work.
- [ ] Imported process-document work records process-doc title/link/comment context as task comments, proof requirements, or artifact metadata when appropriate; it must not copy whole external docs into DynamoDB.
- [ ] Completion proof is explicit. Rows that require a Google Doc, spreadsheet update, report, invoice, backup file, public link, comment, or external status set `requiredLinkName`, `requiresFile`, `proofRequirement`, `externalStatus`, or artifact metadata as appropriate.
- [ ] Tasks with required proof cannot be imported or updated into `done` unless the required proof is present and export validation accepts it; ambiguous completed rows are reported instead of silently marked done.
- [ ] Waiting/follow-up semantics are preserved or inferred conservatively for tasks blocked on a guest, sponsor, author, speaker, publisher, freelancer, accountant, Alexey, Valeria/Valeriia, Grace, or another external/internal reviewer. Waiting tasks must have `waitingFor`, `followUpAt`, and a short note/comment.
- [ ] Ambiguous waiting cases remain `todo` with a migration note and unresolved waiting warning instead of becoming `waiting` without a safe follow-up date.
- [ ] Import write mode is idempotent. Re-running the same CSV inputs updates or skips previously imported records using stable source keys such as source file + row hash/source row ID, and does not create duplicate tasks, recurring configs, artifacts, notifications, or audit/provenance records.
- [ ] Non-sensitive useful URLs in notes become task links, bundle links, or artifact metadata when they are proof/output links. Temporary signed URLs, OAuth URLs, credentials, cookies, API keys, session values, and binary payloads are rejected or redacted.
- [ ] Imported tasks that are due, overdue, waiting, missing proof, or ready for follow-up drive existing notification/dashboard behavior, including `follow-up-due` and missing-evidence context where implemented.
- [ ] Import/audit provenance is durable and export-safe. Use audit events where supported; otherwise use documented bounded provenance metadata/comment fields that are included in portable export without DynamoDB `PK`/`SK` internals.
- [ ] Fixture-based tests cover open rows, completed/history rows, recurring-pattern rows, process-doc rows, multiline rows, malformed dates/statuses, unsafe URLs/secrets, waiting/follow-up inference, proof requirements, idempotency, and workflow attachment ambiguity.
- [ ] Portable export after a fixture import includes imported tasks, recurring configs, artifacts/files/notifications/audit/provenance as applicable; `validate:export` passes; `dry-run:import` reports valid counts without writing data.
- [ ] UI/API behavior remains unified: imported spreadsheet work is visible and actionable through existing task, workflow, proof, waiting/follow-up, recurring, and Home surfaces.
- [ ] [HUMAN] Before any real production Google Spreadsheet export is used, Alexey or Valeria confirms the export source, export date, included tabs/columns, row count, and whether any sensitive rows/links/comments must be redacted.
- [ ] [HUMAN] Before any production DynamoDB import write, a human confirms target environment, on-demand backup/export location, dry-run summary, skipped/error report, unresolved mappings, and rollback/restore plan.

## Test Scenarios

### Scenario: Dry-run previews spreadsheet rows without writes

Given: local CSV fixtures for `todo.csv` and `done.csv` with pending rows, done rows, blank separators, multiline task text, mixed statuses, and extra columns
When: the migration runs in dry-run mode
Then: it reports planned active tasks, recurring suggestions/configs, skipped historical rows, skipped blanks, unresolved mappings, proof requirements, waiting/follow-up candidates, and writes no DynamoDB records.

### Scenario: Pending rows become normal tasks

Given: a local `todo.csv` fixture with one ad-hoc pending task, one task with notes, and one task with a process-doc link
When: the migration runs in local write mode
Then: the rows create normal task records with due dates, comments, source provenance, process-doc context where resolvable, and visibility through existing task APIs and Operations Home.

### Scenario: Repetitive history becomes recurring work

Given: `done.csv` contains repeated historical rows for Slack invites, Mailchimp backup, Slack dump, and newsletter preparation
When: the migration analyzes history
Then: it creates or suggests recurring configs/template triggers instead of importing every historical occurrence as an active task, and the summary explains created/skipped/suggested counts.

### Scenario: Process-document TODO keeps doc context

Given: a row asks to create or update a process document and includes a process document title/link/comment
When: the row is imported
Then: the task stores bounded context, resolves `instructionDocId` when possible, preserves unresolved external `instructionsUrl` as fallback, and requires URL/comment/artifact proof before completion when the row outcome is a document.

### Scenario: Waiting work drives follow-ups

Given: a row clearly represents waiting for a guest/sponsor/reviewer and includes a safe follow-up date or notes from which one can be deterministically parsed
When: the row is imported
Then: the task has `status=waiting`, `waitingFor`, `followUpAt`, a short comment, and appears in follow-up views when the date is due.

### Scenario: Ambiguous waiting is not guessed

Given: a row mentions a person but does not clearly indicate blocked work or a safe follow-up date
When: the row is imported
Then: it remains `todo`, keeps source context in comments/provenance, and the migration report records an unresolved waiting inference.

### Scenario: Proof-gated completion remains safe

Given: a row marked completed requires a Google Doc link, spreadsheet update, report file, invoice, backup, or public URL as proof
When: the import cannot find the required proof in row data
Then: the task is not silently marked `done`; it is reported as missing proof or imported as active review work according to documented rules.

### Scenario: Workflow attachment avoids duplicates

Given: a spreadsheet row appears related to an existing newsletter/podcast/tax-report workflow but the match is ambiguous
When: the import runs
Then: the row is not attached to an arbitrary bundle; it is imported as standalone review work or reported as an unresolved workflow match.

### Scenario: Import is idempotent

Given: the same CSV fixtures were already imported once
When: the migration runs again in local write mode
Then: no duplicate tasks, recurring configs, artifacts, notifications, or provenance records are created, and the summary reports created/updated/skipped counts.

### Scenario: Export and restore safety holds after migration

Given: fixture spreadsheet rows have been imported into local work-engine data
When: portable export, export validation, and dry-run import are run
Then: relationships validate, waiting tasks have required metadata, proof-gated tasks are valid, redactions are enforced, no secrets or binaries are exported, and dry-run import reports insert/update counts without writing production data.

### Scenario: Imported work is visible in the operator flow

Given: imported rows include overdue work, waiting follow-ups, recurring duties, missing proof, and process-doc links
When: the operator opens Operations Home, the task list, and relevant bundle/task detail views
Then: imported work appears in normal due/overdue/waiting/follow-up/recurring/workflow sections, proof blockers are visible, and process docs open from the task context.

## Out of Scope

- Connecting to live Google Sheets APIs, mutating production spreadsheets, deleting rows, changing spreadsheet statuses, or syncing bidirectionally with Google Sheets.
- Importing all historical completed spreadsheet rows as active runtime tasks by default.
- Rebuilding the app as a spreadsheet clone, grid editor, or separate spreadsheet-import dashboard.
- Migrating active Trello cards; that belongs to #41.
- Implementing full raw intake inbox behavior; #31 owns inbox modeling and triage for future manual/Telegram/email/import sources.
- Implementing the full V1 recurring strategy if #40 is still open beyond the migration-specific mapping needed here.
- Mapping every workflow type to fully curated process docs; #33, #36, #37, #38, #39, and future workflow mapping issues improve doc coverage.
- Creating new integrations with Slack, Airtable, Mailchimp, Dropbox, Google Drive, Google Calendar, Luma, Meetup, YouTube, Spotify, Apple Podcasts, Finom, Wise, Revolut, LinkedIn, X, email, or Telegram.
- Performing production DynamoDB writes, destructive restore drills, production backup creation, or external account checks during agent verification.
- Modifying `../dtc-operations`, `../datatasks`, `../podcast-assistant`, or any other source repository.

## Dependencies

- #15, #29, #48, and #50 are already closed and provide the workflow data model, artifact/proof model, portable export/data-safety contract, and waiting follow-up behavior this issue should use.
- #40 is useful for richer recurring-work UX. If it is not closed when this issue starts, keep this issue to migration-safe recurring config creation/suggestions using current primitives and avoid duplicating #40's broader dashboard/admin scope.
- #31 can later represent spreadsheet imports as raw intake items, but this issue should not require a separate inbox to make imported pending TODOs actionable in Work/Home.
- #33 and workflow mapping issues are useful for stable process-doc IDs. If a process doc cannot be resolved to a stable ID, preserve `instructionsUrl` and report the unresolved mapping.
- Any real production spreadsheet export/import requires human confirmation and must follow `docs/v1-execution-data-safety.md` and `docs/restore-drill.md` backup/export/restore guidance.

## Affected Areas

- `work-engine/scripts/migrate-data.ts` and any helper modules created for CSV/TODO migration.
- `work-engine/tests/migrate-data.test.ts` and new fixture-based migration tests.
- `work-engine/src/db/tasks.ts`, `work-engine/src/db/recurring.ts`, `work-engine/src/db/artifacts.ts`, `work-engine/src/db/notifications.ts`, and route validation only if existing fields cannot store migration-safe metadata.
- `work-engine/src/export/portable.ts`, `work-engine/scripts/export-execution-data.ts`, `work-engine/scripts/validate-execution-export.ts`, and `work-engine/scripts/dry-run-import.ts` if exported fields or validation rules need updates.
- `work-engine/src/public/app.js`, `work-engine/src/pages/index.html`, and Playwright specs only if imported work is not visible/actionable through existing operator surfaces.
- Process-doc resolution/search code only if implementation adds a resolver from spreadsheet/process-doc URLs to `instructionDocId`.
- Data safety/export docs only if implementation discovers a new durable entity, source-provenance field, or export rule not already covered by the V1 data-safety docs.

## Data Safety, Export, And Restore Implications

- Use dry-run first for every real export. Production write mode must not run until a human reviews the dry-run summary, unresolved mappings, unsafe URLs, skipped rows, and planned entity counts.
- Create an on-demand DynamoDB backup and portable export before any production import write. Local agent scratch exports must use project-local `.tmp/exports/`.
- Store metadata and stable external/storage references only. Do not store Google Sheets credentials, OAuth tokens, cookies, API keys, signed URLs, session values, private credentials, spreadsheet binary exports, or large raw external document bodies in DynamoDB.
- Exported records must use stable application IDs and explicit relationships, not DynamoDB `PK`/`SK` internals.
- Source provenance must be bounded and redacted. It may include source filename/tab, row number, normalized source date/status, row hash/source ID, and short source text, but not raw secrets or entire private spreadsheet dumps.
- After local fixture import, run portable export validation and dry-run import validation to prove relationship integrity, redaction, date parseability, waiting-task requirements, proof requirements, artifacts/files, notifications, recurring configs, and audit/omitted entities.
- Production restore drills, destructive restore/import checks, and live spreadsheet access are `[HUMAN]` and must not be performed by agents unless explicitly authorized in a later issue.

## Blockers

- Real production spreadsheet migration is blocked on `[HUMAN]` confirmation of the spreadsheet export source, tabs, export date, row count, redaction requirements, and target environment.
- Production DynamoDB writes are blocked on `[HUMAN]` approval of backup/export evidence, dry-run summary, unresolved mappings, skipped/error report, and rollback/restore plan.
- Full `instructionDocId` coverage may be blocked by unresolved stable process-doc IDs until #33 or workflow-specific mapping issues cover the remaining docs; this should not block a safe import when fallback `instructionsUrl` and unresolved mapping reports are present.
- If existing work-engine storage lacks a durable place for source provenance/audit events needed for idempotency and export safety, the Architect should review the minimal schema extension before SWE implementation.
- If #40 is still open and recurring behavior conflicts with the import plan, import should create a reviewed recurring suggestion/report or minimal current-model config rather than implementing a competing recurring-work product.

## Required Verification Commands

Run the work-engine checks because this issue changes migration logic, runtime entities, export safety, and possibly operator behavior:

```bash
npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
```

Run focused migration/export commands with local fixtures and project-local export directories. The exact fixture paths may differ after implementation, but verification must include dry-run, local write/import, export, export validation, and dry-run restore/import validation:

```bash
IS_LOCAL=true npm --prefix work-engine exec -- tsx scripts/migrate-data.ts --dry-run --csv-only --source-todo <fixture-todo.csv> --source-done <fixture-done.csv>
IS_LOCAL=true npm --prefix work-engine exec -- tsx scripts/migrate-data.ts --csv-only --source-todo <fixture-todo.csv> --source-done <fixture-done.csv>
npm --prefix work-engine run export:data -- .tmp/exports/spreadsheet-todos-fixture
npm --prefix work-engine run validate:export -- .tmp/exports/spreadsheet-todos-fixture
npm --prefix work-engine run dry-run:import -- .tmp/exports/spreadsheet-todos-fixture
```

If UI/operator surfaces change, run E2E and capture screenshots for Home/task/workflow states with imported tasks:

```bash
npm --prefix work-engine run test:e2e
```

If process-doc resolution/search changes, also run:

```bash
uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app
cd lambda-functions
uv run --extra search python -m lambda_functions.build_search_index \
  --docs-dir ../content \
  --output ../.tmp/dataops-content-search.index
```

Before handoff, include:

```bash
git diff --check
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate spreadsheet TODOs into integrated operations-manager work #42

Migrate spreadsheet TODOs into integrated operations-manager work

Scope

Acceptance Criteria

Test Scenarios

Scenario: Dry-run previews spreadsheet rows without writes

Scenario: Pending rows become normal tasks

Scenario: Repetitive history becomes recurring work

Scenario: Process-document TODO keeps doc context

Scenario: Waiting work drives follow-ups

Scenario: Ambiguous waiting is not guessed

Scenario: Proof-gated completion remains safe

Scenario: Workflow attachment avoids duplicates

Scenario: Import is idempotent

Scenario: Export and restore safety holds after migration

Scenario: Imported work is visible in the operator flow

Out of Scope

Dependencies

Affected Areas

Data Safety, Export, And Restore Implications

Blockers

Required Verification Commands

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Migrate spreadsheet TODOs into integrated operations-manager work #42

Description

Migrate spreadsheet TODOs into integrated operations-manager work

Scope

Acceptance Criteria

Test Scenarios

Scenario: Dry-run previews spreadsheet rows without writes

Scenario: Pending rows become normal tasks

Scenario: Repetitive history becomes recurring work

Scenario: Process-document TODO keeps doc context

Scenario: Waiting work drives follow-ups

Scenario: Ambiguous waiting is not guessed

Scenario: Proof-gated completion remains safe

Scenario: Workflow attachment avoids duplicates

Scenario: Import is idempotent

Scenario: Export and restore safety holds after migration

Scenario: Imported work is visible in the operator flow

Out of Scope

Dependencies

Affected Areas

Data Safety, Export, And Restore Implications

Blockers

Required Verification Commands

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions