You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add offsite export archives and restore evidence for production data safety
Status: in progress
Tags: enhancement, backend, work-engine, infra, data, testing, docs, P0
Depends on: #48 (closed)
Blocks: None
Scope
Implement the production-ready offsite archive lane for DataOps V1 execution
data.
#48 already delivered the local portable export, export validation, dry-run
import, scheduled local export route, and restore-drill documentation. This
issue extends that foundation so production execution data can be exported to
durable offsite storage and later proven restorable without relying on Lambda
local files.
Affected areas:
work-engine/ TypeScript export and cron/admin export path.
lambda-functions/template.full.yaml and related deployment configuration
for SAM-owned export storage, environment variables, and least-privilege IAM.
Restore/export documentation in docs/v1-execution-data-safety.md and docs/restore-drill.md.
Tests for archive writing, S3/offsite storage behavior, restore evidence, and
production safety gates.
The implementation must stay aligned with the current V1 runtime architecture:
the public Python portal remains the only public entry point, WorkEngineFunction
stays private, runtime execution state stays in SAM-owned DynamoDB tables, and
portable exports remain application-level JSONL snapshots independent of
DynamoDB PK/SK internals.
Acceptance Criteria
SAM/CloudFormation declares the offsite export storage needed for V1, or
wires to an explicitly named existing bucket through parameters, with
server-side encryption, public access blocked, versioning enabled,
production retain/deletion safety, lifecycle or retention rules, and tags
suitable for backup selection.
WorkEngineFunction receives export archive configuration through stack
parameters/environment variables and has least-privilege IAM for only the
required export archive prefix/actions. No production bucket, credential,
account ID, or secret is hardcoded.
The scheduled/admin export path can write a timestamped portable export
archive to offsite storage. The archive contains manifest.json and the
current portable JSONL entity files, including artifacts.jsonl, assistant_jobs.jsonl, and audit_events.jsonl when those entities are
emitted by the current export implementation. Entity omission must remain
explicit in the manifest when an entity is not implemented.
Archive object keys include environment and generation time, are stable
enough for retention/audit review, and avoid leaking private data in the
key name. The route/command response includes archive URI/key, generated_at, schema/export format version, entity counts, and checksum
summary, but does not return secrets, signed URLs, session tokens, or
private credentials.
Export archives preserve the portable export safety rules from docs/v1-execution-data-safety.md: no password hashes, live sessions,
API keys, OAuth tokens, cookies, signed temporary URLs, private
credentials, raw binary payloads, or DynamoDB-only key dependency in
normal exports.
The implementation writes restore evidence for a non-production drill:
source archive URI/key, app git SHA, export generated_at, manifest
checksum summary, validation result, dry-run import counts, skipped/invalid
record counts, target environment, timestamp, and smoke-check checklist
result. Test and local drill artifacts must live under project-local .tmp/exports/ paths.
A restore/drill command or documented workflow can fetch or read an
archive, extract the portable export, run validate:export, run dry-run:import, and produce the restore evidence
report without writing production data.
Production restore/import/write behavior is human-gated: automated cron,
admin export, validation, and dry-run paths must not mutate production
DynamoDB tables; any production restore/write action requires an explicit
human-run command or documented manual approval step before it can write.
The existing local export path remains usable: npm --prefix work-engine run export:data -- <export-dir>, validate:export, and dry-run:import still work for local/test archives.
Documentation explains the offsite archive location/prefix, retention
expectations, restore evidence format, safe local .tmp/exports/ drill
path, and the human gate for production restore/write operations.
[HUMAN] A production operator verifies in AWS that the deployed export
archive bucket/prefix is encrypted, versioned, private, retained/lifecycled
as specified, and receives at least one scheduled/admin export archive.
[HUMAN] Before production execution data is treated as critical, a human
runs the documented restore drill against a staging or isolated target and
attaches the restore evidence report to the issue.
Test Scenarios
Scenario: scheduled export uploads an offsite archive
Given: the work-engine has export archive configuration and a mocked or local
S3-compatible storage client.
When: the scheduled/admin export path runs.
Then: it writes one timestamped archive under the configured environment/prefix,
returns archive metadata and manifest summary, and does not expose secrets or
signed URLs.
Scenario: archive contents stay portable and redacted
Given: exported users, tasks, bundles, templates, recurring configs, files,
notifications, artifacts, assistant jobs, and audit events with realistic
relationships and sensitive fields.
When: the archive is extracted and validated.
Then: all emitted entity files pass validate:export, relationship checks pass,
normal exports exclude sensitive fields, and any omitted entity type is listed
explicitly in the manifest.
Scenario: restore evidence is generated without production writes
Given: a previously written archive.
When: the restore drill workflow validates the archive and runs dry-run import.
Then: it writes a restore evidence report with validation status, dry-run
would-write counts, skipped/invalid counts, checksum summary, target environment,
and smoke-check checklist, while making no writes to production DynamoDB tables.
Scenario: production restore remains human-gated
Given: production archive configuration is present.
When: cron/admin export, validation, or dry-run restore code runs automatically.
Then: the code cannot restore, import, overwrite, or delete production execution
records unless a separate explicit human-approved restore/write path is invoked.
Required Verification
npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
Focused tests for archive upload, archive extraction/validation, restore
evidence generation, and production no-write safety.
sam validate --template-file lambda-functions/template.full.yaml --lint
or the repository's equivalent SAM validation command for the touched
template.
Docs link validation if restore/data-safety docs are edited:
uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app
if Python portal broker, deployment template assumptions, or docs-app behavior
are touched.
Attach or summarize the local restore evidence report path under .tmp/exports/ in the Tester handoff.
Out of Scope
Full Postgres or other target-database import tooling.
Destructive production restore, table replacement, table rename, or production
data repair automation.
Binary backup/storage for uploaded files or generated artifacts beyond the
portable export metadata archive. S3/object-storage backup for artifact
binaries remains a separate production storage issue.
UI dashboards for browsing export history, unless a minimal status endpoint is
needed for verification.
Changes to source repositories outside DataTalksClub/dataops, including ../dtc-operations, ../datatasks, and ../podcast-assistant.
Production AWS verification requires a human with access to the deployed dataops-v1 account/stack. Agent work must stop at code, tests, local or
mocked archive evidence, and documented human checks.
Add offsite export archives and restore evidence for production data safety
Status: in progress
Tags:
enhancement,backend,work-engine,infra,data,testing,docs,P0Depends on: #48 (closed)
Blocks: None
Scope
Implement the production-ready offsite archive lane for DataOps V1 execution
data.
#48 already delivered the local portable export, export validation, dry-run
import, scheduled local export route, and restore-drill documentation. This
issue extends that foundation so production execution data can be exported to
durable offsite storage and later proven restorable without relying on Lambda
local files.
Affected areas:
work-engine/TypeScript export and cron/admin export path.lambda-functions/template.full.yamland related deployment configurationfor SAM-owned export storage, environment variables, and least-privilege IAM.
docs/v1-execution-data-safety.mdanddocs/restore-drill.md.production safety gates.
The implementation must stay aligned with the current V1 runtime architecture:
the public Python portal remains the only public entry point,
WorkEngineFunctionstays private, runtime execution state stays in SAM-owned DynamoDB tables, and
portable exports remain application-level JSONL snapshots independent of
DynamoDB
PK/SKinternals.Acceptance Criteria
wires to an explicitly named existing bucket through parameters, with
server-side encryption, public access blocked, versioning enabled,
production retain/deletion safety, lifecycle or retention rules, and tags
suitable for backup selection.
WorkEngineFunctionreceives export archive configuration through stackparameters/environment variables and has least-privilege IAM for only the
required export archive prefix/actions. No production bucket, credential,
account ID, or secret is hardcoded.
archive to offsite storage. The archive contains
manifest.jsonand thecurrent portable JSONL entity files, including
artifacts.jsonl,assistant_jobs.jsonl, andaudit_events.jsonlwhen those entities areemitted by the current export implementation. Entity omission must remain
explicit in the manifest when an entity is not implemented.
enough for retention/audit review, and avoid leaking private data in the
key name. The route/command response includes archive URI/key,
generated_at, schema/export format version, entity counts, and checksumsummary, but does not return secrets, signed URLs, session tokens, or
private credentials.
docs/v1-execution-data-safety.md: no password hashes, live sessions,API keys, OAuth tokens, cookies, signed temporary URLs, private
credentials, raw binary payloads, or DynamoDB-only key dependency in
normal exports.
source archive URI/key, app git SHA, export
generated_at, manifestchecksum summary, validation result, dry-run import counts, skipped/invalid
record counts, target environment, timestamp, and smoke-check checklist
result. Test and local drill artifacts must live under project-local
.tmp/exports/paths.archive, extract the portable export, run
validate:export, rundry-run:import, and produce the restore evidencereport without writing production data.
admin export, validation, and dry-run paths must not mutate production
DynamoDB tables; any production restore/write action requires an explicit
human-run command or documented manual approval step before it can write.
npm --prefix work-engine run export:data -- <export-dir>,validate:export, anddry-run:importstill work for local/test archives.expectations, restore evidence format, safe local
.tmp/exports/drillpath, and the human gate for production restore/write operations.
[HUMAN]A production operator verifies in AWS that the deployed exportarchive bucket/prefix is encrypted, versioned, private, retained/lifecycled
as specified, and receives at least one scheduled/admin export archive.
[HUMAN]Before production execution data is treated as critical, a humanruns the documented restore drill against a staging or isolated target and
attaches the restore evidence report to the issue.
Test Scenarios
Scenario: scheduled export uploads an offsite archive
Given: the work-engine has export archive configuration and a mocked or local
S3-compatible storage client.
When: the scheduled/admin export path runs.
Then: it writes one timestamped archive under the configured environment/prefix,
returns archive metadata and manifest summary, and does not expose secrets or
signed URLs.
Scenario: archive contents stay portable and redacted
Given: exported users, tasks, bundles, templates, recurring configs, files,
notifications, artifacts, assistant jobs, and audit events with realistic
relationships and sensitive fields.
When: the archive is extracted and validated.
Then: all emitted entity files pass
validate:export, relationship checks pass,normal exports exclude sensitive fields, and any omitted entity type is listed
explicitly in the manifest.
Scenario: restore evidence is generated without production writes
Given: a previously written archive.
When: the restore drill workflow validates the archive and runs dry-run import.
Then: it writes a restore evidence report with validation status, dry-run
would-write counts, skipped/invalid counts, checksum summary, target environment,
and smoke-check checklist, while making no writes to production DynamoDB tables.
Scenario: production restore remains human-gated
Given: production archive configuration is present.
When: cron/admin export, validation, or dry-run restore code runs automatically.
Then: the code cannot restore, import, overwrite, or delete production execution
records unless a separate explicit human-approved restore/write path is invoked.
Required Verification
npm --prefix work-engine testnpm --prefix work-engine run typechecknpm --prefix work-engine run buildFocused tests for archive upload, archive extraction/validation, restore
evidence generation, and production no-write safety.
sam validate --template-file lambda-functions/template.full.yaml --lintor the repository's equivalent SAM validation command for the touched
template.
Docs link validation if restore/data-safety docs are edited:
uv run --project lambda-functions --extra search python -m lambda_functions.validate_docs_links \ --repo-root . \ --content-root contentuv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_appif Python portal broker, deployment template assumptions, or docs-app behavior
are touched.
Attach or summarize the local restore evidence report path under
.tmp/exports/in the Tester handoff.Out of Scope
data repair automation.
portable export metadata archive. S3/object-storage backup for artifact
binaries remains a separate production storage issue.
needed for verification.
DataTalksClub/dataops, including../dtc-operations,../datatasks, and../podcast-assistant.Dependencies
dry-run import, scheduled local export route, and restore-drill document.
entities they add or migrate before implementation must be covered by the
archive or listed explicitly in
manifest.jsonomitted_entities.dataops-v1account/stack. Agent work must stop at code, tests, local ormocked archive evidence, and documented human checks.