You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build the first unified search implementation for the DataOps operator portal. Search should help an operator execute daily work by finding the right task, workflow, SOP, template, artifact, or assistant output in context. It must not become a separate docs-first tool that leaves the operator to manually connect results back to the work they are doing.
Use the current architecture as the starting point:
lambda-functions/src/lambda_functions/docs_index.py, build_search_index.py, and /search currently index/search Markdown content with minsearch.
lambda-functions/src/lambda_functions/doc_registry.py, /docs/registry, and /docs/resolve expose document metadata and stable-ID resolution.
The authenticated portal brokers private work-engine APIs through /work/api/* in lambda-functions/src/lambda_functions/full_app_handler.py.
work-engine/ already stores runtime work entities: tasks, templates, bundles/workflows, artifacts, files, assistant jobs, notifications, and export metadata.
work-engine task/template records already support instructionDocId, instructionStepId, sourceDocIds, phase, systems, and validation; instantiated tasks preserve those fields.
The current frontend/ search UI searches documents only and renders document rows; the Operations Home/task panels already consume live work snapshots and can open tasks/workflows in context.
The implementation should add one operator-facing search experience that can return typed results from both process knowledge and live work state. Search results should make the result type and next action obvious:
SOP/template/reference/playbook/prompt/task-template docs open in the portal document view.
Task results open the task detail/action panel, including process-doc context when instructionDocId is present.
Workflow/bundle results open the workflow detail panel with active tasks, required proof, and linked docs.
Runtime template/workflow-definition results should make it possible to start or inspect the workflow, using existing quick-start flows where available.
Artifact/file results should show the owning task/workflow/assistant job and open the relevant context before exposing the raw URL/file.
Assistant job/output results should show status and owning task/workflow where those records exist; live assistant execution or new assistant lifecycle work is out of scope.
Start with an API/UI slice that covers docs, task templates, live tasks, active bundles/workflows, artifacts/files, and assistant-job/output metadata available from existing APIs. If a data source is not configured locally or the work engine is unavailable, the UI must show partial results and a clear unavailable-state for that source without fabricating matches.
Jobs To Be Done Implications
When an operator starts the day, they can search for a topic such as Mailchimp newsletter, podcast document, or Luma and see both executable work and the SOPs/templates needed to finish it.
When an operator is inside a task or workflow, search helps answer what doc/template/evidence do I need now? rather than sending them to an unrelated documentation library.
When a workflow is at risk, search exposes nearby context: active bundle, overdue task, required artifact, linked SOP, and assistant output status.
When content quality is incomplete, search should surface missing/unresolved process-doc links as useful states, not hide the task or show a broken result.
Affected Areas
Python Lambda docs portal/search backend: lambda-functions/src/lambda_functions/docs_index.py, search_handler.py, api_handler.py, full_app_handler.py, and related tests under tests/docs_app/.
Frontend portal shell: frontend/index.html, frontend/src/app.js, frontend/src/styles.css, and screenshots for search/task/workflow flows.
Work-engine APIs and models as needed for search payloads: work-engine/src/routes/*, work-engine/src/db/*, work-engine/src/types.ts, work-engine/src/export/portable.ts, and related unit/E2E tests.
Content/search index metadata under content/**, especially content/tasks/templates/*.md, but do not do broad content cleanup in this issue.
Acceptance Criteria
The portal exposes one authenticated search entry point that returns typed results for process docs/task-template docs plus available runtime work records: tasks, templates/workflows, bundles, artifacts/files, and assistant jobs/outputs.
Each result includes a stable type, id, display title, short summary/context, source label, relevance-friendly fields, and enough routing metadata for the frontend to open the correct portal context.
Search supports at least these filters where data exists: result type/source, doc type, domain, tag, system, task status, assignee, due-date bucket, bundle/workflow, and template/workflow type. Unsupported filters for a source should not break other sources.
Document results continue to use the registry/search index and preserve existing /search?q=...&doc_type=...&domain=... behavior for docs consumers.
Task results include due date/status, bundle/template relationship, assignee when available, required proof state, and instructionDocId/instructionStepId context when present.
Workflow/bundle results include stage/status, progress or active-task counts when available, next due/overdue context, required bundle links/artifacts summary, and related process-doc IDs when present.
Runtime template/task-template results make the distinction between Git-backed task-template docs and live work-engine templates clear, while linking both back to source docs through stable IDs when possible.
Artifact/file and assistant-job/output results show the owning task/workflow/assistant context and do not expose private raw storage paths as the primary action.
The frontend renders grouped or clearly typed results without making docs visually dominate work results; primary actions open the task/workflow/document context inside the portal.
Searching from the current sidebar/keyboard shortcut continues to work on desktop and mobile, and empty/error/partial-source states are clear without layout overlap.
If /work/api/* or the work-engine Lambda is unavailable, document search still works and the UI reports work-search unavailability without fake work results.
Existing task-to-process-doc links are reused; this issue must not introduce a second link model or a disconnected search-only document resolver.
Result payloads and UI states are covered by backend/unit tests and frontend/E2E tests for mixed docs/work results, filters, missing sources, and routing actions.
Search index generation still succeeds for content metadata changes, and work-engine tests still prove task/template process-doc metadata persists through creation, update, instantiation, export, and search result formatting where touched.
No source repositories outside DataTalksClub/dataops are modified.
Test Scenarios
Scenario: Search returns executable work and knowledge together
Given: docs include a Mailchimp newsletter SOP/template and work-engine has an active Newsletter task and workflow
When: the operator searches for Mailchimp newsletter
Then: results include the SOP/template doc, the live task, and the active workflow with distinct result types and actions that open the right portal context.
Scenario: Task result opens with process-doc context
Given: a task has instructionDocId and instructionStepId that resolve through the document registry
When: the operator selects the task search result
Then: the task panel opens, shows the process-doc title/context, and offers an action to open the doc in the portal.
Scenario: Workflow result keeps execution context
Given: an active Podcast or Newsletter bundle has overdue tasks, missing required links, and related docs
When: the operator searches for the workflow name or a required link such as Luma
Then: the workflow result shows status/progress context and opens the workflow detail rather than a raw docs page.
Scenario: Artifact result routes through owner context
Given: an artifact or file is attached to a task or bundle
When: the operator searches for the artifact title or type
Then: the result shows the owner task/workflow and opens that context before exposing the artifact URL/file action.
Scenario: Assistant output search is metadata-only
Given: an assistant job/output record exists with task or bundle ownership
When: the operator searches for the assistant job title or output title
Then: the result shows assistant status and owner context without requiring live Telegram, Groq, Heru, or external assistant credentials.
Scenario: Partial source failure is visible
Given: the docs index is available but /work/api/tasks or another work source is unavailable
When: the operator searches
Then: document results still render, unavailable work sources are named in a non-blocking status, and no fake work results appear.
Scenario: Filters narrow mixed results
Given: a query matches docs, tasks, and artifacts
When: the operator applies filters such as type: task, status: waiting, system: mailchimp, or doc_type: sop
Then: matching sources narrow correctly and unsupported filters do not discard unrelated source errors silently.
Scenario: Mobile and keyboard search remain usable
Given: the operator is on mobile or uses / / Cmd/Ctrl+K
When: they search and open a task/workflow/document result
Then: focus, routing, panel state, and result text fit without overlap or broken navigation.
Out of Scope
Replacing minsearch with a hosted search service or vector database.
Building natural-language/RAG answers or assistant-generated search summaries.
Rebuilding the portal navigation, introducing a new frontend framework, or creating a standalone search app.
Full content cleanup, stable IDs for every imported doc, or workflow-specific doc mapping beyond what is needed to render search context.
External link availability checks for Google Docs, Loom, GitHub, Airtable, Luma, Meetup, YouTube, Spotify, Slack, Mailchimp, Dropbox, or email systems.
Mutating production task/workflow/artifact/assistant data as part of indexing. Runtime search should read existing state and metadata only.
New assistant lifecycle, transcription, Telegram intake, or generated artifact processing.
Consolidating the TypeScript work-engine into the Python backend.
Dependencies
Extend process docs with stable IDs #33 is required so workflow-critical docs and task-template docs have stable IDs that search can display and route through.
Existing /work/api/* broker behavior should be reused. If the work-engine Lambda is not configured locally, the implementation must still support tests with mocked/local work-engine data and a documented partial-source state.
Assistant output search depends only on metadata available from current assistant job/artifact/file APIs. Anything requiring external credentials is not part of this issue.
Data Safety And Export Implications
Search must treat Git-backed Markdown as process knowledge and DynamoDB/work-engine records as runtime execution state; do not copy runtime work state into Markdown or make content files the source of truth for live tasks.
Search indexing should be read-only for runtime work data. It must not update task status, assistant job status, artifact records, file records, or bundle links.
Private or sensitive artifact storage paths must not be exposed as the primary search result text. Prefer title/type/owner/context and existing authenticated open/download routes.
If a persisted runtime search index/cache is introduced, it must be documented as rebuildable derived data and excluded from portable execution exports unless there is a clear reason to include it.
Portable export behavior must continue to include canonical task/template/bundle/artifact/assistant metadata, including instructionDocId, instructionStepId, sourceDocIds, and owner links where already supported.
If there is no stable local way to query representative work-engine records through /work/api/*, add a narrow mocked/local test fixture rather than querying production data.
If assistant output records are not available by implementation time, keep the result type and UI state metadata-ready but mark live assistant-output search as blocked by the assistant job/artifact issue that owns those records.
Implement unified operator search across docs and work context
Status: pending
Tags:
enhancement,portal,process-docs,work-engine,frontend,backend,testing,data,design,P1Depends on: #33, #34
Blocks: None
Scope
Build the first unified search implementation for the DataOps operator portal. Search should help an operator execute daily work by finding the right task, workflow, SOP, template, artifact, or assistant output in context. It must not become a separate docs-first tool that leaves the operator to manually connect results back to the work they are doing.
Use the current architecture as the starting point:
lambda-functions/src/lambda_functions/docs_index.py,build_search_index.py, and/searchcurrently index/search Markdown content withminsearch.lambda-functions/src/lambda_functions/doc_registry.py,/docs/registry, and/docs/resolveexpose document metadata and stable-ID resolution./work/api/*inlambda-functions/src/lambda_functions/full_app_handler.py.work-engine/already stores runtime work entities: tasks, templates, bundles/workflows, artifacts, files, assistant jobs, notifications, and export metadata.work-enginetask/template records already supportinstructionDocId,instructionStepId,sourceDocIds,phase,systems, andvalidation; instantiated tasks preserve those fields.frontend/search UI searches documents only and renders document rows; the Operations Home/task panels already consume live work snapshots and can open tasks/workflows in context.The implementation should add one operator-facing search experience that can return typed results from both process knowledge and live work state. Search results should make the result type and next action obvious:
instructionDocIdis present.Start with an API/UI slice that covers docs, task templates, live tasks, active bundles/workflows, artifacts/files, and assistant-job/output metadata available from existing APIs. If a data source is not configured locally or the work engine is unavailable, the UI must show partial results and a clear unavailable-state for that source without fabricating matches.
Jobs To Be Done Implications
Mailchimp newsletter,podcast document, orLumaand see both executable work and the SOPs/templates needed to finish it.what doc/template/evidence do I need now?rather than sending them to an unrelated documentation library.Affected Areas
lambda-functions/src/lambda_functions/docs_index.py,search_handler.py,api_handler.py,full_app_handler.py, and related tests undertests/docs_app/.frontend/index.html,frontend/src/app.js,frontend/src/styles.css, and screenshots for search/task/workflow flows.work-engine/src/routes/*,work-engine/src/db/*,work-engine/src/types.ts,work-engine/src/export/portable.ts, and related unit/E2E tests.content/**, especiallycontent/tasks/templates/*.md, but do not do broad content cleanup in this issue.Acceptance Criteria
type,id, display title, short summary/context, source label, relevance-friendly fields, and enough routing metadata for the frontend to open the correct portal context./search?q=...&doc_type=...&domain=...behavior for docs consumers.instructionDocId/instructionStepIdcontext when present./work/api/*or the work-engine Lambda is unavailable, document search still works and the UI reports work-search unavailability without fake work results.DataTalksClub/dataopsare modified.Test Scenarios
Scenario: Search returns executable work and knowledge together
Given: docs include a Mailchimp newsletter SOP/template and work-engine has an active Newsletter task and workflow
When: the operator searches for
Mailchimp newsletterThen: results include the SOP/template doc, the live task, and the active workflow with distinct result types and actions that open the right portal context.
Scenario: Task result opens with process-doc context
Given: a task has
instructionDocIdandinstructionStepIdthat resolve through the document registryWhen: the operator selects the task search result
Then: the task panel opens, shows the process-doc title/context, and offers an action to open the doc in the portal.
Scenario: Workflow result keeps execution context
Given: an active Podcast or Newsletter bundle has overdue tasks, missing required links, and related docs
When: the operator searches for the workflow name or a required link such as
LumaThen: the workflow result shows status/progress context and opens the workflow detail rather than a raw docs page.
Scenario: Artifact result routes through owner context
Given: an artifact or file is attached to a task or bundle
When: the operator searches for the artifact title or type
Then: the result shows the owner task/workflow and opens that context before exposing the artifact URL/file action.
Scenario: Assistant output search is metadata-only
Given: an assistant job/output record exists with task or bundle ownership
When: the operator searches for the assistant job title or output title
Then: the result shows assistant status and owner context without requiring live Telegram, Groq, Heru, or external assistant credentials.
Scenario: Partial source failure is visible
Given: the docs index is available but
/work/api/tasksor another work source is unavailableWhen: the operator searches
Then: document results still render, unavailable work sources are named in a non-blocking status, and no fake work results appear.
Scenario: Filters narrow mixed results
Given: a query matches docs, tasks, and artifacts
When: the operator applies filters such as
type: task,status: waiting,system: mailchimp, ordoc_type: sopThen: matching sources narrow correctly and unsupported filters do not discard unrelated source errors silently.
Scenario: Mobile and keyboard search remain usable
Given: the operator is on mobile or uses
//Cmd/Ctrl+KWhen: they search and open a task/workflow/document result
Then: focus, routing, panel state, and result text fit without overlap or broken navigation.
Out of Scope
minsearchwith a hosted search service or vector database.instructionDocId; use existing fields and depend on Extend process docs with stable IDs #33/Add internal link and related-doc validation #34 for the stable-ID and validation contract.Dependencies
related_docs,sourceDocIds, orinstructionDocIdreferences into trusted results./work/api/*broker behavior should be reused. If the work-engine Lambda is not configured locally, the implementation must still support tests with mocked/local work-engine data and a documented partial-source state.Data Safety And Export Implications
instructionDocId,instructionStepId,sourceDocIds, and owner links where already supported.Blockers
/work/api/*, add a narrow mocked/local test fixture rather than querying production data.Required Verification Commands And Screenshots
Run the docs/search workflow:
Build the search index when content metadata, registry, search, or routing is touched:
cd lambda-functions uv run --extra search python -m lambda_functions.build_search_index \ --docs-dir ../content \ --output ../.tmp/dataops-content-search.indexRun work-engine checks when runtime work search payloads, APIs, models, or export behavior are touched:
npm --prefix work-engine test npm --prefix work-engine run typecheck npm --prefix work-engine run buildRun E2E coverage for changed operator flows:
Capture and attach screenshots for:
Before handoff, include: