Skip to content

Implement unified operator search across docs and work context #32

Description

@alexeygrigorev

Implement unified operator search across docs and work context

Status: pending
Tags: enhancement, portal, process-docs, work-engine, frontend, backend, testing, data, design, P1
Depends on: #33, #34
Blocks: None

Scope

Build the first unified search implementation for the DataOps operator portal. Search should help an operator execute daily work by finding the right task, workflow, SOP, template, artifact, or assistant output in context. It must not become a separate docs-first tool that leaves the operator to manually connect results back to the work they are doing.

Use the current architecture as the starting point:

  • lambda-functions/src/lambda_functions/docs_index.py, build_search_index.py, and /search currently index/search Markdown content with minsearch.
  • lambda-functions/src/lambda_functions/doc_registry.py, /docs/registry, and /docs/resolve expose document metadata and stable-ID resolution.
  • The authenticated portal brokers private work-engine APIs through /work/api/* in lambda-functions/src/lambda_functions/full_app_handler.py.
  • work-engine/ already stores runtime work entities: tasks, templates, bundles/workflows, artifacts, files, assistant jobs, notifications, and export metadata.
  • work-engine task/template records already support instructionDocId, instructionStepId, sourceDocIds, phase, systems, and validation; instantiated tasks preserve those fields.
  • The current frontend/ search UI searches documents only and renders document rows; the Operations Home/task panels already consume live work snapshots and can open tasks/workflows in context.

The implementation should add one operator-facing search experience that can return typed results from both process knowledge and live work state. Search results should make the result type and next action obvious:

  • SOP/template/reference/playbook/prompt/task-template docs open in the portal document view.
  • Task results open the task detail/action panel, including process-doc context when instructionDocId is present.
  • Workflow/bundle results open the workflow detail panel with active tasks, required proof, and linked docs.
  • Runtime template/workflow-definition results should make it possible to start or inspect the workflow, using existing quick-start flows where available.
  • Artifact/file results should show the owning task/workflow/assistant job and open the relevant context before exposing the raw URL/file.
  • Assistant job/output results should show status and owning task/workflow where those records exist; live assistant execution or new assistant lifecycle work is out of scope.

Start with an API/UI slice that covers docs, task templates, live tasks, active bundles/workflows, artifacts/files, and assistant-job/output metadata available from existing APIs. If a data source is not configured locally or the work engine is unavailable, the UI must show partial results and a clear unavailable-state for that source without fabricating matches.

Jobs To Be Done Implications

  • When an operator starts the day, they can search for a topic such as Mailchimp newsletter, podcast document, or Luma and see both executable work and the SOPs/templates needed to finish it.
  • When an operator is inside a task or workflow, search helps answer what doc/template/evidence do I need now? rather than sending them to an unrelated documentation library.
  • When a workflow is at risk, search exposes nearby context: active bundle, overdue task, required artifact, linked SOP, and assistant output status.
  • When content quality is incomplete, search should surface missing/unresolved process-doc links as useful states, not hide the task or show a broken result.

Affected Areas

  • Python Lambda docs portal/search backend: lambda-functions/src/lambda_functions/docs_index.py, search_handler.py, api_handler.py, full_app_handler.py, and related tests under tests/docs_app/.
  • Frontend portal shell: frontend/index.html, frontend/src/app.js, frontend/src/styles.css, and screenshots for search/task/workflow flows.
  • Work-engine APIs and models as needed for search payloads: work-engine/src/routes/*, work-engine/src/db/*, work-engine/src/types.ts, work-engine/src/export/portable.ts, and related unit/E2E tests.
  • Content/search index metadata under content/**, especially content/tasks/templates/*.md, but do not do broad content cleanup in this issue.

Acceptance Criteria

  • The portal exposes one authenticated search entry point that returns typed results for process docs/task-template docs plus available runtime work records: tasks, templates/workflows, bundles, artifacts/files, and assistant jobs/outputs.
  • Each result includes a stable type, id, display title, short summary/context, source label, relevance-friendly fields, and enough routing metadata for the frontend to open the correct portal context.
  • Search supports at least these filters where data exists: result type/source, doc type, domain, tag, system, task status, assignee, due-date bucket, bundle/workflow, and template/workflow type. Unsupported filters for a source should not break other sources.
  • Document results continue to use the registry/search index and preserve existing /search?q=...&doc_type=...&domain=... behavior for docs consumers.
  • Task results include due date/status, bundle/template relationship, assignee when available, required proof state, and instructionDocId/instructionStepId context when present.
  • Workflow/bundle results include stage/status, progress or active-task counts when available, next due/overdue context, required bundle links/artifacts summary, and related process-doc IDs when present.
  • Runtime template/task-template results make the distinction between Git-backed task-template docs and live work-engine templates clear, while linking both back to source docs through stable IDs when possible.
  • Artifact/file and assistant-job/output results show the owning task/workflow/assistant context and do not expose private raw storage paths as the primary action.
  • The frontend renders grouped or clearly typed results without making docs visually dominate work results; primary actions open the task/workflow/document context inside the portal.
  • Searching from the current sidebar/keyboard shortcut continues to work on desktop and mobile, and empty/error/partial-source states are clear without layout overlap.
  • If /work/api/* or the work-engine Lambda is unavailable, document search still works and the UI reports work-search unavailability without fake work results.
  • Existing task-to-process-doc links are reused; this issue must not introduce a second link model or a disconnected search-only document resolver.
  • Result payloads and UI states are covered by backend/unit tests and frontend/E2E tests for mixed docs/work results, filters, missing sources, and routing actions.
  • Search index generation still succeeds for content metadata changes, and work-engine tests still prove task/template process-doc metadata persists through creation, update, instantiation, export, and search result formatting where touched.
  • No source repositories outside DataTalksClub/dataops are modified.

Test Scenarios

Scenario: Search returns executable work and knowledge together

Given: docs include a Mailchimp newsletter SOP/template and work-engine has an active Newsletter task and workflow
When: the operator searches for Mailchimp newsletter
Then: results include the SOP/template doc, the live task, and the active workflow with distinct result types and actions that open the right portal context.

Scenario: Task result opens with process-doc context

Given: a task has instructionDocId and instructionStepId that resolve through the document registry
When: the operator selects the task search result
Then: the task panel opens, shows the process-doc title/context, and offers an action to open the doc in the portal.

Scenario: Workflow result keeps execution context

Given: an active Podcast or Newsletter bundle has overdue tasks, missing required links, and related docs
When: the operator searches for the workflow name or a required link such as Luma
Then: the workflow result shows status/progress context and opens the workflow detail rather than a raw docs page.

Scenario: Artifact result routes through owner context

Given: an artifact or file is attached to a task or bundle
When: the operator searches for the artifact title or type
Then: the result shows the owner task/workflow and opens that context before exposing the artifact URL/file action.

Scenario: Assistant output search is metadata-only

Given: an assistant job/output record exists with task or bundle ownership
When: the operator searches for the assistant job title or output title
Then: the result shows assistant status and owner context without requiring live Telegram, Groq, Heru, or external assistant credentials.

Scenario: Partial source failure is visible

Given: the docs index is available but /work/api/tasks or another work source is unavailable
When: the operator searches
Then: document results still render, unavailable work sources are named in a non-blocking status, and no fake work results appear.

Scenario: Filters narrow mixed results

Given: a query matches docs, tasks, and artifacts
When: the operator applies filters such as type: task, status: waiting, system: mailchimp, or doc_type: sop
Then: matching sources narrow correctly and unsupported filters do not discard unrelated source errors silently.

Scenario: Mobile and keyboard search remain usable

Given: the operator is on mobile or uses / / Cmd/Ctrl+K
When: they search and open a task/workflow/document result
Then: focus, routing, panel state, and result text fit without overlap or broken navigation.

Out of Scope

  • Replacing minsearch with a hosted search service or vector database.
  • Building natural-language/RAG answers or assistant-generated search summaries.
  • Rebuilding the portal navigation, introducing a new frontend framework, or creating a standalone search app.
  • Migrating all legacy Google Docs links to stable instructionDocId; use existing fields and depend on Extend process docs with stable IDs #33/Add internal link and related-doc validation #34 for the stable-ID and validation contract.
  • Full content cleanup, stable IDs for every imported doc, or workflow-specific doc mapping beyond what is needed to render search context.
  • External link availability checks for Google Docs, Loom, GitHub, Airtable, Luma, Meetup, YouTube, Spotify, Slack, Mailchimp, Dropbox, or email systems.
  • Mutating production task/workflow/artifact/assistant data as part of indexing. Runtime search should read existing state and metadata only.
  • New assistant lifecycle, transcription, Telegram intake, or generated artifact processing.
  • Consolidating the TypeScript work-engine into the Python backend.

Dependencies

Data Safety And Export Implications

  • Search must treat Git-backed Markdown as process knowledge and DynamoDB/work-engine records as runtime execution state; do not copy runtime work state into Markdown or make content files the source of truth for live tasks.
  • Search indexing should be read-only for runtime work data. It must not update task status, assistant job status, artifact records, file records, or bundle links.
  • Private or sensitive artifact storage paths must not be exposed as the primary search result text. Prefer title/type/owner/context and existing authenticated open/download routes.
  • If a persisted runtime search index/cache is introduced, it must be documented as rebuildable derived data and excluded from portable execution exports unless there is a clear reason to include it.
  • Portable export behavior must continue to include canonical task/template/bundle/artifact/assistant metadata, including instructionDocId, instructionStepId, sourceDocIds, and owner links where already supported.

Blockers

  • Engineering should not start until Extend process docs with stable IDs #33 and Add internal link and related-doc validation #34 are closed, or the implementer explicitly downgrades unresolved doc IDs/link validation to warning states with PM approval.
  • If there is no stable local way to query representative work-engine records through /work/api/*, add a narrow mocked/local test fixture rather than querying production data.
  • If assistant output records are not available by implementation time, keep the result type and UI state metadata-ready but mark live assistant-output search as blocked by the assistant job/artifact issue that owns those records.

Required Verification Commands And Screenshots

Run the docs/search workflow:

uv run --project lambda-functions --extra search --with pytest python -m pytest tests/docs_app

Build the search index when content metadata, registry, search, or routing is touched:

cd lambda-functions
uv run --extra search python -m lambda_functions.build_search_index \
  --docs-dir ../content \
  --output ../.tmp/dataops-content-search.index

Run work-engine checks when runtime work search payloads, APIs, models, or export behavior are touched:

npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build

Run E2E coverage for changed operator flows:

npm --prefix work-engine run test:e2e

Capture and attach screenshots for:

  • Mixed search results showing docs plus tasks/workflows.
  • A task result opened from search with process-doc context visible.
  • A workflow/bundle result opened from search with active-task/proof context visible.
  • Partial-source unavailable state when work search cannot load.
  • Mobile search results and opened context panel.

Before handoff, include:

git diff --check

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1ImportantbackendBackend/APIdataData model, migration, storagedesignDesign and UXenhancementNew or improved functionalityfrontendFrontend UIportalShared portal shell and UXprocess-docsSOPs, templates, references, playbookstestingTests and QAwork-engineDataTasks task execution engine

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions