Skip to content

PRD: Run workspace usability rewrite #14

Description

@quinCode

PRD: Run workspace usability rewrite

Problem Statement

The React rewrite has achieved the intended backend/frontend architecture, but the current UI is not usable as an operator tool. The first screen exposes internal feature modules as equal cards, shows English/internal labels, allows long model names to overflow, blocks the whole screen while capability discovery loads, and gives weak feedback for validation, run start, task progress, and result review.

From the operator's perspective, the new UI feels like an implementation scaffold rather than a safety-test workspace. It is slower to understand than the previous native UI, visually uncomfortable, and too easy to lose the relationship between configuration, validation, running tasks, results, reports, and reusable configuration.

Solution

Rebuild the React frontend around a Chinese-first run workspace. The first screen should compose configuration, validation, progress, run records, results, reports, and reuse into one coherent operator console.

The visible UI will no longer expose Configuration, Runs, Tasks, and Results as top-level feature cards. Feature modules remain an internal implementation boundary, but the operator sees workflow regions:

  • 运行配置: compact run setup with searchable controls and grouped advanced parameters.
  • 运行状态: validation, start/cancel, sample-level progress, recent events, and debug drawer.
  • 运行记录: task history, reusable configurations, results, report links, and record detail.

The workspace shell must render immediately. Core capability discovery, catalog loading, schema loading, result listing, and run record refreshes are local loading states; none may blank the whole page.

The UI will use React capabilities deliberately: a central RunDraft state model, Context plus reducer, local non-sensitive draft persistence, generated API client boundaries, local loading/error states, lightweight toasts, and reusable UI primitives under a project-owned frontend/src/ui layer.

User Stories

  1. As a safety-test operator, I want the page to show the workspace immediately, so that I am not blocked by a blank Loading capabilities... screen while Core connects.
  2. As a safety-test operator, I want Core connection state in the top bar, so that I can tell whether the environment is ready.
  3. As a safety-test operator, I want Core connection failures to show a Chinese recovery message, so that I know to check LLMSTP_ROOT, LLMSTP_PYTHON, or the core CLI.
  4. As a safety-test operator, I want the main screen organized around a run workflow, so that I can configure, validate, start, monitor, and inspect a run without jumping between module cards.
  5. As a safety-test operator, I want the UI labels, states, buttons, empty messages, and errors in Chinese, so that the tool reads like an operations platform.
  6. As a safety-test operator, I want technical terms to include short direct descriptions where useful, so that I understand what an endpoint, attack, judge, input, or report action means.
  7. As a safety-test operator, I want searchable endpoint selection, so that I can quickly find a long model name without scanning an overflowing list.
  8. As a safety-test operator, I want searchable attack selection, so that I can quickly choose the attack strategy.
  9. As a safety-test operator, I want searchable judge selection with multi-select behavior, so that I can choose one or more judges without awkward list controls.
  10. As a safety-test operator, I want searchable input selection, so that I can find the correct dataset or CSV file quickly.
  11. As a safety-test operator, I want selection options to show a main label, short description, and badges, so that I can distinguish similar endpoints, attacks, judges, and inputs.
  12. As a safety-test operator, I want long labels and paths to truncate or wrap intentionally, so that text never covers adjacent panels.
  13. As a safety-test operator, I want core run choices to remain visible, so that I always know which endpoint, attack, judges, input, and output are active.
  14. As a safety-test operator, I want advanced schema fields collapsed by default, so that the first screen stays scannable.
  15. As a safety-test operator, I want advanced fields grouped by purpose, so that model parameters, run parameters, environment-related fields, and overrides are easier to navigate.
  16. As a safety-test operator, I want collapsed advanced sections to show 已修改 N 项, so that I know hidden settings differ from defaults.
  17. As a safety-test operator, I want a 重置高级参数 action, so that I can clear overrides without losing core choices.
  18. As a safety-test operator, I want a 清空草稿 action, so that I can return the workspace to defaults and remove restored draft state.
  19. As a safety-test operator, I want the current run draft restored after refresh when it is safe, so that accidental refreshes do not force me to reselect everything.
  20. As a safety-test operator, I want sensitive fields excluded from restored drafts, so that API keys, tokens, secrets, passwords, and credentials are not stored in browser storage.
  21. As a safety-test operator, I want restored drafts to require validation again, so that stale validation cannot enable an unsafe start.
  22. As a safety-test operator, I want validation feedback beside the run draft, so that I can see whether the current configuration is ready.
  23. As a safety-test operator, I want 开始运行 always visible but disabled until validation succeeds, so that the next action is clear even before the run is ready.
  24. As a safety-test operator, I want any configuration change to invalidate the previous validation, so that I cannot start with stale validation results.
  25. As a safety-test operator, I want button clicks to show feedback quickly, so that I know validation, start, cancel, reset, clear, reuse, and report actions were received.
  26. As a safety-test operator, I want lightweight toast notifications for short successful actions, so that feedback does not disturb the layout.
  27. As a safety-test operator, I want errors shown both as toast and inline panel messages, so that I can recover without losing context.
  28. As a safety-test operator, I want long-running states shown in panels or the top bar, so that transient toasts do not hide important status.
  29. As a safety-test operator, I want sample-level progress by default, so that I can see completed, failed, and total counts during a run.
  30. As a safety-test operator, I want the current sample prompt preview visible, so that I know what the run is processing.
  31. As a safety-test operator, I want recent events visible, so that I can understand progress without opening a full log.
  32. As a safety-test operator, I want a debug drawer for sample tables and event logs, so that I can diagnose failures without cluttering normal operation.
  33. As a safety-test operator, I want task history, results, reports, and reuse configuration grouped as run records, so that I do not manually correlate separate lists.
  34. As a safety-test operator, I want each run record to show status, configuration summary, progress, result availability, report links, and reuse actions, so that previous runs are understandable.
  35. As a safety-test operator, I want result tables reachable from the relevant run record, so that result inspection stays connected to the run that produced it.
  36. As a safety-test operator, I want report generation feedback and links on the run record, so that generated reports are easy to find.
  37. As a safety-test operator, I want the UI to feel like a light experiment console, so that parameters, status, progress, and results are easy to scan.
  38. As a safety-test operator, I want the UI to avoid black terminal styling, marketing hero sections, decorative gradients, and large card piles, so that it remains calm and task-focused.
  39. As a keyboard user, I want comboboxes and overlays to support keyboard navigation and focus management, so that common workflows are fast and accessible.
  40. As a maintainer, I want feature modules to consume shared UI primitives, so that controls are consistent across the workspace.
  41. As a maintainer, I want feature modules prevented from importing headless primitive packages directly, so that the project-owned UI layer remains the only visual/interaction surface.
  42. As a maintainer, I want a central RunDraft reducer, so that validation freshness, draft persistence, reset, clear, reuse, and start gating are not scattered across components.
  43. As a maintainer, I want catalog display mapping isolated in one adapter, so that backend catalog contract changes only affect the adapter and not every combobox.
  44. As a maintainer, I want components to consume OptionViewModel, so that combobox rendering is stable even if catalog metadata evolves.
  45. As a maintainer, I want tests proving the shell renders before contract discovery completes, so that the blank loading screen cannot return.
  46. As a maintainer, I want tests proving local loading states do not block the whole workspace, so that slow catalog/schema/results calls are isolated.
  47. As a maintainer, I want tests proving long labels do not overflow adjacent panels, so that visual regressions are caught.
  48. As a maintainer, I want tests proving primary UI text is Chinese and internal feature module names are not top-level navigation, so that implementation vocabulary does not leak to operators.

Implementation Decisions

  • The first screen is a run workspace, not a grid of feature module cards.
  • The visible workspace is organized into 运行配置, 运行状态, and 运行记录.
  • Feature modules remain implementation boundaries but are not user-facing top-level navigation.
  • The UI is Chinese-first. Operator-facing labels, helper text, empty states, validation messages, errors, and action names are Chinese.
  • Technical terms can remain when they are domain names, but visible terms should include short direct descriptions where they help the operator act correctly.
  • The visual style is a light experiment console: light background, restrained status colors, compact controls, readable contrast, small radii, and dense but calm three-column layout.
  • The UI must not use black terminal aesthetics, marketing-style hero sections, decorative gradients, or large piles of unrelated cards.
  • The workspace shell renders before /api/v1/contract completes.
  • Core discovery, catalog loading, schema loading, result listing, run record refresh, validation, and start are all local states.
  • Core discovery pending shows a top-bar state such as 正在连接 Core...; Core discovery failure shows Core 暂不可用 with recovery guidance.
  • Core-dependent controls show local placeholders while disabled, rather than replacing the whole app.
  • Core run choices are always visible: endpoint, attack, judges, input, and output.
  • Core selection controls use searchable comboboxes with keyboard support.
  • Comboboxes consume a stable OptionViewModel with id, label, optional description, badges, and searchText.
  • OptionViewModel is derived on the frontend in a single catalog adapter module.
  • Workspace components and comboboxes do not read raw catalog display fields such as backend, class names, or file formats directly.
  • Generated TypeScript catalog types are the input to the catalog adapter, so backend contract changes fail at the adapter boundary first.
  • Schema-driven advanced fields are grouped and collapsed by default.
  • Collapsed advanced sections show 已修改 N 项.
  • The first version includes 重置高级参数 and 清空草稿.
  • A central RunDraft state model holds core choices, advanced overrides, validation result, validation freshness, and reuse-loaded values.
  • RunDraft state uses React Context plus reducer. No external state library is introduced for the first usability rewrite.
  • Reducer actions represent operator intent, including selecting core choices, editing overrides, loading reusable config, marking validation success, invalidating validation, resetting advanced fields, and clearing the draft.
  • Non-sensitive run draft fields may be persisted to browser storage.
  • Sensitive fields whose names indicate api_key, token, secret, password, or credential are stripped before persistence.
  • Restored drafts always have stale validation and cannot enable start until validation succeeds again.
  • 开始运行 remains visible but disabled until the current draft validates successfully.
  • Any change to core choices or advanced overrides invalidates the previous validation.
  • Short successful operations use lightweight toast feedback.
  • Errors use toast plus inline panel messages.
  • Long-running states belong in the top status bar or workflow panels, not transient toasts.
  • Shared UI primitives live under a project-owned UI layer, including Button, Field, Combobox, Panel, Badge, Drawer, Toast, and StatusDot.
  • Feature modules consume the project-owned UI layer and must not import the underlying headless primitive packages directly.
  • The selected UI primitive approach is headless primitives plus custom project styles, not a full visual component framework.
  • Run progress defaults to sample-level visibility with status, progress, completed/failed/total counts, current sample preview, and recent events.
  • Detailed sample tables and event logs belong in an expandable debug drawer.
  • Task history, results, reports, and configuration reuse are presented as run records.
  • A run record is the operator-facing aggregate for one run: status, configuration summary, progress events, result table, report links, and reusable configuration.

Testing Decisions

  • Tests should verify observable user behavior and architecture boundaries, not component implementation details.
  • Existing React feature tests that mock the generated API client are the main seam for workspace behavior.
  • Existing architecture tests are the main seam for boundary rules, including feature import boundaries and generated client usage.
  • New tests should prove the workspace shell renders while contract discovery is pending.
  • New tests should prove a Core discovery failure leaves the workspace mounted and shows a Chinese recoverable status.
  • New tests should prove catalog, schema, result, and run record loading states are local to their panels.
  • New tests should prove internal feature module names do not appear as top-level navigation.
  • New tests should prove primary operator-facing labels and key states are Chinese.
  • New tests should prove comboboxes render OptionViewModel labels, descriptions, and badges without consuming raw catalog display fields in components.
  • New tests should prove long option labels and paths do not overflow adjacent panels. DOM layout assertions or Playwright screenshots are acceptable.
  • RunDraft reducer tests should cover selection changes, advanced override changes, validation invalidation, validation success, reset advanced, clear draft, reuse load, sensitive persistence stripping, and restored stale validation.
  • UI interaction tests should cover selecting configuration, validating, seeing start enabled, changing config, seeing start disabled again, starting a run, monitoring sample-level progress, opening the debug drawer, and viewing a run record result.
  • Toast behavior should be tested through visible feedback, while durable errors must also be asserted inline in the relevant panel.
  • Architecture tests should prevent feature modules from importing headless primitive packages directly.
  • Architecture tests should prevent raw catalog display fields from spreading outside the catalog adapter and generated API client.
  • The PRD accepts visual review as useful but insufficient. The key usability decisions must be locked through tests.

Out of Scope

  • Replacing the existing FastAPI backend architecture.
  • Changing the LLMSTP-core UI contract for this usability rewrite unless a needed frontend behavior cannot be supported otherwise.
  • Adding multi-user accounts, authentication, or authorization.
  • Adding remote queues or distributed execution.
  • Adding a plugin marketplace or dynamic third-party module loading.
  • Adding a full undo stack for run draft edits.
  • Adding a full visual component framework such as Ant Design or MUI.
  • Building complex analytics dashboards or charts for results.
  • Building a report editor.
  • Persisting sensitive fields in browser storage.
  • Implementing a black terminal-style UI.
  • Reintroducing the old native frontend.

Further Notes

The previous architectural rewrite succeeded at separating backend, frontend, CoreGateway, generated clients, and feature modules. This PRD does not reverse that architecture. It changes the visible composition so that the modular implementation serves an operator workflow instead of exposing the module layout directly.

The most important usability failure to prevent is a blank first screen caused by slow Core capability discovery. The workspace must render immediately and degrade locally.

The most important maintainability constraint is that user-facing improvements should not re-couple React to raw core details. RunDraft, OptionViewModel, run records, generated API clients, and shared UI primitives are the intended boundaries.

The implementation should proceed in vertical slices. The first slice should make the shell usable and non-blocking before deeper controls are rebuilt. Subsequent slices can add UI primitives, RunDraft state, comboboxes, advanced parameters, validation gating, progress console, run records, and polish tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ready-for-agentFully specified and ready for an agent to implement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions