Skip to content

🔬 Tutorial DX Eval (alpha.17, ARMED): lock criteria + track 4 tutorial runs + derived backlog #153

Description

@rickylabs

Canonical tracker for the 4-tutorial developer-experience evaluation against the published JSR CLI (currently 0.0.1-alpha.10, prod-verified green via e2e-cli-prod). Each tutorial is walked end-to-end by a closely-supervised Claude Opus 4.8 evaluator sub-agent acting as a real developer following the documented path from an empty directory to a running app under .NET Aspire. This issue locks the evaluation criteria and tracks every run plus everything derived from it.

These runs are evaluations, not smoke tests. The bar is PERFECT DX — findings are graded, not pass/fail.


Locked evaluation criteria (every run covers all 8 axes)

  1. Tutorial validity — does the documented path actually work, start to finish, with no step skipped?
  2. Framework stability — can a real developer complete the tutorial (and therefore a legitimate project) on the framework as published?
  3. Doc-drift detection — every place the documentation diverges from observed reality.
  4. Drift root-cause classification — for each drift, is the fix in the code or in the docs (incorrect/outdated)?
  5. DX perfection bar — is the developer experience perfect? Every rough edge, confusing message, or unnecessary step is a finding.
  6. Process rethink — concrete ideas to make the scaffold/build/dev loop itself better.
  7. New-capability opportunities — plugins, package updates, tools that push DX + enterprise-framework quality further.
  8. Documentation polish — findings fold back into all docs (not only the tutorial under test).

Finding classification (every finding is tagged)

  • Type: doc-fix · code-fix · new-feature · dx-polish · process
  • Severity: blocker (cannot complete) · major (completes but wrong/painful) · minor (cosmetic/wording)
  • Evidence: exact command + documented behavior vs observed behavior + file:line of the doc claim.

The 4 tutorial runs

# Track Focus Chapters Status Run report
1 Storefront services + durable workflows (catalog, cart contracts, checkout saga, shipping webhook) 6 ⬜ not started
2 Team Workspace auth + access control (auth backend, session, 2nd DB, provision job, .withAuthz()) 6 ⬜ not started
3 ERP Sync jobs, queues & polyglot (triggers, durable jobs, queue+cron, polyglot tasks) 5 ⬜ not started
4 Live Dashboard Fresh + SDK stack (contract→SDK, cache-first query, definePage/QueryIsland, durable stream) 6 ⬜ not started

Runs are executed sequentially (the local Aspire AppHost uses a fixed port — see #138 — so parallel runtime boots would collide), each closely supervised.

Method / environment

  • CLI under test: published jsr:@netscript/cli (alpha.10), installed exactly as the docs instruct: deno install --global --allow-all --name netscript jsr:@netscript/cli.
  • Canonical docs: the tutorial markdown in docs/site/tutorials/<track>/ on main (source of the live site) — agents cite file:line for every doc-drift finding.
  • Runtime: local Deno 2.9 + .NET Aspire + Docker/Postgres; full path including the Aspire boot is exercised where the tutorial calls for it.
  • Evaluator surface: Claude Opus 4.8 sub-agents (not CI smoke) so each step is observed and collaborated on.

Derived backlog (populated as runs land — each substantial item becomes its own linked issue)

🐛 Code fixes

none yet

✨ Enhancements / process improvements

none yet

🧩 Planned features (plugins / packages / tools)

none yet

📝 Documentation polish

none yet


Tracking task: #135. Updated after each run with the run report link and newly-filed derived issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions