Canonical tracker for the 4-tutorial developer-experience evaluation against the published JSR CLI (currently 0.0.1-alpha.10, prod-verified green via e2e-cli-prod). Each tutorial is walked end-to-end by a closely-supervised Claude Opus 4.8 evaluator sub-agent acting as a real developer following the documented path from an empty directory to a running app under .NET Aspire. This issue locks the evaluation criteria and tracks every run plus everything derived from it.
These runs are evaluations, not smoke tests. The bar is PERFECT DX — findings are graded, not pass/fail.
Locked evaluation criteria (every run covers all 8 axes)
- Tutorial validity — does the documented path actually work, start to finish, with no step skipped?
- Framework stability — can a real developer complete the tutorial (and therefore a legitimate project) on the framework as published?
- Doc-drift detection — every place the documentation diverges from observed reality.
- Drift root-cause classification — for each drift, is the fix in the code or in the docs (incorrect/outdated)?
- DX perfection bar — is the developer experience perfect? Every rough edge, confusing message, or unnecessary step is a finding.
- Process rethink — concrete ideas to make the scaffold/build/dev loop itself better.
- New-capability opportunities — plugins, package updates, tools that push DX + enterprise-framework quality further.
- Documentation polish — findings fold back into all docs (not only the tutorial under test).
Finding classification (every finding is tagged)
- Type:
doc-fix · code-fix · new-feature · dx-polish · process
- Severity:
blocker (cannot complete) · major (completes but wrong/painful) · minor (cosmetic/wording)
- Evidence: exact command + documented behavior vs observed behavior + file:line of the doc claim.
The 4 tutorial runs
| # |
Track |
Focus |
Chapters |
Status |
Run report |
| 1 |
Storefront |
services + durable workflows (catalog, cart contracts, checkout saga, shipping webhook) |
6 |
⬜ not started |
— |
| 2 |
Team Workspace |
auth + access control (auth backend, session, 2nd DB, provision job, .withAuthz()) |
6 |
⬜ not started |
— |
| 3 |
ERP Sync |
jobs, queues & polyglot (triggers, durable jobs, queue+cron, polyglot tasks) |
5 |
⬜ not started |
— |
| 4 |
Live Dashboard |
Fresh + SDK stack (contract→SDK, cache-first query, definePage/QueryIsland, durable stream) |
6 |
⬜ not started |
— |
Runs are executed sequentially (the local Aspire AppHost uses a fixed port — see #138 — so parallel runtime boots would collide), each closely supervised.
Method / environment
- CLI under test: published
jsr:@netscript/cli (alpha.10), installed exactly as the docs instruct: deno install --global --allow-all --name netscript jsr:@netscript/cli.
- Canonical docs: the tutorial markdown in
docs/site/tutorials/<track>/ on main (source of the live site) — agents cite file:line for every doc-drift finding.
- Runtime: local Deno 2.9 + .NET Aspire + Docker/Postgres; full path including the Aspire boot is exercised where the tutorial calls for it.
- Evaluator surface: Claude Opus 4.8 sub-agents (not CI smoke) so each step is observed and collaborated on.
Derived backlog (populated as runs land — each substantial item becomes its own linked issue)
🐛 Code fixes
none yet
✨ Enhancements / process improvements
none yet
🧩 Planned features (plugins / packages / tools)
none yet
📝 Documentation polish
none yet
Tracking task: #135. Updated after each run with the run report link and newly-filed derived issues.
Canonical tracker for the 4-tutorial developer-experience evaluation against the published JSR CLI (currently
0.0.1-alpha.10, prod-verified green viae2e-cli-prod). Each tutorial is walked end-to-end by a closely-supervised Claude Opus 4.8 evaluator sub-agent acting as a real developer following the documented path from an empty directory to a running app under .NET Aspire. This issue locks the evaluation criteria and tracks every run plus everything derived from it.These runs are evaluations, not smoke tests. The bar is PERFECT DX — findings are graded, not pass/fail.
Locked evaluation criteria (every run covers all 8 axes)
Finding classification (every finding is tagged)
doc-fix·code-fix·new-feature·dx-polish·processblocker(cannot complete) ·major(completes but wrong/painful) ·minor(cosmetic/wording)The 4 tutorial runs
.withAuthz())definePage/QueryIsland, durable stream)Runs are executed sequentially (the local Aspire AppHost uses a fixed port — see #138 — so parallel runtime boots would collide), each closely supervised.
Method / environment
jsr:@netscript/cli(alpha.10), installed exactly as the docs instruct:deno install --global --allow-all --name netscript jsr:@netscript/cli.docs/site/tutorials/<track>/onmain(source of the live site) — agents citefile:linefor every doc-drift finding.Derived backlog (populated as runs land — each substantial item becomes its own linked issue)
🐛 Code fixes
none yet
✨ Enhancements / process improvements
none yet
🧩 Planned features (plugins / packages / tools)
none yet
📝 Documentation polish
none yet
Tracking task: #135. Updated after each run with the run report link and newly-filed derived issues.