🔬 Tutorial DX Eval (alpha.17, ARMED): lock criteria + track 4 tutorial runs + derived backlog

**Canonical tracker** for the 4-tutorial developer-experience evaluation against the **published JSR CLI** (currently `0.0.1-alpha.10`, prod-verified green via `e2e-cli-prod`). Each tutorial is walked end-to-end by a **closely-supervised Claude Opus 4.8 evaluator sub-agent** acting as a real developer following the documented path from an empty directory to a running app under .NET Aspire. This issue **locks the evaluation criteria** and tracks every run plus everything derived from it.

These runs are evaluations, not smoke tests. The bar is **PERFECT DX** — findings are graded, not pass/fail.

---

## Locked evaluation criteria (every run covers all 8 axes)

1. **Tutorial validity** — does the documented path actually work, start to finish, with no step skipped?
2. **Framework stability** — can a real developer complete the tutorial (and therefore a legitimate project) on the framework as published?
3. **Doc-drift detection** — every place the documentation diverges from observed reality.
4. **Drift root-cause classification** — for each drift, is the fix in the **code** or in the **docs** (incorrect/outdated)?
5. **DX perfection bar** — is the developer experience *perfect*? Every rough edge, confusing message, or unnecessary step is a finding.
6. **Process rethink** — concrete ideas to make the scaffold/build/dev loop itself better.
7. **New-capability opportunities** — plugins, package updates, tools that push DX + enterprise-framework quality further.
8. **Documentation polish** — findings fold back into **all** docs (not only the tutorial under test).

## Finding classification (every finding is tagged)

- **Type:** `doc-fix` · `code-fix` · `new-feature` · `dx-polish` · `process`
- **Severity:** `blocker` (cannot complete) · `major` (completes but wrong/painful) · `minor` (cosmetic/wording)
- **Evidence:** exact command + **documented behavior** vs **observed behavior** + file:line of the doc claim.

---

## The 4 tutorial runs

| # | Track | Focus | Chapters | Status | Run report |
|---|-------|-------|----------|--------|------------|
| 1 | **Storefront** | services + durable workflows (catalog, cart contracts, checkout saga, shipping webhook) | 6 | ⬜ not started | — |
| 2 | **Team Workspace** | auth + access control (auth backend, session, 2nd DB, provision job, `.withAuthz()`) | 6 | ⬜ not started | — |
| 3 | **ERP Sync** | jobs, queues & polyglot (triggers, durable jobs, queue+cron, polyglot tasks) | 5 | ⬜ not started | — |
| 4 | **Live Dashboard** | Fresh + SDK stack (contract→SDK, cache-first query, `definePage`/`QueryIsland`, durable stream) | 6 | ⬜ not started | — |

Runs are executed **sequentially** (the local Aspire AppHost uses a fixed port — see #138 — so parallel runtime boots would collide), each closely supervised.

## Method / environment

- **CLI under test:** published `jsr:@netscript/cli` (alpha.10), installed exactly as the docs instruct: `deno install --global --allow-all --name netscript jsr:@netscript/cli`.
- **Canonical docs:** the tutorial markdown in `docs/site/tutorials/<track>/` on `main` (source of the live site) — agents cite `file:line` for every doc-drift finding.
- **Runtime:** local Deno 2.9 + .NET Aspire + Docker/Postgres; full path including the Aspire boot is exercised where the tutorial calls for it.
- **Evaluator surface:** Claude Opus 4.8 sub-agents (not CI smoke) so each step is observed and collaborated on.

---

## Derived backlog (populated as runs land — each substantial item becomes its own linked issue)

### 🐛 Code fixes
_none yet_

### ✨ Enhancements / process improvements
_none yet_

### 🧩 Planned features (plugins / packages / tools)
_none yet_

### 📝 Documentation polish
_none yet_

---

_Tracking task: #135. Updated after each run with the run report link and newly-filed derived issues._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔬 Tutorial DX Eval (alpha.17, ARMED): lock criteria + track 4 tutorial runs + derived backlog #153

Locked evaluation criteria (every run covers all 8 axes)

Finding classification (every finding is tagged)

The 4 tutorial runs

Method / environment

Derived backlog (populated as runs land — each substantial item becomes its own linked issue)

🐛 Code fixes

✨ Enhancements / process improvements

🧩 Planned features (plugins / packages / tools)

📝 Documentation polish

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Track	Focus	Chapters	Status	Run report
1	Storefront	services + durable workflows (catalog, cart contracts, checkout saga, shipping webhook)	6	⬜ not started	—
2	Team Workspace	auth + access control (auth backend, session, 2nd DB, provision job, `.withAuthz()`)	6	⬜ not started	—
3	ERP Sync	jobs, queues & polyglot (triggers, durable jobs, queue+cron, polyglot tasks)	5	⬜ not started	—
4	Live Dashboard	Fresh + SDK stack (contract→SDK, cache-first query, `definePage`/`QueryIsland`, durable stream)	6	⬜ not started	—

🔬 Tutorial DX Eval (alpha.17, ARMED): lock criteria + track 4 tutorial runs + derived backlog #153

Description

Locked evaluation criteria (every run covers all 8 axes)

Finding classification (every finding is tagged)

The 4 tutorial runs

Method / environment

Derived backlog (populated as runs land — each substantial item becomes its own linked issue)

🐛 Code fixes

✨ Enhancements / process improvements

🧩 Planned features (plugins / packages / tools)

📝 Documentation polish

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions