Follow-up to #134: a role-inference catalog (~26 archetypes) — detect what a class IS from shape + usage, not just its base class

## Follow-up to #134: a **role-inference** catalog — detect what a class *is* from shape + usage, not just its base class

#134 proposes flagging a class doing work *out of its role*. That needs a way to **infer the role** even
when there's no marker (no `*Registry` name, no base class, no attribute). The base class is the *sure*
signal — but most classes don't announce themselves, and the interesting bugs hide in the unmarked ones.

The idea: infer role from **three stacked evidence tiers**, and treat agreement across tiers as confidence.

### The confidence model

- **Tier A — declared (certain):** base class, implemented interface, attribute, name suffix.
- **Tier B — structural fingerprint (probable, AST-local):** property shapes + method shapes. *Example
  (yours): one private array + public `add/get` whose only writers funnel through a private writer, methods
  are array ops → this is a store/bag, no marker needed.*
- **Tier C — usage shape (confirming, needs the call graph / `NeedsCodebaseIndex`):** fan-in (referenced in
  N files), read/write ratio, **who writes it** (only a provider/boot path vs. runtime), and **mutation
  provenance** (below).

A role fires when **Tier A**, or **Tier B + Tier C agree**. Two independent tiers ⇒ high confidence, low FP.

### Mutation provenance (your core idea, generalised)

Trace where a class's own state is written — it's the single most discriminating signal:

| Provenance | Implies |
|---|---|
| never written after construction | **immutable** — Value Object / DTO |
| written only in the constructor | **immutable** |
| written by a public setter directly | **mutable bag / config** |
| written only via a public method → **private** writer | **encapsulated mutable store** (registry / cache / aggregator) |
| a `public array` assigned by *other* classes | **leaky / anemic** — flag |
| `$this->x[$k] ??= compute()` only | **memo / cache** |

Combine with read/write ratio + fan-in: *encapsulated store + populated once (boot) + read across many
files* ⇒ **registry**; *encapsulated store + written every call* ⇒ **accumulator**; etc.

---

### The catalog (~26 archetypes + fingerprints)

#### Storage / data
1. **Registry** — keyed array + `register/add/put(key,item)` → private writer; `get/has/find/all` lookups;
   write-once (boot) / read-many fan-in. (Feeds RegistryNamingHonesty/ReturnContract/BaseBypass.)
2. **Bag / Collection wrapper** — single array prop; methods are array ops (`add/all/map/filter/count/first`);
   often `IteratorAggregate`/`Countable`/`ArrayAccess`.
3. **Cache / Memo** — array touched only as `[$k] ??= …` then read; no external writers.
4. **Value Object** — all-readonly props, named ctors (`from*`/`make`), equality (`equals`/CompareSelf),
   pure methods, **no injected deps, no I/O**, never mutates.
5. **DTO / Data payload** — readonly + `from`/`toArray`/serialization; SHOULD have no logic → flag if it
   has loops / `ReflectionClass` / injected services (this is the `TestRunOutcomeData` smell from #134).
6. **Manual enum / closed set** — private ctor + N `public static` self-returning consts/factories →
   *should be a real `enum`*.
7. **Config object** — getters that read `config(...)`/env; no domain behaviour.
8. **Null Object** — implements an interface; every method is a no-op/identity (returns a literal, the arg,
   `$this`, `[]`, or empty body); name `Null*`/`No*`/`Empty*`.

#### Construction
9. **Factory** — `make/create/build/from*` returning `new X` (a family); no state, no store.
10. **Builder / fluent DSL** — methods `return $this`/`static`; accumulate; terminal `build()/get()`.
11. **Mapper / Converter** — takes type A, returns type B; pure; `to*`/`from*`/`map`; name `*Mapper`/`*Transformer`.

#### Behaviour
12. **Service / orchestrator** — ctor injects ≥2 collaborators; public verb methods (`handle/run/process`)
    coordinating them; no array store. (Watch: too many injected deps ⇒ god-service, ties to #134.)
13. **Pipe / Step / Handler** — one real public method (`handle($ctx)`/`__invoke`), rest private helpers.
14. **Resolver / Strategy selector** — `resolve($x)` iterating candidates / predicates; first-match wins;
    no store. (Flag inverse: a "Resolver" that owns a `register()` store → it's a registry.)
15. **Validator** — `validate/assert*/check*`; returns violations or throws; no state.
16. **Visitor** — many `visit*`/`enter*`/`leave*` methods dispatching on node type.
17. **Middleware** — `handle($x, $next)` with a `$next` continuation it calls.
18. **State machine** — closed state set + `transitionTo`/guards; mutates a single `state` prop under rules.
19. **Specification / Query object** — `isSatisfiedBy($x): bool` / `matches()` / builds a query; composable.

#### Structural / wrapping
20. **Decorator** — ctor takes the **same interface it implements**; most methods delegate to `$this->inner->m()`,
    a few add behaviour. (Detectable: implements I + holds I + delegating method bodies.)
21. **Adapter** — implements interface X by wrapping an *unrelated* type Y and translating.
22. **Facade / static proxy** — all-static methods delegating to a container-resolved instance.
23. **Manual singleton** — `private static self $instance` + `getInstance()` + private ctor.
24. **Proxy / lazy loader** — holds a closure/id, resolves the real object on first use, then delegates.

#### Framework roles (sure via base, but pattern-detectable too)
25. **Repository** — persistence imports (Eloquent/DB/query) + `find/save/delete/all` over one entity.
26. **Event / Listener / Exception / Provider / Trait** — base/interface gives Tier A; shape confirms
    (`handle(Event)`; static `for*` factories on a Throwable; `bind()` calls in a provider; etc.).

---

### What this unlocks (prophets it feeds)

- **#134 OutOfPurpose** — *declared role (Tier A) vs inferred role (B+C) disagree* → "a `*Registry` that
  fingerprints as a reflection engine," "a `*Data` that fingerprints as an assembler."
- **Manual-enum → real enum** (archetype 6), **Value-Object-should-be-readonly** (4), **mutable shared bag**
  (2 + leaky provenance), **Null-Object opportunity** (an interface with one impl everyone null-checks),
  **Decorator-forgot-a-method** (20: implements I but doesn't delegate every I method), **Singleton →
  inject it** (23) — each is a targeted advisory once the role is inferred.
- Lets the existing registry family stop relying on the marker: an *unmarked* class that fingerprints as a
  registry (B+C) can be nudged to adopt the base — closing the exact gap that let `ResourceRegistry`
  (#119) slip through.

### Feasibility / FP discipline
- Tier B alone is suggestive, never conclusive — **require Tier A, or B+C agreement** to fire; size/length
  is only ever a tie-breaker (mirror the FP discipline that keeps `RegistryReturnContract` marker-driven).
- Ship a default archetype catalog; let consumers add/disable archetypes and tune thresholds in config.
- All advisory, none auto-fixable (role re-shaping is a design call).

### Open questions
- Express each archetype as a composable predicate set (property-shape + method-shape + usage-shape) over a
  shared `ClassFingerprint` the call-graph index builds once — agree that's the right substrate?
- Which archetypes are worth their own prophet vs. just feeding #134's role-inference?
- Confidence scoring: boolean (A | B&C) vs. a weighted score with a configurable threshold?

### Proving ground — mine two real codebases, don't theorise the catalog

The ~26 archetypes above are a **starting hypothesis, not the spec**. The team should treat two real,
independent consumer codebases — **`workflows`** and **`smart-farmers`** — as both the *discovery corpus*
and the *proof-of-concept test suite*:

1. **Mine them for patterns.** Scan each codebase, build a `ClassFingerprint` per class, and **cluster** —
   let the archetypes that actually recur drive which inferers to write. Bottom-up from real code, not
   top-down from this list; the codebases are the ground truth and will surface idioms (and counter-examples)
   the catalog missed.
2. **Write a comprehensive inferer suite** from what the corpora reveal.
3. **Use the two codebases as the labeled regression suite.** Hand-label a sample of classes per archetype
   in each, then assert each inferer (a) classifies the positives correctly and (b) does **not** misclassify
   the negatives (precision matters more than recall here). Report precision/recall per inferer per codebase.

Two **independent** codebases is the point: an inferer that nails `workflows` but misfires on
`smart-farmers` is **overfit to one project's idioms** — exactly the failure mode to catch before shipping.
`workflows` already has clean known instances of registry / bag / value-object / manual-enum / DTO-with-logic
(see #119, #134); `smart-farmers` is the independent cross-check.

I'm happy to prototype the `ClassFingerprint` + the first 3–4 inferers (registry, bag, value-object,
manual-enum) against the `workflows` snapshot I already have, as the first half of that proving ground.


Provenance	Implies
never written after construction	immutable — Value Object / DTO
written only in the constructor	immutable
written by a public setter directly	mutable bag / config
written only via a public method → private writer	encapsulated mutable store (registry / cache / aggregator)
a `public array` assigned by other classes	leaky / anemic — flag
`$this->x[$k] ??= compute()` only	memo / cache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up to #134: a role-inference catalog (~26 archetypes) — detect what a class IS from shape + usage, not just its base class #135

Follow-up to #134: a role-inference catalog — detect what a class is from shape + usage, not just its base class

The confidence model

Mutation provenance (your core idea, generalised)

The catalog (~26 archetypes + fingerprints)

Storage / data

Construction

Behaviour

Structural / wrapping

Framework roles (sure via base, but pattern-detectable too)

What this unlocks (prophets it feeds)

Feasibility / FP discipline

Open questions

Proving ground — mine two real codebases, don't theorise the catalog

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Follow-up to #134: a role-inference catalog (~26 archetypes) — detect what a class IS from shape + usage, not just its base class #135

Description

Follow-up to #134: a role-inference catalog — detect what a class is from shape + usage, not just its base class

The confidence model

Mutation provenance (your core idea, generalised)

The catalog (~26 archetypes + fingerprints)

Storage / data

Construction

Behaviour

Structural / wrapping

Framework roles (sure via base, but pattern-detectable too)

What this unlocks (prophets it feeds)

Feasibility / FP discipline

Open questions

Proving ground — mine two real codebases, don't theorise the catalog

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions