Skip to content

Value-flow prophet family (11 prophets): config-flow, taint/security, dead-producer, data-clump — shared tracers, phased build, proven on workflows + smart-farmers #163

@jessegall

Description

@jessegall

A value-flow prophet family — trace where values come from / go, across artifacts (full spec)

🎯 Goal

A coherent family of prophets that reason about value flowwhere a value originates, where it
travels, and whether a set the code hardcodes is duplicated as data elsewhere
— rather than judging one
node in isolation. PreferConfigDrivenRegistryProphet (2.11.0) proved the technique with ConfigMapIndex
(config-set ↔ enum-set congruence); this issue specs all 11 prophets of the family, the shared
tracing infra they need, a phased build, and a two-codebase proving ground. Companion to the
role-inference family (#134/#135): role-inference asks what a class IS; value-flow asks where its
values go
. All advisory, none auto-fixable — same FP discipline.

Substrate that already exists

Primitive Answers Direction
ConfigMapIndex "is this hardcoded set also declared, as data, in config?" cross-artifact congruence
CallConsumptionCensus / NullDistinguishedCallCensus "how does every caller use this value?" forward (consumption)
OriginTracer "where did this value come from?" backward (origin)
CodebaseIndex (call graph) the graph everything walks substrate

New shared infra to build (named, so the prophets are unambiguous)

Every tracer inherits the existing discipline: fire only when the trace is fully resolvable and
unambiguous
; any unresolved/invisible site, cycle, or cap-exceed → no finding.


The 11 prophets (full specs)

Each: technique · substrate · FIRES when · does NOT fire when (FP guard) · example · tier.

A. Cross-artifact congruence (generalize ConfigMapIndex)

1. ConfigKeyContractProphet

  • Technique: config reads ↔ declared config key tree. Substrate: ConfigReadIndex.
  • FIRES: (a) a config('a.b.c') read whose path is absent from the declared tree (typo/missing key);
    (b) a declared leaf key with zero reads anywhere in the scanned tree (dead config).
  • DOESN'T FIRE: dynamic key (config($var) / computed path); env()-backed leaf where the path still
    exists; keys under a configurable owned-by-framework/package prefix allowlist; a key written via
    config([...]) at runtime.
  • Example: read config('servces.stripe.key') (typo) → undeclared-read; config/foo.php declares
    foo.legacy read nowhere → dead-config.
  • Tier: Convention (undeclared-read stronger than dead-config).

2. TranslationKeyCongruenceProphet

  • Technique: __()/trans()/@lang keys ↔ lang-file key tree. Substrate: a lang-file variant of ConfigReadIndex.
  • FIRES: a translation key used in code but missing from lang files; a lang key declared but
    never used.
  • DOESN'T FIRE: dynamic keys; pluralization/trans_choice variants of a present base key; package/vendor
    lang namespaces (package::...) not published.
  • Tier: Convention.

3. MigrationModelDriftProphet

  • Technique: a model's $fillable/casts/$guarded column-set ↔ the columns its migrations declare for
    its table. Substrate: a migration-schema index (Schema::create/table columns) + the model's attribute sets.
  • FIRES: $fillable/casts names a column no migration declares; a declared non-trivial typed
    column (json/bool/datetime/decimal) on the model's table with no cast.
  • DOESN'T FIRE: columns added by traits/packages/macros; models with no statically-resolvable table;
    dynamic/$guarded = [] models; columns intentionally hidden.
  • Tier: Convention.

4. StringMatchMirrorsEnumProphet

  • Technique: a match/switch whose arm key-set (string literals) exactly equals an enum's backed
    values, with a string (not the enum) subject. Substrate: AST + enum index.
  • FIRES: that exact congruence (stringly dispatch mirroring an existing enum).
  • DOESN'T FIRE: subject is already the enum; only a partial overlap; default-only arm; non-literal arms;
    no enum with the matching value-set exists.
  • Tier: Convention.

B. Forward flow / consumption (generalize CallConsumptionCensus / ValueFlowTracer)

5. MixedConfigValueUsedTypedProphet

  • Technique: a config('x') (mixed leaf) flows into a typed sink without a cast/coalesce.
    Substrate: ConfigReadIndex + ValueFlowTracer.
  • FIRES: config read → (assignment/return chain) → typed sink (typed param, typed return, arithmetic,
    strict === against a typed value, typed property assignment) with no (int)/(bool)/intval/
    T_*::coalesce/validation on the path.
  • DOESN'T FIRE: a cast/coalesce/validation is on the path; the sink is mixed/untyped; dynamic key; the
    read is a config('x', $default) whose default already types it; flow is ambiguous.
  • Tier: Convention. (Direct generalization of the config-flow trick.)

6. TaintedInputToSinkProphet (security — new category)

  • Technique: request-input source flows to a dangerous sink without a sanitizer boundary.
    Substrate: ValueFlowTracer + TaintCatalog.
  • Sources: request()->input/get/all/query, $request->x, route params, Input::*.
  • Sinks: raw SQL (DB::raw, ->whereRaw/orderByRaw/selectRaw, DB::statement), unserialize,
    filesystem (fopen/file_get_contents/include/require with a path arg), process (exec/shell_exec/
    system/proc_open), eval, redirect()->away().
  • Boundaries (stop taint): FormRequest::validated() result, an explicit cast/intval, a whitelist
    in_array/match mapping, Eloquent parameter binding (only raw sinks are dangerous).
  • FIRES: source → sink with no boundary on the resolvable path.
  • DOESN'T FIRE: any boundary on the path; sink arg is a literal/const/enum value; bound (non-raw) query;
    ambiguous/unresolvable flow.
  • Tier: Correctness (still advisory; flag clearly as security).

7. DeadProducerProphet

  • Technique: a non-void-returning method whose result is never consumed at any resolvable call site.
    Substrate: CallConsumptionCensus (add a "consumed-at-all?" classification).
  • FIRES: ≥1 caller resolved, every resolved caller discards the return (calls it as a statement),
    and the method declares a non-void / non-self return.
  • DOESN'T FIRE: any unresolved/invisible caller (conservative, like allCallersDeNull); fluent
    return $this builders; interface/abstract/override methods (contract-bound); a documented side-effect
    method already typed void.
  • Tier: Convention.

8. SecretToLogOrResponseProphet (security / leak)

  • Technique: a secret source flows into a log/response/dump sink without redaction.
    Substrate: ValueFlowTracer + TaintCatalog.
  • Sources: config('...key/secret/token/password...') (configurable regex), model attributes in
    $hidden, *->password/*->apiToken.
  • Sinks: Log::*/logger(), an HTTP response / response()->json / API resource, dd/dump/var_dump.
  • FIRES: secret source → sink with no redaction/mask/hash on the path.
  • DOESN'T FIRE: value redacted/masked/hashed on the path; sink is a secure store; ambiguous flow.
  • Tier: Correctness (advisory; security).

C. Backward origin (generalize OriginTracer)

9. DataClumpToValueObjectProphet

  • Technique: ≥3 values that repeatedly travel together as the same argument group and often
    originate together → extract a value object. Substrate: ArgumentGroupCensus + OriginTracer.
  • FIRES: the same value-tuple (≥3) passed together at ≥ N (configurable, default 3) call sites.
  • DOESN'T FIRE: <3 values; the tuple appears once; values are unrelated primitives; already a VO/DTO is
    passed; the group is a framework signature (e.g. ($request, $next)).
  • Tier: Convention.

10. PassThroughDependencyProphet

  • Technique: a constructor-injected dependency only ever forwarded unchanged to one collaborator,
    never used for its own methods → inject it at the collaborator. Substrate: in-class usage census of the property.
  • FIRES: an injected private dep whose every use is $collab->method(..., $this->dep, ...) forwarding
    to a single collaborator, with no $this->dep->anything() call.
  • DOESN'T FIRE: dep used directly (any $this->dep->m()); forwarded to multiple distinct
    collaborators; held for a lifecycle/identity reason; exposed via a getter.
  • Tier: Convention.

11. HardcodedLiteralShouldBeConfigProphet

  • Technique: a magic literal repeated across ≥N sites whose value also appears as a config leaf
    read from config. Substrate: LiteralCensus + ConfigReadIndex (value match).
  • FIRES: a non-trivial literal (string/number) repeated at ≥ N (default 3) sites and equal to a
    declared config leaf value.
  • DOESN'T FIRE: trivial literals (0,1,'',true,false,-1); test files; the literal is the
    config default declaration itself; enum/const values.
  • Tier: Convention.

Proving ground — mine both corpora, don't theorise (same method as #135)

Treat workflows and smart-farmers as both the discovery corpus and the labeled
proof-of-concept test suite
:

  1. Mine for flows. Run the tracers (ConfigReadIndex, CallConsumptionCensus, OriginTracer,
    ValueFlowTracer) across both codebases and surface the recurring flow shapes — let the real flows
    drive which prophets to prioritise, bottom-up. The 11 specs above are the hypothesis; the corpora are
    ground truth.
  2. Build the suite from what the two reveal.
  3. Label and measure. Hand-label a sample per prophet in each codebase; assert positives fire and
    negatives don't; report precision/recall per prophet per codebase.
  4. Two independent codebases is the point. A flow prophet that nails workflows but misfires on
    smart-farmers is overfit to one project's idioms — catch it before shipping.

Phased build (each phase validated against BOTH corpora before merge)

  1. Config-flow trioConfigReadIndex + [prophet-report] ExplicitDataFactoryProphet: Finding is tagged [AUTO-FIXABLE] but 'repent --git' bails with 'FAILED (manua… #1, [prophet-report] SuggestCompareSelfTraitProphet: False positive: prophet flags string comparisons against node-CLASS string co… #5, [prophet-report] PreferTypeMethodOverInlineDispatch: False positive: this match($this) IS the type-method the rule asks for — toke… #11. (Reuses the proven ConfigMapIndex approach.)
  2. Security/taintValueFlowTracer + TaintCatalog + [prophet-report] SuggestCompareSelfTraitProphet: Prophet/PHPStan conflict: the rule now MANDATES the case-anchored instance fo… #6, [prophet-report] SuggestCompareSelfTraitProphet: Residual of the equals/instance fix (1.58.1): the VARIADIC helpers equalsAny/… #8. (New capability class.)
  3. Consumption + congruenceCallConsumptionCensus extension + [prophet-report] StringsThatShouldBeEnumsProphet: False positive on a generic value-bag accessor. ValueBag::asFloat(string $key… #7; [prophet-report] NoRawLiteralProphet: repent autofix produces a RUNTIME BUG on array-index access. It rewrote `$adj… #2, [prophet-report] NoRawLiteralProphet: False positive with a BEHAVIOR-CHANGING suggested fix. Flags `(string) ($x ??… #3, [prophet-report] SuggestCompareSelfTraitProphet: False positive on a non-enum. WireType is a 'final readonly class' with strin… #4.
  4. Origin/groupingArgumentGroupCensus + [prophet-report] ExplicitDataFactory: Marked [AUTO-FIXABLE] but 'repent --git' reports 'No changes made' for every … #9, [prophet-report] ExplicitDataFactory: False positive: EloquentModelUnpacker::from(Model): array is a custom domain … #10.

Acceptance criteria

  • ConfigReadIndex, ValueFlowTracer, ArgumentGroupCensus, LiteralCensus, TaintCatalog exist, each with the "bail on ambiguity" guarantee + unit tests.
  • All 11 prophets implemented per the FIRES / DOESN'T-FIRE specs above, advisory tier, none auto-fixable.
  • Each prophet links to a skill (a value-flow skill, or folded into existing ones).
  • Precision/recall reported per prophet on both workflows and smart-farmers; no prophet ships with a known FP on either corpus.
  • A self-test asserting the catalog stays in sync (every prophet has a corpus fixture).

Happy to prototype Phase 1 (ConfigReadIndex + #1/#5/#11) against the workflows snapshot I already have,
as the first half of the proving ground.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions