fp-10 [first-principles]: run the pre-registered idiom-separability probe — is the W-code ledger text detectably Qwen-marked?

Minted by fp-6 (#57, PR #72) per the constancy rule.

**Question:** fp-6 pre-registered the probe (audit §8.15c); this runs it. If a cheap classifier separates the 956 ledger W-code episodes (3B-emitted) from human-written MBPP reference solutions well above chance, the idiom signature is real and quantified — and a from-scratch core pretrained on those episodes inherits that distribution as ground truth.

**Pre-registered spec (frozen in §8.15c — do not reinterpret):**
- Data: 956 mbpp ledger episodes vs MBPP sanitized reference solutions; split 5-fold CV BY TASK (no task leakage).
- Features: stylometry per program (type-hint rate, docstring/comment rate, f-string usage, quote style, identifier casing, mean line length, comprehension/lambda rate) + hashed char-3gram bag.
- Classifier: logistic regression.
- Verdict bar: **CV-AUC ≥ 0.75 = signature real**; report AUC + top features either way. Below bar = contamination concern demoted at this granularity.

**Consequence wiring:** quantifies §8.15(d)'s dilution-mix design input for NC2-own; pairs with eng #70 (license filter) — license class says what MAY enter the owned corpus, this says what the Qwen-emitted share DOES to it.

CPU-only, minutes, from existing files. AC: script + selftest + receipt + audit § + STATE + PR closes this + fp-11 minted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp-10 [first-principles]: run the pre-registered idiom-separability probe — is the W-code ledger text detectably Qwen-marked? #73

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fp-10 [first-principles]: run the pre-registered idiom-separability probe — is the W-code ledger text detectably Qwen-marked? #73

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions