[Observability] Persist typed failure and filter-drop manifests for Etherscan DatasetBuilder runs

## Finding
The Etherscan `DatasetBuilder` path used by `train.py --collect-data` still relies on logs and aggregate counts; it does not persist typed per-stage failures, filter-drop counts, or a dataset-generation manifest.

## Evidence
- `train.py:107-194` uses `src.dataset_pipeline.DatasetBuilder`, then logs only total pairs, post-filter count, export path, and aggregate dataset statistics.
- `src/dataset_pipeline.py:793-931` logs skips/compile/TAC-analysis failures (`not verified`, empty source, no compiler, no functions, compile failure, TAC failure) but returns only `total_pairs`.
- `src/dataset_pipeline.py:1606-1651` deletes rows for length, TAC length, duplicates, test names, and simple patterns without recording per-rule row counts.
- `src/dataset_pipeline.py:1663-1744` exports JSONL/CSV/Parquet but does not write a manifest with inputs, code/env metadata, output hashes, drop counts, or compile diagnostics.
- The Hugging Face generator has stronger observability (`download_hf_contracts.py:760-807` export selection stats and `download_hf_contracts.py:810-888` compile diagnostics), so the repository has two dataset paths with different debuggability.

## Impact
If a model trained from the Etherscan path performs poorly, maintainers cannot distinguish upstream fetch failures, parser failures, compiler-selection issues, solc errors, TAC-analysis failures, selector-match misses, or filtering/dedup drops without re-running and scraping logs. The exported dataset is also hard to trace back to exact inputs and environment.

## Recommended fix
Add a first-class manifest and diagnostics store for `DatasetBuilder` runs. Persist per-contract/per-compiler status rows with typed failure categories, bounded error samples, per-filter deletion counts, input address file hash/count, output artifact hashes/row counts, git/env metadata, and timings.

## Acceptance criteria
- `train.py --collect-data` writes a dataset manifest next to the exported dataset.
- The manifest or SQLite DB reports counts for each skip/failure/filter/drop category.
- Compile/TAC/parser failures include typed categories and bounded sample errors/contracts.
- Documentation explains how to use the manifest during data-quality/model-quality triage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability] Persist typed failure and filter-drop manifests for Etherscan DatasetBuilder runs #110

Finding

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Observability] Persist typed failure and filter-drop manifests for Etherscan DatasetBuilder runs #110

Description

Finding

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions