[Training] Fix collect-and-compile max_workers mismatch

## Evidence
- `train.py:133-137` calls `builder.collect_and_compile_contracts(..., max_workers=3, max_compiler_configs=...)`.
- `src/dataset_pipeline.py:793-797` defines `collect_and_compile_contracts(self, contract_addresses, max_compiler_configs=2)` with no `max_workers` parameter.
- `src/dataset_pipeline.py:816-918` then processes contracts, compiler configs, and compiled contracts in nested serial loops.

## Impact
The full training pipeline can fail before training starts with `TypeError: collect_and_compile_contracts() got an unexpected keyword argument 'max_workers'`. Even after the signature mismatch is fixed, dataset compilation/TAC preprocessing remains serial, increasing wall-clock time and compute cost before every fresh training run.

## Recommended fix
Accept and honor a bounded `max_workers` argument in `collect_and_compile_contracts`, or remove the caller argument if serial execution is intentional. If parallelizing, use per-thread SQLite connections, rate-limit Etherscan requests, and guard solc installation/cache access.

## Acceptance criteria
- A unit/integration test covers the `train.py` collection call path and proves `max_workers` is accepted.
- Dataset collection can be configured with 1 worker and >1 workers without SQLite/solc races.
- Logs include contract/config throughput so preprocessing speedups are measurable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] Fix collect-and-compile max_workers mismatch #56

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Training] Fix collect-and-compile max_workers mismatch #56

Description

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions