[Observability] Add persistent inference traces with prompt truncation and token diagnostics

## Finding
Inference has transient progress events and logs, but no durable request trace that explains why a decompilation was good or bad.

## Evidence
- The web endpoint streams progress and a final result in `web/app.py:558-925`, but it does not assign a request/run ID or persist a JSON trace for later debugging.
- The final `analysis` payload at `web/app.py:900-912` includes counts, timings, lookup hits, function sources, and model config, but not prompt token counts, truncation status, generation settings, per-function latency, generated token counts, or bytecode/TAC hashes.
- `src/model_setup.py:1167-1216` may strip or hard-truncate TAC to fit the context window, but `_build_prompt` at `src/model_setup.py:1268-1272` returns only prompt text and does not expose whether truncation happened.
- `src/model_setup.py:1291-1310` returns only generated Solidity for single-function inference, so callers cannot inspect generation diagnostics.

## Impact
When an API/UI output is poor, maintainers cannot tell after the fact whether the issue came from TAC analysis, exact-match lookup miss, prompt truncation, model generation, compiler metadata absence, or a specific function-level exception. This slows production triage and model quality improvement.

## Recommended fix
Add an optional persistent inference trace (for example JSONL under `results/inference_traces/`) keyed by request ID. Record bytecode hash, model artifact path/config, compiler metadata, analysis counts, selector map, lookup hit/miss per function, prompt token budget, TAC tokens before/after truncation, generated token count, decoding parameters, per-function latency, errors, and final artifact paths. Include the request ID in logs, SSE progress, and final API responses. Avoid storing raw bytecode by default; store hashes and bounded samples unless explicitly enabled.

## Acceptance criteria
- Every web/API decompile response includes a request ID.
- With tracing enabled, a durable JSON trace is written for successful and failed requests.
- Trace entries expose per-function prompt/truncation/generation diagnostics and errors.
- Unit or integration tests verify tracing without loading a real model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability] Add persistent inference traces with prompt truncation and token diagnostics #70

Finding

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Observability] Add persistent inference traces with prompt truncation and token diagnostics #70

Description

Finding

Evidence

Impact

Recommended fix

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions