Finding
Inference has transient progress events and logs, but no durable request trace that explains why a decompilation was good or bad.
Evidence
- The web endpoint streams progress and a final result in
web/app.py:558-925, but it does not assign a request/run ID or persist a JSON trace for later debugging.
- The final
analysis payload at web/app.py:900-912 includes counts, timings, lookup hits, function sources, and model config, but not prompt token counts, truncation status, generation settings, per-function latency, generated token counts, or bytecode/TAC hashes.
src/model_setup.py:1167-1216 may strip or hard-truncate TAC to fit the context window, but _build_prompt at src/model_setup.py:1268-1272 returns only prompt text and does not expose whether truncation happened.
src/model_setup.py:1291-1310 returns only generated Solidity for single-function inference, so callers cannot inspect generation diagnostics.
Impact
When an API/UI output is poor, maintainers cannot tell after the fact whether the issue came from TAC analysis, exact-match lookup miss, prompt truncation, model generation, compiler metadata absence, or a specific function-level exception. This slows production triage and model quality improvement.
Recommended fix
Add an optional persistent inference trace (for example JSONL under results/inference_traces/) keyed by request ID. Record bytecode hash, model artifact path/config, compiler metadata, analysis counts, selector map, lookup hit/miss per function, prompt token budget, TAC tokens before/after truncation, generated token count, decoding parameters, per-function latency, errors, and final artifact paths. Include the request ID in logs, SSE progress, and final API responses. Avoid storing raw bytecode by default; store hashes and bounded samples unless explicitly enabled.
Acceptance criteria
- Every web/API decompile response includes a request ID.
- With tracing enabled, a durable JSON trace is written for successful and failed requests.
- Trace entries expose per-function prompt/truncation/generation diagnostics and errors.
- Unit or integration tests verify tracing without loading a real model.
Finding
Inference has transient progress events and logs, but no durable request trace that explains why a decompilation was good or bad.
Evidence
web/app.py:558-925, but it does not assign a request/run ID or persist a JSON trace for later debugging.analysispayload atweb/app.py:900-912includes counts, timings, lookup hits, function sources, and model config, but not prompt token counts, truncation status, generation settings, per-function latency, generated token counts, or bytecode/TAC hashes.src/model_setup.py:1167-1216may strip or hard-truncate TAC to fit the context window, but_build_promptatsrc/model_setup.py:1268-1272returns only prompt text and does not expose whether truncation happened.src/model_setup.py:1291-1310returns only generated Solidity for single-function inference, so callers cannot inspect generation diagnostics.Impact
When an API/UI output is poor, maintainers cannot tell after the fact whether the issue came from TAC analysis, exact-match lookup miss, prompt truncation, model generation, compiler metadata absence, or a specific function-level exception. This slows production triage and model quality improvement.
Recommended fix
Add an optional persistent inference trace (for example JSONL under
results/inference_traces/) keyed by request ID. Record bytecode hash, model artifact path/config, compiler metadata, analysis counts, selector map, lookup hit/miss per function, prompt token budget, TAC tokens before/after truncation, generated token count, decoding parameters, per-function latency, errors, and final artifact paths. Include the request ID in logs, SSE progress, and final API responses. Avoid storing raw bytecode by default; store hashes and bounded samples unless explicitly enabled.Acceptance criteria