Skip to content

[Observability] Add persistent inference traces with prompt truncation and token diagnostics #70

Description

@agorevski

Finding

Inference has transient progress events and logs, but no durable request trace that explains why a decompilation was good or bad.

Evidence

  • The web endpoint streams progress and a final result in web/app.py:558-925, but it does not assign a request/run ID or persist a JSON trace for later debugging.
  • The final analysis payload at web/app.py:900-912 includes counts, timings, lookup hits, function sources, and model config, but not prompt token counts, truncation status, generation settings, per-function latency, generated token counts, or bytecode/TAC hashes.
  • src/model_setup.py:1167-1216 may strip or hard-truncate TAC to fit the context window, but _build_prompt at src/model_setup.py:1268-1272 returns only prompt text and does not expose whether truncation happened.
  • src/model_setup.py:1291-1310 returns only generated Solidity for single-function inference, so callers cannot inspect generation diagnostics.

Impact

When an API/UI output is poor, maintainers cannot tell after the fact whether the issue came from TAC analysis, exact-match lookup miss, prompt truncation, model generation, compiler metadata absence, or a specific function-level exception. This slows production triage and model quality improvement.

Recommended fix

Add an optional persistent inference trace (for example JSONL under results/inference_traces/) keyed by request ID. Record bytecode hash, model artifact path/config, compiler metadata, analysis counts, selector map, lookup hit/miss per function, prompt token budget, TAC tokens before/after truncation, generated token count, decoding parameters, per-function latency, errors, and final artifact paths. Include the request ID in logs, SSE progress, and final API responses. Avoid storing raw bytecode by default; store hashes and bounded samples unless explicitly enabled.

Acceptance criteria

  • Every web/API decompile response includes a request ID.
  • With tracing enabled, a durable JSON trace is written for successful and failed requests.
  • Trace entries expose per-function prompt/truncation/generation diagnostics and errors.
  • Unit or integration tests verify tracing without loading a real model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions