Problem
TAC generation resets the simulated EVM stack at every basic block, so values left on the stack by a predecessor are lost at the target block.
Evidence
src/bytecode_analyzer.py:1516-1523 initializes stack: List[str] = [] inside the loop over self.basic_blocks.items(), then converts each block independently.
- Reproduction with bytecode
0x60056005565b60020300 (PUSH1 5; PUSH1 5; JUMP; JUMPDEST; PUSH1 2; SUB; STOP) shows the value 5 left on the stack before the jump is not available in the target block:
block_0000:
temp_1 = 05
temp_2 = 05
goto temp_2
block_0005:
temp_3 = 02
temp_4 = stack_underflow - stack_underflow
stop()
Why it matters
EVM stack values commonly flow across basic-block boundaries. Resetting stack state introduces artificial stack_underflow expressions and removes data dependencies, which makes TAC less faithful and lowers the quality of both training pairs and inference prompts.
Suggested fix
Run a CFG-aware stack data-flow pass that computes incoming and outgoing stack states per block. For merges, introduce explicit phi/unknown stack parameters rather than silently underflowing. Then emit TAC using the computed entry stack for each block.
Validation/tests to add
- A unit test for the reproduction bytecode above that expects the target block to consume the predecessor's carried stack value.
- Branch/merge fixtures that verify deterministic phi/unknown handling when two predecessors provide different stack values.
- A corpus metric/test that tracks and fails on unexpected increases in
stack_underflow count for compiled Solidity fixtures.
Problem
TAC generation resets the simulated EVM stack at every basic block, so values left on the stack by a predecessor are lost at the target block.
Evidence
src/bytecode_analyzer.py:1516-1523initializesstack: List[str] = []inside the loop overself.basic_blocks.items(), then converts each block independently.0x60056005565b60020300(PUSH1 5; PUSH1 5; JUMP; JUMPDEST; PUSH1 2; SUB; STOP) shows the value5left on the stack before the jump is not available in the target block:Why it matters
EVM stack values commonly flow across basic-block boundaries. Resetting stack state introduces artificial
stack_underflowexpressions and removes data dependencies, which makes TAC less faithful and lowers the quality of both training pairs and inference prompts.Suggested fix
Run a CFG-aware stack data-flow pass that computes incoming and outgoing stack states per block. For merges, introduce explicit phi/unknown stack parameters rather than silently underflowing. Then emit TAC using the computed entry stack for each block.
Validation/tests to add
stack_underflowcount for compiled Solidity fixtures.