Problem
Modern EVM opcodes are not decoded with correct stack effects before TAC generation. In particular, PUSH0 (0x5f) currently becomes an unknown assignment instead of the constant zero, and other post-Shanghai/Cancun opcodes such as BLOBHASH, BLOBBASEFEE, TLOAD, TSTORE, and MCOPY fall through as unknown one-push operations.
Evidence
src/local_compiler.py:116-123 includes solc 0.8.26 and 0.8.20; the comments note that 0.8.20 is the first version with PUSH0 support, so the local compilation pipeline will routinely produce this opcode.
src/bytecode_analyzer.py:97-129 does not include stack effects for PUSH0 or the newer opcodes.
src/bytecode_analyzer.py:936-945 only handles names starting with PUSH, but the installed disassembler reports 0x5f as UNKNOWN_0x5f.
src/bytecode_analyzer.py:1316-1338 treats totally unknown opcodes as an assignment that pushes one temp, corrupting stack state for opcodes that pop or push a different number of values.
Reproduction observed in this repository:
BytecodeAnalyzer('0x5f00') instructions: [('UNKNOWN_0x5f', ''), ('STOP', '')]
TAC: temp_1 = <unknown>
Why it matters
PUSH0 is common in solc 0.8.20+ bytecode. Emitting <unknown> instead of 0 materially degrades model inputs, exact TAC hashes, and downstream decompilation quality. Incorrect fallback stack effects for newer opcodes can also cascade into bogus stack_underflow values and wrong CFG/TAC for contracts compiled for recent forks.
Suggested fix
Normalize disassembler names for known raw opcode bytes before analysis (for example, map UNKNOWN_0x5f to PUSH0) or update/replace the disassembler layer. Add explicit stack effects and TAC formatting for at least PUSH0, BLOBHASH, BLOBBASEFEE, TLOAD, TSTORE, and MCOPY; unknown opcodes should not blindly push one temp unless their stack effect is known.
Validation/tests to add
- Unit test that
0x5f00 emits a TAC constant 0, not <unknown>.
- Stack-effect tests for
TLOAD, TSTORE, and MCOPY byte sequences.
- Compile a minimal contract with solc
0.8.20+ through local_compiler.py and assert the generated TAC contains no UNKNOWN_0x5f or <unknown> for PUSH0.
Problem
Modern EVM opcodes are not decoded with correct stack effects before TAC generation. In particular,
PUSH0(0x5f) currently becomes an unknown assignment instead of the constant zero, and other post-Shanghai/Cancun opcodes such asBLOBHASH,BLOBBASEFEE,TLOAD,TSTORE, andMCOPYfall through as unknown one-push operations.Evidence
src/local_compiler.py:116-123includes solc0.8.26and0.8.20; the comments note that0.8.20is the first version withPUSH0support, so the local compilation pipeline will routinely produce this opcode.src/bytecode_analyzer.py:97-129does not include stack effects forPUSH0or the newer opcodes.src/bytecode_analyzer.py:936-945only handles names starting withPUSH, but the installed disassembler reports0x5fasUNKNOWN_0x5f.src/bytecode_analyzer.py:1316-1338treats totally unknown opcodes as an assignment that pushes one temp, corrupting stack state for opcodes that pop or push a different number of values.Reproduction observed in this repository:
Why it matters
PUSH0is common in solc0.8.20+bytecode. Emitting<unknown>instead of0materially degrades model inputs, exact TAC hashes, and downstream decompilation quality. Incorrect fallback stack effects for newer opcodes can also cascade into bogusstack_underflowvalues and wrong CFG/TAC for contracts compiled for recent forks.Suggested fix
Normalize disassembler names for known raw opcode bytes before analysis (for example, map
UNKNOWN_0x5ftoPUSH0) or update/replace the disassembler layer. Add explicit stack effects and TAC formatting for at leastPUSH0,BLOBHASH,BLOBBASEFEE,TLOAD,TSTORE, andMCOPY; unknown opcodes should not blindly push one temp unless their stack effect is known.Validation/tests to add
0x5f00emits a TAC constant0, not<unknown>.TLOAD,TSTORE, andMCOPYbyte sequences.0.8.20+throughlocal_compiler.pyand assert the generated TAC contains noUNKNOWN_0x5for<unknown>forPUSH0.