Skip to content

[Pipeline Review] Handle PUSH0 and post-Shanghai opcodes before TAC generation #48

Description

@agorevski

Problem

Modern EVM opcodes are not decoded with correct stack effects before TAC generation. In particular, PUSH0 (0x5f) currently becomes an unknown assignment instead of the constant zero, and other post-Shanghai/Cancun opcodes such as BLOBHASH, BLOBBASEFEE, TLOAD, TSTORE, and MCOPY fall through as unknown one-push operations.

Evidence

  • src/local_compiler.py:116-123 includes solc 0.8.26 and 0.8.20; the comments note that 0.8.20 is the first version with PUSH0 support, so the local compilation pipeline will routinely produce this opcode.
  • src/bytecode_analyzer.py:97-129 does not include stack effects for PUSH0 or the newer opcodes.
  • src/bytecode_analyzer.py:936-945 only handles names starting with PUSH, but the installed disassembler reports 0x5f as UNKNOWN_0x5f.
  • src/bytecode_analyzer.py:1316-1338 treats totally unknown opcodes as an assignment that pushes one temp, corrupting stack state for opcodes that pop or push a different number of values.

Reproduction observed in this repository:

BytecodeAnalyzer('0x5f00') instructions: [('UNKNOWN_0x5f', ''), ('STOP', '')]
TAC: temp_1 = <unknown>

Why it matters

PUSH0 is common in solc 0.8.20+ bytecode. Emitting <unknown> instead of 0 materially degrades model inputs, exact TAC hashes, and downstream decompilation quality. Incorrect fallback stack effects for newer opcodes can also cascade into bogus stack_underflow values and wrong CFG/TAC for contracts compiled for recent forks.

Suggested fix

Normalize disassembler names for known raw opcode bytes before analysis (for example, map UNKNOWN_0x5f to PUSH0) or update/replace the disassembler layer. Add explicit stack effects and TAC formatting for at least PUSH0, BLOBHASH, BLOBBASEFEE, TLOAD, TSTORE, and MCOPY; unknown opcodes should not blindly push one temp unless their stack effect is known.

Validation/tests to add

  • Unit test that 0x5f00 emits a TAC constant 0, not <unknown>.
  • Stack-effect tests for TLOAD, TSTORE, and MCOPY byte sequences.
  • Compile a minimal contract with solc 0.8.20+ through local_compiler.py and assert the generated TAC contains no UNKNOWN_0x5f or <unknown> for PUSH0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions