Skip to content

[Pipeline Review] Preserve EVM operand order for non-commutative TAC operations #49

Description

@agorevski

Problem

The TAC converter reverses operands for binary EVM operations. This is harmless for commutative operations such as ADD, but wrong for SUB, DIV, comparisons, shifts, and signed operations.

Evidence

  • src/bytecode_analyzer.py:970-982 pops the stack top into a, the next value into b, then emits operand1=a, operand2=b for all _BINARY_OPS.
  • For bytecode 0x600560020300 (PUSH1 5; PUSH1 2; SUB; STOP), the current TAC is:
temp_1 = 05
temp_2 = 02
temp_3 = temp_2 - temp_1
stop()

The EVM result for that sequence is 5 - 2, so the TAC should represent temp_1 - temp_2.

  • Existing tests such as tests/test_bytecode_analyzer.py:359-381 verify only the operator token for SUB, DIV, and EXP; they do not assert semantic operand order.

Why it matters

This directly changes program meaning in the model input. A decompilation model trained or prompted with reversed subtraction, division, comparison, and shift expressions will learn incorrect Solidity semantics and may generate unsafe or misleading code.

Suggested fix

For binary operations, pop the right operand from the stack top and the left operand from the next stack slot, then emit left operator right. Audit the other multi-operand handlers (ADDMOD, MULMOD, memory/storage/call operations) against the EVM stack specification while making this change.

Validation/tests to add

  • Golden TAC tests for PUSH1 5; PUSH1 2; SUB, DIV, LT, GT, SHL, and SHR that assert full formatted TAC, not just the operator.
  • A small end-to-end bytecode fixture where the expected expression order is known and appears in the generated per-function TAC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions