Introduce extended validation of EOF code sections to guarantee that neither stack underflow nor overflow can happen during execution of validated contracts.
The current EVM performs a number of validity checks for each executed instruction, such as checking for instruction being defined, stack overflow and underflow, and enough amount of gas remaining.
This EIP minimizes the number of such checks required at run-time by verifying that no exceptional conditions can happen and preventing the execution and deployment of any invalid code.
The operand stack validation provides several benefits:
CALLF
and JUMPF
(JUMPF
introduced in a separate EIP) ,It also has some disadvantages:
The guarantees created by these validation rules also improve the feasibility of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validations. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.
Remark: We rely on the notions of operand stack and type section as defined by EIP-4750.
Each code section is validated independently.
In the first validation phase defined in EIP-3670 (and extended by EIP-4200 and EIP-4750) instructions are inspected independently to check if their opcodes and immediate values are valid.
In the second validation phase control-flow analysis is performed on the code.
Operand stack height here refers to the number of stack values accessible by this function, i.e. it does not take into account values of caller functions' frames (but does include this function's inputs). Note that validation procedure does not require actual operand stack implementation, but only to keep track of its height.
Terminating instructions refers to the instructions either:
RETF
, JUMPF
, orSTOP
, RETURN
, RETURNCONTRACT
, REVERT
, INVALID
.note: JUMPF
and RETURNCONTRACT
are introduced in separate EIPs.
Forward jump refers to any of RJUMP
/RJUMPI
/RJUMPV
instruction with relative offset greater than or equal to 0. Backwards jump refers to any of RJUMP
/RJUMPI
/RJUMPV
instruction with relative offset less than 0, including jumps to the same jump instruction.
Instructions in the code are scanned in a single linear pass over the code. For each instruction the operand stack height bounds are recorded as stack_height_min
and stack_height_max
.
The first instruction's recorded stack height bounds are initialized to be equal to the number of inputs to the function type matching the code (stack_height_min = stack_height_max = type[code_section_index].inputs
).
For each instruction:
CALLF
instruction the recorded stack height lower bound must be at least the number of inputs of the called function according to its type defined in the type section,RETF
instruction both the recorded lower and upper bound must be equal and must be exactly the number of outputs of the function matching the code,JUMPF
into returning function both the recorded lower and upper bound must equal exactly type[current_section_index].outputs + type[target_section_index].inputs - type[target_section_index].outputs
,JUMPF
into non-returning function the recorded stack height lower bound must be at least the number of inputs of the target function according to its type defined in the type section,RETF
and JUMPF
, this implies that extra items left on stack at instruction ending EVM execution are allowed.CALLF
and JUMPF
check for possible stack overflow: if recorded stack height upper bound is greater than 1024 - types[target_section_index].max_stack_height + types[target_section_index].inputs
, validation fails.CALLF
stack height bounds are adjusted by adding types[target_section_index].outputs - types[target_section_index].inputs
,target_stack_min = min(target_stack_min, current_stack_min)
and target_stack_max = max(target_stack_max, current_stack_max)
, where (target_stack_min, target_stack_max)
are successor bounds and (current_stack_min, current_stack_max)
are bounds computed in 2.3.After all instructions are visited, determine the function maximum operand stack height:
0x3FF
).max_stack_height
within the type section.The computational and space complexity of this pass is O(len(code)). Each instruction is visited at most once.
Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the CALLF
and JUMPF
performing operand stack overflow check for the entire called function.
Any code section validated according to operand stack validation has the following properties:
CALLF
or JUMPF
instruction.RJUMP
.In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in CALLF
and JUMPF
instructions using the called function's max_stack_height
information. This decreases flexibility of an EVM program because max_stack_height
corresponds to the worst-case control-flow path in the function.
The operand stack validation algorithm rejects any code having any unreachable instructions. This check can be performed very cheaply. It prevents deploying degenerated code. Moreover, it enables combining instruction validation and operand stack validation into single pass.
It is currently required that the operand stack is empty (in the current function context) after the RETF
instruction.
Otherwise, the RETF
semantic would be more complicated. For n
function outputs and s
the stack height at RETF
the EVM must erase s-n
non-top stack items and move the n
stack items to the place of erased ones. Cost of such operation may be relatively cheap but is not constant.
However, lifting the requirement and modifying the RETF
semantic as described above is backward
compatible and can be easily introduced in the future.
Originally another variant of stack validation was proposed, where instead of linear scan of the code section, all code paths were examined by following the target(s) of every jump instruction in a breadth-first-search manner, tracking stack height for each visited instruction and checking that for every possible code path to a particular instruction its stack height remains constant.
The advantage of this variant would be somewhat simpler algorithm (we would not need to track stack height bounds, but only a single stack height value for each instruction) and no extra requirement for ordering of code basic blocks (see below).
However compiler teams opposed to such restrictive stack height requirements. One prominent pattern used by compilers which wouldn't be possible is jumping to terminating helpers (code blocks ending with RETURN
or REVERT
) from different stack heights. This is common for example for a series of assert
statements, each one compiled to a RJUMPI
into a shared terminating helper. Enforcing constant stack requirement would mean that before jumping to such helper, extra items on the stack have to be popped, and this noticeably increases code size and consumed gas, and would defeat the purpose of extracting these common terminating sequences into a helper.
The prerequisite to stack validation algorithm is ordering of code basic blocks in a way that no block is referenced only by backwards jump.
This is required to make it possible to examine each instruction in one linear pass over the code section. Forward pass over the code section allows for the algorithm to "expand" each forward jump target's stack height bounds and still keep the complexity linear. Trying to do jump target stack bounds expansion while scanning the code in the breadth-first-search manner would require to re-examine entire code path after its stack height bounds are expanded, which would result in quadratic complexity.
This change requires a "network upgrade," since it modifies consensus rules.
It poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).
Needs discussion.
Copyright and related rights waived via CC0.