This proposal provides a complete, efficient, safe and static control-flow facility.
It introduces two new opcodes to support calling and returning from subroutines:
RJUMPSUB relative_offset
-- relative jump to subroutineRETURNSUB
-- return to PC
after most recent RJUMPSUB
.It depends on the two new opcodes proposed by EIP-4200 to support static jumps:
RJUMP relative_offset
— relative jump to PC + relative_offset
RJUMPI relative_offset
— conditional relative jumpIt deprecates JUMP
and JUMPI
, allowing valid code to support streaming, one-pass, and other near-linear compilers.
In concert with EIP-3540 and EIP-3670 it ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion.
This is among the simplest possible proposals that meets these requirements.
Jumps, conditional jumps and subroutines were proposed by Alan Turing in 1945 as a means of organizing the logic of the code and the design of the memory crystals for his Automatic Computing Engine:
We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date... We also wish to be able to arrange for the splitting up of operations into subsidiary operations... To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation.
— Alan Turing — in B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977.
In more contemporary terms, we have sequences of instructions, jumps and conditional jumps that divide sequences into blocks, subroutine calls, and a stack of addresses to return to. The details vary, but similar facilities have proven their value across a long line of important machines over the last 75 years, including most all of the machines we have programmed or implemented -- physical machines including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP-11 and VAX, Motorola 68000, Sun SPARC, and Intel x86s, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others.
Unlike these machines, the Ethereum Virtual Machine does not provide subroutine operations. Instead, they must be synthesized using the dynamic JUMP
instruction, which takes its destination on the stack. Further, the EVM provides only dynamic jumps, impeding the static analysis we need.
Efficient to write by hand, compile from high level labguages, validate at deploy time, interpret by VMs, and compile to machine code.
Static jumps, conditional jumps, and subroutines are sufficient and efficient in space and time, as shown by historical experience and as we will show for the EVM below.
The EVM has unusually high requirements for safety. Not only do many smart contracts control inordinately large amounts of valuable Ether, but once placed on the blockchain any defects are visible to attackers and cannot be repaired. We propose to statically validate important safety constraints on code at initialization time.
The EVM's dynamic jumps cause two major problems. First, the need to synthesize static jumps and subroutines with dynamic jumps wastes space and gas with needlessly complex code, as we will show below.
Worse, jumps that can dynamically branch to any destination in the code can cause quadratic "path explosions" when traversing the flow of control. For Ethereum, this is a denial-of-service vulnerability that prevents us, at initialization time, from validating the safe use of EVM code and from compiling EVM code to machine code.
We need static control-flow to validate program safety and to compile EVM bytecode to machine code -- in time and space linear in the size of the code.
RJUMPSUB (0x5f) relative_offset
Transfers control to a subroutine.
relative_offset
from the immediate data at PC
.PC + 3
to the return stack
.PC
to PC + relative_offset
.The relative_offset
is relative to the current PC
. The offset is encoded as a two-byte, twos-complement signed integer, stored MSB-first.
The gas cost is low.
RETURNSUB (0x5e)
Returns control to the caller of a subroutine.
return stack
to PC
.The gas cost is verylow.
Notes:
return stack
do not need to be validated, since they are alterable only by RJUMPSUB
and RETURNSUB
.return stack
. But the actual state of the return stack
is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code RJUMPSUB
to unobservably push PC
on the return stack
rather than PC + 1
, which is allowed so long as RETURNSUB
observably returns control to the PC + 3
location.)Execution is defined in the Yellow Paper as a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines six such states.
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for the first three conditions. We don’t know how much gas there will be, we don’t know how deep a recursion may go, analysis of stack depth even for non-recursive programs is nontrivial, and we don't know whether a call will be static. All of the remaining conditions MUST be validated statically, in time and space quasi-linear in the size of the code.
JUMP
and JUMPI
instructions ARE NOT valid.RJUMP
, RJUMPI
, or RJUMPSUB
instructions MUST NOT address immediate data or addresses outside of their code section.data stack
MUST always be positive.return stack
MUST always be positive.stack pointer
and the stack pointer
on entry to the current subroutine.PC
andThis is a purely semantic specification, placing no constraints on the syntax of code sections beyond being a sequence of opcodes and immediate data – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the bytecode's control-flow graph. The EVM is a simple state machine. We only promise that valid code will not, as it were, jam up the gears of the machine.
By avoiding syntactic constraints we allow for optimizations like tail call elimination, multiple-entry subroutines, moving "cold" code out of line, and other ways of reducing redundancy and keeping "hot" code in cache. Since we wish to support one-pass compilation of EVM code to machine code it is crucial that the EVM code be as well optimized as possible up front.
Rather than enforce constraints via syntax, we enforce them via validation.
The constraints on valid code cover all of the exceptional halting states that we can validate and allow code to be validated and compiled in time and space quasi-linear in the size of the code.
The RJUMP
, RJUMPI
and RJUMPSUB
instructions take their relative_offset as immediate arguments, which cannot change at runtime. Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime. Dynamic jumps can branch to any destination in the code, so exploitable quadratic "path explosions" are possible when traversing the control flow graph. Deprecating JUMP
and JUMPI
prevents this.
Requiring a consistently aligned data stack
Requiring a consistently aligned data stack
also allows some algorithms that traverse the control-flow graph -- including code validation and compilation -- to break cycles at joins, again preventing quadratic path explosion. When a traversal gets to a PC
it has visited before it is either at the beginning of a loop or the entry to a function. Since the stack height at that PC
is constant we know that loops will not grow stack, and that the number of arguments to a subroutine will always be the same -- there may be no need to traverse that path again.
Note: The JVM and Wasm enforce similar constraints for similar reasons.
There are a few major designs for a subroutine facility, two of which are considered here. The others are mostly not appropriate for the EVM, such as the Wheeler Jump -- self-modifying code that writes return addresses into called subroutines.
1. Keep return addresses on a dedicated return stack. Turing's design is often used by stack machines, including those for Forth, Java, Wasm, and others. The data stack is used for computation, with a dedicated stack for return addresses. A single instruction suffices to call, and another to return from a routine.
2. Keep return addresses on the data stack. This design is often used by register machines, including those from CDC, IBM, DEC, Intel, and ARM. The registers are used primarily for computation, and the stack maintains call frames for return addresses, arguments, and local variables. On the EVM there are no registers for computation, so using the stack for both purposes can cause the sort of inefficiencies we see below. Pascal p-code does use this design, but as part of a complex calling convention with dedicated registers.
data stack
slot for the return address andWe illustrate here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using JUMP
.
Consider these examples of a fairly minimal subroutine, including the code to call it.
Subroutine call, using RJUMPSUB
:
SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
CALL_SQUARE:
push 0x02 ; 3 gas
rjumpsub SQUARE ; 5 gas
Total gas: 19
Subroutine call, using JUMP
:
SQUARE:
jumpdest ; 1 gas
swap1 ; 3 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
CALL_SQUARE:
jumpdest ; 1 gas
push 0x02 ; 3 gas
push RTN_CALL: ; 3 gas
push SQUARE ; 3 gas
jump ; 8 gas
RTN_CALL:
jumpdest ; 1 gas
Total: 41 gas.
Using RJUMPSUB
versus JUMP
saves 41 - 19 = 22 gas — a 54% improvement.
Of course in cases like this one we can optimize the tail call, so that the return from SQUARE
actually returns from TEST_SQUARE
.
Tail call optimization, using RJUMPSUB
and RETURNSUB
:
```SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
CALL_SQUARE: push 0x02 ; 3 gas rjump SQUARE ; 3 gas
_Total: 17 gas_
Tail call optimization, using `JUMP`:
SQUARE: jumpdest ; 1 gas swap1 ; 3 gas dup1 ; 3 gas mul ; 5 gas swap2 ; 3 gas jump ; 8 gas
CALL_SQUARE: jumpdest ; 1 gas push 0x02 ; 3 gas push SQUARE ; 3 gas jump ; 8 gas
_Total: 38 gas_
Using `RJUMPSUB` versus `JUMP` saves _38 - 17 = 21 gas_ — a _55%_ improvement.
#### Efficiency Caveats
We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using `JUMP` — in our examples they cut gas use by about half.
Clearly, the benefits of this efficiency are greater for programs that have been factored into smaller subroutines. How small? Wrapping code in a subroutine costs only _8 gas_ using `RJUMPSUB` and `RETURNSUB` versus _30 gas_ using `JUMP`, `PUSH` and `SWAP` as above.
### Costs
The _low_ cost of `RJUMPSUB` versus the _mid_ cost of `JUMP` is justified by needing only to decode the immediate two byte destination to the `PC` and push the return address on the `return stack`, all using native arithmetic, versus using the data stack with emulated 256-bit instructions.
The _verylow_ cost of `RETURNSUB` is justified by needing only to pop the `return stack` into the `PC`. Benchmarking will be needed to tell if the costs are well-balanced.
## Backwards Compatibility
These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.
## Test Cases
### Simple routine
This should jump into a subroutine, back out and stop.
Bytecode: `0x5f0003005e` (`RJUMPSUB 3, RETURNSUB, STOP`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RJUMPSUB | 5 | [] | [] |
| 2 | STOP | 0 | [] | [] |
| 3 | RETURNSUB | 3 | [] | [] |
Output: 0x
Consumed gas: `10`
### Two levels of subroutines
This should execute fine, going into one two depths of subroutines
Bytecode: `0x5f00045F00025200` (`RJUMPSUB 4, RJUMPSUB 2, RETURNSUB, RETURNSUB, STOP`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RJUMPSUB | 5 | [] | [] |
| 3 | RJUMPSUB | 5 | [] | [] |
| 4 | RETURNSUB | 5 | [] | [] |
| 5 | RETURNSUB | 5 | [] | [] |
| 6 | STOP | 0 | [] | [] |
Consumed gas: `20`
### Failure 1: invalid jump
This should fail, since the given location is outside of the code-range.
Bytecode: `0X5fff`(`RJUMPSUB -1`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RJUMPSUB | 10 | [] | [] |
Error: at pc=0, op=RJUMPSUB: invalid jump destination
### Failure 2: shallow `return stack`
This should fail at first opcode, due to shallow `return_stack`
Bytecode: `0x5e` (`RETURNSUB`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RETURNSUB | 5 | [] | [] |
Error: at pc=0, op=RETURNSUB: invalid retsub
### Subroutine at end of code
In this example the RJUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error
Bytecode: `0x5c00045e5fffff` (`RJUMP 4, RETURNSUB, RJUMPSUB -1`)
| Pc | Op | Cost | Stack | RStack |
|-------|-------------|------|-----------|-----------|
| 0 | RJUMP | 5 | [] | [] |
| 3 | RETURNSUB | 5 | [] | [] |
| 4 | RJUMPSUB | 5 | [] | [] |
| 7 | STOP | 0 | [] | [] |
Consumed gas: `15`
## Reference Implementation
The following is a pseudo-Python implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time.
This algorithm performs a symbolic execution of the program that recursively traverses the _code_, emulating its control flow and stack use and checking for violations of the rules above.
It runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ — thus the algorithm takes time proportional to the size of the _code_. It maintains a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the _code_.
### Validation Function
** Note: this function is known to be incomplete and incorrect. **
For simplicity's sake we assume that all jumpdest analysis and prior validation has been done, including EIP-3540, EIP-3670, and EIP-4200, so EOF headers and sections are well-formed, and there are no invalid instructions or jumps. In practice, all passes of validation can be folded into a single loop no recursion.
We also assume some helper functions.
* `is_valid(opcode)` returns true iff opcode is valid.
* `is_terminator(opcode)` returns true iff opcode is terminator.
* `is_valid_jumpdest(pc)` returns true iff `pc` is a valid jump destination.
* `immediate_data(pc)` returns the immediate data for the instruction at `pc`.
* `immediate_size(opcode)` returns the size of the immediate data for an opcode.
* `removed_items(opcode)` returns the number of items removed from the `data_stack` by the `opcode`.
* `added_items(opcode)` returns the number of items added to the `data_stack` by the `opcode`.
def validate_code(code: bytes, pc: int, sp: int, bp: int) -> boolean: continuations = [] do while pc < len(code): opcode = code[pc] if !is_valid(opcode): return false if is_terminator(opcode): return true
# check stack height and return if we have been here before
stack_height = sp - bp
if stack_height > 1024
return false
if pos in stack_heights:
if stack_height != stack_heights[pos]:
return false
return true
else:
stack_heights[pos] = stack_height
if opcode == RJUMP:
# reset pc to destination of jump
jumpdest = immediate_data(pc)
pc += jumpdest
if !is_valid_jumpdest(pc)
return false
elif opcode == RJUMPI:
jumpdest = pc + immediate_data(pc)
if !is_valid_jumpdest(pc)
return false
# continue true side of conditional later
continations.push((jumpdest, sp, bp))
# continue false side of conditional now
elif opcode == RJUMPSUB:
# will enter subroutine at destination
bp = sp
# push return address and reset pc to destination
jumpdest = pc + immediate_data(pc)
if !is_valid_jumpdest(pc)
return false
push(return_stack, pc + 3)
pc = jumpdest
continue
elif opcode == RETURNSUB:
# will return to subroutine at destination
bp = sp
# pop return address and check for preceding call
pc = pop(return_stack)
if code[pc - 3] != RJUMPSUB:
return false
# apply instructions to stack
sp -= removed_items(opcode)
if sp < 0
return false
sp += added_items(opcode)
# Skip opcode and immediate data
pc += 1 + immediate_size(opcode)
while (pc, sp, bp) = continuations.pop()
return true
```
These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must be quasi-linear in time and space to not be a denial of service vulnerability. The algorithm here makes one linear-time pass of the bytecode, and uses a stack of continuations that cannot exceed the number of RJUMPI
instructions in the code.
Copyright and related rights waived via CC0.