Created	2019-10-17
Status	Withdrawn
Category	Core
Type	Standards Track
Authors
Requires	EIP-3540 EIP-3670 EIP-4200

Abstract

This proposal provides a complete, efficient, safe and static control-flow facility.

It introduces two new opcodes to support calling and returning from subroutines:

RJUMPSUB relative_offset -- relative jump to subroutine
RETURNSUB -- return to PC after most recent RJUMPSUB.

It depends on the two new opcodes proposed by EIP-4200 to support static jumps:

RJUMP relative_offset — relative jump to PC + relative_offset
RJUMPI relative_offset — conditional relative jump

It deprecates JUMP and JUMPI, allowing valid code to support streaming, one-pass, and other near-linear compilers.

In concert with EIP-3540 and EIP-3670 it ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion.

This is among the simplest possible proposals that meets these requirements.

Motivation

A complete control-flow facility.

Jumps, conditional jumps and subroutines were proposed by Alan Turing in 1945 as a means of organizing the logic of the code and the design of the memory crystals for his Automatic Computing Engine:

We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date... We also wish to be able to arrange for the splitting up of operations into subsidiary operations... To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation.

— Alan Turing — in B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977.

In more contemporary terms, we have sequences of instructions, jumps and conditional jumps that divide sequences into blocks, subroutine calls, and a stack of addresses to return to. The details vary, but similar facilities have proven their value across a long line of important machines over the last 75 years, including most all of the machines we have programmed or implemented -- physical machines including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP-11 and VAX, Motorola 68000, Sun SPARC, and Intel x86s, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others.

Unlike these machines, the Ethereum Virtual Machine does not provide subroutine operations. Instead, they must be synthesized using the dynamic JUMP instruction, which takes its destination on the stack. Further, the EVM provides only dynamic jumps, impeding the static analysis we need.

Efficient control-flow.

Efficient to write by hand, compile from high level labguages, validate at deploy time, interpret by VMs, and compile to machine code.

Static jumps, conditional jumps, and subroutines are sufficient and efficient in space and time, as shown by historical experience and as we will show for the EVM below.

Safe control-flow.

The EVM has unusually high requirements for safety. Not only do many smart contracts control inordinately large amounts of valuable Ether, but once placed on the blockchain any defects are visible to attackers and cannot be repaired. We propose to statically validate important safety constraints on code at initialization time.

Static control-flow.

The EVM's dynamic jumps cause two major problems. First, the need to synthesize static jumps and subroutines with dynamic jumps wastes space and gas with needlessly complex code, as we will show below.

Worse, jumps that can dynamically branch to any destination in the code can cause quadratic "path explosions" when traversing the flow of control. For Ethereum, this is a denial-of-service vulnerability that prevents us, at initialization time, from validating the safe use of EVM code and from compiling EVM code to machine code.

We need static control-flow to validate program safety and to compile EVM bytecode to machine code -- in time and space linear in the size of the code.

Specification

Opcodes

`RJUMPSUB (0x5f) relative_offset`

Transfers control to a subroutine.

Decode the relative_offset from the immediate data at PC.
Push the current PC + 3 to the return stack.
Set PC to PC + relative_offset.

The relative_offset is relative to the current PC. The offset is encoded as a two-byte, twos-complement signed integer, stored MSB-first.

The gas cost is low.

`RETURNSUB (0x5e)`

Returns control to the caller of a subroutine.

Pop the return stack to PC.

The gas cost is verylow.

Notes:

Values popped off the return stack do not need to be validated, since they are alterable only by RJUMPSUB and RETURNSUB.
The description above lays out the semantics of these instructions in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code RJUMPSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 3 location.)

Validity

Execution is defined in the Yellow Paper as a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines six such states.

Insufficient gas
More than 1024 stack items
State modification during a static call
Insufficient stack items
Invalid jump destination
Invalid instruction

We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for the first three conditions. We don’t know how much gas there will be, we don’t know how deep a recursion may go, analysis of stack depth even for non-recursive programs is nontrivial, and we don't know whether a call will be static. All of the remaining conditions MUST be validated statically, in time and space quasi-linear in the size of the code.

Static Constraints on Valid Code

Every instruction MUST be valid:
The JUMP and JUMPI instructions ARE NOT valid.
Every jump MUST be valid:
The RJUMP, RJUMPI, or RJUMPSUB instructions MUST NOT address immediate data or addresses outside of their code section.
The stacks MUST be valid:
The number of items on the data stack MUST always be positive.
The number of items on the return stackMUST always be positive.
The data stack MUST be consistently aligned:
The data stack height is
- the absolute difference between the current stack pointer and the stack pointer on entry to the current subroutine.
It MUST be the same for every reachable path through a given PC and
MUST NOT exceed 1024.

Rationale

This is a purely semantic specification, placing no constraints on the syntax of code sections beyond being a sequence of opcodes and immediate data – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the bytecode's control-flow graph. The EVM is a simple state machine. We only promise that valid code will not, as it were, jam up the gears of the machine.

By avoiding syntactic constraints we allow for optimizations like tail call elimination, multiple-entry subroutines, moving "cold" code out of line, and other ways of reducing redundancy and keeping "hot" code in cache. Since we wish to support one-pass compilation of EVM code to machine code it is crucial that the EVM code be as well optimized as possible up front.

Validation

Rather than enforce constraints via syntax, we enforce them via validation.

The constraints on valid code cover all of the exceptional halting states that we can validate and allow code to be validated and compiled in time and space quasi-linear in the size of the code.

The RJUMP, RJUMPI and RJUMPSUB instructions take their relative_offset as immediate arguments, which cannot change at runtime. Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime. Dynamic jumps can branch to any destination in the code, so exploitable quadratic "path explosions" are possible when traversing the control flow graph. Deprecating JUMP and JUMPI prevents this.

Requiring a consistently aligned data stack

prevents stack underflow
ensures that all calls to a subroutine have the same number of inputs and the same number of outputs and
ensures that stack height is bounded in the absence of recursion.

Requiring a consistently aligned data stack also allows some algorithms that traverse the control-flow graph -- including code validation and compilation -- to break cycles at joins, again preventing quadratic path explosion. When a traversal gets to a PC it has visited before it is either at the beginning of a loop or the entry to a function. Since the stack height at that PC is constant we know that loops will not grow stack, and that the number of arguments to a subroutine will always be the same -- there may be no need to traverse that path again.

Note: The JVM and Wasm enforce similar constraints for similar reasons.

Alternative Designs

There are a few major designs for a subroutine facility, two of which are considered here. The others are mostly not appropriate for the EVM, such as the Wheeler Jump -- self-modifying code that writes return addresses into called subroutines.

1. Keep return addresses on a dedicated return stack. Turing's design is often used by stack machines, including those for Forth, Java, Wasm, and others. The data stack is used for computation, with a dedicated stack for return addresses. A single instruction suffices to call, and another to return from a routine.

2. Keep return addresses on the data stack. This design is often used by register machines, including those from CDC, IBM, DEC, Intel, and ARM. The registers are used primarily for computation, and the stack maintains call frames for return addresses, arguments, and local variables. On the EVM there are no registers for computation, so using the stack for both purposes can cause the sort of inefficiencies we see below. Pascal p-code does use this design, but as part of a complex calling convention with dedicated registers.

We prefer the dedicated return stack.

It maintains a clear separation between calculation and flow of control:
the data stack is free of vulnerable return addresses and
it's impossible to overwrite the return stack.
It improves efficiency:
it uses native arithmetic rather than 256-bit EVM instructions for the return address,
doesn't use up a data stack slot for the return address and
needs less motion of 256-bit data on the stack.

Efficiency

We illustrate here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using JUMP.

Simple Subroutine Call

Consider these examples of a fairly minimal subroutine, including the code to call it.

Subroutine call, using RJUMPSUB:

SQUARE:
    dup1            ; 3 gas
    mul             ; 5 gas
    returnsub       ; 3 gas

CALL_SQUARE:
    push 0x02       ; 3 gas
    rjumpsub SQUARE ; 5 gas

Total gas: 19

Subroutine call, using JUMP:

SQUARE:
    jumpdest        ; 1 gas
    swap1           ; 3 gas
    dup1            ; 3 gas
    mul             ; 5 gas
    swap1           ; 3 gas
    jump            ; 8 gas

CALL_SQUARE:
    jumpdest        ; 1 gas
    push 0x02       ; 3 gas
    push RTN_CALL:  ; 3 gas
    push SQUARE     ; 3 gas
    jump            ; 8 gas
RTN_CALL:
    jumpdest        ; 1 gas

Total: 41 gas.

Using RJUMPSUB versus JUMP saves 41 - 19 = 22 gas — a 54% improvement.

Tail Call Optimization

Of course in cases like this one we can optimize the tail call, so that the return from SQUARE actually returns from TEST_SQUARE.

Tail call optimization, using RJUMPSUB and RETURNSUB: ```SQUARE: dup1 ; 3 gas mul ; 5 gas returnsub ; 3 gas

CALL_SQUARE: push 0x02 ; 3 gas rjump SQUARE ; 3 gas

_Total: 17 gas_


Tail call optimization, using `JUMP`:

SQUARE: jumpdest ; 1 gas swap1 ; 3 gas dup1 ; 3 gas mul ; 5 gas swap2 ; 3 gas jump ; 8 gas

CALL_SQUARE: jumpdest ; 1 gas push 0x02 ; 3 gas push SQUARE ; 3 gas jump ; 8 gas

_Total: 38 gas_

Using `RJUMPSUB` versus `JUMP` saves _38 - 17 = 21 gas_ — a _55%_ improvement.

#### Efficiency Caveats

We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using `JUMP` — in our examples they cut gas use by about half.

Clearly, the benefits of this efficiency are greater for programs that have been factored into smaller subroutines.  How small?  Wrapping code in a subroutine costs only _8 gas_ using `RJUMPSUB` and `RETURNSUB` versus _30 gas_ using `JUMP`, `PUSH` and `SWAP` as above.

### Costs

The _low_ cost of `RJUMPSUB` versus the _mid_ cost of `JUMP` is justified by needing only to decode the immediate two byte destination to the `PC` and push the return address on the `return stack`, all using native arithmetic, versus using the data stack with emulated 256-bit instructions.

The _verylow_ cost of `RETURNSUB` is justified by needing only to pop the `return stack` into the `PC`.  Benchmarking will be needed to tell if the costs are well-balanced.

## Backwards Compatibility

These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data.  Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.

## Test Cases

### Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: `0x5f0003005e` (`RJUMPSUB 3, RETURNSUB, STOP`)

|  Pc   |      Op     | Cost |   Stack   |   RStack  |
|-------|-------------|------|-----------|-----------|
|    0  |   RJUMPSUB  |    5 |        [] |        [] |
|    2  |       STOP  |    0 |        [] |        [] |
|    3  |  RETURNSUB  |    3 |        [] |        [] |

Output: 0x
Consumed gas: `10`

### Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: `0x5f00045F00025200` (`RJUMPSUB 4, RJUMPSUB 2, RETURNSUB, RETURNSUB, STOP`)

|  Pc   |      Op     | Cost |   Stack   |   RStack  |
|-------|-------------|------|-----------|-----------|
|    0  |   RJUMPSUB  |    5 |        [] |        [] |
|    3  |   RJUMPSUB  |    5 |        [] |        [] |
|    4  |  RETURNSUB  |    5 |        [] |        [] |
|    5  |  RETURNSUB  |    5 |        [] |        [] |
|    6  |       STOP  |    0 |        [] |        [] |

Consumed gas: `20`

### Failure 1: invalid jump

This should fail, since the given location is outside of the code-range.

Bytecode: `0X5fff`(`RJUMPSUB -1`)

|  Pc   |      Op     | Cost |   Stack   |   RStack  |
|-------|-------------|------|-----------|-----------|
|    0  |   RJUMPSUB  |   10 |        [] |        [] |

Error: at pc=0, op=RJUMPSUB: invalid jump destination

### Failure 2: shallow `return stack`

This should fail at first opcode, due to shallow `return_stack`

Bytecode: `0x5e` (`RETURNSUB`)

|  Pc   |      Op     | Cost |   Stack   |   RStack  |
|-------|-------------|------|-----------|-----------|
|    0  |  RETURNSUB  |    5 |        [] |        [] |

Error: at pc=0, op=RETURNSUB: invalid retsub

### Subroutine at end of code

In this example the RJUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error

Bytecode: `0x5c00045e5fffff` (`RJUMP 4, RETURNSUB, RJUMPSUB -1`)

|  Pc   |      Op     | Cost |   Stack   |   RStack  |
|-------|-------------|------|-----------|-----------|
|    0  |      RJUMP  |    5 |        [] |        [] |
|    3  |  RETURNSUB  |    5 |        [] |        [] |
|    4  |   RJUMPSUB  |    5 |        [] |        [] |
|    7  |       STOP  |    0 |        [] |        [] |

Consumed gas: `15`

## Reference Implementation

The following is a pseudo-Python implementation of an algorithm for predicating code validity.  An equivalent algorithm must be run at initialization time.

This algorithm performs a symbolic execution of the program that recursively traverses the _code_, emulating its control flow and stack use and checking for violations of the rules above.

It runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ — thus the algorithm takes time proportional to the size of the _code_. It maintains a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the _code_.

### Validation Function

** Note: this function is known to be incomplete and incorrect. **

For simplicity's sake we assume that all jumpdest analysis and prior validation has been done, including EIP-3540, EIP-3670, and EIP-4200, so EOF headers and sections are well-formed, and there are no invalid instructions or jumps.  In practice, all passes of validation can be folded into a single loop no recursion.

We also assume some helper functions.
* `is_valid(opcode)` returns true iff opcode is valid.
* `is_terminator(opcode)` returns true iff opcode is terminator.
* `is_valid_jumpdest(pc)` returns true iff `pc` is a valid jump destination.
* `immediate_data(pc)` returns the immediate data for the instruction at `pc`.
* `immediate_size(opcode)` returns the size of the immediate data for an opcode.
* `removed_items(opcode)` returns the number of items removed from the `data_stack` by the `opcode`.
* `added_items(opcode)` returns the number of items added to the `data_stack` by the `opcode`.

returns true iff code is valid

def validate_code(code: bytes, pc: int, sp: int, bp: int) -> boolean: continuations = [] do while pc < len(code): opcode = code[pc] if !is_valid(opcode): return false if is_terminator(opcode): return true

        # check stack height and return if we have been here before
        stack_height = sp - bp
        if stack_height > 1024
            return false
        if pos in stack_heights:
            if stack_height != stack_heights[pos]:
                return false
            return true
        else:
            stack_heights[pos] = stack_height

        if opcode == RJUMP:

            # reset pc to destination of jump
            jumpdest = immediate_data(pc)
            pc += jumpdest
            if !is_valid_jumpdest(pc)
                return false

        elif opcode == RJUMPI:

            jumpdest = pc + immediate_data(pc)
            if !is_valid_jumpdest(pc)
                return false

            # continue true side of conditional later
            continations.push((jumpdest, sp, bp))

            # continue false side of conditional now

        elif opcode == RJUMPSUB:

            # will enter subroutine at destination
            bp = sp

            # push return address and reset pc to destination
            jumpdest = pc + immediate_data(pc)
            if !is_valid_jumpdest(pc)
                return false
            push(return_stack, pc + 3)
            pc = jumpdest
            continue

        elif opcode == RETURNSUB:

            # will return to subroutine at destination
            bp = sp

            # pop return address and check for preceding call
            pc = pop(return_stack)
            if code[pc - 3] != RJUMPSUB:
               return false

        # apply instructions to stack
        sp -= removed_items(opcode)
        if sp < 0
            return false
        sp += added_items(opcode)

        # Skip opcode and immediate data
        pc += 1 + immediate_size(opcode)

    while (pc, sp, bp) = continuations.pop()

return true

```

Security Considerations

These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must be quasi-linear in time and space to not be a denial of service vulnerability. The algorithm here makes one linear-time pass of the bytecode, and uses a stack of continuations that cannot exceed the number of RJUMPI instructions in the code.

EIP-2315 - Simple Subroutines for the EVM