EIP-4573 - Procedures for the EVM

Created	2021-12-16
Status	Stagnant
Category	Core
Type	Standards Track
Authors
Requires	EIP-2315 EIP-3540 EIP-3670 EIP-3779 EIP-4200

Abstract

Five EVM instructions are introduced to define, call, and return from named EVM procedures and access their call frames in memory - ENTERPROC, LEAVEPROC, CALLPROC, RETURNPROC, and FRAMEADDRESS.

Motivation

Currently, Ethereum bytecode has no syntactic structure, and subroutines have no defined interfaces.

We propose to add procedures -- delimited blocks of code that can be entered only by calling into them via defined interfaces.

Also, the EVM currently has no automatic management of memory for procedures. So we also propose to automatically reserve call frames on an in-memory stack.

Constraints on the use of procedures must be validated at contract initialization time to maintain the safety properties of EIP-3779: Valid programs will not halt with an exception unless they run out of gas or recursively overflow stack.

Prior Art

The terminology is not well-defined, but we will follow Intel in calling the low-level concept subroutines and the higher level concept procedures. The distinction is that subroutines are little more than a jump that knows where it came from, whereas procedures have a defined interface and manage memory as a stack. EIP-2315 introduces subroutines, and this EIP introduces procedures.

Specification

Instructions

ENTERPROC (0x??) dest_section: uint8, dest_offset: uint8, n_inputs: uint16, n_outputs: uint16, n_locals: uint16

frame_stack.push(FP)
FP -= n_locals * 32
PC +- <length of immediates>

Marks the entry point to a procedure * at offset dest_offset from the beginning of the dest_section. * taking n_inputs arguments from the data stack, * returning n_outputs values on the data stack, and * reserving n_locals words of data in memory on the frame stack.

Procedures can only be entered via a CALLPROC to their entry point.

LEAVEPROC (0x??)

   FP = frame_stack.pop()
   asm RETURNSUB

Pop the frame stack and return to the calling procedure using RETURNSUB.

Marks the end of a procedure. Each ENTERPROC requires a closing LEAVEPROC.

Note: Attempts to jump into a procedure (including its LEAVEPROC) from outside of the procedure or to jump or step to ENTERPROC at all must be prevented at validation time. CALLPROC is the only valid way to enter a procedure.

CALLPROC (0x??) dest_section: uint16, dest_proc: uint16

``` FP -= n_locals asm JUMPSUB +

> Allocate a *stack frame* and transfer control and `JUMPSUB` to the Nth (N=*dest_proc*) _procedure_ in the Mth(M=*dest_section*) _section_ of the code.  _Section 0_ is the current code section, any other code sections are indexed starting at _1_. 

*Note: That the procedure is defined and the required `n_inputs` words are available on the `data stack` must be shown at validation time.* 

#### RETURNPROC (0x??)

FP += n_locals asm RETURNSUB

> Pop the `frame stack` and return control to the calling procedure using `RETURNSUB`.

*Note: That the promised `n_outputs` words are available on the `data stack` must be shown at validation time.*

#### FRAMEADDRESS (0x??) offset: int16

asm PUSH2 FP + offset

> Push the address `FP + offset` onto the data stack.

Call frame data is addressed at an immediate `offset` relative to `FP`.

Typical usage includes storing data on a call frame

PUSH 0xdada FRAMEADDRESS 32 MSTORE

and loading data from a call frame

FRAMEADDRESS 32 MLOAD

### Memory Costs

Presently,`MSTORE` is defined as

memory[stack[0]...stack[0]+31] = stack[1] memory_size = max(memory_size,floor((stack[0]+32)÷32)

* where `memory_size` is the number of active words of memory above _0_.

We propose to treat memory addresses as signed, so the formula needs to be

memory[stack[0]...stack[0]+31] = stack[1] if (stack[0])+32)÷32) < 0 negative_memory_size = max(negative_memory_size,floor((stack[0]+32)÷32)) else positive_memory_size = max(positive_memory_size,floor((stack[0]+32)÷32)) memory_size = positive_memory_size + negative_memory_size

* where `negative_memory_size` is the number of active words of memory below _0_ and
* where `positive_memory_size` is the number of active words of memory at or above _0_.

### Call Frame Stack

These instructions make use of a `frame stack` to allocate and free frames of local data for _procedures_ in memory.  Frame memory begins at address 0 in memory and grows downwards, towards more negative addresses.  A frame is allocated for each procedure when it is called, and freed when it returns.

Memory can be addressed relative to the frame pointer `FP` or by absolute address.  `FP` starts at 0, and moves downward towards more negative addresses to point to the frame for each `CALLPROC` and moving upward towards less negative addresses to point to the previous frame for the corresponding `RETURNPROC`.

Equivalently, in the EVM's twos-complement arithmetic, `FP` moves from the highest address down, as is common in many calling conventions.

For example, after an initial `CALLPROC` to a procedure needing two words of data the `frame stack` might look like this

 0-> ........
     ........
FP->

Then, after a further `CALLPROC` to a procedure needing three words of data the `frame stack` would like this

 0-> ........
     ........

-64-> ........ ........ ........ FP->

After a `RETURNPROC` from that procedure the `frame stack` would look like this

 0-> ........
     ........
FP-> ........
     ........
     ........

and after a final `RETURNPROC`, like this

FP-> ........
     ........
     ........
     ........
     ........

```

Rationale

There is actually not much new here. It amounts to EIP-615, refined and refactored into bite-sized pieces, along lines common to other machines.

This proposal uses the EIP-2315 return stack to manage calls and returns, and steals ideas from EIP-615, EIP-3336, and EIP-4200. ENTERPROC corresponds to BEGINSUB from EIP-615. Like EIP-615 it uses a frame stack to track call-frame addresses with FP as procedures are entered and left, but like EIP-3336 and EIP-3337 it moves call frames from the data stack to memory.

Aliasing call frames with ordinary memory supports addressing call-frame data with ordinary stores and loads. This is generally useful, especially for languages like C that provide pointers to variables on the stack.

The design model here is the subroutines and procedures of the Intel x86 architecture. * JUMPSUB and RETURNSUB (from EIP-2315 -- like CALL and RET -- jump to and return from subroutines. * ENTERPROC -- like ENTER -- sets up the stack frame for a procedure. * CALLPROC amounts to a JUMPSUB to an ENTERPROC. * RETURNPROC amounts to an early LEAVEPROC. * LEAVEPROC -- like LEAVE -- takes down the stack frame for a procedure. It then executes a RETURNSUB.

Backwards Compatibility

This proposal adds new EVM opcodes. It doesn't remove or change the semantics of any existing opcodes, so there should be no backwards compatibility issues.

Security

Safe use of these constructs must be checked completely at validation time -- per EIP-3779 -- so there should be no security issues at runtime.

ENTERPROC and LEAVEPROC must follow the same safety rules as for JUMPSUB and RETURNSUB in EIP-2315. In addition, the following constraints must be validated:

EveryENTERPROC must be followed by a LEAVEPROC to delimit the bodies of procedures.
There can be no nested procedures.
There can be no jump into the body of a procedure (including its LEAVEPROC) from outside of that body.
There can be no jump or step to BEGINPROC at all -- only CALLPROC.
The specified n_inputs and n_outputs must be on the stack.