This Informational EIP provides foundational context for understanding control flow in the EVM. It covers the historical development of control flow mechanisms in computing, the technical foundations of control flow analysis, and the impact of static control flow on Ethereum's scaling roadmap.
This document serves as background material for proposals for static control flow, including EIP-615: Subroutines and Static Jumps for the EVM, EIP-4750: EOF Functions, EIP-7979: Call and Return Opcodes, EIP-8013: Static Relative Jumps, discussions around RISC-V migration and ZK verification infrastructure, and discussions still to come.
In 1833 Charles Babbage began the design of the Analytical Engine: a steam-powered, mechanical, Turing-complete computer. Programs were to be encoded on punched cards which controlled a system of rods, gears and other machinery to implement arithmetic, storage (for 1,000 40-digits decimal numbers) and conditional jumps. Jumps were supported by cards that shuffled the card deck forwards or backwards a fixed number of cards.
The first published description of the Analytical Engine was in French, by L. F. Menabre, 1842^1. The English translator, Ada Augusta, Countess of Lovelace, made extensive notes on the science of computer programming. The notes include her famous program for recursively computing Bernoulli numbers — arguably the world's first complete computer program — which used conditional jumps to implement the required nested loops.
In 1945 Alan Turing proposed his Automatic Computing Engine^2 (ACE), where he introduced the concept of calls and returns: "To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation."
The ACE supported calls directly with a stack of mercury-filled memory crystals holding return addresses. Turing's design was for a 32-bit RISC machine with a vacuum tube integer ALU, floating point microcode, 32 registers, a 1024-entry return stack, and 32K of RAM on a 1 MHz bus. The smaller Pilot ACE was for a time the world's fastest computer.
Call and return facilities of various names and levels of complexity have proven their worth across a long line of important machines over the last 80 years, including most of the machines I have programmed or implemented: Physical machines including the Burroughs 5000, CDC 7600, IBM 360, PDP-11, VAX, Motorola 68000 and Intelx86, and virtual machines including those for Scheme, Forth, Pascal, Java, and WebAssembly.
Especially relevant to the EVM's design are the Java, WebAssembly (Wasm), and CLI (.NET) VMs. They share crucial common properties:
The static control flow that supports linear time compilers also supports any other code that needs to traverse the control flow of a program of a program, traversing each path only once.
Among the most imporant reasons to traverse the control flow of a program is to extract the control flow graph. The EVM makes this quadratically difficult due to its dynamic control flow.
A control flow graph (CFG) is a directed graph representation of a program where:
A complete and sound CFG represents all and only the possible paths of program execution and is a fundamental starting point for many downstream tasks, including many static analyses:
Dynamic jumps make dynamic control flow possible, make quadraticaly complex CFGs possible, and can make static analysis of control difficult to impossible. Dynamic jumps are not a problem for machines which run on physical hardware -- their instructions are designed for speed. But they are uncommon in virtual machines whose code is the source for downstream tools. This is because, as we will illustrate below, building and traversing a dynamic control flow graph can take quadratic space and time. For this reason Java and Wasm do not support dynamic jumps and CLI carefully restricts them.
For an example, consider these EVM programs and the easily generated series of longer programs like them ... the really long ones make for nice exploits. It's not important that at runtime gas isn't random or that the jump will most often fail, what matters is that because the jump destination is taken from the stack it is impossible to know a priori where the jumps go, so every path must be explored.
jumpdest jumpdest jumpdest ...
gas gas gas ...
jump jump jump ...
jumpdest jumpdest jumpdest ...
gas gas gas ...
jump jump jump ...
jumpdest jumpdest jumpdest ...
stop gas gas ...
jump jump ...
jumpdest jumpdest ...
stop gas ...
jump ...
jumpdest ...
stop ...
...
...
...
The control flow graphs for these programs make the problem clear. Each block of instructions in a graph is a sequence from the above programs with at most one entry (a JUMPDEST) and one exit (a JUMP), and each arc is a transfer of control. Arcs on the left are backwards branches, arcs on the right are forwards branches. See how the tangle of arcs goes up fast, faster than the programs get longer.
To be precise the number of arcs in these graphs is equal to the number of blocks minus one, squared. That is, the number of possible paths of control flow, and thus the time needed to traverse them all, is quadratic in the size of the program. At each block's exiting instruction, (except the stop) the analyzer cannot know a priori where control will transfer. Every block might jump to any other block, creating a fully-connected CFG with O(N²) possible paths. Traversing all paths to verify them requires O(N²) time, and this is not just a theoretical worst-case, as shown above.
In Java, Wasm, and CLI it is simply impossible to have programs like these.
The Ethereum Virtual Machine does not provide explicit facilities for calls and returns. Instead, they must be synthesized using the dynamic JUMP instruction, which takes its argument from the stack and stores intermediate links on the stack. So control flow must be dynamic, which creates the quadratic CFG problems explained above.
For Ethereum, this quadratic behavior is a denial-of-service vulnerability for any online static analysis, including validating bytecode and AOT compilation at contract creation time and JIT compilation at runtime.
Even offline, dynamic jumps (and the lack of calls and returns) can cause static analyses of many contracts to become quadratically impractical, exponentially intractable or even mathematically impossible. For a few examples, consider these abstracts from just a few recent papers on the problem. The last paper resorts to neural nets to disassemble (most) Solidity programs. There is an entire academic literature of complex, incomplete solutions to problems that static analysis renders trivial.
"Ethereum smart contracts are distributed programs running on top of the Ethereum blockchain. Since program flaws can cause significant monetary losses and can hardly be fixed due to the immutable nature of the blockchain, there is a strong need of automated analysis tools which provide formal security guarantees. Designing such analyzers, however, proved to be challenging and error-prone."^3
"The EVM language is a simple stack-based language ... with one significant difference between the EVM and other virtual machine languages (like Java bytecode or CLI for .Net programs): the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. Static analyzers need the complete control flow graph (CFG) of the EVM program in order to be able to represent all its execution paths."^4
"Static analysis approaches mostly face the challenge of analyzing compiled Ethereum bytecode... However, due to the intrinsic complexity of Ethereum bytecode (especially in jump resolution), static analysis encounters significant obstacles."^5
"Analyzing contract binaries is vital ... comprising function entry identification and detecting its boundaries... Unfortunately, it is challenging to identify functions ... due to the lack of internal function call statements."^6
In our experience, to avoid the problems of dynamic control flow VMs use static jumps, calls and returns.
As laid out above, Static control flow means that the destinations of every jump or call is determinable a priori, before execution. This has concrete implications for Ethereum's scaling roadmap, particularly around ZK verification, rollups, and future execution layer changes.
To understand why static control flow matters for ZK-Rollups, we need to briefly understand how ZK systems verify computation:
A _ZK-Rollup sequencer or prover batches many transactions and constructs a ZK proof that all transactions executed correctly. The proof is then submi_tted to L1, where it is quickly verified. Key implications:
Static control flow enables:
Optimistic Rollups assume transactions are valid but require fraud proofs to dispute invalid state roots. A fraud proof is a proof that a specific transaction executed incorrectly. Key implications of static analysis include:
As already discussed, static control flow enables contracts to be compiled to machine code before execution, just-in-time or ahead-of-time. This is an obvious win for non-ZK Clients, whether on L1, L2, or EVM-compatible chains.
There are ongoing discussions within the Ethereum research community about potentially replacing the EVM with a RISC-V execution environment. RISC-V has a standard instruction set architecture which is seeing increasing use in the ZK community. One currrent strategy for creating a ZK-EVM is to compile an EVM interpreter like evmone or reth to RISC-V for use in a ZK-VM. Supporting RISC-V directly eliminates the overhead of the EVM interpreter. An EVM with static control flow opens up another strategy — compile the EVM code to RISC-V code. That gives good RISC-V code in one linear-time pass, and better code in multiple passes, altogether linear time.
A missing piece in this puzzle is that RISC-V is a 32-bit or 64-bit architecture, but the current EVM is a 256-bit architecture. For that purpose we have EIP-7937: EVM64 — 64-bit mode EVM opcodes. It's also the case that the prover has the actual trace, and can tell how many bits are actually in use at each step of the computation, so the 256-bit registers might not be that big of a problem.
Four EIPs currently specify new EVM opcodes and semantics for static control flow: EIP-615: Subroutines and Static Jumps for the EVM, EIP-4750: EOF - Functions, EIP-7979: Call and Return Opcodes for the EVM, and EIP-8013: Static relative jumps and calls for the EVM. They are all implemented with the standard Turing return stack architecture, and are for the most part compatible with each other other. In particular, EIP-7979 can be used to implement every other EIP here.
Static control flow has been a cornerstone of efficient computation since Babbage and Turing. The EVM's reliance on dynamic jumps is an anomaly among virtual machines and a significant barrier to analysis, compilation, and scaling. Proposals to introduce explicit call/return opcodes and enforce static control flow bring the EVM in line with industry best practices and unlock a range of optimizations critical to Ethereum's scaling roadmap.
Static control flow is not a silver bullet. But it is a foundational piece that enables:
By making control flow explicit and enforceable, the EVM becomes compatible with the full ecosystem of optimization and analysis techniques that other VMs and processor designs have leveraged for decades.
This Informational proposal itself specificies no changes to the protocol. Therefore it has no direct security implications. It does not affect the security considerations of the EIPs it references, rather, it helps to motivate and specify them.
{
"type": "article",
"id": 3,
"author": [
{
"family": "Menabre",
"given": "L.F."
}
],
"DOI": "10.1145/2809523.2809528",
"title": "Sketch of The Analytical Engine Invented by Charles Babbage.",
"original-date": {
"date-parts": [
[1842, 10, 1]
]
},
"URL": "https://doi.org/10.1145/2809523.2809528"
}
```
{
"type": "article",
"id": 3,
"author": [
{
"family": "Carpenter",
"given": "B.E"
}
],
"DOI": "comjnl/20.3.269",
"title": "The other Turing machine.",
"original-date": {
"date-parts": [
[1977, 1, 1]
]
},
"URL": "https://doi.org/10.1093/comjnl/20.3.269"
}
```
{
"type": "article",
"id": 4,
"author": [
{
"family": "Schneidewind",
"given": "Clara"
}
],
"DOI": "arXiv:2101.05735",
"title": "The Good, the Bad and the Ugly: Pitfalls and Best Practices in Automated Sound Static Analysis of Ethereum Smart Contracts.",
"original-date": {
"date-parts": [
[2021, 1, 14]
]
},
"URL": "https://arxiv.org/abs/2101.05735"
}
```
{
"type": "article",
"id": 5,
"author": [
{
"family": "Albert",
"given": "Elvira"
}
],
"DOI": "arXiv:2004.14437",
"title": "Analyzing Smart Contracts: From EVM to a sound Graph.",
"original-date": {
"date-parts": [
[2020, 4, 29]
]
},
"URL": "https://arxiv.org/abs/2004.14437"
}
```
{
"type": "article",
"id": 6,
"author": [
{
"family": "Contro",
"given": "Filippo"
}
],
"DOI": "arXiv:2103.09113",
"title": "EtherSolve: Computing an Accurate Graph from Ethereum Bytecode.",
"original-date": {
"date-parts": [
[2021, 3, 16]
]
},
"URL": "https://arxiv.org/abs/2103.09113"
}
```
{
"type": "article",
"id": 7,
"author": [
{
"family": "He",
"given": "Jiahao"
}
],
"DOI": "arXiv:2301.12695",
"title": "Neural-FEBI: Accurate Function Identification in Ethereum Virtual Machine Bytecode.",
"original-date": {
"date-parts": [
[2023, 1, 30]
]
},
"URL": "https://arxiv.org/abs/2301.12695"
}
```
{
"type": "article",
"id": 8,
"author": [
{
"family": "Jain, et al. Doi: ",
"given": "Akshita"
}
],
"DOI": "10.48550/Arxiv.2301.12695",
"title": "Exploring The Efficacy Of Rollups’ A Comparative Study Of Optimistic And ZK-Rollups And Their Popular Implementations.",
"original-date": {
"date-parts": [
[2024, 1, 1]
]
},
"URL": "https://arxiv.org/10.48550/Arxiv.2301.12695"
}
```
Copyright and related rights waived via CC0.