Code merkleization, along with binarification of the trie and gas cost bump of state accessing opcodes, are considered as the main levers for decreasing block witness sizes in stateless or partial-stateless Eth1x roadmaps. Here we specify a fixed-sized chunk approach to code merkleization and outline how the transition of existing contracts to this model would look like.
Bytecode is currently the second contributor to block witness size, after the proof hashes. Transitioning the trie from hexary to binary reduces the hash section of the witness by 3x, thereby making code the first contributor. By breaking contract code into chunks and committing to those chunks in a merkle tree, stateless clients would only need the chunks that were touched during a given transaction to execute it.
What follows is structured to have two sections:
CHUNK_SIZE
: 32 (bytes)VERSION_KEY
: max(u256)
VERSION
: 0EMPTY_CODE_ROOT
: 0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470
(==keccak256('')
)HF_TIMESTAMP
: to be definedITERATOR_STRIDE
: TBC
but based on verkle numbers, first devnets should use 50_000
BE(x, N)
: casts x
to an unsigned integer of N
bytes and returns its big-endian representationFor an account record A
with code C
, two extra optional fields are added: A.codeSize
and A.codeRoot
. A.codeHash
is retained to serve EXTCODEHASH
.
If C
is empty, i.e. in the case of an EoA, A.codeRoot
and A.codeSize
are omitted from the account's RLP. This is intended to limit the size overhead of this change.
If C
is not empty:
A.codeSize = len(code)
A.codeRoot
contains the root of codeTrie
, a trie with the following leaves:VERSION_KEY
, value: BE(VERSION, 1)
chunks = [(FIO_0, code_0), ..., (FIO_n, code_n)]
which are derived from C
as follows:code_0 || ... || code_n == C
.length(code_i) == CHUNK_SIZE
where 0 <= i < n
.length(code_n) <= CHUNK_SIZE
.FIO_i
is the offset of the first instruction within the chunk. It should only be greater than zero if the last instruction in code_i-1
is a multi-byte instruction (like PUSHN
) crossing the chunk boundary. It is set to CHUNK_SIZE
in the case where all bytes of a chunk are data.The i
th element of chunks
is stored in codeTrie
with:
BE(i, 4)
BE(FIO_i, 1) || code_i
, where ||
stands for byte-wise concatenationAs of now there is a charge of 200 gas per byte of the code stored in state by contract creation operations, be it initiated via CREATE
, CREATE2
, or an external transaction. This per byte cost is to be increased from 200
to 500
to account for the chunking and merkleization costs. This number is inherited from EIP-4762 and is picked for forward-compatibility.
The transition process involves reading all contracts in the state and applying the above procedure to them. A process simliar to EIP-7612 is to be used, as the total code size at the time of this EIP edit is >10GB and can not be processed in a single block.
Note that:
At HF_TIMESTAMP
the EIP gets activated:
23
, and CODESIZE
/CODECOPY
are forwarded to the delegated account. EXTCODESIZE
/EXTCODEHASH
/EXTCODECOPY
behave the same as they did before the activation of this EIP.ITERATOR_STRIDE
accounts in the tree, and for each account:The value for ITERATOR_STRIDE
has been chosen to be safe while ensuring the transition process does not last too long. At the current state size and block time, this represents about 10000 blocks, which is about one and a half day.
The trie format is chosen to be the same as that of the account trie. If a tree conversion happens at a later stage, the chunk tree will have to be converted as well, e.g. the way it is in EIP-6800 or EIP-7864.
The current recommended chunk size of 32 bytes has been selected based on a few observations. Smaller chunks are more efficient (i.e. have higher chunk utilization, but incur a larger hash overhead (i.e. number of hashes as part of the proof) due to a higher trie depth. Larger chunks are less efficient, but incur less hash overhead. We plan to run a larger experiment comparing various chunk sizes to arrive at a final recommendation.
The firstInstructionOffset
fields allows safe jumpdest analysis when a client doesn't have all the chunks, e.g. a stateless clients receiving block witnesses.
Note: there could be an edge case when computing FIO for the chunks where the data bytes at the end of a bytecode (last chunk) resemble a multi-byte instruction. This case can be safely ignored.
Details of how the code is stored, is left to the client implementers. However, to reflect the removal of the max code limit and the fact that larger codes will be more expensive to load, reading a non-accessed chunk will incur a 200 warming cost, and chunks written beyond the 24kb limit will cost 500 gas instead of 200.
We have considered an alternative option to package chunks, where each chunk is prepended with its chunkLength
and would only contain complete opcodes (i.e. any multi-byte opcode not fitting the CHUNK_SIZE
would be deferred to the next chunk).
This approach has downsides compared to the one specified:
1) Requires a larger CHUNK_SIZE
-- at least 33 bytes to accommodate the PUSH32
instruction.
2) It is more wasteful. For example, DUP1 PUSH32 <32-byte payload>
would be encoded as two chunks, the first chunk contains only DUP1
, and the second contains only the PUSH32
instruction with its payload.
3) Calculating the number of chunks is not trivial and would have to be stored explicitly in the metadata.
Additionally we have reviewed many other options (basic block based, Solidity subroutines (requires determining the control flow), EIP-2315 subroutines). This EIP however only focuses on the chunk-based option.
From the perspective of contracts, the design aims to be transparent, with the exception of changes in gas costs.
Outside of the interface presented for contracts, this proposal introduces major changes to the way contract code is stored, and needs a substantial modification of the Ethereum state. Therefore it can only be introduced via a hard fork.
TBD
Show the codeRoot
for the following cases:
code=''
code='PUSH1(0) DUP1 REVERT'
code='PUSH32(-1)'
(data passing through a chunk boundary)TBA
Copyright and related rights waived via CC0.