EIP-6690 - EVM Modular Arithmetic Extensions

Created	2023-03-15
Status	Stagnant
Category	Core
Type	Standards Track
Authors

Abstract

This EIP proposes new EVM modular arithmetic opcodes which support operations on odd or power-of-two moduli between 3 and 2**768-1

Motivation

Current opcodes for modular arithmetic only support values up to 256 bits wide. In addition, they are permissive and accept any representable value for the inputs.

Many cryptographic operations are heavily-bottlenecked by modular arithmetic. To expand the range of cryptographic primitives that can be implemented efficiently as EVM contracts, we propose new modular arithmetic opcodes designed for efficiency.

Specification

Constants

Name	Value	Description
`COST_SETMODX_BASE`	1	static cost component for the `SETMODX` opcode
`COST_STOREX_BASE`	1	static cost for the `STOREX` opcode
`COST_LOADX_BASE`	1	static cost for the `LOADX` opcode

Conventions

The use of assert implies that if the assertion fails, the current call frame will consume all call gas and terminate call execution in an exceptional state.
Unless otherwise stated, the implied unit for size is bytes.

Overview

The execution state of an EVM call frame is modified to include a mapping of id (a number 0-256) to a field context. A field context comprises a modulus and an allocated space of virtual registers to perform modular arithmetic operations on.

An executing contract uses a new instruction SETMODX to set the active field context, allocating a new one in the mapping if it does not already exist for id.

New arithmetic opcodes perform modular addition, subtraction and multiplication with inputs/outputs from virtual registers of the active field context.

New load/store opcodes copy values to and from EVM memory and the virtual registers allocated by the active field context.

New Opcodes

`SETMODX(0xc0)`

Input: <top of stack> id modulus_offset modulus_size alloc_count.

Output: none

Execution

Charge COST_SETMODX_BASE.

Assert 0 <= id <= 256.

If a field context for id exists in this call scope:

Set it as the active one.

Otherwise:

Assert modulus_size <= 96.
Assert that the byte range[modulus_offset, modulus_offset+modulus_size] falls within EVM memory.
Load the byte range from EVM memory, interpreting it as a big endian modulus.
Assert modulus is odd or is a power of two.
Assert 3 <= modulus <= 2**768 - 1.
Assert 0 < alloc_count <= 256.
Define the size of elements stored in virtual registers: element_size as the size needed to represent the modulus padded to be a multiple of 64 bits. Implementations will align virtual register values along system-word lines.
assert that the new size of all virtual registers allocated in the current call-context does not exceed 24576 bytes.
Charge EVM memory expansion cost to expand memory by alloc_count * element_size bytes:.
if the modulus is odd, charge the dynamic cost component (defined below)
Allocate the new field context with alloc_count initially-zeroed registers. Associate it with id in the mapping.
The new field context is now set as the active one.

Dynamic Cost Component for Odd Moduli

Modulus size (padded to nearest 64 bits)	cost
64 bits	23
128 bits	26
192 bits	29
256 bits	32
320 bits	36
384 bits	39
448 bits	42
512 bits	45
576 bits	48
640 bits	51
704 bits	54
768 bits	58

The values in this table are based on the following linear cost model:

def cost_setmodx_dynamic(modulus_size: int) -> int:
    limb_count = (modulus_size + 7) / 8
    return round(limb_count * 3.13 + 19.94)

Arithmetic Opcodes

Opcodes ADDMODX(0xc3), SUBMODX(0xc4), MULMODX(0xc5) take a 7 byte immediate interpreted as byte values out_idx, out_stride, x_idx, x_stride, y_idx, y_stride, count.

Define cost tables for each arithmetic operation based on the modulus:

Modulus size (padded to nearest 64 bits)	cost per add/sub	cost per mul
64	1	1
128	1	1
192	1	1
256	1	2
320	1	2
384	1	3
448	1	4
512	2	5
576	2	7
640	2	8
704	2	10
768	2	12

The values in this table correspond to the following cost models:

def cost_add_or_sub(modulus_size: int) -> int:
    limb_count = (modulus_size + 7) // 8
    return round(limb_count * 0.12 + 0.58)

def cost_mul(modulus_size: int) -> int:
    limb_count = (modulus_size + 7) // 8
    return round((0.08 * limb_count**2) + (-0.06 * limb_count) + 0.75)

Execution asserts:

all accessed values fall within bounds: max([out_idx + (out_stride * count), x_idx + (x_stride * count), y_idx + (y_stride * count)]) < len(field_context.registers)
out_stride != 0
count != 0
an active field context active_ctx is set.

Then, charge count * operation_cost where operation_cost is drawn from the cost table. Compute:

for i in range(count):
    active_ctx.registers[out_idx+i*out_stride] = operation(active_ctx.registers[x_idx+i*x_stride], active_ctx.registers[y_idx+i*y_stride])

Where operation computes modular addition, subtraction or multiplication depending on the opcode.

Note: Inputs/outputs can overlap. active_ctx.registers is not modified until all operations in the loop have completed.

Data Transfer Opcodes

Note: serialization format for values in EVM memory: big-endian padded to active_ctx.element_size bytes.

`LOADX(0xc1)`

Stack in: (top of stack) dest source count

Stack out: none

Description: copies values from registers in the currently-active field context to EVM memory.

Execution

Assert that a field context is set as active in the current call frame.
Assert that the range [source, source + count] falls within the active field context's zero-indexed value space.
Assert that the range [dest, dest + count * active_ctx.element_size] falls entirely within EVM memory.
If modulus is a power of two:
- charge (active_ctxt.element_size + 31) // 32 * 3 gas (the same as the memory copying component for the MCOPY opcode)
If modulus is odd:
- charge count * operation_cost gas, where operation_cost is looked up from the multiplication cost table, given active_ctxt.element_size.
copy virtual register values [source, source + count] to memory [dest, dest + count * active_ctx.element_size].

`STOREX(0xc2)`

Stack in: dest source count

Stack out: none

Description: copies values from EVM memory into registers of the currently-active field context.

Execution

Assert that a field context is set as active in the current call frame.
Assert dest + count is less than or equal to the number of the active field context's virtual registers.
If modulus of the active context is a power of two:
- charge cost_mem_copy(count * active_ctx.element_size) (TBD: deliberately didn't choose mcopy gas model to see if this can consider a cheaper one).
Else if modulus of the active context is odd:
- charge count * operation_cost gas, where operation_cost is looked up from the multiplication cost table, given active_ctxt.element_size.
Assert that [source, source + count * active_context.element_size] falls entirely within EVM memory. Interpret it as count number of values, asserting that each is less than the modulus and storing them in the registers [dest, dest + count].

EVM Memory Expansion Cost Modification

When expanding EVM memory or allocating a new field context via SETMODX, expansion cost will now consider the size of all allocated virtual registers in the current call frame.

Rationale

Separation of EVM Memory and EVMMAX Virtual Register Space

It is assumed that optimized implementations will not store values in EVM-compatible big-endian serialization format, but instead convert them to an internal working representation. The costs in the spec explicitly reflect the choice of Montgomery form as an optimal internal representation.

Values represented in Montgomery form can make use of optimized modular reduction for multiplication. See the dedicated section on Montgomery multiplication in the rationale for more details.

Total Virtual Register Allocation Cap

24576 bytes is chosen as the per-call-context allocation limit with the goal of ensuring that the virtual register space can be contained in the L1 CPU data cache of most machines (with room to spare), in order that arithmetic operation costs do not have to account for memory access latency.

SETMODX cost

For odd moduli, SETMODX precomputes two values used for optimized modular multiplication via Montgomery reduction. This is dominated by a modular reduction by the modulus, so the cost model is assumed to be linear.

Power-of-two moduli require no precomputation for optimized modular arithmetic.

LOADX/STOREX costs

For power-of-two moduli, LOADX/STOREX are assumed to copy values directly between EVM/EVMMAX memories without measurable conversion cost to convert between EVM big-endian serialization and whatever internal representation is used.

For odd moduli, the spec assumes that the internal representation is Montgomery form. Conversion between canonical/Montgomery form requires one modular multiplication per value per direction. The cost of memory copying is baked in to the multiplication price model.

Arithmetic Costs

Addition/subtraction/multiplication are assumed to have constant-time implementations. Addition/subtraction can be implemented with linear complexity, while multiplication is quadratic in the size of the modulus.

The costs for power-of-two modular multiplication are provisional, and likely be brought down when an optimized arithmetic implementation is rolled out and benchmarked.

Montgomery Modular Multiplication

For a value A, an odd modulus M and a value R (must be coprime and greater than M, chosen as a power of two for efficient performance), the Montgomery representation is A * R % M.

Define the Montgomery modular multiplication of two values A and B: mulmont(A, B, M): A * B * R**-1 % M where R**-1 % M is the modular inverse of R with respect to M. There is various literature and algorithms for computing mulmont which have been published since 1985, and the operation is used ubiquitously where execution is bottlenecked by operations over odd moduli.

Note that normal modular addition and subtraction algorithms work for Montgomery form values.

Conversion of Canonical to Montgomery Representation

mulmont(canon_val, R**2 % M, M) = mont_val

Conversion from Montgomery to Canonical representation

mulmont(mont_val, 1, M) = canon_val

Implementation

# Basic Montgomery Multiplication
#   x, y, mod (int) - x < mod and y < mod
#   mod (int) - an odd modulus
#   R (int) -  a power of two, and greater than mod
#   mont_const (int) - pow(-mod, -1, R), precomputed
# output:
#  (x * y * pow(R, -1, mod)) % mod
#
def mulmont(x: int, y: int, mod: int, monst_const: int, R: int) -> int:
    t = x * y
    m = ((t % R) * mont_const) % R
    t = t + m * mod
    t /= R
    if t >= mod:
        t -= mod
    return t

There are various optimized algorithms for computing Montgomery Multiplication of bigint numbers expressed as arrays of system words.

These all use a different precomputed constant mont_const = pow(-mod, -1, 2**SYSTEM_WORD_SIZE_BITS).

Test Cases

There are tests contained in the Geth implementation. However, no cross-client tests exist yet.

Reference Implementation

There is an implementation of this EIP in an open PR to Go Ethereum.

EIP-6690 - EVM Modular Arithmetic Extensions

Abstract

Motivation

Specification

Constants

Conventions

Overview

New Opcodes

`SETMODX(0xc0)`

Execution

Dynamic Cost Component for Odd Moduli

Arithmetic Opcodes

Data Transfer Opcodes

`LOADX(0xc1)`

Execution

`STOREX(0xc2)`

Execution

EVM Memory Expansion Cost Modification

Rationale

Separation of EVM Memory and EVMMAX Virtual Register Space

Total Virtual Register Allocation Cap

SETMODX cost

LOADX/STOREX costs

Arithmetic Costs

Montgomery Modular Multiplication

Conversion of Canonical to Montgomery Representation

Conversion from Montgomery to Canonical representation

Implementation

Test Cases

Reference Implementation

Security Considerations

Copyright

Modulus size (padded to nearest 64 bits)	cost per add/sub	cost per mul
64	1	1
128	1	1
192	1	1
256	1	2
320	1	2
384	1	3
448	1	4
512	2	5
576	2	7
640	2	8
704	2	10
768	2	12

Modulus size (padded to nearest 64 bits)	cost per add/sub	cost per mul
64	1	1
128	1	1
192	1	1
256	1	2
320	1	2
384	1	3
448	1	4
512	2	5
576	2	7
640	2	8
704	2	10
768	2	12

Modulus size (padded to nearest 64 bits)	cost per add/sub	cost per mul
64	1	1
128	1	1
192	1	1
256	1	2
320	1	2
384	1	3
448	1	4
512	2	5
576	2	7
640	2	8
704	2	10
768	2	12