This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to efficiently perform fast polynomial multiplication for post-quantum and STARK cryptography.
With the recent advances in quantum computing, there are increased concerns for the quantum threat against Ethereum. Today ECDSA is the EOA signature algorithms, which is vulnerable to attacks by quantum computers. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and post-quantum threat mitigation.
| Name | Value | Comment |
|---|---|---|
| NTT_FW | TBD | precompile address |
| NTT_INV | TBD | precompile address |
| NTT_VECMULMOD | TBD | precompile address |
| NTT_VECADDMOD | TBD | precompile address |
We introduce four separate precompiles to perform the following operations:
600 gas,600 gas,The NTT_FW and NTT_INV are fully defined by the following set of parameters: Let $R$ be a cyclotomic ring of the form $R=\mathbb F_q[X]/(X^n+1)$. In these notations,
Any element $a \in R$ is a polynomial of degree at most $n-1$ with integer coefficients, written as $a=\sum_{i=0}^{n-1} a_iX^i$
The NTT transformation is described by the following algorithm.
Input: A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in standard order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi_\text{rev} \in \mathbb{Z}_q^n$ storing powers of $\psi$ in bit-reversed order.
Output: $a \leftarrow \text{NTT_FW}(a)$ in bit-reversed order.
t ← n
for m = 1 to n-1 by 2m do
t ← t / 2
for i = 0 to m-1 do
j1 ← 2 ⋅ i ⋅ t
j2 ← j1 + t - 1
S ← Ψrev[m + i]
for j = j1 to j2 do
U ← a[j]
V ← a[j + t] ⋅ S
a[j] ← (U + V) mod q
a[j + t] ← (U - V) mod q
end for
end for
end for
return a
The Inverse NTT is described by the following algorithm.
Input: A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in bit-reversed order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi^{-1}_\text{rev} \in \mathbb F_q^n$ storing powers of $\psi^{-1}$ in bit-reversed order.
Output: $a \leftarrow \text{NTT_INV}(a)$ in standard order.
t ← 1
for m = n to 1 by m/2 do
j1 ← 0
h ← m / 2
for i = 0 to h-1 do
j2 ← j1 + t - 1
S ← Ψ⁻¹rev[h + i]
for j = j1 to j2 do
U ← a[j]
V ← a[j + t]
a[j] ← (U + V) mod q
a[j + t] ← (U - V) ⋅ S mod q
end for
j1 ← j1 + 2t
end for
t ← 2t
end for
for j = 0 to n-1 do
a[j] ← a[j] ⋅ n⁻¹ mod q
end for
return a
The NTT_VECMULMOD is functions similarly to SIMD, but operates with larger input and output sizes.
Input: Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.
Output: The element-wise product $(a[0]\cdot b[0] \mod q, a[1]\cdot b[1]\mod q, \dots, a[n-1]\cdot b[n-1] \mod q)$.
Gas cost: Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.
The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.
Input: Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.
Output: The element-wise addition $(a[0]+ b[0] \mod q, a[1]+ b[1]\mod q, \dots, a[n-1]+ b[n-1] \mod q)$.
Gas cost: Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) /32$.
If $f$ and $g$ are two polynomials of $R$, then $f\times g= \text{NTT_INV}(\text{NTT_VECMULMOD}( \text{NTT_FW}(a), \text{NTT_FW}(b)))$ is equal to the product of $f$ and $g$ in $R$. The algorithm has a complexity of $n \log_2n$ rather than $n^2$ with the classical schoolbook multiplication algorithm.
The gas cost model for EIP-7885 precompiles is designed to target approximately 50-55 mgas/s performance, consistent with existing precompiled contracts like ECRECOVER. Two implementations with different gas costs are available:
Gas Costs by Implementation:
| Scheme | Ring Degree | nocgo NTT_FW | nocgo NTT_INV | cgo NTT_FW | cgo NTT_INV |
|---|---|---|---|---|---|
| Falcon-512 | 512 | 790 gas | 790 gas | 500 gas | 500 gas |
| Falcon-1024 | 1024 | 1,750 gas | 1,750 gas | 1,080 gas | 1,080 gas |
| ML-DSA (Dilithium) | 256 | 220 gas | 270 gas | 256 gas | 340 gas |
Pure Go (nocgo): Zero-dependency implementation.
liboqs (cgo): Offers 36-38% lower gas costs for NTT operations.
Rationale: Gas costs are calibrated per cryptographic scheme based on actual execution performance. NTT computation complexity varies by ring degree and modulus size, with costs optimized to maintain consistent throughput across different post-quantum schemes.
VECMULMOD Gas Cost: ceil(0.32 × N)
VECADDMOD Gas Cost: ceil(0.3 × N)
Cost Components:
Benchmark Validation: The dynamic cost model achieves target performance:
The gas cost formulas accurately reflect actual execution costs while maintaining ~50-55 mgas/s performance target across all tested cryptographic standards.
The implementation applies for many fields of interest for cryptography. In particular, the design applies for:
To illustrate the interest of the precompile, the assets provide the measured gas const for a single NTT and extrapolates the minimal gas cost taking into account the required number of NTT_FW and NTT_INV. The provided assets use pure Yul optimizations, with memory access hacks. It is unlikely that more than one order of magnitude could be spared on such a minimal code.
| Use case | Parameters | single NTT gas cost | Required NTT(FW/INV) | Estimated NTT/Full cost |
|---|---|---|---|---|
| Falcon | $q=12289, n=512$ | 1.8 M | 1 NTTFW+1 NTTINV | 3.6 M |
| Dilithium | $q=2^{23}-2^{13}+1, n=256$ | 460K | 4 NTTFW +1 NTTINV | 2.3M |
Falcon cost has been measured over a full implementation and is compliant to the estimation. Dilithium cost is evaluated assuming
This demonstrates that using pure solidity enables L2s with low gas fees to experiment with FALCON in the short term, whereas it is too expensive to do so on L1. Adopting this EIP, the signature verification of Falcon can be reduced to 1500 gas, and a similar result is expected for Dilithium. Adopting the hash function as a separate EIP would enable a gas verification cost of 2000 gas. This is in line with the ratio looking at SUPERCOP implementations.
Two implementations have been developed for op-geth with four precompiled contracts at addresses 0x12, 0x13, 0x14, and 0x15:
Pure Go Implementation (nocgo) - Default:
ceil(0.32 × N) gas, VECADDMOD: ceil(0.3 × N) gasliboqs Implementation (cgo) - High Performance:
Both implementations support:
Benchmark results on Intel(R) Xeon(R) CPU @ 2.20GHz (liboqs implementation):
BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-512-Gas=500-4
377011 9411 ns/op 500 gas/op 53.12 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-Falcon-1024-Gas=1080-4
176936 20419 ns/op 1080 gas/op 52.89 mgas/s
BenchmarkPrecompiledNTT_FW/NTT_FW-ML-DSA-256-Gas=256-4
740838 4785 ns/op 256 gas/op 53.49 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-512-Gas=500-4
372236 9374 ns/op 500 gas/op 53.33 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-Falcon-1024-Gas=1080-4
176300 20316 ns/op 1080 gas/op 53.15 mgas/s
BenchmarkPrecompiledNTT_INV/NTT_INV-ML-DSA-256-Gas=340-4
558224 6396 ns/op 340 gas/op 53.15 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-512-Gas=164-4
1237585 2919 ns/op 164 gas/op 56.17 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-Falcon-1024-Gas=328-4
592540 5999 ns/op 328 gas/op 54.67 mgas/s
BenchmarkPrecompiledNTT_VECMULMOD/VECMULMOD-ML-DSA-256-Gas=82-4
1726333 1957 ns/op 82 gas/op 41.89 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-512-Gas=154-4
1273167 2828 ns/op 154 gas/op 54.45 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-Falcon-1024-Gas=308-4
606345 5829 ns/op 308 gas/op 52.84 mgas/s
BenchmarkPrecompiledNTT_VECADDMOD/VECADDMOD-ML-DSA-256-Gas=77-4
2178388 1645 ns/op 77 gas/op 46.81 mgas/s
BenchmarkPrecompiledEcrecover/-Gas=3000-4
65388 57182 ns/op 3000 gas/op 52.46 mgas/s
The benchmarks demonstrate that all NTT precompiles maintain competitive or better throughput compared to existing precompiled contracts like ECRECOVER, processing approximately 52-56 million gas per second. Both implementations (nocgo and cgo) provide efficient support for multiple post-quantum cryptographic schemes including Falcon-512/1024 and ML-DSA/Dilithium, with the liboqs variant offering ~36-38% lower gas costs for NTT operations.
Integration testing demonstrates the practical impact of NTT precompiles on full signature verification. Tests were conducted using modified ZKNOX implementations (ETHFALCON and ETHDILITHIUM) against an op-geth client with NTT precompile support.
Test Environment: OP-Stack testnet with NTT precompiles enabled.
| Algorithm | Verification Method | Pure Solidity | With Precompile | Gas Saved |
|---|---|---|---|---|
| ETHFALCON (Falcon-512) | Direct | 1,800,000 | 479,341 | 1,320,659 (73.4%) |
| ETHDILITHIUM (ML-DSA) | PKContract-based | 9,746,410 | 7,618,412 | 2,127,998 |
| ETHDILITHIUM (ML-DSA) | Direct (struct) | 7,874,354 | 5,732,354 | 2,142,000 |
Key Findings:
These results validate that NTT precompiles significantly reduce gas costs for post-quantum signature verification, making lattice-based cryptography more practical for Ethereum applications.
There are no backward compatibility questions.
The reference implementation includes comprehensive test coverage across multiple layers:
Malformed Input Tests - 8 test cases covering:
Functional Tests:
INTT(NTT(x)) = xCrypto Standards Benchmarks:
Unified Malformed Input Tests - 7 test cases covering:
Functional Tests:
result[i] = (a[i] * b[i]) mod qresult[i] = (a[i] + b[i]) mod qCrypto Standards Benchmarks - Performance validation with:
Benchmark data are available in the EIP assets:
Pure Go Implementation (nocgo):
liboqs Implementation (cgo):
All implementations have been validated over a large base of reference vectors, and implementing both FALCON and DILITHIUM algorithms as demonstration of the usefulness of the precompile.
Needs discussion.
Copyright and related rights waived via CC0.