Introduce a new opcode, CLZ(x)
, which pops x
from the stack and pushes the number of leading zero bits in x
to the stack. If x
is zero, pushes 256.
Count leading zeros (CLZ) is a native opcode in many processor architectures (even in RISC architectures like ARM).
It is a basic building block used in math operations, byte operations, compression algorithms, data structures:
Adding a CLZ
opcode will:
shr
, which are very expensive to prove. In SP1 rv32im, a 32-bit right shift costs 1.6x more than a 32-bit mul.A new opcode is introduced: CLZ
(0x1e
).
/// @dev Count leading zeros.
/// Returns the number of zeros preceding the most significant one bit.
/// If `x` is zero, returns 256.
/// This is the fastest known `CLZ` implementation in Solidity and uses about 184 gas.
function clz(uint256 x) internal pure returns (uint256 r) {
/// @solidity memory-safe-assembly
assembly {
r := shl(7, lt(0xffffffffffffffffffffffffffffffff, x))
r := or(r, shl(6, lt(0xffffffffffffffff, shr(r, x))))
r := or(r, shl(5, lt(0xffffffff, shr(r, x))))
r := or(r, shl(4, lt(0xffff, shr(r, x))))
r := or(r, shl(3, lt(0xff, shr(r, x))))
// forgefmt: disable-next-item
r := add(xor(r, byte(and(0x1f, shr(shr(r, x), 0x8421084210842108cc6318c6db6d54be)),
0xf8f9f9faf9fdfafbf9fdfcfdfafbfcfef9fafdfafcfcfbfefafafcfbffffffff)), iszero(x))
}
}
Or in Python,
def clz(x):
"""Returns the number of zeros preceding the most significant one bit."""
if x < 0:
raise ValueError("clz is undefined for negative numbers")
if x > 2**256 - 1:
raise ValueError("clz is undefined for numbers larger than 2**256 - 1")
if x == 0:
return 256
# Convert to binary string and remove any '0b' prefix.
bin_str = bin(x).replace('0b', '')
return 256 - len(bin_str)
Or in C++,
inline uint32_t clz(uint32_t x) {
uint32_t r = 0;
if (!(x & 0xFFFF0000)) { r += 16; x <<= 16; }
if (!(x & 0xFF000000)) { r += 8; x <<= 8; }
if (!(x & 0xF0000000)) { r += 4; x <<= 4; }
if (!(x & 0xC0000000)) { r += 2; x <<= 2; }
if (!(x & 0x80000000)) { r += 1; }
return r;
}
// `x` is a uint256 bit number represented with 8 uint32 limbs.
// This implementation is optimized for SP1 proving via rv32im.
// For regular compute, it performs similarly to `ADD`.
inline uint32_t clz(uint32_t x[8]) {
if (x[7] != 0) return clz(x[7]);
if (x[6] != 0) return 32 + clz(x[6]);
if (x[5] != 0) return 64 + clz(x[5]);
if (x[4] != 0) return 96 + clz(x[4]);
if (x[3] != 0) return 128 + clz(x[3]);
if (x[2] != 0) return 160 + clz(x[2]);
if (x[1] != 0) return 192 + clz(x[1]);
if (x[0] != 0) return 224 + clz(x[0]);
return 256;
}
The cost of the opcode is 3, the same as ADD
.
256 is the smallest number after 255. Returning a small number allows the result to be compared with minimal additional bytecode.
For byte scanning operations, one can get the number of bytes to be skipped for a zero word by simply computing 256 >> 3
, which gives 32.
CTZ
(count trailing zeros)Computing the least significant bit can be easily implemented with CLZ
by isolating the smallest bit via x & -x
.
However, it is not possible to implement CLZ
with CTZ
.
We have benchmarked the CLZ
implementation against the ADD
implementation in the intx library. CLZ
uses approximately the same amount of compute cycles as ADD
.
The SP1 rv32im optimized variant uses less compute cycles than ADD
, in the average and worst cases.
In SP1 rv32im, a 256-bit CLZ
is cheaper to prove than ADD
.
This is a new opcode not present prior.
PUSH32 0x000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000100
PUSH32 0x8000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000000
PUSH32 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000000
PUSH32 0x4000000000000000000000000000000000000000000000000000000000000000
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000001
PUSH32 0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
CLZ
---
0x0000000000000000000000000000000000000000000000000000000000000001
PUSH32 0x0000000000000000000000000000000000000000000000000000000000000001
CLZ
---
0x00000000000000000000000000000000000000000000000000000000000000ff
CLZ
is a stateless opcode that has a low worst-case constant cost in memory usage, compute and proving costs. It is therefore safe from being exploited for denial of service attacks.
Copyright and related rights waived via CC0.