In this zkStudyClub session, Ivo presents techniques for applying the log-derivative lookup tables in a circuit using LegoSNARK-style commitment. As an application, we show how this lookup table can be used to implement range checks, specifically applying it to the non-native arithmetic. Using these optimisations, we were able to reduce the proof time for BN254 pairing in Groth16 to approx 5s (MBP M1). The technique also works for PLONKish arithmetisation.
zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark)
1. Log-derivative lookups for improving performance
of non-native arithmetic in SNARKs
Ivo Kubjas
gnark
August 3, 2023
2. Motivation
I In pairing based SNARKs we work in a pairing-friendly elliptic
curve group.
I The arithmetic is defined on the scalars of the EC group.
I The computation (circuit) is defined as a relation between
polynomials.
I Succinct verification: verifier only receives commitments to
some polynomials, asks opening and checks relation on the
evaluations.
I Heavy prover: has to compute relation → need FFT/NTT for
any reasonably-sized circuits
3. Motivation
I But curves which are good for SNARKs, are not compatible
with practical applications
I ECDSA over BN254, P-256/P-384
I RSA signature scheme
I BLS signatures
I We need non-native (to the scalar field) arithmetic!
Useful fields
Fast fields
for SNARKs
BLS sigs
over 2-chains
4. Non-native arithmetic
I Chinese remainder theorem 1 - schoolbook multi-precision
integer multiplication
I Casting out primes (nines) 2 - check against many small prime
moduli
I Goblin Plonk - ZKSG a few weeks ago
I xjSNARK-style polynomial identity testing 3
1
https://hackmd.io/@arielg/B13JoihA8
2
https://eprint.iacr.org/2022/1470
3
https://akosba.github.io/papers/xjsnark.pdf
5. Representation
I Moduli of native field r and non-native field q.
I Decompose non-native element a in basis 2B:
a =
N−1
X
i=0
ai2iB
, ∀ai ∈ [0, 2B
)
I If 2B < r, then limbs ai can fit into the native field.
Native element
Non-native
element limb
a0 a1 a2
I Have to track if possibly ai ≥ 2B. Introducing overflow such
that ai ∈ [0, 2B+overflow).
6. Arithmetic 101
I Arithmetic on integers, do not bother about modular
reduction for now.
I Addition limbwise: a + b =
PN−1
i=0 (ai + bi)2iB. Set
overflow = max(overflowa, overflowb) + 1.
I It is going to be easy...
I Subtraction limbwise: a − b =
PN−1
i=0 (ai − bi)2iB. But what if
bi > ai? 🤯
I Being in a field, can add multiples of q: padding s such that
si > bi and s = αq.
I Subtraction: a + s − b, then never underflows.
7. Multiplication
I Naive integer multiplication:
c = a · b ⇔ c` =
2N−1
X
i,j=0
i+j=`
aibj
I Observe: native multiplication complexity O(N2).
I xjSNARK observations
I for integer a =
P
ai 2B
associate polynomial a(X) =
P
ai X
I can compute c out-circuit (using advice/hint) and have to
assert a(X) · b(X) = c(X)
I cannot do Schwartz-Zippel, but degree of c(X) is small
enough to brute-force
I constants!
I Got O(N) multiplication complexity (T&C apply)
I Overflow of the result limbs bounded by
B + overflowa + overflowb + b + log2(2N − 1).
I I went over the fact that we need to range-check c` from hint.
8. Modular reduction
I Can amortize multiplications before we have to mod-reduce
I But in practice not useful as limb count of grows exponentially
and overflows large ⇒ range checks become very difficult
I a ≡ b (mod q) ⇔ ∃α : a − b = αq (NB! integer assertion)
I Could try comparing limb-wise, but a − b and αq may have
different overflows
I To carry excess, need to partition the limbs at common split
⇒ need to range check carries to ensure partition correctness.
a0 a1 a2
- + - +
b0 b1 b2
e0 e1
e0 e1
I For equality check of a and b, consider as polynomials a(X),
b(X) and polynomial e(X) made from the excess:
a(X) = b(X) + (2B
− X)e(X)
9. Mulmod
I Combining with multiplication and modular reduction, get:
a(X)b(X) ≡ c(X) + α(X)q(X) + (2B
− X)e(X) (mod r)
I Good in R1CS (polynomial evaluation at constant)
I Less good in PLONK
I Some badness can be averted using caching
10. Done?
I Multiplication complexity small-ish
(O(N) with small constants)
I But have to range check: c
(modular residual c, coefficient α
and carries e)
I Naive range check adds 1/2
constraint per bit (O(B) with same
small constants):
(1 − xi) ∗ xi = 0 &
X
i
xi2i
= x
I B is ≤ 64 times larger than N
11. Range checks
I UltraPLONK (custom gates + plookup) - couldn’t figure out
how to do nicely, also in Groth16.
I Waksman permutation network - too small saving.
I Multiset equality using logarithmic derivative argument? 4
X
fi
ki
X − fi
=
X
sj
1
X − sj
4
https://ia.cr/2022/1530
12. Fiat-Shamir challenge in-circuit
I We would need a succinct verifier challenge depending on fi,
ki and si.
I In-circuit hashing doesn’t work, too expensive for prover.
I Out-circuit challenge computation doesn’t work, too
expensive for verifier and privacy loss.
I LegoSNARK commitment?
13. I Trick to efficiency - use part of proof as a commitment.
14. Commitment as in-circuit challenge
I Pedersen vector commitment with proving key as a basis
I For binding, basis has to be linearly independent ⇒ basis with
known relations to prover would lead to multiple valid witness.
I If prover can predict commitment value for a random basis,
then can break discrete log.
I Hash commitment with domain separation to native field, use
as a public witness.
I For PLONK, we use a custom gate to mark committed
variables and use its polynomial commitment as a public
witness.5
5
https://ia.cr/2022/1072
15. Using randomness in circuit
I Unified circuits for PLONK and R1CS.
I Multiple commitment: τi = H(i, τ)
I Tables by compressing entries and lookups: f(τ) =
P
i fiτi
I Boolean function pre-computation: Lookup(x||y||XOR(x, y))
I Non-native mulmod check:
a(τ)b(τ) ≡ c(τ) + α(τ)q(τ) + (2B − τ)e(τ)
18. Technical considerations - range check table size
func (c *Circuit) Define(api *frontend.API) error {
rchecker := rangecheck.New(api)
rchecker.Check(c.Witness, 16)
rchecker.Check(c.Witness2, 16)
// built table of size 2^16
}
I Estimate optimal table size for the number of inputs and bits
checked
19. Benchmarks
I Counting constraints not very descriptive (proof systems,
precomputation)
I Time, CPU usage, memory better
I https://www.zk-bench.org/circuit
I https://zka.lc/
I https://zprize.io
I Benchmarks on MBP M1 over BN254 (solve + prove)
Operation Groth16 PLONK
ECDSA secp256k1/P256 1.29s (284767) 18.9s (1136131)
ECDSA P384 2.75s (598706) 127.9s (2334733)
BN254 pairing 7.07s (1895732) (7458801)
BLS12-381 pairing 10.90s (2546974) (10077257)