zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark)

Log-derivative lookups for improving performance
of non-native arithmetic in SNARKs
Ivo Kubjas
gnark
August 3, 2023

Motivation
I In pairing based SNARKs we work in a pairing-friendly elliptic
curve group.
I The arithmetic is defined on the scalars of the EC group.
I The computation (circuit) is defined as a relation between
polynomials.
I Succinct verification: verifier only receives commitments to
some polynomials, asks opening and checks relation on the
evaluations.
I Heavy prover: has to compute relation → need FFT/NTT for
any reasonably-sized circuits

Motivation
I But curves which are good for SNARKs, are not compatible
with practical applications
I ECDSA over BN254, P-256/P-384
I RSA signature scheme
I BLS signatures
I We need non-native (to the scalar field) arithmetic!
Useful fields
Fast fields
for SNARKs
BLS sigs
over 2-chains

Non-native arithmetic
I Chinese remainder theorem 1 - schoolbook multi-precision
integer multiplication
I Casting out primes (nines) 2 - check against many small prime
moduli
I Goblin Plonk - ZKSG a few weeks ago
I xjSNARK-style polynomial identity testing 3
1
https://hackmd.io/@arielg/B13JoihA8
2
https://eprint.iacr.org/2022/1470
3
https://akosba.github.io/papers/xjsnark.pdf

Representation
I Moduli of native field r and non-native field q.
I Decompose non-native element a in basis 2B:
a =
N−1
X
i=0
ai2iB
, ∀ai ∈ [0, 2B
)
I If 2B < r, then limbs ai can fit into the native field.
Native element
Non-native
element limb
a0 a1 a2
I Have to track if possibly ai ≥ 2B. Introducing overflow such
that ai ∈ [0, 2B+overflow).

Arithmetic 101
I Arithmetic on integers, do not bother about modular
reduction for now.
I Addition limbwise: a + b =
PN−1
i=0 (ai + bi)2iB. Set
overflow = max(overflowa, overflowb) + 1.
I It is going to be easy...
I Subtraction limbwise: a − b =
PN−1
i=0 (ai − bi)2iB. But what if
bi > ai? 🤯
I Being in a field, can add multiples of q: padding s such that
si > bi and s = αq.
I Subtraction: a + s − b, then never underflows.

Multiplication
I Naive integer multiplication:
c = a · b ⇔ c` =
2N−1
X
i,j=0
i+j=`
aibj
I Observe: native multiplication complexity O(N2).
I xjSNARK observations
I for integer a =
P
ai 2B
associate polynomial a(X) =
P
ai X
I can compute c out-circuit (using advice/hint) and have to
assert a(X) · b(X) = c(X)
I cannot do Schwartz-Zippel, but degree of c(X) is small
enough to brute-force
I constants!
I Got O(N) multiplication complexity (T&C apply)
I Overflow of the result limbs bounded by
B + overflowa + overflowb + b + log2(2N − 1).
I I went over the fact that we need to range-check c` from hint.

Modular reduction
I Can amortize multiplications before we have to mod-reduce
I But in practice not useful as limb count of grows exponentially
and overflows large ⇒ range checks become very difficult
I a ≡ b (mod q) ⇔ ∃α : a − b = αq (NB! integer assertion)
I Could try comparing limb-wise, but a − b and αq may have
different overflows
I To carry excess, need to partition the limbs at common split
⇒ need to range check carries to ensure partition correctness.
a0 a1 a2
- + - +
b0 b1 b2
e0 e1
e0 e1
I For equality check of a and b, consider as polynomials a(X),
b(X) and polynomial e(X) made from the excess:
a(X) = b(X) + (2B
− X)e(X)

Mulmod
I Combining with multiplication and modular reduction, get:
a(X)b(X) ≡ c(X) + α(X)q(X) + (2B
− X)e(X) (mod r)
I Good in R1CS (polynomial evaluation at constant)
I Less good in PLONK
I Some badness can be averted using caching

Done?
I Multiplication complexity small-ish
(O(N) with small constants)
I But have to range check: c
(modular residual c, coefficient α
and carries e)
I Naive range check adds 1/2
constraint per bit (O(B) with same
small constants):
(1 − xi) ∗ xi = 0 &
X
i
xi2i
= x
I B is ≤ 64 times larger than N

Range checks
I UltraPLONK (custom gates + plookup) - couldn’t figure out
how to do nicely, also in Groth16.
I Waksman permutation network - too small saving.
I Multiset equality using logarithmic derivative argument? 4
X
fi
ki
X − fi
=
X
sj
1
X − sj
4
https://ia.cr/2022/1530

Fiat-Shamir challenge in-circuit
I We would need a succinct verifier challenge depending on fi,
ki and si.
I In-circuit hashing doesn’t work, too expensive for prover.
I Out-circuit challenge computation doesn’t work, too
expensive for verifier and privacy loss.
I LegoSNARK commitment?

I Trick to efficiency - use part of proof as a commitment.

Commitment as in-circuit challenge
I Pedersen vector commitment with proving key as a basis
I For binding, basis has to be linearly independent ⇒ basis with
known relations to prover would lead to multiple valid witness.
I If prover can predict commitment value for a random basis,
then can break discrete log.
I Hash commitment with domain separation to native field, use
as a public witness.
I For PLONK, we use a custom gate to mark committed
variables and use its polynomial commitment as a public
witness.5
5
https://ia.cr/2022/1072

Using randomness in circuit
I Unified circuits for PLONK and R1CS.
I Multiple commitment: τi = H(i, τ)
I Tables by compressing entries and lookups: f(τ) =
P
i fiτi
I Boolean function pre-computation: Lookup(x||y||XOR(x, y))
I Non-native mulmod check:
a(τ)b(τ) ≡ c(τ) + α(τ)q(τ) + (2B − τ)e(τ)

Technical consideration - non-native soundness
func (c *Circuit) Define(api *frontend.API) error {
nna := emulated.New[emulated.Secp256k1](api)
nna.Rangecheck(c.Witness)
nna.Rangecheck(c.Input)
res := nna.Mul(c.Witness, c.Input)
nna.Rangecheck(res)
// ...
}
Better
nna := emulated.New[emulated.Secp256k1](api)
res := nna.Mul(c.Witness, c.Input)
// ...
}

Technical considerations - lazy finalization
rchecker := rangecheck.New(api)
rchecker.Check(c.Witness, 16)
// ..
rchecker.Finalize()
}
Better
return nil // automatically finalized
}

Technical considerations - range check table size
rchecker.Check(c.Witness2, 16)
// built table of size 2^16
}
I Estimate optimal table size for the number of inputs and bits
checked

Benchmarks
I Counting constraints not very descriptive (proof systems,
precomputation)
I Time, CPU usage, memory better
I https://www.zk-bench.org/circuit
I https://zka.lc/
I https://zprize.io
I Benchmarks on MBP M1 over BN254 (solve + prove)
Operation Groth16 PLONK
ECDSA secp256k1/P256 1.29s (284767) 18.9s (1136131)
ECDSA P384 2.75s (598706) 127.9s (2334733)
BN254 pairing 7.07s (1895732) (7458801)
BLS12-381 pairing 10.90s (2546974) (10077257)

zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark)

Similar to zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark) (20)

More from Alex Pruden

More from Alex Pruden (13)

Recently uploaded

Recently uploaded (20)

zkStudyClub - Improving performance of non-native arithmetic in SNARKs (Ivo Kubjas, Consensys Gnark)