A 90-minute online presentation for zkStudyClub, delivered 2020-06-01. I present a new idea with a demonstrated 5% speed-up for multi-scalar multiplication. When combined with precomputation, this method could yield upwards of 20% speed-up.
Multi-scalar multiplication: state of the art and new ideas
1. Multi-scalar multiplication: state of the art and new ideas
presented at zkStudyClub
Gus Gutoski
zkTeam, ConsenSys R&D
June 1, 2020
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 1 / 57
2. Introduction
The multi-scalar multiplication problem (MSM)
Also known as. Multi-exponentiation, multi-exp.
Parameters. A cyclic group G whose order |G| has bit length b.
(Example. BLS or BN elliptic curves have |G| ≈ 2256, so b = 256.)
Input. Group elements G1, . . . , Gn in G called inputs.
Integers a1, . . . , an between 0 and |G| called scalars.
Output. The group element a1G1 + · · · + anGn called the output.
Goal. Minimize the number of group (+) operations as a function of n.
Naive solution. Use double-and-add to compute each ai Gi , then add them all up.
Expected group ops: 1.5bn ≈ 384n.
Can we do better? (<sarcasm> No. Let’s all go home. </sarcasm>)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 2 / 57
4. Motivation: zero-knowledge provers
Example: Groth16 protocol (gross simplification)
Let n denote the size of the secret inputs x accepted by program P.
The proving key for program P contains (among other things) n group elements
G1, . . . , Gn.
Given a size-n secret input x for program P, the prover deduces integers a1, . . . , an and
computes G := a1G1 + · · · + anGn. The proof contains G (among other things).
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 4 / 57
5. Motivation: zero-knowledge provers
Are you motivated yet?
Program P takes size-n secret input =⇒ a ZKP prover must do a MSM of n points.
Example. Zcash spend program: n ≈ 4 × 104.
Example. Rollup (a scalability solution): the bigger the n, the better.
Goal. Programs with n = 107, 108, or more.
MSM accounts for 80% of prover work.
Justin Drake: ”Focus on multi-exponentiation, forget about FFTs.” From Zero
Knowledge podcast , 2020-03-11.
Takeaway
Multi-scalar multiplication (MSM) dominates prover costs. Prover costs dominate ZKP costs.
Improvements for MSM immediately yield improvements in ZKP efficiency.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 5 / 57
6. Overview
State of the art: the bucket method
zkTeam’s implementation in gnark: [GitHub]
BLS or BN curves (b ≈ 256).
Number of group (+) ops scales like
16n + (constant)
Compare: naive method scales like 384n. That’s a 24× improvement!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 6 / 57
7. Overview
Overview of the bucket method
Succinct description in the paper 2012/549. (Section 4, “Overlap in the Pippenger
approach”.)
High-level strategy:
1. Reduce one b-bit MSM to several c-bit MSMs for manageable c ≤ b.
2. [Interesting part.] Use tricks to solve the c-bit MSMs. (See next section.)
3. Combine c-bit MSMs into the final b-bit MSM.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 7 / 57
8. Overview
Step 1. Reduce b-bit MSM to c-bit MSM (1)
Choose c ≤ b. Write each scalar a1, . . . , an in binary. Partition binary scalars into c-bit parts.
Example. Given b = 12, choose c = 3. Each 12-bit scalar a is partitioned into 3-bit parts.
Given the scalar a = 1368 we write a = (2, 3, 5, 0):
1368 in binary : 010
2
101
5
011
3
000
0
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 8 / 57
9. Overview
Step 1. Reduce b-bit MSM to c-bit MSM (2)
Deduce b/c instances of c-bit MSM from the partitioned scalars.
Example, continued. (b, c, b/c) = (12, 3, 4). Paritition each scalar ai = (ai,1, ai,2, ai,3, ai,4).
The 4 c-bit MSM instances T1, . . . , T4 are given by
T1 := a1,1G1 + · · · + an,1Gn
...
T4 := a1,4G1 + · · · + an,4Gn
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 9 / 57
10. Overview
Step 3. Combine c-bit MSMs into the final b-bit MSM
The usual way: double c times then add.
Example, continued. (b, c, b/c) = (12, 3, 4). Combine T1, . . . , T4 into the final answer T:
1. T ← T1
2. For j = 2, . . . , 4:
2.1 T ← 2c
T (Double c times)
2.2 T ← T + Tj
Final answer: T = a1G1 + · · · + anGn.
Computation cost in group (+) ops:
(b/c − 1) × (c + 1) = b − c +
b
c
− 1 (1)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 10 / 57
11. Overview
Step 2. Use tricks to solve the c-bit MSMs
Ready for tricks?
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 11 / 57
12. Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 12 / 57
13. Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 13 / 57
14. Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 14 / 57
15. Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G4 G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 15 / 57
16. Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G4 G2 G1 G5
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 16 / 57
17. Core
Sum the contents of each bucket
G14 G11
G9 G6 G13 G3 G12
G4 G2 G7 G10 G8 G1 G5
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
S1 ← G4 + G9 + G14 S3 ← G7 + G13 S5 ← G8 S7 ← G5 + G12
S2 ← G2 + G6 S4 ← G10 S6 ← G1 + G3 + G11
Expected cost to compute S1, . . . , S7 in group (+) ops:
n − (2c
− 1) = n − 2c
+ 1 (2)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 17 / 57
18. Core
Combine the bucket sums to get the answer
Desired output a1G1 + · · · + anGn equals
S1 + 2S2 + 3S3 + · · · + 7S7
This is not obvious, but easy to check.
This is another instance of MSM with inputs S1, . . . , S7, scalars 1, . . . , 7.
Number of inputs 2c
− 1 is fixed.
Scalars 1, . . . , 2c
− 1 are known in advance.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 18 / 57
19. Core
A fast way to combine the bucket sums
The desired sum
S1 + 2S2 + 3S3 + · · · + 7S7
is computed via
S7
+ (S7 + S6)
+ (S7 + S6 + S5)
...
+ (S7 + S6 + S5 + S4 + S3 + S2 + S1)
Computation cost in group (+) ops:
2 × (2c
− 2) + 1 = 2c+1
− 3 (3)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 19 / 57
20. Efficiency
Total cost for the bucket method: theory
Expected cost in group (+) ops (from Eqs. (1), (2), (3)):
b
c
(n + 2c
− 2) + b − c +
b
c
− 1 ≈
b
c
(n + 2c
) (4)
Minimum occurs at c ≈ log n. At first glance, asymptotic scaling looks like
O b
n
log n
Beware! We must have c ≤ b, so we cannot choose c ≈ log n when n > 2b.
For n > 2b scaling reverts to O(n).
Example (b = 1). n
log n scaling is impossible; O(n) is the best we can do.
Example (b = 256). n can never reach 2256
so n
log n scaling is achievable.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 20 / 57
21. Efficiency
Total cost for the bucket method: practice
Eq. (4):
b
c
(n + 2c
)
A large instance is n = 107 (so log n ≈ 23).
For gnark with b = 256 we observed peak performance at c = 16. This yields the cost
claimed on slide 6:
16n + 212
Puzzle: Why c = 16 instead of c = log n? Other concerns:
Memory use scales with 2c
. Eventually, memory is a bottleneck.
Fewer edge cases if c divides b.
Example. 256-bit scalars stored in four 64-bit limbs. It’s annoying if c-bit MSM straddles
two limbs.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 21 / 57
22. Improvements
Ideas to improve upon the bucket method
1. Parallelism: yes
2. Precomputation: not really, unless combined with item 4
3. Low Hamming-weight representations: no
4. [New!] [Elliptic curves only] Signed digits, and generalization: yes
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 22 / 57
23. Parallelism
Parallelism for the bucket method
Is the bucket method faster on multiple cores? Yes!
Natural boundary for parallel computation: each c-bit MSM is independent of the rest.
scalars b-bit decimal c-bit binary parts
a1 : 1368 010 101 011 000
a2 : 819 001 100 110 011
...
...
...
...
...
...
an : 2709 101 010 010 101
b/c cores : 1 2 3 4
Easily make full use of up to b/c cores.
Example: (b, c) = (256, 16) =⇒ easy use up to 256/16 = 16 cores.
Increased memory use: each core uses 2c memory.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 23 / 57
24. Parallelism
Even more parallelism for the bucket method?
Sometimes we have more than b/c cores available. Can we use all of them? Yes, but. . .
Another natural boundary: partition the inputs
Inefficiency. 2 MSM instances of size n/2 costs more group (+) ops than 1 MSM
instance of size n.
gnark does not do this; parallelism is limited to b/c cores.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 24 / 57
25. Precomputation
Precomputation for the bucket method
Inputs G1, . . . , Gn are known in advance. Can we use this to our advantage? Sort of.
Idea. Precompute a bunch of points and store them. Examples:
For each input G: 2G, 3G, . . . , (2c
− 1)G.
For each input G: 2k
G, 22k
G, . . . , 2mk
G for some k, m
Various subsets of inputs: (G1 + G2 + G3), (G1 + G2), (G1 + G3), (G2 + G3).
Goal. A smooth trade-off between procomputed storage vs. run time.
Problem. Large MSM instances already use most available memory.
Example. For n = 108
gnark needs 58GB to store enough BLS12-377 curve points to
produce a ZKP for a program with size-n secret input.
Perhaps we could store extra points on disk. But disk reads might be too slow.
Experimentation needed.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 25 / 57
26. Precomputation
Naive precomputation for the bucket method
For each input G precompute 2G, 3G, . . . , (2c − 1)G.
Recall. The goal of c-bit MSM is to compute a1G1 + · · · + anGn for c-bit scalars
a1, . . . , an.
If ai Gi are already in storage then there’s nothing left to do!
No need to compute bucket sums S1, . . . , S2c −1.
No need to accumulate bucket sums S1 + 2S2 + · · · + (2c
− 1)S2c −1.
Number of group (+) ops reduces to
b
c
n + b − c +
b
c
− 1 ≈
b
c
n
Extra storage space required is (2c − 2)n
Extreme example b = c = 256. If we store 2256
n points then we need only n group (+) ops.
Realistic example (n, b, c) = (223
, 256, 16). Extra storage is 550 billion elliptic curve points!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 26 / 57
27. Precomputation
A trade-off for naive precomputation
For each input G precompute k points: (2c − k)G, . . . , (2c − 1)G.
Bucket method only for the first 2c − 1 − k buckets instead of 2c − 1:
Compute only S1, . . . , S2c −1−k .
Accumulate only S1 + 2S2 + · · · + (2c
− 1 − k)S2c −1−k .
Total cost in group (+) ops:
b
c
(n + 2c
− k − 2) + b − c +
b
c
− 1 ≈
b
c
(n + 2c
− k)
Extra storage: kn points. Choose k as big as you can store.
Takeaway
Small storage capacity =⇒ negligible improvement.
Non-negligible improvements can only be achieved with very large storage capacity.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 27 / 57
28. Low Hamming-weight representations
Lessons from speed-ups for single-scalar multiplication
Problem. (Single-) scalar multiplication.
Input. Scalar a, group element G.
Output. Group element aG.
Standard method: double-and-add.
Cost increases with the Hamming weight of a.
Examples. 8-bit scalars:
128 binary: 10000000 7 group (+) ops
170 binary: 10101010 10 group (+) ops
240 binary: 11110000 10 group (+) ops
255 binary: 11111111 14 group (+) ops
Idea. Use a different (non-binary) encoding for scalars with lower average Hamming
weight.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 28 / 57
29. Low Hamming-weight representations
Example: non-adjacent form (NAF)
Like binary except digits can be {−1, 0, 1}.
Requires group (−) ops to cost the same as group (+) ops.
Elliptic curve groups have this property.
Non-zero digits are never adjacent.
Average Hamming density is 1/3. (Compare: 1/2 for binary.)
Examples.
128 NAF: 0 1 0 0 0 0 0 0 0 7 group (+) ops
170 NAF: 0 1 0 1 0 1 0 1 0 10 group (+) ops
240 NAF: 1 0 0 0 −1 0 0 0 0 8 group (+) ops, 1 group (−) op
255 NAF: 1 0 0 0 0 0 0 0 −1 8 group (+) ops, 1 group (−) op
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 29 / 57
30. Low Hamming-weight representations
Example: double-base number system (DBNS)
Scalars written as a linear combo of 2i 3j . (Compare: 2i only for binary.)
Digit set can be binary {0, 1} or larger.
Highly redundant—each scalar has many representations.
Example. 127 has 783 representations with digit set {0, 1}. Minimum Hamming weight
is 3. (Compare: 7 for binary.) There are 3 such representations:
127 = 22
33
+ 21
32
+ 20
30
127 = 22
33
+ 24
30
+ 20
31
127 = 25
31
+ 20
33
+ 22
30
Hamming density for a b-bit scalar is O(1/ log b). (Compare: 1/2 for binary.)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 30 / 57
31. Low Hamming-weight representations
Can low Hamming-weight representations improve the bucket method?
No!
Cost of bucket method increases with the number of possible scalars.
There are 2c possible c-bit scalars =⇒ always need 2c buckets, regardless of how those
scalars are encoded.
Cost of bucket method could be reduced if we have a guarantee that some scalars never
(or rarely) occur.
Scalar encodings cannot provide such a guarantee.
More advanced techniques can give such a guarantee. (Example: Pippenger’s algorithm.)
The big question is whether the cost of establishing the guarantee outweights its benefits.
That’s a discussion for another talk...
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 31 / 57
32. Improvement: exploit cheap group inversion
New idea: exploit cheap group inversion
Inspiration: elliptic curve group inversion is (almost) free.
Given G ∈ G, it’s cheap to compute −G via (x, y) → (x, −y).
Currently: c-bit scalars written with digit set {0, . . . , 2c − 1}.
Instead, allow negative digits. e.g. {−2c−1, . . . , 2c−1 − 1}
If scalar a > 0 for point G then add G to bucket Sa as usual.
If scalar a < 0 for point G then add −G to bucket S|a|.
No need for buckets Sa for a > 2c−1.
We have eliminated half the buckets!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 32 / 57
33. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 33 / 57
34. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
G2
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 34 / 57
35. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
G2
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 35 / 57
36. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
G2
G4 −G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 36 / 57
37. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
−G5 G2
G4 −G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 37 / 57
38. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Sum the contents of each bucket
G14 −G11
−G12 G6
G9 −G3 G13
−G5 G2 −G8
G4 −G1 G7 G10
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
S1 ← G4 − G5 + G9 − G12 + G14 S3 ← G7 + G13
S2 ← −G1 + G2 − G3 + G6 − G11 S4 ← G10
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 38 / 57
39. Improvement: exploit cheap group inversion How much improvement?
Combine the bucket sums to get the answer
Like before, desired output a1G1 + · · · + anGn equals
S1 + 2S2 + 3S3 + 4S4
Instead of 2c − 1 buckets, we now have only 2c−1 buckets.
Bucket accumulation works exactly as before, except with half the buckets.
Bucket accumulation costs drop from ∼ 2c to ∼ 2c−1.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 39 / 57
40. Improvement: exploit cheap group inversion How much improvement?
Total cost of the improved bucket method
New approximate cost in group (+) ops:
b
c
n + 2c−1
for your choice of c.
Option 1: Use the same c as before and enjoy 50% saving in bucket accumulation costs.
Option 2: Set c ← c + 1, which reduces the multiple of n:
b
c+1
(n + 2c
)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 40 / 57
41. Improvement: exploit cheap group inversion How much improvement?
How much improvement?
Under option 2 the multiple of n improves by the factor c
c+1.
Example. c = 19 =⇒ 5% improvement (ignoring bucket accumulation cost).
As discussed last time, there might be other reasons not to change c.
Concrete improvement
I implemented a stupid PoC in gnark keeping c = 16 (option 1) and observed a 5.7% speed
improvement for n = 106 inputs. [GitHub]
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 41 / 57
42. Improvement: exploit cheap group inversion Scalars with negative digits
How to express scalars with negative digits?
In the basic bucket method it’s easy to partition b-bit scalars into c-bit parts.
Example. (b, c) = (12, 3). Given a = 1368 we write a = (2, 3, 5, 0):
1368 in binary : 010
2
101
5
011
3
000
0
In general: we are given for free a0, . . . , ab/c−1 from {0, . . . , 2c − 1} with
a =
b/c−1
i=0
ai 2ci
We need to find a0, . . . , ab/c−1 from {−2c−1, . . . , 2c−1 − 1} with
a =
b/c−1
i=0
ai 2ci
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 42 / 57
43. Improvement: exploit cheap group inversion Scalars with negative digits
Lending: like borrowing, except the opposite
function Signed-Digits(a0, . . . , ab/c−1)
for i ← 0, . . . , b/c − 1 do
if ai ≥ 2c−1 then
assert: i = b/c − 1 No overflow for final digit!
ai ← ai − 2c Force this digit into {−2c−1, . . . , 2c−1 − 1}
ai+1 ← ai+1 + 1 Lend 2c to the next digit
else
ai ← ai
end if
end for
return a0, . . . , ab/c−1
end function
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 43 / 57
44. Improvement: exploit cheap group inversion Scalars with negative digits
On the efficiency of conversion to signed digits
Signed-Digits works only if |G| fits comfortably into b bits.
Example. BLS12-377 has |G| is 253 bits, typically stored in 256 bits =⇒ 3 unused bit in
the final digit =⇒ overflow cannot occur.
Conversion to signed digits has a cost. Fortunately, that cost seems to be negligible.
What’s the most efficient way to compute signed digits?
In my stupid PoC I allocated separate memory for signed digits: n × b bits of additional
memory use. For n > 106
that’s 32MB.
This memory can be saved if you compute ai on the fly. But it seems you need to compute
a0, . . . , ai−1 first, so there’s lots of repeated computation.
Careful! This problem becomes much worse when we generalize this idea.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 44 / 57
45. Generalization: exploit cheap scalar multiplication
Can we do more?
Recall: The ability to cheaply compute (−1)G allowed us to reduce bucket accumulation
cost by a factor of 1/2.
Question. Suppose there are scalars µ1, . . . , µk (including 1) for which we can cheaply
compute µ1G, . . . , µkG. Can we reduce bucket accumulation cost by a factor of 1/k?
Dare to dream: approximate cost in group (+) ops becomes
b
c
(n + 2c
/k) or
b
c + log k
(n + 2c
)
The multiple of n improves by the factor c
c+log k
Example. If (c, k) = (16, 16) then that’s a ∼20% improvement for MSM!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 45 / 57
46. Generalization: exploit cheap scalar multiplication
Example: one extra scalar, and it’s combo with −1
Suppose we can cheaply compute (−1)G, λG.
We can also cheaply compute −λG.
Think: (µ1, µ2, µ3, µ4) = (1, −1, λ, −λ) so k = 4 instead of 2
Suppose we can write scalars using digit set
{ 0, 1, . . . , 2c,
−1, . . . , −2c,
λ, . . . , λ2c,
−λ, . . . , −λ2c }
This digit set has size ∼ 4 × 2c and requires only ∼ 2c buckets.
In this case, the new λ doubled k. (Hooray!) But that won’t always happen.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 46 / 57
47. Generalization: exploit cheap scalar multiplication
Full generalization
Suppose we can cheaply multiply by µ1, . . . , µk.
Suppose we can write scalars using digit set
i=0,...,2c
j=1,...,k
{iµj }
This digit set has size ∼ k2c and requires only ∼ 2c buckets.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 47 / 57
48. Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves
Cheap multiplication for elliptic curves
Consider an elliptic curve of the form
E : y2
= x3
+ b
for some b in a prime field Fp. (e.g. BLS curves, . . . ).
Let G ⊆ E(Fp) be a prime order subgroup. Let β be a cube root of 1 mod p. For any G ∈ G
the map φ : (x, y) → (βx, y) acts as
φ : G → λG
where λ is a cube root of 1 mod |G|.
Takeaway
Computing λG can be implemented with a single multiplication modulo p—only slightly more
costly than computing −G and much cheaper than a group (+) op.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 48 / 57
49. Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves
More cheap scalars?
φ gives us 2 new scalars λ, λ2.
Combos with −1 yield a total of k = 6 cheap scalars:
(µ1, . . . , µ6) = (1, −1, λ, −λ, λ2
, −λ2
)
Observe: new scalars do not always double k
Example. λλ2
= 1 is not another new scalar.
More endomorphisms like φ?
Galbraith-Lin-Scott find more in 2008/194. But you need to work in an extension field Fpk
instead of Fp.
We need to convince everyone to switch do a different base field.
It might be worth it: Hu-Longa-Zu demonstrate a speed-up for single-scalar multiplication.
[link]
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 49 / 57
50. Generalization: exploit cheap scalar multiplication Strange digit sets
Complication: large multiples
−1 is a well-behaved scalar
Cheap to convert scalars to signed digits
Overflow is easy to quantify, easy to avoid.
Other scalars λ might be very large
Example. For BLS12-377, λ is 129 bits
How to convert scalars to digit sets containing λ?
Overflow might kill us
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 50 / 57
51. Generalization: exploit cheap scalar multiplication Strange digit sets
First attempt to generalize Signed-Digits
Suppose:
We have a scalar a whose ith digit ai is large: 2c−1 ≤ ai < 2c.
Want to map it to a small bucket label ai with 0 ≤ ai < 2c−1
.
We can write
λai = ai + d2c
for some integers ai , d with ai in the desired range.
Example. We saw (λ, d) = (−1, −1).
Then we can do the following:
Set ai ← ai + d2c and ai+1 ← ai+1 − d.
(Pray that we do not overflow!)
During bucket accumulation for G ∈ G, add point λG into bucket ai .
Problem. If λ is large then d will be very large =⇒ digits ai become very large =⇒ badness
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 51 / 57
52. Generalization: exploit cheap scalar multiplication Strange digit sets
Mitigation: borrow from higher digits
Write
λai = ai + d12c
+ · · · + d 2c
for di , . . . , d in some reasonable digit set.
Need to set several digits instead of just one:
ai+1 ← ai+1 − d1
...
ai+ ← ai+ − d
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 52 / 57
53. Generalization: exploit cheap scalar multiplication Strange digit sets
Problem: that’s fine until we run out of digits
We cannot completely avoid overflow:
Example. If |G| is 256 bits and λ is 128 bits (like BLS12-377) then this optimization can
be used for only the lower half of digits =⇒ 50% performance penalty.
Example. If |G|, λ have equal bit lengths then this optimization cannot be used at all.
Open problem. Can we do better?
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 53 / 57
54. A boost for precomputation
Precomputation, revisited
On slide 27 we observed a trade-off for precomputed storage: cost of the bucket method
can be reduced to
b
c
(n + 2c
− k)
at a cost of storing kn extra group elements. That’s not a very good trade-off.
Precomputation gives a better trade-off when combined with the new method of signed
digits:
b
c
(n + 2c
/k)
for (k − 1)n extra storage. (Thanks to Alexandre Belling.)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 54 / 57
55. A boost for precomputation
Precomputation + signed digits = significant improvement
Advantage. We can choose λ to play nice with signed digits (e.g. λ = 2, 3, . . . )
Faster, simpler conversion to the new digit set; less worry of overflow.
Each new precomputed multiple λG significantly increases the number k of cheap scalars
µ1, . . . , µk at our disposal.
Example.
Suppose we start with 6 scalars µ1, . . . , µ6. (Perhaps obtained from endomorphisms.)
Each additional precomputed multiple λ adds 6 new cheap scalars:
λµ1, . . . , λµ6
On slide 45 we estimated a ∼20% improvement for k ≥ 16. (Ignoring the cost to convert
digit sets!)
We could exceed this target k = 16 only 2n extra storage.
Example. Even with only 2 scalars (1, −1) we can reach k = 16 with 7n extra storage.
I did not implement this improvement.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 55 / 57
56. Get off my lawn
Summary of open problems
Find an efficient way to write scalars using the digit set from slide 47 without overflow
Prior art for SSM ([GLV], [HPX]) use a SVP lattice solution—can it be adapted to MSM?
Find more scalars µ that admit cheap scalar multiplication.
The best lead I know is 2008/194 and follow-ups.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 56 / 57
57. Get off my lawn
Fin
Thank you!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 57 / 57