SlideShare a Scribd company logo
1 of 57
Download to read offline
Multi-scalar multiplication: state of the art and new ideas
presented at zkStudyClub
Gus Gutoski
zkTeam, ConsenSys R&D
June 1, 2020
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 1 / 57
Introduction
The multi-scalar multiplication problem (MSM)
Also known as. Multi-exponentiation, multi-exp.
Parameters. A cyclic group G whose order |G| has bit length b.
(Example. BLS or BN elliptic curves have |G| ≈ 2256, so b = 256.)
Input. Group elements G1, . . . , Gn in G called inputs.
Integers a1, . . . , an between 0 and |G| called scalars.
Output. The group element a1G1 + · · · + anGn called the output.
Goal. Minimize the number of group (+) operations as a function of n.
Naive solution. Use double-and-add to compute each ai Gi , then add them all up.
Expected group ops: 1.5bn ≈ 384n.
Can we do better? (<sarcasm> No. Let’s all go home. </sarcasm>)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 2 / 57
Motivation: zero-knowledge provers
Motivation: zero-knowledge proofs (ZKPs)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 3 / 57
Motivation: zero-knowledge provers
Example: Groth16 protocol (gross simplification)
Let n denote the size of the secret inputs x accepted by program P.
The proving key for program P contains (among other things) n group elements
G1, . . . , Gn.
Given a size-n secret input x for program P, the prover deduces integers a1, . . . , an and
computes G := a1G1 + · · · + anGn. The proof contains G (among other things).
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 4 / 57
Motivation: zero-knowledge provers
Are you motivated yet?
Program P takes size-n secret input =⇒ a ZKP prover must do a MSM of n points.
Example. Zcash spend program: n ≈ 4 × 104.
Example. Rollup (a scalability solution): the bigger the n, the better.
Goal. Programs with n = 107, 108, or more.
MSM accounts for 80% of prover work.
Justin Drake: ”Focus on multi-exponentiation, forget about FFTs.” From Zero
Knowledge podcast , 2020-03-11.
Takeaway
Multi-scalar multiplication (MSM) dominates prover costs. Prover costs dominate ZKP costs.
Improvements for MSM immediately yield improvements in ZKP efficiency.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 5 / 57
Overview
State of the art: the bucket method
zkTeam’s implementation in gnark: [GitHub]
BLS or BN curves (b ≈ 256).
Number of group (+) ops scales like
16n + (constant)
Compare: naive method scales like 384n. That’s a 24× improvement!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 6 / 57
Overview
Overview of the bucket method
Succinct description in the paper 2012/549. (Section 4, “Overlap in the Pippenger
approach”.)
High-level strategy:
1. Reduce one b-bit MSM to several c-bit MSMs for manageable c ≤ b.
2. [Interesting part.] Use tricks to solve the c-bit MSMs. (See next section.)
3. Combine c-bit MSMs into the final b-bit MSM.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 7 / 57
Overview
Step 1. Reduce b-bit MSM to c-bit MSM (1)
Choose c ≤ b. Write each scalar a1, . . . , an in binary. Partition binary scalars into c-bit parts.
Example. Given b = 12, choose c = 3. Each 12-bit scalar a is partitioned into 3-bit parts.
Given the scalar a = 1368 we write a = (2, 3, 5, 0):
1368 in binary : 010
2
101
5
011
3
000
0
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 8 / 57
Overview
Step 1. Reduce b-bit MSM to c-bit MSM (2)
Deduce b/c instances of c-bit MSM from the partitioned scalars.
Example, continued. (b, c, b/c) = (12, 3, 4). Paritition each scalar ai = (ai,1, ai,2, ai,3, ai,4).
The 4 c-bit MSM instances T1, . . . , T4 are given by
T1 := a1,1G1 + · · · + an,1Gn
...
T4 := a1,4G1 + · · · + an,4Gn
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 9 / 57
Overview
Step 3. Combine c-bit MSMs into the final b-bit MSM
The usual way: double c times then add.
Example, continued. (b, c, b/c) = (12, 3, 4). Combine T1, . . . , T4 into the final answer T:
1. T ← T1
2. For j = 2, . . . , 4:
2.1 T ← 2c
T (Double c times)
2.2 T ← T + Tj
Final answer: T = a1G1 + · · · + anGn.
Computation cost in group (+) ops:
(b/c − 1) × (c + 1) = b − c +
b
c
− 1 (1)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 10 / 57
Overview
Step 2. Use tricks to solve the c-bit MSMs
Ready for tricks?
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 11 / 57
Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 12 / 57
Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 13 / 57
Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 14 / 57
Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G4 G2 G1
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 15 / 57
Core
Each input goes into a bucket
6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · ·
G3
G4 G2 G1 G5
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 16 / 57
Core
Sum the contents of each bucket
G14 G11
G9 G6 G13 G3 G12
G4 G2 G7 G10 G8 G1 G5
2c − 1 buckets: 1 2 3 4 5 6 7
bucket sums: S1 S2 S3 S4 S5 S6 S7
S1 ← G4 + G9 + G14 S3 ← G7 + G13 S5 ← G8 S7 ← G5 + G12
S2 ← G2 + G6 S4 ← G10 S6 ← G1 + G3 + G11
Expected cost to compute S1, . . . , S7 in group (+) ops:
n − (2c
− 1) = n − 2c
+ 1 (2)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 17 / 57
Core
Combine the bucket sums to get the answer
Desired output a1G1 + · · · + anGn equals
S1 + 2S2 + 3S3 + · · · + 7S7
This is not obvious, but easy to check.
This is another instance of MSM with inputs S1, . . . , S7, scalars 1, . . . , 7.
Number of inputs 2c
− 1 is fixed.
Scalars 1, . . . , 2c
− 1 are known in advance.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 18 / 57
Core
A fast way to combine the bucket sums
The desired sum
S1 + 2S2 + 3S3 + · · · + 7S7
is computed via
S7
+ (S7 + S6)
+ (S7 + S6 + S5)
...
+ (S7 + S6 + S5 + S4 + S3 + S2 + S1)
Computation cost in group (+) ops:
2 × (2c
− 2) + 1 = 2c+1
− 3 (3)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 19 / 57
Efficiency
Total cost for the bucket method: theory
Expected cost in group (+) ops (from Eqs. (1), (2), (3)):
b
c
(n + 2c
− 2) + b − c +
b
c
− 1 ≈
b
c
(n + 2c
) (4)
Minimum occurs at c ≈ log n. At first glance, asymptotic scaling looks like
O b
n
log n
Beware! We must have c ≤ b, so we cannot choose c ≈ log n when n > 2b.
For n > 2b scaling reverts to O(n).
Example (b = 1). n
log n scaling is impossible; O(n) is the best we can do.
Example (b = 256). n can never reach 2256
so n
log n scaling is achievable.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 20 / 57
Efficiency
Total cost for the bucket method: practice
Eq. (4):
b
c
(n + 2c
)
A large instance is n = 107 (so log n ≈ 23).
For gnark with b = 256 we observed peak performance at c = 16. This yields the cost
claimed on slide 6:
16n + 212
Puzzle: Why c = 16 instead of c = log n? Other concerns:
Memory use scales with 2c
. Eventually, memory is a bottleneck.
Fewer edge cases if c divides b.
Example. 256-bit scalars stored in four 64-bit limbs. It’s annoying if c-bit MSM straddles
two limbs.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 21 / 57
Improvements
Ideas to improve upon the bucket method
1. Parallelism: yes
2. Precomputation: not really, unless combined with item 4
3. Low Hamming-weight representations: no
4. [New!] [Elliptic curves only] Signed digits, and generalization: yes
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 22 / 57
Parallelism
Parallelism for the bucket method
Is the bucket method faster on multiple cores? Yes!
Natural boundary for parallel computation: each c-bit MSM is independent of the rest.
scalars b-bit decimal c-bit binary parts
a1 : 1368 010 101 011 000
a2 : 819 001 100 110 011
...
...
...
...
...
...
an : 2709 101 010 010 101
b/c cores : 1 2 3 4
Easily make full use of up to b/c cores.
Example: (b, c) = (256, 16) =⇒ easy use up to 256/16 = 16 cores.
Increased memory use: each core uses 2c memory.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 23 / 57
Parallelism
Even more parallelism for the bucket method?
Sometimes we have more than b/c cores available. Can we use all of them? Yes, but. . .
Another natural boundary: partition the inputs
Inefficiency. 2 MSM instances of size n/2 costs more group (+) ops than 1 MSM
instance of size n.
gnark does not do this; parallelism is limited to b/c cores.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 24 / 57
Precomputation
Precomputation for the bucket method
Inputs G1, . . . , Gn are known in advance. Can we use this to our advantage? Sort of.
Idea. Precompute a bunch of points and store them. Examples:
For each input G: 2G, 3G, . . . , (2c
− 1)G.
For each input G: 2k
G, 22k
G, . . . , 2mk
G for some k, m
Various subsets of inputs: (G1 + G2 + G3), (G1 + G2), (G1 + G3), (G2 + G3).
Goal. A smooth trade-off between procomputed storage vs. run time.
Problem. Large MSM instances already use most available memory.
Example. For n = 108
gnark needs 58GB to store enough BLS12-377 curve points to
produce a ZKP for a program with size-n secret input.
Perhaps we could store extra points on disk. But disk reads might be too slow.
Experimentation needed.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 25 / 57
Precomputation
Naive precomputation for the bucket method
For each input G precompute 2G, 3G, . . . , (2c − 1)G.
Recall. The goal of c-bit MSM is to compute a1G1 + · · · + anGn for c-bit scalars
a1, . . . , an.
If ai Gi are already in storage then there’s nothing left to do!
No need to compute bucket sums S1, . . . , S2c −1.
No need to accumulate bucket sums S1 + 2S2 + · · · + (2c
− 1)S2c −1.
Number of group (+) ops reduces to
b
c
n + b − c +
b
c
− 1 ≈
b
c
n
Extra storage space required is (2c − 2)n
Extreme example b = c = 256. If we store 2256
n points then we need only n group (+) ops.
Realistic example (n, b, c) = (223
, 256, 16). Extra storage is 550 billion elliptic curve points!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 26 / 57
Precomputation
A trade-off for naive precomputation
For each input G precompute k points: (2c − k)G, . . . , (2c − 1)G.
Bucket method only for the first 2c − 1 − k buckets instead of 2c − 1:
Compute only S1, . . . , S2c −1−k .
Accumulate only S1 + 2S2 + · · · + (2c
− 1 − k)S2c −1−k .
Total cost in group (+) ops:
b
c
(n + 2c
− k − 2) + b − c +
b
c
− 1 ≈
b
c
(n + 2c
− k)
Extra storage: kn points. Choose k as big as you can store.
Takeaway
Small storage capacity =⇒ negligible improvement.
Non-negligible improvements can only be achieved with very large storage capacity.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 27 / 57
Low Hamming-weight representations
Lessons from speed-ups for single-scalar multiplication
Problem. (Single-) scalar multiplication.
Input. Scalar a, group element G.
Output. Group element aG.
Standard method: double-and-add.
Cost increases with the Hamming weight of a.
Examples. 8-bit scalars:
128 binary: 10000000 7 group (+) ops
170 binary: 10101010 10 group (+) ops
240 binary: 11110000 10 group (+) ops
255 binary: 11111111 14 group (+) ops
Idea. Use a different (non-binary) encoding for scalars with lower average Hamming
weight.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 28 / 57
Low Hamming-weight representations
Example: non-adjacent form (NAF)
Like binary except digits can be {−1, 0, 1}.
Requires group (−) ops to cost the same as group (+) ops.
Elliptic curve groups have this property.
Non-zero digits are never adjacent.
Average Hamming density is 1/3. (Compare: 1/2 for binary.)
Examples.
128 NAF: 0 1 0 0 0 0 0 0 0 7 group (+) ops
170 NAF: 0 1 0 1 0 1 0 1 0 10 group (+) ops
240 NAF: 1 0 0 0 −1 0 0 0 0 8 group (+) ops, 1 group (−) op
255 NAF: 1 0 0 0 0 0 0 0 −1 8 group (+) ops, 1 group (−) op
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 29 / 57
Low Hamming-weight representations
Example: double-base number system (DBNS)
Scalars written as a linear combo of 2i 3j . (Compare: 2i only for binary.)
Digit set can be binary {0, 1} or larger.
Highly redundant—each scalar has many representations.
Example. 127 has 783 representations with digit set {0, 1}. Minimum Hamming weight
is 3. (Compare: 7 for binary.) There are 3 such representations:
127 = 22
33
+ 21
32
+ 20
30
127 = 22
33
+ 24
30
+ 20
31
127 = 25
31
+ 20
33
+ 22
30
Hamming density for a b-bit scalar is O(1/ log b). (Compare: 1/2 for binary.)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 30 / 57
Low Hamming-weight representations
Can low Hamming-weight representations improve the bucket method?
No!
Cost of bucket method increases with the number of possible scalars.
There are 2c possible c-bit scalars =⇒ always need 2c buckets, regardless of how those
scalars are encoded.
Cost of bucket method could be reduced if we have a guarantee that some scalars never
(or rarely) occur.
Scalar encodings cannot provide such a guarantee.
More advanced techniques can give such a guarantee. (Example: Pippenger’s algorithm.)
The big question is whether the cost of establishing the guarantee outweights its benefits.
That’s a discussion for another talk...
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 31 / 57
Improvement: exploit cheap group inversion
New idea: exploit cheap group inversion
Inspiration: elliptic curve group inversion is (almost) free.
Given G ∈ G, it’s cheap to compute −G via (x, y) → (x, −y).
Currently: c-bit scalars written with digit set {0, . . . , 2c − 1}.
Instead, allow negative digits. e.g. {−2c−1, . . . , 2c−1 − 1}
If scalar a > 0 for point G then add G to bucket Sa as usual.
If scalar a < 0 for point G then add −G to bucket S|a|.
No need for buckets Sa for a > 2c−1.
We have eliminated half the buckets!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 32 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 33 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
G2
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 34 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
G2
−G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 35 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
G2
G4 −G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 36 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Example: 3-bit MSM with negative digits
(−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · ·
−G3
−G5 G2
G4 −G1
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 37 / 57
Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited
Sum the contents of each bucket
G14 −G11
−G12 G6
G9 −G3 G13
−G5 G2 −G8
G4 −G1 G7 G10
2c−1 buckets: 1 2 3 4
bucket sums: S1 S2 S3 S4
S1 ← G4 − G5 + G9 − G12 + G14 S3 ← G7 + G13
S2 ← −G1 + G2 − G3 + G6 − G11 S4 ← G10
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 38 / 57
Improvement: exploit cheap group inversion How much improvement?
Combine the bucket sums to get the answer
Like before, desired output a1G1 + · · · + anGn equals
S1 + 2S2 + 3S3 + 4S4
Instead of 2c − 1 buckets, we now have only 2c−1 buckets.
Bucket accumulation works exactly as before, except with half the buckets.
Bucket accumulation costs drop from ∼ 2c to ∼ 2c−1.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 39 / 57
Improvement: exploit cheap group inversion How much improvement?
Total cost of the improved bucket method
New approximate cost in group (+) ops:
b
c
n + 2c−1
for your choice of c.
Option 1: Use the same c as before and enjoy 50% saving in bucket accumulation costs.
Option 2: Set c ← c + 1, which reduces the multiple of n:
b
c+1
(n + 2c
)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 40 / 57
Improvement: exploit cheap group inversion How much improvement?
How much improvement?
Under option 2 the multiple of n improves by the factor c
c+1.
Example. c = 19 =⇒ 5% improvement (ignoring bucket accumulation cost).
As discussed last time, there might be other reasons not to change c.
Concrete improvement
I implemented a stupid PoC in gnark keeping c = 16 (option 1) and observed a 5.7% speed
improvement for n = 106 inputs. [GitHub]
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 41 / 57
Improvement: exploit cheap group inversion Scalars with negative digits
How to express scalars with negative digits?
In the basic bucket method it’s easy to partition b-bit scalars into c-bit parts.
Example. (b, c) = (12, 3). Given a = 1368 we write a = (2, 3, 5, 0):
1368 in binary : 010
2
101
5
011
3
000
0
In general: we are given for free a0, . . . , ab/c−1 from {0, . . . , 2c − 1} with
a =
b/c−1
i=0
ai 2ci
We need to find a0, . . . , ab/c−1 from {−2c−1, . . . , 2c−1 − 1} with
a =
b/c−1
i=0
ai 2ci
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 42 / 57
Improvement: exploit cheap group inversion Scalars with negative digits
Lending: like borrowing, except the opposite
function Signed-Digits(a0, . . . , ab/c−1)
for i ← 0, . . . , b/c − 1 do
if ai ≥ 2c−1 then
assert: i = b/c − 1 No overflow for final digit!
ai ← ai − 2c Force this digit into {−2c−1, . . . , 2c−1 − 1}
ai+1 ← ai+1 + 1 Lend 2c to the next digit
else
ai ← ai
end if
end for
return a0, . . . , ab/c−1
end function
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 43 / 57
Improvement: exploit cheap group inversion Scalars with negative digits
On the efficiency of conversion to signed digits
Signed-Digits works only if |G| fits comfortably into b bits.
Example. BLS12-377 has |G| is 253 bits, typically stored in 256 bits =⇒ 3 unused bit in
the final digit =⇒ overflow cannot occur.
Conversion to signed digits has a cost. Fortunately, that cost seems to be negligible.
What’s the most efficient way to compute signed digits?
In my stupid PoC I allocated separate memory for signed digits: n × b bits of additional
memory use. For n > 106
that’s 32MB.
This memory can be saved if you compute ai on the fly. But it seems you need to compute
a0, . . . , ai−1 first, so there’s lots of repeated computation.
Careful! This problem becomes much worse when we generalize this idea.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 44 / 57
Generalization: exploit cheap scalar multiplication
Can we do more?
Recall: The ability to cheaply compute (−1)G allowed us to reduce bucket accumulation
cost by a factor of 1/2.
Question. Suppose there are scalars µ1, . . . , µk (including 1) for which we can cheaply
compute µ1G, . . . , µkG. Can we reduce bucket accumulation cost by a factor of 1/k?
Dare to dream: approximate cost in group (+) ops becomes
b
c
(n + 2c
/k) or
b
c + log k
(n + 2c
)
The multiple of n improves by the factor c
c+log k
Example. If (c, k) = (16, 16) then that’s a ∼20% improvement for MSM!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 45 / 57
Generalization: exploit cheap scalar multiplication
Example: one extra scalar, and it’s combo with −1
Suppose we can cheaply compute (−1)G, λG.
We can also cheaply compute −λG.
Think: (µ1, µ2, µ3, µ4) = (1, −1, λ, −λ) so k = 4 instead of 2
Suppose we can write scalars using digit set
{ 0, 1, . . . , 2c,
−1, . . . , −2c,
λ, . . . , λ2c,
−λ, . . . , −λ2c }
This digit set has size ∼ 4 × 2c and requires only ∼ 2c buckets.
In this case, the new λ doubled k. (Hooray!) But that won’t always happen.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 46 / 57
Generalization: exploit cheap scalar multiplication
Full generalization
Suppose we can cheaply multiply by µ1, . . . , µk.
Suppose we can write scalars using digit set
i=0,...,2c
j=1,...,k
{iµj }
This digit set has size ∼ k2c and requires only ∼ 2c buckets.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 47 / 57
Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves
Cheap multiplication for elliptic curves
Consider an elliptic curve of the form
E : y2
= x3
+ b
for some b in a prime field Fp. (e.g. BLS curves, . . . ).
Let G ⊆ E(Fp) be a prime order subgroup. Let β be a cube root of 1 mod p. For any G ∈ G
the map φ : (x, y) → (βx, y) acts as
φ : G → λG
where λ is a cube root of 1 mod |G|.
Takeaway
Computing λG can be implemented with a single multiplication modulo p—only slightly more
costly than computing −G and much cheaper than a group (+) op.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 48 / 57
Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves
More cheap scalars?
φ gives us 2 new scalars λ, λ2.
Combos with −1 yield a total of k = 6 cheap scalars:
(µ1, . . . , µ6) = (1, −1, λ, −λ, λ2
, −λ2
)
Observe: new scalars do not always double k
Example. λλ2
= 1 is not another new scalar.
More endomorphisms like φ?
Galbraith-Lin-Scott find more in 2008/194. But you need to work in an extension field Fpk
instead of Fp.
We need to convince everyone to switch do a different base field.
It might be worth it: Hu-Longa-Zu demonstrate a speed-up for single-scalar multiplication.
[link]
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 49 / 57
Generalization: exploit cheap scalar multiplication Strange digit sets
Complication: large multiples
−1 is a well-behaved scalar
Cheap to convert scalars to signed digits
Overflow is easy to quantify, easy to avoid.
Other scalars λ might be very large
Example. For BLS12-377, λ is 129 bits
How to convert scalars to digit sets containing λ?
Overflow might kill us
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 50 / 57
Generalization: exploit cheap scalar multiplication Strange digit sets
First attempt to generalize Signed-Digits
Suppose:
We have a scalar a whose ith digit ai is large: 2c−1 ≤ ai < 2c.
Want to map it to a small bucket label ai with 0 ≤ ai < 2c−1
.
We can write
λai = ai + d2c
for some integers ai , d with ai in the desired range.
Example. We saw (λ, d) = (−1, −1).
Then we can do the following:
Set ai ← ai + d2c and ai+1 ← ai+1 − d.
(Pray that we do not overflow!)
During bucket accumulation for G ∈ G, add point λG into bucket ai .
Problem. If λ is large then d will be very large =⇒ digits ai become very large =⇒ badness
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 51 / 57
Generalization: exploit cheap scalar multiplication Strange digit sets
Mitigation: borrow from higher digits
Write
λai = ai + d12c
+ · · · + d 2c
for di , . . . , d in some reasonable digit set.
Need to set several digits instead of just one:
ai+1 ← ai+1 − d1
...
ai+ ← ai+ − d
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 52 / 57
Generalization: exploit cheap scalar multiplication Strange digit sets
Problem: that’s fine until we run out of digits
We cannot completely avoid overflow:
Example. If |G| is 256 bits and λ is 128 bits (like BLS12-377) then this optimization can
be used for only the lower half of digits =⇒ 50% performance penalty.
Example. If |G|, λ have equal bit lengths then this optimization cannot be used at all.
Open problem. Can we do better?
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 53 / 57
A boost for precomputation
Precomputation, revisited
On slide 27 we observed a trade-off for precomputed storage: cost of the bucket method
can be reduced to
b
c
(n + 2c
− k)
at a cost of storing kn extra group elements. That’s not a very good trade-off.
Precomputation gives a better trade-off when combined with the new method of signed
digits:
b
c
(n + 2c
/k)
for (k − 1)n extra storage. (Thanks to Alexandre Belling.)
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 54 / 57
A boost for precomputation
Precomputation + signed digits = significant improvement
Advantage. We can choose λ to play nice with signed digits (e.g. λ = 2, 3, . . . )
Faster, simpler conversion to the new digit set; less worry of overflow.
Each new precomputed multiple λG significantly increases the number k of cheap scalars
µ1, . . . , µk at our disposal.
Example.
Suppose we start with 6 scalars µ1, . . . , µ6. (Perhaps obtained from endomorphisms.)
Each additional precomputed multiple λ adds 6 new cheap scalars:
λµ1, . . . , λµ6
On slide 45 we estimated a ∼20% improvement for k ≥ 16. (Ignoring the cost to convert
digit sets!)
We could exceed this target k = 16 only 2n extra storage.
Example. Even with only 2 scalars (1, −1) we can reach k = 16 with 7n extra storage.
I did not implement this improvement.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 55 / 57
Get off my lawn
Summary of open problems
Find an efficient way to write scalars using the digit set from slide 47 without overflow
Prior art for SSM ([GLV], [HPX]) use a SVP lattice solution—can it be adapted to MSM?
Find more scalars µ that admit cheap scalar multiplication.
The best lead I know is 2008/194 and follow-ups.
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 56 / 57
Get off my lawn
Fin
Thank you!
Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 57 / 57

More Related Content

What's hot

虚数は作れる!Swift で学ぶ複素数
虚数は作れる!Swift で学ぶ複素数虚数は作れる!Swift で学ぶ複素数
虚数は作れる!Swift で学ぶ複素数Taketo Sano
 
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)AtCoder Inc.
 
GoogleのSHA-1のはなし
GoogleのSHA-1のはなしGoogleのSHA-1のはなし
GoogleのSHA-1のはなしMITSUNARI Shigeo
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1TigerGraph
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学MITSUNARI Shigeo
 
2SAT(充足可能性問題)の解き方
2SAT(充足可能性問題)の解き方2SAT(充足可能性問題)の解き方
2SAT(充足可能性問題)の解き方Tsuneo Yoshioka
 
並行実行制御の最適化手法
並行実行制御の最適化手法並行実行制御の最適化手法
並行実行制御の最適化手法Sho Nakazono
 
ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術MITSUNARI Shigeo
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
 
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)Kuniyasu Suzaki
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advancedMITSUNARI Shigeo
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかYuki Miyatake
 
「Frama-Cによるソースコード検証」 (mzp)
「Frama-Cによるソースコード検証」 (mzp)「Frama-Cによるソースコード検証」 (mzp)
「Frama-Cによるソースコード検証」 (mzp)Hiroki Mizuno
 
Re永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドRe永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドMasaki Hara
 

What's hot (20)

いつやるの?Git入門
いつやるの?Git入門いつやるの?Git入門
いつやるの?Git入門
 
虚数は作れる!Swift で学ぶ複素数
虚数は作れる!Swift で学ぶ複素数虚数は作れる!Swift で学ぶ複素数
虚数は作れる!Swift で学ぶ複素数
 
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第五回講義資料(ワークスアプリケーションズ & AtCoder)
 
暗認本読書会7
暗認本読書会7暗認本読書会7
暗認本読書会7
 
GoogleのSHA-1のはなし
GoogleのSHA-1のはなしGoogleのSHA-1のはなし
GoogleのSHA-1のはなし
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学
 
2SAT(充足可能性問題)の解き方
2SAT(充足可能性問題)の解き方2SAT(充足可能性問題)の解き方
2SAT(充足可能性問題)の解き方
 
並行実行制御の最適化手法
並行実行制御の最適化手法並行実行制御の最適化手法
並行実行制御の最適化手法
 
WiredTigerを詳しく説明
WiredTigerを詳しく説明WiredTigerを詳しく説明
WiredTigerを詳しく説明
 
ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advanced
 
BoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうかBoostAsioで可読性を求めるのは間違っているだろうか
BoostAsioで可読性を求めるのは間違っているだろうか
 
「Frama-Cによるソースコード検証」 (mzp)
「Frama-Cによるソースコード検証」 (mzp)「Frama-Cによるソースコード検証」 (mzp)
「Frama-Cによるソースコード検証」 (mzp)
 
TLS, HTTP/2演習
TLS, HTTP/2演習TLS, HTTP/2演習
TLS, HTTP/2演習
 
暗認本読書会9
暗認本読書会9暗認本読書会9
暗認本読書会9
 
Re永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライドRe永続データ構造が分からない人のためのスライド
Re永続データ構造が分からない人のためのスライド
 

Similar to Multi-scalar multiplication: state of the art and new ideas

Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
Learning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsLearning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsDaniil Mirylenka
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsMathieu Dutour Sikiric
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMehdi Zitouni
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcastingRudra Narayan Paul
 
module3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfmodule3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfShiwani Gupta
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...diannepatricia
 
Nucleon TMD Contractions in Lattice QCD using QUDA
Nucleon TMD Contractions in Lattice QCD using QUDANucleon TMD Contractions in Lattice QCD using QUDA
Nucleon TMD Contractions in Lattice QCD using QUDAChristos Kallidonis
 
AskMeMetallurgy- Gate MT 2017 Question paper with solution
AskMeMetallurgy- Gate MT 2017 Question paper with solutionAskMeMetallurgy- Gate MT 2017 Question paper with solution
AskMeMetallurgy- Gate MT 2017 Question paper with solutionAskmemetallurgy.com
 

Similar to Multi-scalar multiplication: state of the art and new ideas (20)

sheet6.pdf
sheet6.pdfsheet6.pdf
sheet6.pdf
 
doc6.pdf
doc6.pdfdoc6.pdf
doc6.pdf
 
paper6.pdf
paper6.pdfpaper6.pdf
paper6.pdf
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Learning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document CollectionsLearning to Grow Structured Visual Summaries for Document Collections
Learning to Grow Structured Visual Summaries for Document Collections
 
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
 
Lego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawingsLego like spheres and tori, enumeration and drawings
Lego like spheres and tori, enumeration and drawings
 
2 funda.ppt
2 funda.ppt2 funda.ppt
2 funda.ppt
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset mining
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcasting
 
module3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfmodule3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdf
 
Unit 2
Unit 2Unit 2
Unit 2
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
 
Nucleon TMD Contractions in Lattice QCD using QUDA
Nucleon TMD Contractions in Lattice QCD using QUDANucleon TMD Contractions in Lattice QCD using QUDA
Nucleon TMD Contractions in Lattice QCD using QUDA
 
AskMeMetallurgy- Gate MT 2017 Question paper with solution
AskMeMetallurgy- Gate MT 2017 Question paper with solutionAskMeMetallurgy- Gate MT 2017 Question paper with solution
AskMeMetallurgy- Gate MT 2017 Question paper with solution
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Multi-scalar multiplication: state of the art and new ideas

  • 1. Multi-scalar multiplication: state of the art and new ideas presented at zkStudyClub Gus Gutoski zkTeam, ConsenSys R&D June 1, 2020 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 1 / 57
  • 2. Introduction The multi-scalar multiplication problem (MSM) Also known as. Multi-exponentiation, multi-exp. Parameters. A cyclic group G whose order |G| has bit length b. (Example. BLS or BN elliptic curves have |G| ≈ 2256, so b = 256.) Input. Group elements G1, . . . , Gn in G called inputs. Integers a1, . . . , an between 0 and |G| called scalars. Output. The group element a1G1 + · · · + anGn called the output. Goal. Minimize the number of group (+) operations as a function of n. Naive solution. Use double-and-add to compute each ai Gi , then add them all up. Expected group ops: 1.5bn ≈ 384n. Can we do better? (<sarcasm> No. Let’s all go home. </sarcasm>) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 2 / 57
  • 3. Motivation: zero-knowledge provers Motivation: zero-knowledge proofs (ZKPs) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 3 / 57
  • 4. Motivation: zero-knowledge provers Example: Groth16 protocol (gross simplification) Let n denote the size of the secret inputs x accepted by program P. The proving key for program P contains (among other things) n group elements G1, . . . , Gn. Given a size-n secret input x for program P, the prover deduces integers a1, . . . , an and computes G := a1G1 + · · · + anGn. The proof contains G (among other things). Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 4 / 57
  • 5. Motivation: zero-knowledge provers Are you motivated yet? Program P takes size-n secret input =⇒ a ZKP prover must do a MSM of n points. Example. Zcash spend program: n ≈ 4 × 104. Example. Rollup (a scalability solution): the bigger the n, the better. Goal. Programs with n = 107, 108, or more. MSM accounts for 80% of prover work. Justin Drake: ”Focus on multi-exponentiation, forget about FFTs.” From Zero Knowledge podcast , 2020-03-11. Takeaway Multi-scalar multiplication (MSM) dominates prover costs. Prover costs dominate ZKP costs. Improvements for MSM immediately yield improvements in ZKP efficiency. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 5 / 57
  • 6. Overview State of the art: the bucket method zkTeam’s implementation in gnark: [GitHub] BLS or BN curves (b ≈ 256). Number of group (+) ops scales like 16n + (constant) Compare: naive method scales like 384n. That’s a 24× improvement! Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 6 / 57
  • 7. Overview Overview of the bucket method Succinct description in the paper 2012/549. (Section 4, “Overlap in the Pippenger approach”.) High-level strategy: 1. Reduce one b-bit MSM to several c-bit MSMs for manageable c ≤ b. 2. [Interesting part.] Use tricks to solve the c-bit MSMs. (See next section.) 3. Combine c-bit MSMs into the final b-bit MSM. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 7 / 57
  • 8. Overview Step 1. Reduce b-bit MSM to c-bit MSM (1) Choose c ≤ b. Write each scalar a1, . . . , an in binary. Partition binary scalars into c-bit parts. Example. Given b = 12, choose c = 3. Each 12-bit scalar a is partitioned into 3-bit parts. Given the scalar a = 1368 we write a = (2, 3, 5, 0): 1368 in binary : 010 2 101 5 011 3 000 0 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 8 / 57
  • 9. Overview Step 1. Reduce b-bit MSM to c-bit MSM (2) Deduce b/c instances of c-bit MSM from the partitioned scalars. Example, continued. (b, c, b/c) = (12, 3, 4). Paritition each scalar ai = (ai,1, ai,2, ai,3, ai,4). The 4 c-bit MSM instances T1, . . . , T4 are given by T1 := a1,1G1 + · · · + an,1Gn ... T4 := a1,4G1 + · · · + an,4Gn Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 9 / 57
  • 10. Overview Step 3. Combine c-bit MSMs into the final b-bit MSM The usual way: double c times then add. Example, continued. (b, c, b/c) = (12, 3, 4). Combine T1, . . . , T4 into the final answer T: 1. T ← T1 2. For j = 2, . . . , 4: 2.1 T ← 2c T (Double c times) 2.2 T ← T + Tj Final answer: T = a1G1 + · · · + anGn. Computation cost in group (+) ops: (b/c − 1) × (c + 1) = b − c + b c − 1 (1) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 10 / 57
  • 11. Overview Step 2. Use tricks to solve the c-bit MSMs Ready for tricks? Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 11 / 57
  • 12. Core Each input goes into a bucket 6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · · G1 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 12 / 57
  • 13. Core Each input goes into a bucket 6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · · G2 G1 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 13 / 57
  • 14. Core Each input goes into a bucket 6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · · G3 G2 G1 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 14 / 57
  • 15. Core Each input goes into a bucket 6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · · G3 G4 G2 G1 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 15 / 57
  • 16. Core Each input goes into a bucket 6G1 + 2G2 + 6G3 + 1G4 + 7G5 + 2G6 + 3G7 + · · · G3 G4 G2 G1 G5 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 16 / 57
  • 17. Core Sum the contents of each bucket G14 G11 G9 G6 G13 G3 G12 G4 G2 G7 G10 G8 G1 G5 2c − 1 buckets: 1 2 3 4 5 6 7 bucket sums: S1 S2 S3 S4 S5 S6 S7 S1 ← G4 + G9 + G14 S3 ← G7 + G13 S5 ← G8 S7 ← G5 + G12 S2 ← G2 + G6 S4 ← G10 S6 ← G1 + G3 + G11 Expected cost to compute S1, . . . , S7 in group (+) ops: n − (2c − 1) = n − 2c + 1 (2) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 17 / 57
  • 18. Core Combine the bucket sums to get the answer Desired output a1G1 + · · · + anGn equals S1 + 2S2 + 3S3 + · · · + 7S7 This is not obvious, but easy to check. This is another instance of MSM with inputs S1, . . . , S7, scalars 1, . . . , 7. Number of inputs 2c − 1 is fixed. Scalars 1, . . . , 2c − 1 are known in advance. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 18 / 57
  • 19. Core A fast way to combine the bucket sums The desired sum S1 + 2S2 + 3S3 + · · · + 7S7 is computed via S7 + (S7 + S6) + (S7 + S6 + S5) ... + (S7 + S6 + S5 + S4 + S3 + S2 + S1) Computation cost in group (+) ops: 2 × (2c − 2) + 1 = 2c+1 − 3 (3) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 19 / 57
  • 20. Efficiency Total cost for the bucket method: theory Expected cost in group (+) ops (from Eqs. (1), (2), (3)): b c (n + 2c − 2) + b − c + b c − 1 ≈ b c (n + 2c ) (4) Minimum occurs at c ≈ log n. At first glance, asymptotic scaling looks like O b n log n Beware! We must have c ≤ b, so we cannot choose c ≈ log n when n > 2b. For n > 2b scaling reverts to O(n). Example (b = 1). n log n scaling is impossible; O(n) is the best we can do. Example (b = 256). n can never reach 2256 so n log n scaling is achievable. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 20 / 57
  • 21. Efficiency Total cost for the bucket method: practice Eq. (4): b c (n + 2c ) A large instance is n = 107 (so log n ≈ 23). For gnark with b = 256 we observed peak performance at c = 16. This yields the cost claimed on slide 6: 16n + 212 Puzzle: Why c = 16 instead of c = log n? Other concerns: Memory use scales with 2c . Eventually, memory is a bottleneck. Fewer edge cases if c divides b. Example. 256-bit scalars stored in four 64-bit limbs. It’s annoying if c-bit MSM straddles two limbs. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 21 / 57
  • 22. Improvements Ideas to improve upon the bucket method 1. Parallelism: yes 2. Precomputation: not really, unless combined with item 4 3. Low Hamming-weight representations: no 4. [New!] [Elliptic curves only] Signed digits, and generalization: yes Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 22 / 57
  • 23. Parallelism Parallelism for the bucket method Is the bucket method faster on multiple cores? Yes! Natural boundary for parallel computation: each c-bit MSM is independent of the rest. scalars b-bit decimal c-bit binary parts a1 : 1368 010 101 011 000 a2 : 819 001 100 110 011 ... ... ... ... ... ... an : 2709 101 010 010 101 b/c cores : 1 2 3 4 Easily make full use of up to b/c cores. Example: (b, c) = (256, 16) =⇒ easy use up to 256/16 = 16 cores. Increased memory use: each core uses 2c memory. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 23 / 57
  • 24. Parallelism Even more parallelism for the bucket method? Sometimes we have more than b/c cores available. Can we use all of them? Yes, but. . . Another natural boundary: partition the inputs Inefficiency. 2 MSM instances of size n/2 costs more group (+) ops than 1 MSM instance of size n. gnark does not do this; parallelism is limited to b/c cores. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 24 / 57
  • 25. Precomputation Precomputation for the bucket method Inputs G1, . . . , Gn are known in advance. Can we use this to our advantage? Sort of. Idea. Precompute a bunch of points and store them. Examples: For each input G: 2G, 3G, . . . , (2c − 1)G. For each input G: 2k G, 22k G, . . . , 2mk G for some k, m Various subsets of inputs: (G1 + G2 + G3), (G1 + G2), (G1 + G3), (G2 + G3). Goal. A smooth trade-off between procomputed storage vs. run time. Problem. Large MSM instances already use most available memory. Example. For n = 108 gnark needs 58GB to store enough BLS12-377 curve points to produce a ZKP for a program with size-n secret input. Perhaps we could store extra points on disk. But disk reads might be too slow. Experimentation needed. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 25 / 57
  • 26. Precomputation Naive precomputation for the bucket method For each input G precompute 2G, 3G, . . . , (2c − 1)G. Recall. The goal of c-bit MSM is to compute a1G1 + · · · + anGn for c-bit scalars a1, . . . , an. If ai Gi are already in storage then there’s nothing left to do! No need to compute bucket sums S1, . . . , S2c −1. No need to accumulate bucket sums S1 + 2S2 + · · · + (2c − 1)S2c −1. Number of group (+) ops reduces to b c n + b − c + b c − 1 ≈ b c n Extra storage space required is (2c − 2)n Extreme example b = c = 256. If we store 2256 n points then we need only n group (+) ops. Realistic example (n, b, c) = (223 , 256, 16). Extra storage is 550 billion elliptic curve points! Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 26 / 57
  • 27. Precomputation A trade-off for naive precomputation For each input G precompute k points: (2c − k)G, . . . , (2c − 1)G. Bucket method only for the first 2c − 1 − k buckets instead of 2c − 1: Compute only S1, . . . , S2c −1−k . Accumulate only S1 + 2S2 + · · · + (2c − 1 − k)S2c −1−k . Total cost in group (+) ops: b c (n + 2c − k − 2) + b − c + b c − 1 ≈ b c (n + 2c − k) Extra storage: kn points. Choose k as big as you can store. Takeaway Small storage capacity =⇒ negligible improvement. Non-negligible improvements can only be achieved with very large storage capacity. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 27 / 57
  • 28. Low Hamming-weight representations Lessons from speed-ups for single-scalar multiplication Problem. (Single-) scalar multiplication. Input. Scalar a, group element G. Output. Group element aG. Standard method: double-and-add. Cost increases with the Hamming weight of a. Examples. 8-bit scalars: 128 binary: 10000000 7 group (+) ops 170 binary: 10101010 10 group (+) ops 240 binary: 11110000 10 group (+) ops 255 binary: 11111111 14 group (+) ops Idea. Use a different (non-binary) encoding for scalars with lower average Hamming weight. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 28 / 57
  • 29. Low Hamming-weight representations Example: non-adjacent form (NAF) Like binary except digits can be {−1, 0, 1}. Requires group (−) ops to cost the same as group (+) ops. Elliptic curve groups have this property. Non-zero digits are never adjacent. Average Hamming density is 1/3. (Compare: 1/2 for binary.) Examples. 128 NAF: 0 1 0 0 0 0 0 0 0 7 group (+) ops 170 NAF: 0 1 0 1 0 1 0 1 0 10 group (+) ops 240 NAF: 1 0 0 0 −1 0 0 0 0 8 group (+) ops, 1 group (−) op 255 NAF: 1 0 0 0 0 0 0 0 −1 8 group (+) ops, 1 group (−) op Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 29 / 57
  • 30. Low Hamming-weight representations Example: double-base number system (DBNS) Scalars written as a linear combo of 2i 3j . (Compare: 2i only for binary.) Digit set can be binary {0, 1} or larger. Highly redundant—each scalar has many representations. Example. 127 has 783 representations with digit set {0, 1}. Minimum Hamming weight is 3. (Compare: 7 for binary.) There are 3 such representations: 127 = 22 33 + 21 32 + 20 30 127 = 22 33 + 24 30 + 20 31 127 = 25 31 + 20 33 + 22 30 Hamming density for a b-bit scalar is O(1/ log b). (Compare: 1/2 for binary.) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 30 / 57
  • 31. Low Hamming-weight representations Can low Hamming-weight representations improve the bucket method? No! Cost of bucket method increases with the number of possible scalars. There are 2c possible c-bit scalars =⇒ always need 2c buckets, regardless of how those scalars are encoded. Cost of bucket method could be reduced if we have a guarantee that some scalars never (or rarely) occur. Scalar encodings cannot provide such a guarantee. More advanced techniques can give such a guarantee. (Example: Pippenger’s algorithm.) The big question is whether the cost of establishing the guarantee outweights its benefits. That’s a discussion for another talk... Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 31 / 57
  • 32. Improvement: exploit cheap group inversion New idea: exploit cheap group inversion Inspiration: elliptic curve group inversion is (almost) free. Given G ∈ G, it’s cheap to compute −G via (x, y) → (x, −y). Currently: c-bit scalars written with digit set {0, . . . , 2c − 1}. Instead, allow negative digits. e.g. {−2c−1, . . . , 2c−1 − 1} If scalar a > 0 for point G then add G to bucket Sa as usual. If scalar a < 0 for point G then add −G to bucket S|a|. No need for buckets Sa for a > 2c−1. We have eliminated half the buckets! Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 32 / 57
  • 33. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Example: 3-bit MSM with negative digits (−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · · −G1 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 33 / 57
  • 34. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Example: 3-bit MSM with negative digits (−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · · G2 −G1 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 34 / 57
  • 35. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Example: 3-bit MSM with negative digits (−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · · −G3 G2 −G1 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 35 / 57
  • 36. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Example: 3-bit MSM with negative digits (−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · · −G3 G2 G4 −G1 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 36 / 57
  • 37. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Example: 3-bit MSM with negative digits (−2)G1 + 2G2 + (−2)G3 + 1G4 + (−1)G5 + 2G6 + 3G7 + · · · −G3 −G5 G2 G4 −G1 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 37 / 57
  • 38. Improvement: exploit cheap group inversion Buckets for c-bit MSM, revisited Sum the contents of each bucket G14 −G11 −G12 G6 G9 −G3 G13 −G5 G2 −G8 G4 −G1 G7 G10 2c−1 buckets: 1 2 3 4 bucket sums: S1 S2 S3 S4 S1 ← G4 − G5 + G9 − G12 + G14 S3 ← G7 + G13 S2 ← −G1 + G2 − G3 + G6 − G11 S4 ← G10 Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 38 / 57
  • 39. Improvement: exploit cheap group inversion How much improvement? Combine the bucket sums to get the answer Like before, desired output a1G1 + · · · + anGn equals S1 + 2S2 + 3S3 + 4S4 Instead of 2c − 1 buckets, we now have only 2c−1 buckets. Bucket accumulation works exactly as before, except with half the buckets. Bucket accumulation costs drop from ∼ 2c to ∼ 2c−1. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 39 / 57
  • 40. Improvement: exploit cheap group inversion How much improvement? Total cost of the improved bucket method New approximate cost in group (+) ops: b c n + 2c−1 for your choice of c. Option 1: Use the same c as before and enjoy 50% saving in bucket accumulation costs. Option 2: Set c ← c + 1, which reduces the multiple of n: b c+1 (n + 2c ) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 40 / 57
  • 41. Improvement: exploit cheap group inversion How much improvement? How much improvement? Under option 2 the multiple of n improves by the factor c c+1. Example. c = 19 =⇒ 5% improvement (ignoring bucket accumulation cost). As discussed last time, there might be other reasons not to change c. Concrete improvement I implemented a stupid PoC in gnark keeping c = 16 (option 1) and observed a 5.7% speed improvement for n = 106 inputs. [GitHub] Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 41 / 57
  • 42. Improvement: exploit cheap group inversion Scalars with negative digits How to express scalars with negative digits? In the basic bucket method it’s easy to partition b-bit scalars into c-bit parts. Example. (b, c) = (12, 3). Given a = 1368 we write a = (2, 3, 5, 0): 1368 in binary : 010 2 101 5 011 3 000 0 In general: we are given for free a0, . . . , ab/c−1 from {0, . . . , 2c − 1} with a = b/c−1 i=0 ai 2ci We need to find a0, . . . , ab/c−1 from {−2c−1, . . . , 2c−1 − 1} with a = b/c−1 i=0 ai 2ci Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 42 / 57
  • 43. Improvement: exploit cheap group inversion Scalars with negative digits Lending: like borrowing, except the opposite function Signed-Digits(a0, . . . , ab/c−1) for i ← 0, . . . , b/c − 1 do if ai ≥ 2c−1 then assert: i = b/c − 1 No overflow for final digit! ai ← ai − 2c Force this digit into {−2c−1, . . . , 2c−1 − 1} ai+1 ← ai+1 + 1 Lend 2c to the next digit else ai ← ai end if end for return a0, . . . , ab/c−1 end function Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 43 / 57
  • 44. Improvement: exploit cheap group inversion Scalars with negative digits On the efficiency of conversion to signed digits Signed-Digits works only if |G| fits comfortably into b bits. Example. BLS12-377 has |G| is 253 bits, typically stored in 256 bits =⇒ 3 unused bit in the final digit =⇒ overflow cannot occur. Conversion to signed digits has a cost. Fortunately, that cost seems to be negligible. What’s the most efficient way to compute signed digits? In my stupid PoC I allocated separate memory for signed digits: n × b bits of additional memory use. For n > 106 that’s 32MB. This memory can be saved if you compute ai on the fly. But it seems you need to compute a0, . . . , ai−1 first, so there’s lots of repeated computation. Careful! This problem becomes much worse when we generalize this idea. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 44 / 57
  • 45. Generalization: exploit cheap scalar multiplication Can we do more? Recall: The ability to cheaply compute (−1)G allowed us to reduce bucket accumulation cost by a factor of 1/2. Question. Suppose there are scalars µ1, . . . , µk (including 1) for which we can cheaply compute µ1G, . . . , µkG. Can we reduce bucket accumulation cost by a factor of 1/k? Dare to dream: approximate cost in group (+) ops becomes b c (n + 2c /k) or b c + log k (n + 2c ) The multiple of n improves by the factor c c+log k Example. If (c, k) = (16, 16) then that’s a ∼20% improvement for MSM! Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 45 / 57
  • 46. Generalization: exploit cheap scalar multiplication Example: one extra scalar, and it’s combo with −1 Suppose we can cheaply compute (−1)G, λG. We can also cheaply compute −λG. Think: (µ1, µ2, µ3, µ4) = (1, −1, λ, −λ) so k = 4 instead of 2 Suppose we can write scalars using digit set { 0, 1, . . . , 2c, −1, . . . , −2c, λ, . . . , λ2c, −λ, . . . , −λ2c } This digit set has size ∼ 4 × 2c and requires only ∼ 2c buckets. In this case, the new λ doubled k. (Hooray!) But that won’t always happen. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 46 / 57
  • 47. Generalization: exploit cheap scalar multiplication Full generalization Suppose we can cheaply multiply by µ1, . . . , µk. Suppose we can write scalars using digit set i=0,...,2c j=1,...,k {iµj } This digit set has size ∼ k2c and requires only ∼ 2c buckets. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 47 / 57
  • 48. Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves Cheap multiplication for elliptic curves Consider an elliptic curve of the form E : y2 = x3 + b for some b in a prime field Fp. (e.g. BLS curves, . . . ). Let G ⊆ E(Fp) be a prime order subgroup. Let β be a cube root of 1 mod p. For any G ∈ G the map φ : (x, y) → (βx, y) acts as φ : G → λG where λ is a cube root of 1 mod |G|. Takeaway Computing λG can be implemented with a single multiplication modulo p—only slightly more costly than computing −G and much cheaper than a group (+) op. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 48 / 57
  • 49. Generalization: exploit cheap scalar multiplication Endomorphism multiplication for elliptic curves More cheap scalars? φ gives us 2 new scalars λ, λ2. Combos with −1 yield a total of k = 6 cheap scalars: (µ1, . . . , µ6) = (1, −1, λ, −λ, λ2 , −λ2 ) Observe: new scalars do not always double k Example. λλ2 = 1 is not another new scalar. More endomorphisms like φ? Galbraith-Lin-Scott find more in 2008/194. But you need to work in an extension field Fpk instead of Fp. We need to convince everyone to switch do a different base field. It might be worth it: Hu-Longa-Zu demonstrate a speed-up for single-scalar multiplication. [link] Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 49 / 57
  • 50. Generalization: exploit cheap scalar multiplication Strange digit sets Complication: large multiples −1 is a well-behaved scalar Cheap to convert scalars to signed digits Overflow is easy to quantify, easy to avoid. Other scalars λ might be very large Example. For BLS12-377, λ is 129 bits How to convert scalars to digit sets containing λ? Overflow might kill us Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 50 / 57
  • 51. Generalization: exploit cheap scalar multiplication Strange digit sets First attempt to generalize Signed-Digits Suppose: We have a scalar a whose ith digit ai is large: 2c−1 ≤ ai < 2c. Want to map it to a small bucket label ai with 0 ≤ ai < 2c−1 . We can write λai = ai + d2c for some integers ai , d with ai in the desired range. Example. We saw (λ, d) = (−1, −1). Then we can do the following: Set ai ← ai + d2c and ai+1 ← ai+1 − d. (Pray that we do not overflow!) During bucket accumulation for G ∈ G, add point λG into bucket ai . Problem. If λ is large then d will be very large =⇒ digits ai become very large =⇒ badness Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 51 / 57
  • 52. Generalization: exploit cheap scalar multiplication Strange digit sets Mitigation: borrow from higher digits Write λai = ai + d12c + · · · + d 2c for di , . . . , d in some reasonable digit set. Need to set several digits instead of just one: ai+1 ← ai+1 − d1 ... ai+ ← ai+ − d Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 52 / 57
  • 53. Generalization: exploit cheap scalar multiplication Strange digit sets Problem: that’s fine until we run out of digits We cannot completely avoid overflow: Example. If |G| is 256 bits and λ is 128 bits (like BLS12-377) then this optimization can be used for only the lower half of digits =⇒ 50% performance penalty. Example. If |G|, λ have equal bit lengths then this optimization cannot be used at all. Open problem. Can we do better? Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 53 / 57
  • 54. A boost for precomputation Precomputation, revisited On slide 27 we observed a trade-off for precomputed storage: cost of the bucket method can be reduced to b c (n + 2c − k) at a cost of storing kn extra group elements. That’s not a very good trade-off. Precomputation gives a better trade-off when combined with the new method of signed digits: b c (n + 2c /k) for (k − 1)n extra storage. (Thanks to Alexandre Belling.) Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 54 / 57
  • 55. A boost for precomputation Precomputation + signed digits = significant improvement Advantage. We can choose λ to play nice with signed digits (e.g. λ = 2, 3, . . . ) Faster, simpler conversion to the new digit set; less worry of overflow. Each new precomputed multiple λG significantly increases the number k of cheap scalars µ1, . . . , µk at our disposal. Example. Suppose we start with 6 scalars µ1, . . . , µ6. (Perhaps obtained from endomorphisms.) Each additional precomputed multiple λ adds 6 new cheap scalars: λµ1, . . . , λµ6 On slide 45 we estimated a ∼20% improvement for k ≥ 16. (Ignoring the cost to convert digit sets!) We could exceed this target k = 16 only 2n extra storage. Example. Even with only 2 scalars (1, −1) we can reach k = 16 with 7n extra storage. I did not implement this improvement. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 55 / 57
  • 56. Get off my lawn Summary of open problems Find an efficient way to write scalars using the digit set from slide 47 without overflow Prior art for SSM ([GLV], [HPX]) use a SVP lattice solution—can it be adapted to MSM? Find more scalars µ that admit cheap scalar multiplication. The best lead I know is 2008/194 and follow-ups. Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 56 / 57
  • 57. Get off my lawn Fin Thank you! Gus Gutoski (zkTeam, ConsenSys R&D) MSM: SotA and new ideas June 1, 2020 57 / 57