This week's session covers new work from Justin Thaler (GWU) et al on Lasso/Jolt.
Lasso is a new lookup argument (more on this below) with a dramatically faster prover. Our initial implementation provides roughly a 10x speedup over the lookup argument in the popular, well-engineered halo2 toolchain; we expect improvements of around 40x when optimizations are complete. To demonstrate, we’re releasing the open source implementation, written in Rust. We invite the community to help us make Lasso as fast and robust as possible.
The second, accompanying innovation to Lasso is Jolt, a new approach to zkVM (zero knowledge virtual machine) design that builds on Lasso. Jolt realizes the “lookup singularity” – a vision initially laid out by Barry Whitehat of the Ethereum Foundation for simpler tooling and lightweight, lookup-centric circuits (more on why this matters below). Relative to existing zkVMs, we expect Jolt to achieve similar or better performance – and importantly, a more streamlined and accessible developer experience. With Jolt, it will be easier for developers to write fast SNARKs in their high-level language of choice.
Lasso: https://people.cs.georgetown.edu/jthaler/Lasso-paper.pdf
Jolt: https://people.cs.georgetown.edu/jthaler/Jolt-paper.pdf
1. Justin Thaler
Georgetown University and a16z crypto research
Joint work with:
Srinath Setty (Microsoft Research), Riad Wahby
(CMU), Arasu Arun (NYU), Sam Ragsdale (a16z),
Michael Zhu (a16z)
Lasso + Jolt: A Deep Dive
2. Presentation Outline
• What are lookup arguments?
• What are Lasso/Jolt?
• Lasso in detail.
• Jolt in detail.
• How to think about Lasso as a tool.
• And where else will lookup arguments be useful outside of zkVMs?
3. Lookup arguments: what are they?
• Unindexed lookup argument:
• Lets P commit to a vector 𝑎 ∈ 𝑭!, and prove that every entry of 𝑎 resides in a
pre-determined table 𝑡 ∈ 𝑭".
• For every entry 𝑎# there is an index 𝑏# such that 𝑎# = 𝑡 𝑏# .
• Indexed lookup argument:
• Lets P commit to vectors 𝑎, 𝑏 ∈ 𝑭!, and prove that 𝑎# = 𝑡 𝑏# for all 𝑖.
• We call 𝑎 the vector of lookup values and 𝑏 the indices.
4. Lookup arguments: what are they?
• Unindexed lookup argument:
• Lets P commit to a vector 𝑎 ∈ 𝑭!, and prove that every entry of 𝑎 resides in a
pre-determined table 𝑡 ∈ 𝑭".
• For every entry 𝑎# there is an index 𝑏# such that 𝑎# = 𝑡 𝑏# .
• Indexed lookup argument:
• Lets P commit to vectors 𝑎, 𝑏 ∈ 𝑭!, and prove that 𝑎# = 𝑡 𝑏# for all 𝑖.
• We call 𝑎 the vector of lookup values and 𝑏 the indices.
• Unindexed lookups are proofs of a subset relationship (i.e., batch set-membership
proofs).
• 𝑎 specifies a subset of 𝑡.
• Indexed lookups are reads into a read-only memory.
• 𝑡 is the memory, and 𝑎# = 𝑡 𝑏# is a read of memory cell 𝑏#.
5. Lasso+Jolt: what are they?
• Lasso: new family of (indexed) lookup arguments.
• P is an order of magnitude faster than in prior works.
• Addresses key bottleneck for P: commitment costs.
• P commits to fewer field elements, and all of them are small.
• No commitment to 𝑡 needed for many tables.
• Support for gigantic tables (decomposable, or LDE-structured).
• P commitment costs: 𝑂(𝑐(𝑚 + 𝑁$/&)) field elements.
• Jolt: new zkVM technique.
• Much lower commitment costs for P than prior works.
• Primitive instructions are implemented via one lookup into the
entire evaluation table of the instruction.
6. Lasso+Jolt: what are they?
• Lasso: new family of (indexed) lookup arguments.
• P is an order of magnitude faster than in prior works.
• Addresses key bottleneck for P: commitment costs.
• P commits to fewer field elements, and all of them are small.
• No commitment to 𝑡 needed for many tables.
• Support for gigantic tables (decomposable, or LDE-structured).
• P commitment costs: 𝑂(𝑐(𝑚 + 𝑁$/&)) field elements.
• Jolt: new zkVM technique.
• Much lower commitment costs for P than prior works.
• Primitive instructions are implemented via one lookup into the
entire evaluation table of the instruction.
7. Lasso+Jolt: what are they?
• Lasso: new family of (indexed) lookup arguments.
• P is an order of magnitude faster than in prior works.
• Addresses key bottleneck for P: commitment costs.
• P commits to fewer field elements, and all of them are small.
• No commitment to 𝑡 needed for many tables.
• Support for gigantic tables (decomposable, or LDE-structured).
• P commitment costs: 𝑂(𝑐(𝑚 + 𝑁$/&)) field elements.
• Jolt: new zkVM technique.
• Much lower commitment costs for P than prior works.
• Primitive instructions are implemented via one lookup into the
entire evaluation table of the instruction.
9. Lasso costs in detail
• For 𝑚 indexed lookups into a table of size 𝑁, using parameter 𝑐:
• P commits to 3𝑐𝑚 + 𝑐𝑁!/#
field elements.
• All of them are small, say, in the set {0, 1, … , 𝑚}.
• With MSM-based polynomial commitment schemes, P does (roughly)
just one group operation per (small) committed field element.
• Examples: KZG-based, IPA/Bulletproofs, Hyrax, Dory, etc.
• 𝑐=1 is a special case.
• P commits to only 𝑚+𝑁 field elements.
• Even amongst these 𝑚+𝑁, many are 0.
• Hence “free” to commit to with MSM-based schemes.
• Specifically, at most 2𝑚 are non-zero.
• If every read is of a different table cell, 𝑚 of the field elements are
equal to 1, and the rest are 0s.
• V costs:
• 𝑂(log 𝑚) field ops and hash evaluations (from Fiat-Shamir).
• Plus one evaluation proof for a committed polynomial of size 𝑁!/#.
• Low enough V costs to reduce further via composition/recursion.
10. Lasso costs in detail
• For 𝑚 indexed lookups into a table of size 𝑁, using parameter 𝑐:
• P commits to 3𝑐𝑚 + 𝑐𝑁!/#
field elements.
• All of them are small, say, in the set {0, 1, … , 𝑚}.
• With MSM-based polynomial commitment schemes, P does (roughly)
just one group operation per (small) committed field element.
• Examples: KZG-based, IPA/Bulletproofs, Hyrax, Dory, etc.
• 𝑐=1 is a special case. I call it “Basic-Lasso”.
• P commits to only 𝑚+𝑁 field elements.
• Even amongst these 𝑚+𝑁, many are 0.
• Hence “free” to commit to with MSM-based schemes.
• Specifically, at most 2𝑚 are non-zero.
• If every read is of a different table cell, 𝑚 of the field elements are
equal to 1, and the rest are 0s.
• V costs:
• 𝑂(log 𝑚) field ops and hash evaluations (from Fiat-Shamir).
• Plus one evaluation proof for a committed polynomial of size 𝑁!/#.
• Low enough V costs to reduce further via composition/recursion.
11. Lasso costs in detail
• For 𝑚 indexed lookups into a table of size 𝑁, using parameter 𝑐:
• P commits to 3𝑐𝑚 + 𝑐𝑁!/#
field elements.
• All of them are small, say, in the set {0, 1, … , 𝑚}.
• With MSM-based polynomial commitment schemes, P does (roughly)
just one group operation per (small) committed field element.
• Examples: KZG-based, IPA/Bulletproofs, Hyrax, Dory, etc.
• 𝑐=1 is a special case. I call it “Basic-Lasso”.
• P commits to only 𝑚+𝑁 field elements.
• Even amongst these 𝑚+𝑁, many are 0.
• Hence “free” to commit to with MSM-based schemes.
• Specifically, at most 2𝑚 are non-zero.
• If every read is of a different table cell, 𝑚 of the field elements are
equal to 1, and the rest are 0s.
• V costs:
• 𝑂(log 𝑚) field ops and hash evaluations (from Fiat-Shamir).
• Plus one evaluation proof for a committed polynomial of size 𝑁!/#.
• Low enough V costs to reduce further via composition/recursion.
12. Lasso applied to huge tables: 𝑐>1
• Most big lookup tables arising in practice are decomposable.
• Can answer an (indexed) lookup into the big table of size 𝑁 by performing
roughly 𝑐 lookups into tables of size 𝑁$/& and “collating” the results.
• Lasso handles the collation with the sum-check protocol.
• No extra commitment costs for P.
• Can view Lasso with 𝑐>1 as a generic reduction from lookups into big,
decomposable tables to lookups into small tables.
• Can use any lookup argument for the small tables, not just Lasso with
𝑐 =1.
• Major caveat: the small-table lookup argument must be indexed.
• There are known transformations from unindexed lookup arguments to
indexed ones.
• But they either do not preserve “smallness” of table entries or do not
preserve decomposability of the big table!
13. Lasso applied to huge tables: 𝑐>1
• Most big lookup tables arising in practice are decomposable.
• Can answer an (indexed) lookup into the big table of size 𝑁 by performing
roughly 𝑐 lookups into tables of size 𝑁$/& and “collating” the results.
• Lasso handles the collation with the sum-check protocol.
• No extra commitment costs for P.
• Can view Lasso with 𝑐>1 as a generic reduction from lookups into big,
decomposable tables to lookups into small tables.
• Can use any lookup argument for the small tables.
• Lasso uses Basic-Lasso on the small tables.
• Major caveat: the small-table lookup argument must be indexed.
• There are known transformations from unindexed lookup arguments to
indexed ones.
• But they either do not preserve “smallness” of table entries or do not
preserve decomposability of the big table!
14. Lasso applied to huge tables: 𝑐>1
• Most big lookup tables arising in practice are decomposable.
• Can answer an (indexed) lookup into the big table of size 𝑁 by performing
roughly 𝑐 lookups into tables of size 𝑁$/& and “collating” the results.
• Lasso handles the collation with the sum-check protocol.
• No extra commitment costs for P.
• Can view Lasso with 𝑐>1 as a generic reduction from lookups into big,
decomposable tables to lookups into small tables.
• Can use any lookup argument for the small tables.
• Lasso uses Basic-Lasso on the small tables.
• Major caveat: the small-table lookup argument must be indexed.
• There are known transformations from unindexed lookup arguments to
indexed ones.
• But they either do not preserve “smallness” of table entries or do not
preserve decomposability of the big table.
• Because they “pack” indices and values together into a single field element.
15. Background: Grand Product Arguments
• All known lookup arguments use something called a grand product argument.
• A SNARK for proving the product of 𝑛 committed values.
• Popular grand product arguments today have P commit to 𝑛 extra values (partial
products).
• This is unnecessary.
• T13: gave an optimized variant of the GKR protocol (sum-check-based interactive proof
for circuit evaluation).
• No commitment costs for P.
• P does linear number of field operations.
• Proof size/V time is 𝑂 log 𝑛 $ field ops (and hash evaluations from Fiat-Shamir).
• Much less than FRI, concretely and asymptotically.
• [Lee, Setty 2019] reduce V costs to about 𝑂 log(𝑛) with slight increase in commitment
costs for P.
16. Key Performance Insight in Basic-Lasso
• For many existing lookup arguments, if you swap out the invoked grand product
argument for T13, P commits only to small field elements.
• See upcoming work on LogUp by Papini and Haböck.
• More involved than just a simple swap of the grand product argument.
• Remember: Jolt needs an indexed lookup argument that plays nicely with
collating small-table lookup results into big-table results.
• See my second a16z talk for details on how Basic-Lasso works.
17. Key Performance Insight in Basic-Lasso
• For many existing lookup arguments, if you swap out the invoked grand product
argument for T13, P commits only to small field elements.
• See upcoming work on LogUp by Papini and Haböck.
• More involved than just a simple swap of a grand product argument.
• Remember: Lasso/Jolt need an indexed lookup argument that plays nicely with
collating small-table lookup results into big-table results.
• Technical takeaway: The community has still not fully internalized the power of
sum-check to avoid commitment costs for P.
18. Key Performance Insight in Basic-Lasso
• For many existing lookup arguments, if you swap out the invoked grand product
argument for T13, P commits only to small field elements.
• See upcoming work on LogUp by Papini and Haböck.
• More involved than just a simple swap of a grand product argument.
• Remember: Lasso/Jolt need an indexed lookup argument that plays nicely with
collating small-table lookup results into big-table results.
• Technical takeaway: The community has still not fully internalized the power of
sum-check to avoid commitment costs for P.
• See my second a16z talk for details on how Basic-Lasso works.
• Last part of this talk: more info about how to think of Lasso as a tool.
20. Front-ends today for VM execution
• Say P claims to have run a computer program for 𝑚 steps.
• Say the program is written in the assembly language for a VM.
• Popular VM’s targeted: RISC-V, Ethereum Virtual Machine
(EVM)
• Today, front-ends produce a circuit that, for each step of the
computation:
1. Figures out what instruction to execute at that step.
2. Executes that instruction.
Lasso lets one replace Step 2 with a single lookup.
For each instruction, the table stores the entire evaluation table
of the function.
If instruction 𝑓 operations on two 64-bit inputs, the table stores
𝑓(𝑥, 𝑦) for every pair of 64-bit inputs 𝑥, 𝑦 .
This table has size 2()*.
All RISC-V instructions are decomposable.
21. Jolt: A new front-end paradigm
• Say P claims to have run a computer program for 𝑚 steps.
• Say the program is written in the assembly language for a VM.
• Popular VM’s targeted: RISC-V, Ethereum Virtual Machine
(EVM)
• Today, front-ends produce a circuit that, for each step of the
computation:
1. Figures out what instruction to execute at that step.
2. Executes that instruction.
• Lasso lets one replace Step 2 with a single lookup.
• For each instruction, the table stores the entire evaluation
table of the instruction.
If instruction 𝑓 operations on two 64-bit inputs, the table stores
𝑓(𝑥, 𝑦) for every pair of 64-bit inputs 𝑥, 𝑦 .
This table has size 2()*.
All RISC-V instructions are decomposable.
22. Jolt: A new front-end paradigm
• Say P claims to have run a computer program for 𝑚 steps.
• Say the program is written in the assembly language for a VM.
• Popular VM’s targeted: RISC-V, Ethereum Virtual Machine
(EVM)
• Today, front-ends produce a circuit that, for each step of the
computation:
1. Figures out what instruction to execute at that step.
2. Executes that instruction.
• Lasso lets one replace Step 2 with a single lookup.
• For each instruction, the table stores the entire evaluation
table of the instruction.
• If instruction 𝑓 operations on two 64-bit inputs, the table
stores 𝑓(𝑥, 𝑦) for every pair of 64-bit inputs 𝑥, 𝑦 .
• This table has size 2()*.
• Jolt shows that all RISC-V instructions are decomposable.
23. Jolt in a picture
query to be split into “chunks” which are fed into di↵erent subtables. The prover provides these chunks as
advice, which are c in number for some small constant c, and hence approximately W/c or 2W/c bits long,
depending on the structure of z. The constraint system must verify that the chunks correctly constitute z,
but need not perform any range checks as the Lasso algorithm itself later implicitly enforces these on the
chunks.
24. Jolt in context
• Jolt is a realization of Barry Whitehat’s “lookup singularity” vision (?)
• Auditability/Simplicity/Extensibility benefits.
• Performance benefits.
• A qualitatively different way of building zkVMs.
• Yet with many similarities to things people are already doing.
• People are already computing functions like bitwise-AND by
doing several lookups into small tables and combining the
results.
• Differences/keys to Jolt:
• The new small-table lookup argument is much faster for P.
• The new small-table lookup argument is naturally indexed.
• The collation technique is much faster for P.
• “Free” to multiply and add results of small-table lookups.
• These differences let us do almost everything in VM emulation
with lookups.
25. Jolt in context
• Jolt is a realization of Barry Whitehat’s “lookup singularity” vision (?)
• Auditability/Simplicity/Extensibility benefits.
• Performance benefits.
• A qualitatively different way of building zkVMs.
• Yet with many similarities to things people are already doing.
• People are already computing functions like bitwise-AND by
doing several lookups into small tables and combining the
results.
• Differences/keys to Jolt:
• The new small-table lookup argument is much faster for P.
• The new small-table lookup argument is naturally indexed.
• The collation technique is much faster for P.
• “Free” to multiply and add results of small-table lookups.
• These differences let us do almost everything in VM emulation
with lookups.
27. Example 1: Bitwise-AND
• Decomposable: to compute bitwise-AND of two 64-bit inputs 𝑥, 𝑦:
• Break each of 𝑥, 𝑦 into, say, 𝑐 = 8 chunks of 8 bits.
• Compute the bitwise-AND of each chunk.
• Concatenate the results.
• i.e., output is ∑+,(
*
8+-( 0 bitwiseAND(𝑥+, 𝑦+).
LDE-structured:
bitwiseAND 𝑥, 𝑦 = :
+,(
./
2+-(
0 𝑥+ 0 𝑦+.
This is a multilinear polynomial that can be evaluated with under
200 field operations.
28. • Decomposable: to compute bitwise-AND of two 64-bit inputs 𝑥, 𝑦:
• Break each of 𝑥, 𝑦 into, say, 𝑐 = 8 chunks of 8 bits.
• Compute the bitwise-AND of each chunk.
• Concatenate the results.
• i.e., output is ∑+,(
*
8+-( 0 bitwiseAND(𝑥+, 𝑦+).
• Avoiding an honest-party committing to the sub-table:
• bitwiseAND(𝑥+, 𝑦+) = ∑0,(
*
20-(
0 𝑥0 0 𝑦0.
• This is a multilinear polynomial that can be evaluated with
under 25 field operations.
• The only information the Lasso V needs about the sub-table is
one evaluation of this polynomial.
Example 1: Bitwise-AND
29. Example 2: RISC-V Addition
• For adding two 64-bit numbers 𝑥, 𝑦, RISC-V prescribes that they be added and
any “overflow bit” be ignored.
• Jolt computes 𝑧 = 𝑥 + 𝑦 in the finite field (via one constraint added to the
ancillary R1CS), and then uses lookups to identify the overflow bit, if any, and
adjust the result accordingly.
30. Example 2: RISC-V Addition
• For adding two 64-bit numbers 𝑥, 𝑦, RISC-V prescribes that they be added and
any “overflow bit” be ignored.
• Jolt computes 𝑧 = 𝑥 + 𝑦 in the finite field (via one constraint added to the
ancillary R1CS), and then uses lookups to identify the overflow bit, if any, and
adjust the result accordingly.
• P commits to the “limb-decomposition” (𝑏!, … , 𝑏#) of the field element z =
𝑥 + 𝑦.
• Let 𝑀 = 2%&/# denote the max value any limb should take.
• A constraint is added to the R1CS to confirm 𝑧 = ∑'(!
#
𝑀')!
> 𝑏' and each
𝑏' is range checked via a lookup into the subtable that stores 0, … , 𝑀 − 1 .
• These checks guarantee that (𝑏!, … , 𝑏#) is really the prescribed limb-
decomposition of 𝑧.
• To identify the overflow bit, one can do a lookup at index 𝑏#, into a table
whose 𝑖'th entry spits out the relevant high-order bit of 𝑖.
31. Example 3: LESS THAN UNSIGNED
• LESS-THAN
• Decomposable: to compute LESS-THAN of two 64-bit inputs
𝑥, 𝑦:
• Break each of 𝑥, 𝑦 into, say, 𝑐 = 8 chunks of 8 bits.
• Compute LESS-THAN (LT) and EQUALITY (EQ) on each
chunk.
• Output is: ∑+,(
*
2+-( 0 LT(𝑥+, 𝑦+) ∏0,+1(
*
EQ (𝑥0, 𝑦0).
LDE-structured:
EQ 𝑥', 𝑦' = ∏()$
*
( 𝑥',(𝑦',(+ (1 − 𝑥',()(1 − 𝑦',()).
LT 𝑥#, 𝑦# = ∑()$
*
(1 − 𝑥#)𝑦# ∏,)(
*
( 𝑥#,(𝑦#,(+ (1 − 𝑥#,()(1 − 𝑦#,()).
Plugging the above into the output expression gives a multilinear
polynomial that can be evaluated with under 200 field
operations.
32. Example 3: LESS THAN UNSIGNED
• LESS-THAN
• Decomposable: to compute LESS-THAN of two 64-bit inputs 𝑥, 𝑦:
• Break each of 𝑥, 𝑦 into, say, 𝑐 = 8 chunks of 8 bits.
• Compute LESS-THAN (LT) and EQUALITY (EQ) on each chunk.
• Output is: ∑+,(
*
2+-( 0 LT(𝑥+, 𝑦+) ∏0,+1(
*
EQ (𝑥0, 𝑦0).
• Avoiding commitments to the two subtables:
• EQ 𝑥', 𝑦' = ∏()$
*
( 𝑥',(𝑦',(+ (1 − 𝑥',()(1 − 𝑦',()).
• LT 𝑥#, 𝑦# = ∑()$
*
(1 − 𝑥#)𝑦# ∏,)(
*
( 𝑥#,(𝑦#,(+ (1 − 𝑥#,()(1 − 𝑦#,()).
• These are multilinear polynomials that can be evaluated with
under 50 field operations.
33. General intuition for Lasso as a tool
• Lasso supports simple operations on the bit-decompositions of field
elements, without requiring P to commit to the individual bits.
• The sub-tables have quickly-evaluable multilinear extensions if each
corresponds to a simple function on the (bits of) the table indices.
• This ensures no honest party has to commit to them in pre-processing.
• Can compute, say, bitwiseAND of two field elements in {0, 1, …, 2^64-1}
with lower P costs than, say, Plonk incurs per addition or multiplication
gate.
• Remember: Lookup arguments are all about economies of scale. They
only make sense to use if doing many lookups into one table (i.e.,
computing many invocations of the same function).
35. SNARKs for repeated function evaluation
• Many previous works have studied SNARKs for repeated function
evaluation.
• Computing the same function 𝑓 on many different inputs
𝑥(, … , 𝑥2.
• They consider a “polynomial” amount of data parallelism.
• If 𝑓 takes inputs of length n, the number of different
inputs is 𝑚 = poly 𝑛 .
• They still force P to evaluate 𝑓 in a very specific way.
• Executing a specific circuit to compute 𝑓.
36. Zooming out: a new view on lookup arguments
• View a lookup table as storing all evaluations of a function 𝑓.
• A lookup argument is then a SNARK for highly repeated evaluation
of 𝑓.
• It lets P prove that a committed vector
((𝑎A, 𝑓 𝑎A ), …, (𝑎B, 𝑓 𝑎B )
consists of correct evaluations of 𝑓 at different inputs 𝑎A, …, 𝑎B.
Due to the 𝑂(𝑐(𝑚 + 𝑁A/D)) cost for P, Lasso is effective only if the
number of lookups 𝑚 is not too much smaller than the table size 𝑁.
The number of copies of 𝑓 should be exponential in the input size to
𝑓.
37. Zooming out: a new view on lookup arguments
• View a lookup table as storing all evaluations of a function 𝑓.
• A lookup argument is then a SNARK for highly repeated evaluation
of 𝑓.
• It lets P prove that a committed vector
((𝑎A, 𝑓 𝑎A ), …, (𝑎B, 𝑓 𝑎B )
consists of correct evaluations of 𝑓 at different inputs 𝑎A, …, 𝑎B.
• Due to the 𝑂(𝑐(𝑚 + 𝑁A/D)) cost for P, Lasso is effective only if the
number of lookups 𝑚 is not too much smaller than the table size 𝑁.
• i.e., The number of copies of 𝑓 should be exponential in the input size to 𝑓.
38. High-level message of this viewpoint
• Lasso is useful wherever the same function is evaluated many times.
• zkVMs are only one such example.
• By definition, the VM abstraction represents the computation as repeated application
of primitive instructions.
• But implementing a VM abstraction comes with substantial performance costs in
general.
• Interesting direction for future work:
• Other/better ways to isolate repeated structure in computation.
• Example work (with Yinuo Zhang and Sriram Sridhar):
• Bit-slicing.
• To evaluate a hash function or block cipher like SHA/AES naturally computed by a Boolean
circuit C on, say, 64 different inputs:
• Pack the first bit of each input into a single field element, the second bit of each input
into a single field element, and so on.
• Replace each AND gate in C with bitwiseAND, each OR gate in C with bitwiseOR, etc.
• Now each output gate of C computes (one bit of) all 64 evaluations of SHA/AES.
• Apply Lasso to this circuit.