Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Accelerators
1. QuantumForce.eu
Search and optimisation algorithms for genomics
on quantum accelerators
04th Apr, 2019
Aritra Sarkar
PhD candidate, Quantum Computer Architecture lab
QuTech (Faculty of Applied Sciences)
Dept. of Q&CE (Faculty of Electrical Engineering, Mathematics and Computer Sciences)
Delft University of Technology
Genomics
Machine
Learning
Quantum
Computing
Application
Platform
Method
access the
presentation
here
3. QuantumForce.eu 3
NISQ acceleration
NISQ
FTQC
QEC
ClassicalSimulationLimit
number of qubits
errorrate
https://arxiv.org/abs/1801.00862 - John Preskill, Quantum Computing in the NISQ era and beyond
NISQ: Noisy Intermediate-Scale Quantum
map problem to quantum:
do:
run Q Algorithm
assess answer
while (result not satisfactory)
save measurement result/statistics
interpret classical answer
HostCPU
Graphics Processing Unit
Field-Programmable Gate Array
Digital Signal Processor
Neural Processing Unit
QuantumAccelerator
7. QuantumForce.eu 7
• Map-to-reference vs. Variant calling
– Multiple solutions evaluated in superposition, but cannot access results for every state
• Superposition is doesn’t have a classical logic equivalent (e.g. AND/OR)
• Generalization of probability theory for complex amplitudes
– Useful when used to explore large solution space but requires only the min/max/mean answer
Superposition vs. ParallelismIndexedbase-pairs
Ref.
Genome
Target
Genome
Differences Variants
embarrassingly parallel
no interaction
need all answers
Not suitable for Q-Acceleration
Ref.
Genome
Splices
Short
Reads
Differences
Index of
min-diff.
Indexedsplices
parallel evolution
global/local interactions
statistical answer
11. QuantumForce.eu 11
Ab initio alignment
Naïve Method
• Substring(/subsequence) matching problem
Exact match
Boyer-Moore
+ Improvements
Knuth-Morris-Pratt
+ Improvements
Suffix Trees
+ Improvements
Exact match
(wildcards)
Needleman-Wunsch
Global Alignment
Smith-Waterman
Local Alignment
Simple Edit Transcript using memoization of Levenshtein Distance
+ Improvements(alphabet/operation weights)
Approximate match
BYP/CL/Myers/hybrid-dynamic methods
Alignment with arbitrary (k bounded) gaps
Approximate match
Approximate match
Burrows-Wheeler-Transform + Smith-Waterman (BWT-SW) All local hits
Burrows-Wheeler-Aligner + super-Maximal Exact Match (BWA-MEM) Heuristics
12. 12QuantumForce.eu
Dissecting a quantum algorithm
Superpose
Soln. Space
Encode
Function
Clever
Process
Measure
Initialize
|0⟩⊗n
Classical
Output
Classical
Input
1. Prepare all-zero state for n-qubits (not so trivial experimentally as it sounds)
2. Full superposition in computational basis (H-gate on all qubits)
– OR, superposition of classical input space
3. Transform superposition to evaluate the function (using 1 & 2 qubit gates)
– OR, evaluate function based on classical input space
4. Somehow* increase the amplitude of the solution space
5. Measure out the state
6. Repeat Steps 1-5 to access the modal classical output
* the quantum magic of interference
13. 13QuantumForce.eu
Evolution
Tight bounds on quantum
searching
… arbitrary initial
amplitude distribution
Quantum Pattern
Matching
Grover Search one solution
full, uniform
database
known Oracle for
solution in database
optimal iterations
multiple (un)known
solutions
full, uniform
database
known Oracle for
solution in database
optimal iterations
multiple known
solutions
arbitrary database
known Oracle for
solution in database
optimal iterations
multiple unknown
solutions
sliding index
database
alphabet based
Oracles
optimal iterations
one solution
sub-string
phonebook
0 Hamming Distance
Oracle
optimal iterations
… Quantum
Bioinformatics
Quantum Associative
Memory
multiple known
solutions
arbitrary database
known Oracle for
solution in database
higher Pmax
iteration
… associative memory
with distributed queries
multiple known
solutions
arbitrary database Binomial Oracle optimal iterations
… improved distributed
queries
multiple unknown
solutions
arbitrary database Binomial Oracle
higher Pmax
iteration
Gen 1
(tested)
QUS
Gen 2
(tested)
QPM
Gen 3
(tested)
QNN
Q Walk / Graph SearchQ Unstructured Search Q Structured Search HSP (abelian/dihedral)
16. 16QuantumForce.eu
Purebreds
• a.k.a. Coherent Protocols
– e.g. Shor’s factorisation, Shor’s discrete-log, QFT, Quantum Phase Estimation, Harrow-Hassidim-Lloyd,
Matrix inversion ...
– Most studied/popular quantum algorithms so far
• Exponential speedup
– Caveats
• Noise tolerance
– Number of qubits for FT
• Circuit depth
• Quantum I/O
– Classical Input: State preparation
– Classical Output: State tomography
– QRAM
O ( f(experimental) x g(no-cloning) x h(algorithm) )
17. 17QuantumForce.eu
Workhorses
• Peter Shor estimates 2048-RSA requires ~5k qubits (times 102-103 physical qubits) & ~107 gates
• Near-term Quantum Algorithms
– low depth circuits without extensive QEC (small-codes)
– enough qubits to just store the problem (hard to do better)
– still solve useful problems with local constraints
– Adaptable optimization algorithms (easy to map to problem)
• Genetic Algorithm / Evolutionary Programs
• Deep Learning
– Quantum Approximate Optimization Algorithm
• NP-Hard combinatorial optimisation problems in Quantum Machine Learning
• Polynomial-time solution for every instance with guaranteed approximation quality bound
• Interesting because of its potential to exhibit near-term quantum supremacy
• Gate-based implementation inspired by Adiabatic QC and Q Annealing
https://www.bcg.com/en-ca/publications/2018/next-decade-quantum-computing-how-play.aspx
19. 19QuantumForce.eu
De novo assembly
• Eulerian path/cycle [De Bruijn Graph]
– Eulerian path is a trail in a finite graph which visits every edge exactly once.
– Eulerian cycle is an Eulerian trail which starts and ends on the same vertex.
• Hamiltonian path/cycle [Overlap-Layout-Consensus]
– Hamiltonian path is a graph path between two vertices of a graph that visits each vertex exactly once.
– Hamiltonian cycle is a path which starts from one node and ends at the same node covering all the nodes of that graph.
– If a Hamiltonian path exists whose endpoints are adjacent, then the resulting graph cycle is called a Hamiltonian cycle.
• Decision version vs. Function version
• Travelling Salesman Problem
– A cycle that visits all nodes of the graph and such that the sum of the edge weights is minimum.
– Find a Hamiltonian cycle of minimum weight.
• Using Quantum Approximate Optimisation Algorithm
+ “Easy” to solve
- Error-prone
- Bad for super-sampled reads
- Bad for long reads
- NP-Hard to solve
20. 20QuantumForce.eu
QAOA
Genomics optimisation
Bitflip (X) mixers
VQE
Controlled-bit-flip (Λf(X)) mixers
XY mixers
Controlled-XY mixers
Permutation mixers
Maximum cut
Maximum-L-SAT
Minimum-L-SAT
Set Splitting
MaxE3LIN2
Maximum Independent Set
Maximum Clique
Minimum Vertex Cover
Maximum Set Packing
Minimum Set Cover
Maximum-K-Colorable Subgraph
Graph Partitioning (Minimum Bisection)
Maximum Bisection
Maximum Vertex K-Cover
Maximum-K-Colorable Induced Subgraph
Minimum Graph Coloring
Minimum Clique Cover
Traveling Salesperson Problem = minimum cost Hamiltonian Cycle
SMS, minimizing total weighted squared tardiness
SMS, minimizing total weighted tardiness
SMS, with release dates
DNA Sequence Reconstruction
by De novo Assembly
21. 21QuantumForce.eu
QAOA
• Quantum/classical Hybrid algorithm
– Parameterised quantum subroutine is run within a classical optimization loop
– Prepare the quantum state | 𝜓 𝜃 , often called the ansatz
– Measure the expectation value 𝜓 𝜃 ℋ 𝜓 𝜃
• By Variational theorem, expectation value ℋ ⟩|𝑎𝑛𝑠𝑎𝑡𝑧 ≥ λ1 (smallest eigenvalue; lowest energy; ground-state)
– Find an optimal choice of real-valued parameters 𝜃 such that the expectation value is minimised
– Implementation based on Variational Quantum Eigensolver primitive
• Challenges
– Heuristic - no general recipe of Ansatz definition works universally
– Optimiser choice
– Initial Parameter selection is arbitrary
– Convergence not always guaranteed
– High number of Iteration
29. 29QuantumForce.eu
QuantumForce.eu
• Big Data Analytics
• Industry 4.0
• Cyber-Physical Systems
• Artificial Intelligence and Machine Learning
..... and other applications
Contacts for collaboration:
• Koen Bertels, CEO
koen@QuantumForce.eu
• Zaid Al-Ars, CTO
zaid@QuantumForce.eu
31. Search and optimisation algorithms for genomics
on quantum accelerators
Aritra Sarkar
Quantum Computer Architecture Lab
QuTech and Department of Quantum & Computer Engineering
Delft University of Technology
What’s the big challenge?
We all know about the Moore’s Law of transistor scaling – more and more transistors are getting integrated into chips, but we are no longer making better processors – power, memory and frequency walls
One the other hand – the amount of data generated from genomics are also increasing exponentially – enabled by the lower cost of sequencing
Within the next decade, the amount of data generated per year would be 2-40 exabytes – which is huge!
Our computing clusters are not equipped to handle this kind of data volume – a driver for us to turn to the exascale computing promise of quantum systems
sorry if this is not the latest version of the ever-evolving stack diagram
Putting this project in a bigger perspective – let’s have a look at the Whole Genome Sequencing pipeline
Extract DNA from human, crops or microorganisms
Sequence the DNA in a wet lab – i.e. read the base pairs of the DNA – problem: it is too long (3 billion bp) and tangled to be read in one go
Oversample the DNA (make many copies), cut it off in smaller pieces (shotgun sequencing) and the sequencing machine gives back a bucket of short reads
Stitch them back together (like solving a jigsaw puzzle)
Use the reconstructed DNA for further analysis like disease diagnosis (personalized medicine) or GM crops
We focus on the Data analysis part of this pipeline
It too consists of a bunch of algorithmic steps as shown in the GATK best practices from Broad Institute – these are for example, reference mapping, indel alignment, variant calling
We are going to focus only on the reference mapping (solving the jigsaw puzzle) as it is one of the most computation intensive part
There are 3 basic ways of playing with functions
The first is simple function evaluation. We feed in the input, we get back the output.
Another way is when we have a set of inputs and their corresponding outputs; we want to infer what is the transformation function. This is called inductive logic and is the basis of machine learning where we train a predictor to approximate the function.
The third way of course is when we have the output and the function, and we want to know what input resulted in the particular output. This is function inversion. And we know, if we have an inverse function, we can evaluate it to get back x.
But what if such a function does not exist? Like you cannot ask your sibling the pattern lock of their mobile.
Of course there is this other possibility where you try out every combinations possible, till you find the solution. Start out with the ninja star if your sibling is in QuTech!
Which method should we choose for our problem? For that we need to have a quick look at computation complexity classes.
For our interest, there are there 2 important classes, Polynomial complexity (P) or the easy problems, and Non-deterministic Polynomial time (NP), or simply the intractable ones.
Where does quantum computers lie in this venn diagram? Well, no one knows for sure, but the quantum equivalent for feasible class, called BQP looks somewhat like this. For example, Shor’s famous algorithm lie in the middle star region. Whereas, we arer trying to solve a problem in the upper star – where even efficient quantum algorithms don’t exist.
So we cannot invert. But the good news is that, unlike the stupid idea of trying out every possible pattern lock in a classical system, in quantum we can use the parallelism of superposition states. Which in a way can be reasoned as, trying out all possible paths at the same time.
So let’s define our problem: sub-sequence index search
We have a reference genome, which is 3 billion bp for humans, but for this example, I consider a much shorter one of 32 bp. I have used 4 colours for the 4 DNA alphabet
Now take one of the read from the bucket. Again the typical size is much larger, but here, it is a modest 5 character string.
Proceeding in a naïve linear search style – we start by matching the read at position 0 and give it a score, then at position 1, and so on
What we want from our algorithm is the position where it matches best. Note, it doesn’t completely match – approximate yet optimal
Additionally some algorithms also give us back the corrected version of our noisy query
There are many ways of doing quantum pattern matching
The general exploration on pattern matching explored in this thesis can be extended to other domains.
Of course we can do amino-acid sequencing by just extending the alphabet size
DNA fingerprinting is comparing two large sequences and we can use the concept of Hamming distance and amplify the mismatches.
Motif finding might also be trivial as it involves finding the consensus in a stored memory.
There are other related domains where similar pattern matching applications are useful. For example in speech signals, or stock market patterns.
And definitely on images. For example, this is an output from the same algorithm that was used on DNA tweeked for 2D black and white images.
Finding the template took 17 qubits and some 50k gates.
MEST: not far before Q App Dev becomes a buzzword and these developers start tinkering with Q Algos to find a killer app for QC
Image copyright: Novikov Aleksey
these are the ones people already tried using
How much wood could a woodchuck chuck
If a woodchuck could chuck wood?
As much wood as a woodchuck could chuck,
If a woodchuck could chuck wood.
“Run a shitty circuit shitload times such that there are on average less shit” - Malay