A quantum-inspired optimization heuristic for
the multiple sequence alignment problem in
bio-computing
10th International Conference on Information, Intelligence, Systems and Ap-
plications (IISA 2019)
Patras, Greece
Konstantinos Giannakis, Christos Papalitsas, Georgia Theocharopoulou, Sofia
Fanarioti, and Theodore Andronikos
July 15-17, 2019
Dept. of Informatics, Ionian University, Greece
kgiann@ionio.gr
Acknowledgment
This research is funded in the context of the project “Investigating alternative computational
methods and their use in computational problems related to optimization and game theory,” (MIS
5007500) under the call for proposals “Supporting researchers with an emphasis on young
researchers” (EDULL34). The project is co-financed by Greece and the European Union (European
Social Fund - ESF) by the Operational Programme Human Resources Development, Education and
Lifelong Learning 2014-2020.
1
Project info
∙
∙ Natural and biomolecular computing
∙ Modeling with Membrane automata
∙ Unconventional Computational Methods
∙ Quantum Optimization Algorithms and Strategies
→http://di.ionio.gr/alcomopt/←
2
An overview
◎ Biological Sequences
◎ Alignment
∙ Single and multiple alignment (MSA)
◎ NP-hard for most cases
◎ Bio-inspired techniques
◎ Translate the MSA problem as a TSP instance
∙ Proposal of a quantum-inspired method
∙ To progressively solve this problem
∙ Evaluated through simulations against other aligning methods
∙ Provide a good initial alignment, especially for large sets of
sequences
3
Background
Introduction
∙ Biological sequences play an important role in various fields
✓ The multiple sequence alignment or MSA considers the problem
of aligning a set of sequences
∙ They usually represent proteins or nucleotides
∙ Pairwise alignment → 2 sequences
∙ Related to non-biological fields, e.g., in information-related
fields, natural language processing, finances, etc.
 NP-complete with SP-score and NP-hard for most metrics
∙ Global and local alignments
∙ Usually dynamic programming approaches
- Heuristic and/or probabilistic methods for larger instances
5
Motivaton
∙ Arranging sequences of DNA, RNA, or proteins in order to study
properties and relationships among themselves
∙ Useful in many fields → it can reveal interesting properties
about the sequences (e.g., phylogenetic trees)
∙ Sequences are usually large and multitudinous
∙ Need for good alignments within acceptable time frames, at
least for an initial solution
6
Pairwise and multiple alignment
∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC:
A C T - T A A -
- C A G T A - C
∙ If you add an extra sequence S3 = CGGACA:
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
7
Solutions and methods
∙ Progressive, iterative, motif-based, and hybrid methods
✓ Progressive: the most similar sequences are aligned first and
then the rest of them are aligned in a decreasing order
∙ Various scoring functions (e.g., sum-of-pairs metric, the
weighted sum-of-pairs, etc.)
∙ Quantum- and bio-inspired approaches1,2,3
∙ Trying to take advantage of the superiority of quantum
computing
1
A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by
quantum annealing”. In: Scientific reports 2 (2012), p. 571.
2
D-Wave Initiates Open Quantum Software Environment. D-Wave Systems.
3
L. Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: |
Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006, p. 360.
8
Other approaches
∙ Needleman–Wunsch algorithm for global pairwise alignment
(dynamic programming; O(n2
) complexity)
∙ Smith–Waterman algorithm for local pairwise alignment
∙ Platforms: MUSCLE, Multalin, CLUSTALs (Omega)
∙ Multalin: a progressive approach based on the UPGMA,
∙ CLUSTAL W: a progressive approach based upon the
neighbor-joining (NJ) algorithm
∙ Both UPGMA and (NJ) require the construction of a complete
distance matrix of the sequences
9
Related literature and quantum computation
∙ Quantum computing4
∙ D-WAVE
∙ Neven’s law
∙ Quantum algorithms for pattern-matching5
and sequence
comparison problems6
∙ A quantum-inspired genetic algorithm for MSA7
∙ Combining TSP with MSA8,9
4
M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge
University Press, 2010.
5
A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018).
6
L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons:
Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532.
7
Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding for
multiple sequence alignment”. In: Journal of bioinformatics and computational biology 8.01
(2010), pp. 59–75.
8
J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman
problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012).
9
C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for evolutionary
tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627.
10
Main part
Notation
∙ Σ = {s1, s2, .., sm} a finite alphabet; a sequence is a string over Σ
∙ Two sequences S′
1 and S′
2 are alignments of two sequences S1
and S2 if S′
i can be obtained from Si by removing gaps
∙ This holds for more sequences, too
∙ m: the max length of sequences S′
1 and S′
2
∙ The total cost for this alignment can be expressed as
m∑
i=1
d(S′
1(i), S′
2(i))
∙ d is the chosen scoring function
∙ S′
j (i) is the ith character of sequence S′
j
12
Scoring functions
∙ A standard score scheme:
d(S′
1(i), S′
2(i)) =
{
1, if match
0, if mismatch
∙ An optimal alignment is the one with the maximum score
∙ The matrix of size n × m represents an alignment of the n
sequences
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
∙ Several scoring schemes
∙ The most widely used → sum of pairs
13
Sum of pairs
∙
(mn
2
)
total pairs
n−1∑
i=1
n∑
j=i+1
d(Si, Sj)
∙ Can “punish” mismatches, e.g., by giving -1 to any mismatch
∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC:
A C T - T A A -
- C A G T A - C
∙ The score for this alignment is 3
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
∙ The score for the above alignment is 9
14
The idea
∙ Similarity among sequences is modeled as a TSP instance
∙ The similarity matrix is used as the neighboring matrix
∙ Proper normalization in order to have a minimization problem
∙ The underlying structure is a complete, undirected graph
G = (V, A)
∙ Each edge is associated with a weight wij
15
The proposed method
Our approach
∙ We apply the qGVNS quantum-inspired metaheuristic10
∙ It consists of a VND local search, a diversification procedure,
and a neighborhood change step
∙ VNS and GVNS have bio-inspired origins
∙ Novelty: In qGVNS, a modified diversification phase successfully
resolves local minima traps by exploiting quantum computation
techniques
∙ A simulated quantum register generates a complex
n-dimensional unit vector during each shaking call
∙ The complex n-dimensional vector is fed as input and outputs a
real n-dimensional vector
∙ The i-th component of the real vector is equal to the modulus
squared of the i-th component of the complex vector
10
Christos Papalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search)
Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38.
17
The qGVNS routine
Data: an initial solution
Result: an optimized solution
1 Initialization of the feasibility distance matrix
2 begin
3 X ← Nearest Neighbor heuristic;
4 repeat
5 X′
← Quantum-Perturbation(X)
6 X′′
← pipeVND(X′
)
7 if X′′
is better than X′
then
8 X ← X′′
9 end
10 until optimal solution is found or time limit is met;
11 end
18
The proposed method
Data: a set of sequences
Result: a set of aligned sequences
1 Disti,j: The pairwise distance matrix of the sequences
2 begin
3 i ← the i-th sequence
4 j ← the j-th sequence
5 repeat
6 Disti,j ← Calculate pairwise distance of i-th and j-th
sequences
7 until all pairs are examined;
8 end
9 Sol ← Approximation of the shortest Hamiltonian of Dist ▷ Derived
using qGVNS of Algorithm 1
10 Score_matrix ▷ Choose a score matrix of your choice
11 Gap_penalty ▷ Choose the value of gap penalty
12 Align sequences following the Sol matrix as guide
19
In a nutshell
∙ Initial calculation of the distance matrix among the sequences
∙ This is a trivial part for most MSA algorithms
∙ The choices for a score matrix and a gap penalty do not affect
the main algorithm
✓ They can be separately chosen
∙ It enhances its applicability
20
Analysis
∙ Need for the calculation of pairwise distances of the sequences
∙ Exhaustive comparisons
∙ Calculation of the complete distance matrix
∙ For n sequences, this calculation requires O(m(n2
− n))
comparisons (m is the maximum length)
∙ The calculation of the shortest Hamiltonian path is a known
NP-hard problem
 It seems like the proposed algorithm simply hides the “hard”
computational part
✓ This is not true, though, since the GVNS-based heuristic only
approximates the solution in a specific time frame
∙ After the qGVNS, the actual alignment is straightforward through
a simple (binary) guide tree from the above resulting ordering
21
Guide tree
Seq. 1
Seq. 2
Seq. 3
Seq. 4
Seq. 5
Seq. 6
Seq. 7
Seq. 8
.............
A depiction of the tree produced after calculating the shortest path of the
distance matrix.
22
Results
Evaluation
∙ Benchmarked on 33 real sequences retrieved from NCBI
∙ Samples from different kinds of organisms were chosen
(archaea, invertebrates, plants, and plasmids)
∙ MatLab
∙ For the first set, each match is scored with 1, whereas any
mismatch is scored with -1
∙ For the second set, no negative score for mismatches
∙ Compared against UPGMA and single neighbor algorithm
24
Sum of pairs score with negative mismatches (1/2)
-1.6x108
-1.4x10
8
-1.2x108
-1x10
8
-8x107
-6x107
-4x107
-2x107
0
1
11
17
5
2
14
10
8
13
26
19
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(1/3)
UPGA
Single neighbor
Proposed algorithm
Set (1/3)
-9x106
-8x106
-7x106
-6x106
-5x106
-4x10
6
-3x106
-2x10
6
-1x106
0
29
27
7
28
31
30
23
16
12
6
22
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(2/3)
UPGA
Single neighbor
Proposed algorithm
Set (2/3)
25
Sum of pairs score with negative mismatches (2/2)
-500000
-450000
-400000
-350000
-300000
-250000
-200000
-150000
-100000
-50000
0
21
4
25
15
20
18
3
24
9
33
32
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(3/3)
UPGA
Single neighbor
Proposed algorithm
Set (3/3)
26
Sum of pairs score with only matches (1/2)
0
50000
100000
150000
200000
250000
32
24
3
20
15
22
33
18
9
30
31
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(1/3)
UPGA
Single neighbor
Proposed algorithm
Set (1/3)
100000
200000
300000
400000
500000
600000
700000
800000
900000
25
4
21
6
29
7
13
16
12
10
23
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(2/3)
UPGA
Single neighbor
Proposed algorithm
Set (2/3)
27
Sum of pairs score with only matches (2/2)
0
2x106
4x106
6x10
6
8x106
1x10
7
1.2x107
1.4x10
7
1.6x107
1.8x107
5
17
28
27
1
26
19
11
8
14
2
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(3/3)
UPGA
Single neighbor
Proposed algorithm
Set (3/3)
28
Result overview
∙ Better suited for larger sequences
∙ All algorithms operate under the progressive alignment scheme
∙ They, too, use the full distance matrix
∙ All use the same method for pairwise alignment
✓ The proposed algorithm outperforms the other two
∙ Especially when larger sets of sequences are used
∙ No prior knowledge on the relation among the sequences was
assumed
29
Concluding
Conclusion
 Biology and bioinformatics are related to demanding problems
 MSA is an indicative example
 Many approaches and scoring methods
 We followed the progressive approach
 A quantum-inspired optimization heuristic
 Evaluated on actual sets of sequences from NCBI
 We provide alignments with good scores, especially when the
sets of sequences are large
 Useful for constructing an initial alignment
 If prior knowledge is available?
 Decrease cost upon the distance matrix
31
Key References I
L. Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: |
Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006,
p. 360.
J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman
problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012).
D-Wave Initiates Open Quantum Software Environment. D-Wave Systems.
L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons:
Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532.
Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding
for multiple sequence alignment”. In: Journal of bioinformatics and computational biology
8.01 (2010), pp. 59–75.
C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for
evolutionary tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627.
M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information.
Cambridge University Press, 2010.
32
Key References II
Christos Papalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search)
Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38.
A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by
quantum annealing”. In: Scientific reports 2 (2012), p. 571.
A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018).
33
  
Thank you for your attention!
34
Any Questions?
35

A quantum-inspired optimization heuristic for the multiple sequence alignment problem in bio-computing

  • 1.
    A quantum-inspired optimizationheuristic for the multiple sequence alignment problem in bio-computing 10th International Conference on Information, Intelligence, Systems and Ap- plications (IISA 2019) Patras, Greece Konstantinos Giannakis, Christos Papalitsas, Georgia Theocharopoulou, Sofia Fanarioti, and Theodore Andronikos July 15-17, 2019 Dept. of Informatics, Ionian University, Greece kgiann@ionio.gr
  • 2.
    Acknowledgment This research isfunded in the context of the project “Investigating alternative computational methods and their use in computational problems related to optimization and game theory,” (MIS 5007500) under the call for proposals “Supporting researchers with an emphasis on young researchers” (EDULL34). The project is co-financed by Greece and the European Union (European Social Fund - ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014-2020. 1
  • 3.
    Project info ∙ ∙ Naturaland biomolecular computing ∙ Modeling with Membrane automata ∙ Unconventional Computational Methods ∙ Quantum Optimization Algorithms and Strategies →http://di.ionio.gr/alcomopt/← 2
  • 4.
    An overview ◎ BiologicalSequences ◎ Alignment ∙ Single and multiple alignment (MSA) ◎ NP-hard for most cases ◎ Bio-inspired techniques ◎ Translate the MSA problem as a TSP instance ∙ Proposal of a quantum-inspired method ∙ To progressively solve this problem ∙ Evaluated through simulations against other aligning methods ∙ Provide a good initial alignment, especially for large sets of sequences 3
  • 5.
  • 6.
    Introduction ∙ Biological sequencesplay an important role in various fields ✓ The multiple sequence alignment or MSA considers the problem of aligning a set of sequences ∙ They usually represent proteins or nucleotides ∙ Pairwise alignment → 2 sequences ∙ Related to non-biological fields, e.g., in information-related fields, natural language processing, finances, etc.  NP-complete with SP-score and NP-hard for most metrics ∙ Global and local alignments ∙ Usually dynamic programming approaches - Heuristic and/or probabilistic methods for larger instances 5
  • 7.
    Motivaton ∙ Arranging sequencesof DNA, RNA, or proteins in order to study properties and relationships among themselves ∙ Useful in many fields → it can reveal interesting properties about the sequences (e.g., phylogenetic trees) ∙ Sequences are usually large and multitudinous ∙ Need for good alignments within acceptable time frames, at least for an initial solution 6
  • 8.
    Pairwise and multiplealignment ∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC: A C T - T A A - - C A G T A - C ∙ If you add an extra sequence S3 = CGGACA: A C T - T A A - - - C A G T A - C - - C G G - A - C A 7
  • 9.
    Solutions and methods ∙Progressive, iterative, motif-based, and hybrid methods ✓ Progressive: the most similar sequences are aligned first and then the rest of them are aligned in a decreasing order ∙ Various scoring functions (e.g., sum-of-pairs metric, the weighted sum-of-pairs, etc.) ∙ Quantum- and bio-inspired approaches1,2,3 ∙ Trying to take advantage of the superiority of quantum computing 1 A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by quantum annealing”. In: Scientific reports 2 (2012), p. 571. 2 D-Wave Initiates Open Quantum Software Environment. D-Wave Systems. 3 L. Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: | Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006, p. 360. 8
  • 10.
    Other approaches ∙ Needleman–Wunschalgorithm for global pairwise alignment (dynamic programming; O(n2 ) complexity) ∙ Smith–Waterman algorithm for local pairwise alignment ∙ Platforms: MUSCLE, Multalin, CLUSTALs (Omega) ∙ Multalin: a progressive approach based on the UPGMA, ∙ CLUSTAL W: a progressive approach based upon the neighbor-joining (NJ) algorithm ∙ Both UPGMA and (NJ) require the construction of a complete distance matrix of the sequences 9
  • 11.
    Related literature andquantum computation ∙ Quantum computing4 ∙ D-WAVE ∙ Neven’s law ∙ Quantum algorithms for pattern-matching5 and sequence comparison problems6 ∙ A quantum-inspired genetic algorithm for MSA7 ∙ Combining TSP with MSA8,9 4 M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2010. 5 A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018). 6 L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons: Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532. 7 Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding for multiple sequence alignment”. In: Journal of bioinformatics and computational biology 8.01 (2010), pp. 59–75. 8 J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012). 9 C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for evolutionary tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627. 10
  • 12.
  • 13.
    Notation ∙ Σ ={s1, s2, .., sm} a finite alphabet; a sequence is a string over Σ ∙ Two sequences S′ 1 and S′ 2 are alignments of two sequences S1 and S2 if S′ i can be obtained from Si by removing gaps ∙ This holds for more sequences, too ∙ m: the max length of sequences S′ 1 and S′ 2 ∙ The total cost for this alignment can be expressed as m∑ i=1 d(S′ 1(i), S′ 2(i)) ∙ d is the chosen scoring function ∙ S′ j (i) is the ith character of sequence S′ j 12
  • 14.
    Scoring functions ∙ Astandard score scheme: d(S′ 1(i), S′ 2(i)) = { 1, if match 0, if mismatch ∙ An optimal alignment is the one with the maximum score ∙ The matrix of size n × m represents an alignment of the n sequences A C T - T A A - - - C A G T A - C - - C G G - A - C A ∙ Several scoring schemes ∙ The most widely used → sum of pairs 13
  • 15.
    Sum of pairs ∙ (mn 2 ) totalpairs n−1∑ i=1 n∑ j=i+1 d(Si, Sj) ∙ Can “punish” mismatches, e.g., by giving -1 to any mismatch ∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC: A C T - T A A - - C A G T A - C ∙ The score for this alignment is 3 A C T - T A A - - - C A G T A - C - - C G G - A - C A ∙ The score for the above alignment is 9 14
  • 16.
    The idea ∙ Similarityamong sequences is modeled as a TSP instance ∙ The similarity matrix is used as the neighboring matrix ∙ Proper normalization in order to have a minimization problem ∙ The underlying structure is a complete, undirected graph G = (V, A) ∙ Each edge is associated with a weight wij 15
  • 17.
  • 18.
    Our approach ∙ Weapply the qGVNS quantum-inspired metaheuristic10 ∙ It consists of a VND local search, a diversification procedure, and a neighborhood change step ∙ VNS and GVNS have bio-inspired origins ∙ Novelty: In qGVNS, a modified diversification phase successfully resolves local minima traps by exploiting quantum computation techniques ∙ A simulated quantum register generates a complex n-dimensional unit vector during each shaking call ∙ The complex n-dimensional vector is fed as input and outputs a real n-dimensional vector ∙ The i-th component of the real vector is equal to the modulus squared of the i-th component of the complex vector 10 Christos Papalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search) Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38. 17
  • 19.
    The qGVNS routine Data:an initial solution Result: an optimized solution 1 Initialization of the feasibility distance matrix 2 begin 3 X ← Nearest Neighbor heuristic; 4 repeat 5 X′ ← Quantum-Perturbation(X) 6 X′′ ← pipeVND(X′ ) 7 if X′′ is better than X′ then 8 X ← X′′ 9 end 10 until optimal solution is found or time limit is met; 11 end 18
  • 20.
    The proposed method Data:a set of sequences Result: a set of aligned sequences 1 Disti,j: The pairwise distance matrix of the sequences 2 begin 3 i ← the i-th sequence 4 j ← the j-th sequence 5 repeat 6 Disti,j ← Calculate pairwise distance of i-th and j-th sequences 7 until all pairs are examined; 8 end 9 Sol ← Approximation of the shortest Hamiltonian of Dist ▷ Derived using qGVNS of Algorithm 1 10 Score_matrix ▷ Choose a score matrix of your choice 11 Gap_penalty ▷ Choose the value of gap penalty 12 Align sequences following the Sol matrix as guide 19
  • 21.
    In a nutshell ∙Initial calculation of the distance matrix among the sequences ∙ This is a trivial part for most MSA algorithms ∙ The choices for a score matrix and a gap penalty do not affect the main algorithm ✓ They can be separately chosen ∙ It enhances its applicability 20
  • 22.
    Analysis ∙ Need forthe calculation of pairwise distances of the sequences ∙ Exhaustive comparisons ∙ Calculation of the complete distance matrix ∙ For n sequences, this calculation requires O(m(n2 − n)) comparisons (m is the maximum length) ∙ The calculation of the shortest Hamiltonian path is a known NP-hard problem  It seems like the proposed algorithm simply hides the “hard” computational part ✓ This is not true, though, since the GVNS-based heuristic only approximates the solution in a specific time frame ∙ After the qGVNS, the actual alignment is straightforward through a simple (binary) guide tree from the above resulting ordering 21
  • 23.
    Guide tree Seq. 1 Seq.2 Seq. 3 Seq. 4 Seq. 5 Seq. 6 Seq. 7 Seq. 8 ............. A depiction of the tree produced after calculating the shortest path of the distance matrix. 22
  • 24.
  • 25.
    Evaluation ∙ Benchmarked on33 real sequences retrieved from NCBI ∙ Samples from different kinds of organisms were chosen (archaea, invertebrates, plants, and plasmids) ∙ MatLab ∙ For the first set, each match is scored with 1, whereas any mismatch is scored with -1 ∙ For the second set, no negative score for mismatches ∙ Compared against UPGMA and single neighbor algorithm 24
  • 26.
    Sum of pairsscore with negative mismatches (1/2) -1.6x108 -1.4x10 8 -1.2x108 -1x10 8 -8x107 -6x107 -4x107 -2x107 0 1 11 17 5 2 14 10 8 13 26 19 SumofPairs # of Benchmark Alignment score of actual sequences (negative mismatches)(1/3) UPGA Single neighbor Proposed algorithm Set (1/3) -9x106 -8x106 -7x106 -6x106 -5x106 -4x10 6 -3x106 -2x10 6 -1x106 0 29 27 7 28 31 30 23 16 12 6 22 SumofPairs # of Benchmark Alignment score of actual sequences (negative mismatches)(2/3) UPGA Single neighbor Proposed algorithm Set (2/3) 25
  • 27.
    Sum of pairsscore with negative mismatches (2/2) -500000 -450000 -400000 -350000 -300000 -250000 -200000 -150000 -100000 -50000 0 21 4 25 15 20 18 3 24 9 33 32 SumofPairs # of Benchmark Alignment score of actual sequences (negative mismatches)(3/3) UPGA Single neighbor Proposed algorithm Set (3/3) 26
  • 28.
    Sum of pairsscore with only matches (1/2) 0 50000 100000 150000 200000 250000 32 24 3 20 15 22 33 18 9 30 31 SumofPairs # of Benchmark Alignment score of actual sequences (only matches)(1/3) UPGA Single neighbor Proposed algorithm Set (1/3) 100000 200000 300000 400000 500000 600000 700000 800000 900000 25 4 21 6 29 7 13 16 12 10 23 SumofPairs # of Benchmark Alignment score of actual sequences (only matches)(2/3) UPGA Single neighbor Proposed algorithm Set (2/3) 27
  • 29.
    Sum of pairsscore with only matches (2/2) 0 2x106 4x106 6x10 6 8x106 1x10 7 1.2x107 1.4x10 7 1.6x107 1.8x107 5 17 28 27 1 26 19 11 8 14 2 SumofPairs # of Benchmark Alignment score of actual sequences (only matches)(3/3) UPGA Single neighbor Proposed algorithm Set (3/3) 28
  • 30.
    Result overview ∙ Bettersuited for larger sequences ∙ All algorithms operate under the progressive alignment scheme ∙ They, too, use the full distance matrix ∙ All use the same method for pairwise alignment ✓ The proposed algorithm outperforms the other two ∙ Especially when larger sets of sequences are used ∙ No prior knowledge on the relation among the sequences was assumed 29
  • 31.
  • 32.
    Conclusion  Biology andbioinformatics are related to demanding problems  MSA is an indicative example  Many approaches and scoring methods  We followed the progressive approach  A quantum-inspired optimization heuristic  Evaluated on actual sets of sequences from NCBI  We provide alignments with good scores, especially when the sets of sequences are large  Useful for constructing an initial alignment  If prior knowledge is available?  Decrease cost upon the distance matrix 31
  • 33.
    Key References I L.Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: | Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006, p. 360. J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012). D-Wave Initiates Open Quantum Software Environment. D-Wave Systems. L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons: Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532. Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding for multiple sequence alignment”. In: Journal of bioinformatics and computational biology 8.01 (2010), pp. 59–75. C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for evolutionary tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627. M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2010. 32
  • 34.
    Key References II ChristosPapalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search) Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38. A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by quantum annealing”. In: Scientific reports 2 (2012), p. 571. A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018). 33
  • 35.
       Thankyou for your attention! 34
  • 36.