Successfully reported this slideshow.
Upcoming SlideShare
×
1 of 55

# CARI2020: A CGM-Based Parallel Algorithm Using the Four-Russians Speedup for the 1-D Sequence Alignment Problem

0

Share

Jerry Lacmou Zeutouo, University of Dschang, Cameroon

See all

See all

### CARI2020: A CGM-Based Parallel Algorithm Using the Four-Russians Speedup for the 1-D Sequence Alignment Problem

1. 1. A CGM-Based Parallel Algorithm Using the Four-Russians Speedup for the 1-D Sequence Alignment Problem By Jerry Lacmou Zeutouo, Grace Colette Tessa Masse, Franklin Ingrid Kamga Youmbi at African Conference on Reseach in Computer Science and Applied Mathematics CARI’2020 Polytech School of Thiès, Senegal October.
2. 2. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Opening thoughts The processing of sequences such as DNA sequencing is a variety of problems that generate massive calculations and whose results are diﬃcult to achieve on simple PCs. One way to improve the speed of solution is parallelism. Lacmou et al. 2 / 55
3. 3. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Outline 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 3 / 55
4. 4. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 4 / 55
5. 5. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus What bioinformatics is Bioinformatics is the analysis of biological information. Lacmou et al. 5 / 55
6. 6. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Biological information Deﬁnition Biological information can be: a sequence (DNA, RNA, proteins); a structure (primaire, secondary); a function (molecular function e.g. enzyme; cell component e.g. membrane protein; biological process e.g. oxygen trans- port); interactions (proteins, metabolic pathways, gene networks). Lacmou et al. 6 / 55
7. 7. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus The privileged axes of bioinformatics 1 Formalization of genetic information analysis of sequences (biomolecules) and their structure (especially 3D structure); 2 Biological interpretation of genetic information data integra- tion (establishment of maps and networks of gene interac- tions, protein interactions, etc.); 3 Functional prediction. Lacmou et al. 7 / 55
8. 8. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Bioinformatics methods Comparative method: comparison of unknown sequences or structures with databases (sequences and structures) of genes and proteins known to establish similarities (similarities, ho- mologies or identities); Statistical method: software programs apply statistical anal- yses to the data (on sequence syntax) to try to identify and identify rules and constraints that are systematic, regular or general in nature; Modelling probabilistic approach. Lacmou et al. 8 / 55
9. 9. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Sequence alignment One of the fundamental problems of bioinformatics is the sequence alignment, crucial for molecular prediction, molecular interactions and phylogenetic analysis. Lacmou et al. 9 / 55
10. 10. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Sequence alignment deﬁnition Deﬁnition Sequence alignment is the problem of comparing biological sequences by looking for a series of nucleotides or amino acids that appear in the same order in the input sequences, possibly introducing gaps. It is a means of visualizing the similarity between sequences based on the notions of similarity or distance. Figure: Sequences alignementLacmou et al. 10 / 55
11. 11. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Types of alignments Two types of alignments are considered: 1 global alignments, which take into account the whole of two sequences to be compared; 2 local alignments, which make it possible to detect the seg- ment of the ﬁrst sequence which is most similar to a segment of the other. Lacmou et al. 11 / 55
12. 12. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Dynamic programming complexity Dynamic programming complexity Due to dynamic programming, both types of alignments can be run in O(n2) time and space where n is the length of two strings. Based on the Four-Russians method, several sped-up sequential solutions has been proposed like the one of Brubach and Ghurye’s algorithm running in O( n2 log n). Lacmou et al. 12 / 55
13. 13. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Why parallelize? Despite the acceleration of the sequential algorithm, the resolution of the problem remains time and space consuming. Lacmou et al. 13 / 55
14. 14. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus The parallelization of the sequential algorithm A PRAM algorithm runs in O(log n log m) time with nm log nm CREW processors have been proposed by Apostolico et al., 1990; Alves et al., 2002 proposed a parallel solution for a variant of the problem under the CGM model using weighted graphs that require log p rounds and runs in O(n2 p log m); Recently, Kim et al., 2016 have proposed a space-eﬃcient al- phabetindependent Four-Russians’ lookup table and a mul- tithreaded Four-Russians’ edit distance algorithm. Lacmou et al. 14 / 55
15. 15. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Our aim We tackle the problem of parallelizing the Brubach and Ghurye’s sequential algorithm for the one-dimensional sequence alignment problem on the CGM model (Coarse-Grained Multicomputer). The choice of the model CGM is due to the fact that it is more suitable to the current machines and it has already been used to solve eﬃciently several problems such as the optimal binary search tree problem Lacmou et al. 15 / 55
16. 16. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus What is a CGM algorithm? Its design is based on: Partitioning of the problem into subproblems; Distribution of subproblems between the diﬀerent processors; CGM algorithm itself: it is a set of rounds of calculations and communicatons. Lacmou et al. 16 / 55
17. 17. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Parallelisation criterion on CGM algorithm Minimize the idleness time and the latency time of proces- sors; Minimize the overall communication time; Minimize the number of communications rounds; Minimize the local computation time of each processor and; Balance the load between the processors. Lacmou et al. 17 / 55
18. 18. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Context Problematic Our main focus Our results Our solution requires O( n2 p log n ) execution time with O(p) communication rounds on p processors. Lacmou et al. 18 / 55
19. 19. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 19 / 55
20. 20. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Formal deﬁnition Deﬁnition The sequence alignment method seeks to optimize the alignment score. This score is related to the similarity rate between the two compared sequences. It measures the number of edit operations it takes to convert a sequence X into another Y . Lacmou et al. 20 / 55
21. 21. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Elementary editing operation They include: 1 insertions; 2 deletions; 3 substitutions of a single character. The weighted variant assigns a cost to each of the mentioned operations and through the use of a penalty matrix in case, costs appear not to be constant. Lacmou et al. 21 / 55
22. 22. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Global alignment overview A global alignment of two X and Y sequences can be given by computing the distance between X and Y ; The elementary editing operations are described as follows: substitution of an a symbol by an b symbol; deletion of an a symbol; insertion of an b symbol. There is also the ability to calculate global alignments using similarity scores instead of distance. Lacmou et al. 22 / 55
23. 23. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Global alignment score computation T[i, j] = max    T[i − 1, j − 1] + Sub(X[i], Y [j]), T[i − 1, j] + Del(X[i]), T[i, j − 1] + Ins(Y [j]). (1) Lacmou et al. 23 / 55
24. 24. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Local alignment overview A local alignment of two sequences X and Y consists in ﬁnd- ing the X segment that is most similar to a Y segment ; The local edit score of two sequences X and Y is deﬁned by s(X, Y ), the maximum similarity between an X segment and a Y segment. Lacmou et al. 24 / 55
25. 25. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Local alignment score computation Ts[i, j] = max    Ts[i − 1, j − 1] + Sub(X[i], Y [j]), Ts[i − 1, j] + Del(X[i]), Ts[i, j − 1] + Ins(Y [j]), 0. (2) Lacmou et al. 25 / 55
26. 26. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Task graph Lacmou et al. 26 / 55
27. 27. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Speedup techniques Dynamic programming algorithms can sometimes be made even faster by applying speedups such as the Knuth-Yao quadrangle-inequality speedup or the Four-Russians speedup. Lacmou et al. 27 / 55
28. 28. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Idea behind the Four-Russians’ speedup The idea behind the speedup is to tile the dynamic programming table into smaller blocks whose solutions are foreseen, precomputed and stored in a lookup table. The goal sought is spending less time on those by merely searching for them. Lacmou et al. 28 / 55
29. 29. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table A single t-block Lacmou et al. 29 / 55
30. 30. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table This quadratic time bound has been ﬁrst proposed using the Four-Russians technique by Masek and Paterson in 1980 achieving O(n2/ log n); Crochemore et al., 2003 later achieved the same time bound for unrestricted scoring matrices. Lacmou et al. 30 / 55
31. 31. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table Brubach and Ghurye’s eﬃcient lookup table The bottleneck in the aforementioned speedup lies in the space demanded by the table, and the time spent computing it. Lacmou et al. 31 / 55
32. 32. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table The Four-Russians lookup table can be built in O(t2 lg t), queried in O(t) and stored in O(t2) space. Lacmou et al. 32 / 55
33. 33. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Global alignment Local alignment The Four-Russians’ speedup Brubach and Ghurye’s eﬃcient lookup table The t-blocks within the dynamic programming table are tiled in a way they overlap by one column/row on each side; Given a t-block ﬁrst row and column, the block function of the lookup process has to provide its last row and column; Running the block function time and again in a row-wise (or column-wise) manner will eventually yield the global edit distance score in the bottom-right cell. Lacmou et al. 33 / 55
34. 34. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 34 / 55
35. 35. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Our contribution Our solution is divided into two parts: 1 Firstly, each processor computes the lookup table through the Brubach and Ghurye’s sequential algorithm in O(t2 lg t); 2 Secondly, we partition the task graph into subgraphs of the same size, and we distribute them fairly onto the processors. Lacmou et al. 35 / 55
36. 36. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Partitioning strategy Our technique consists in partitioning the task graph in two steps: Lacmou et al. 36 / 55
37. 37. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Partitioning steps 1 Firstly, we partition the task graph into small blocks of size t referred to as t-blocks such that any two adjacent t-blocks overlap by either a row or column. After this ﬁrst partition- ing, the task graph is divided into k lines and k columns where k = n t−1; 2 Next, we subdivide the n2 t2 t-blocks into p lines and p columns of blocks referred to as macro-blocks and denoted by MB(i, j). MB(i, j) is a matrix of size k p × k p and is identiﬁed by the node on the lower right corner. Lacmou et al. 37 / 55
38. 38. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Scenario of this partitioning 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 Lacmou et al. 38 / 55
39. 39. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Snake-like mapping This distribution consists in assigning all macro-blocks of a diagonal from the top left corner to the bottom right corner; This process is renewed until all processors have been used, starting with processor 0 and traveling through the blocks with a "snake-like" path. Lacmou et al. 39 / 55
40. 40. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Illustration Lacmou et al. 40 / 55
41. 41. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Lemma After the partitioning of the task graph and snake-like distribution scheme, each processor evaluates exactly p macro-blocks. Proof. There are p2 macro-blocks after the partitioning of the task graph. Therefore, it is clear that each processor evaluates exactly p blocks. Lacmou et al. 41 / 55
42. 42. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Our CGM algorithm We present our CGM solution for the sequence alignment problem. Brubach and Ghurye’s sequential algorithm is used for local computations. Based on the previous partitioning strategy and the mapping of blocks onto processors, the corresponding CGM algorithm is presented in algorithm. Lacmou et al. 42 / 55
43. 43. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Our CGM algorithm Lacmou et al. 43 / 55
44. 44. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Local computation time Lemma The evaluation of a macro-block of size n p(t−1) × n p(t−1) requires O n2 p2t computation time. Lemma Our CGM algorithm requires O n2 pt time steps per processor. Lacmou et al. 44 / 55
45. 45. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Partitioning strategy Mapping macro-blocks onto processors Overview of the CGM algorithm Algorithm complexity Theorem By subdividing the task graph into macro-blocks of size n p(t−1) × n p(t−1) and using the snake-like distribution scheme, our CGM algorithm requires O n2 p log n execution time with O(p) communication rounds when t = log n. Lacmou et al. 45 / 55
46. 46. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 46 / 55
47. 47. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Environment We measure the performance of our solution on the Matrics plat- form backed by the University of Picardie Jules Verne. Intel Xeon (R) CPU E5-2680 v4 @ 2.40GHz; 28 cores; 128GB of RAM; The inter-processor communication is implemented with the MPI library (OpenMPI version 1.10.4). Lacmou et al. 47 / 55
48. 48. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Samples We use a real biological DNA sequences with |Σ| = 4. The results presented here are derived from its execution for diﬀerent values of the triplet (m, n, p), where m and n are the size of sequences, with values ranging from 105 to 106, and p is the number of processors, with values in the set {1, 2, 4, 8, 16, 32, 48}. Lacmou et al. 48 / 55
49. 49. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Execution time for m Lacmou et al. 49 / 55
50. 50. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Execution time for p Lacmou et al. 50 / 55
51. 51. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Result comments For m = 106: Our algorithm runs about 11.43 times faster than the Brubach and Ghurye’s sequential algorithm on 16 processors; The speed-up increases up to 20.40 on 32 processors; And 30.05 on 48 processors. From all this, we can conclude that our algorithm is scalable to the increase of the size of sequences and the numbers of processors. Lacmou et al. 51 / 55
52. 52. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion 1 Introduction 2 Sequential algorithm 3 CGM algorithm 4 Experimental results 5 Conclusion Lacmou et al. 52 / 55
53. 53. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Wrapping things up We attempted in this paper to parallelize their results in ap- plication to the one-dimensional sequence alignment prob- lem; Based on the CGM model, our solution clusters t-blocks into macro-blocks and follows the snake-like distribution mapping pattern onto p processors to achieve for O(p) rounds of com- munication, an execution time in O(n2/p log n); Experimental results show good agreement with theoretical forecasts. Lacmou et al. 53 / 55
54. 54. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Future directions Noticing the fact that single processors uselessly compute large sets of entries in their respective lookup tables, one might be interested in cutting oﬀ the waste. Or even extend our solution to address the two-dimensional sequence alignment problem. Lacmou et al. 54 / 55
55. 55. Introduction Sequential algorithm CGM algorithm Experimental results Conclusion Thanks for your keen attention :-) Lacmou et al. 55 / 55