SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015
DOI : 10.5121/ijcseit.2015.5401 1
A COMPARATIVE ANALYSIS OF PROGRESSIVE
MULTIPLE SEQUENCE ALIGNMENT APPROACHES
USING UPGMA AND NEIGHBOR JOIN BASED GUIDE
TREES
Dega Ravi Kumar Yadav1
and Gunes Ercal2
Department of Computer Science, SIUE, Edwardsville, Illinois, USA
ABSTRACT
Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging
from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence
alignment, an important way of which is the progressive alignment approach studied in this work.
Progressive alignment involves three steps: find the distance between each pair of sequences; construct a
guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept
of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms
of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our
experimental results indicate that the Neighbor Join method is both more efficient in terms of performance
and more accurate in terms of overall cost minimization.
KEYWORDS
Progressive MSA, Guide Tree, Profiles, Pair-wise Distance & Dynamic Programming.
1. INTRODUCTION
The traditional pairwise sequence alignment problem in its utmost generality is to find an
arrangement of two given strings, S and T, such that the arrangement yields information on the
relationship between S and T, such as the minimum number of changes to S that would transform
S into T. In the context of DNA sequences, which can be viewed as strings from the 4 letter
alphabet {A, C, G, T}, these changes may represent mutation events, so that the alignment sought
yields important evolutionary information [15]. Similarly, the pairwise sequence alignment
problem can be generalized to the multiple sequence alignment problem to yield information on
the relatedness of multiple sequences. Applications of the multiple sequence alignment (MSA)
problem for DNA sequences include phylogenetic analysis, domain identification, discovery of
DNA regulatory elements, and pattern identification. Additionally, MSA applications for protein
sequences also includes protein family identification and structure prediction. This work is
concerned with approaches to multiple sequence alignment in the context of DNA sequences.
Generally, aligning two sequences is straightforward via dynamic programming. But pairwise
alignment is insufficient for many applications in which the relationship among several sequences
is sought. Moreover, it is infeasible to naturally extend the dynamic programming approach that
works for pairwise sequence alignment directly to multiple sequence alignment when there are
more than three sequences to align. Unfortunately, multiple sequences alignment is NP-hard
based on SP (sum-of-pairs) scores [1]. Therefore, heuristics are crucial to MSA.
International Journal of Computer Science,
Progressive alignments are by far
method [2, 3]. Progressive alignment
calculations for all the input DNA
computed in the previous step. 3.
nearest sequences first. In this
approaches to MSA in terms of
progressive MSA with UPGMA
based guide trees. Our results indicate
preferable in terms of both efficiency
2. BACKGROUND AND RELATED
The main steps of the progressive
1. Compute pairwise distances
2. Build the guide tree based
3. Align first two sequences
(Global Alignment).
4. From the next sequence
profile.
5. Repeat step 4 until the longest
6.
Figure 1:
2.1. Pairwise Distance
The distance between two DNA
distance matrix is computed among
number of evolutionary models
Jukes-Cantor model [8]. The Jukes
matches to the number of non-gaps
distance formula is:
Here `D' is the Jukes-Cantor distance
of matches to the number of non
specified species, a guide tree based
2.2. Guide Tree
In progressive alignment, the guide
determine which sequence is to be
a phylogenetic tree that is constructed
A phylogenetic tree is an evolutionary
species. Phylogenetic trees are
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
far the most widely used heuristic multiple sequence
alignment is done in three major steps; 1. Perform pairwise
DNA sequences. 2. Build the guide tree using the distance
3. Based on the guide tree, perform progressive alignment
this project, we compare two important progressive
of algorithmic efficiency as well as alignment accuracy,
UPGMA based guide trees and progressive MSA with Neighbor
indicate that the Neighbor Join method of guide tree construction
efficiency and accuracy of the overall resulting MSA.
RELATED WORK
progressive alignment methodology are as follows [6, 13]:
distances for all the sequences.
based on the distance matrix.
sequences based on the guide tree leaf nodes using dynamic
sequence alignment, construct a profile and align the new sequence
longest branch leaf node is aligned. Hence, MSA is achieved.
1: Progressive Multiple sequence Alignment
DNA sequences is known as the pairwise distance. In this
among the species using Jukes Cantor distance formula.
proposed to measure pairwise distance, the first of
Jukes Cantor distance formula is based on the ratio
gaps in the DNA of two sequences of the species.
D = (-3/4 * log (1-(4/3*R)))
distance between two DNA sequences and `R' is the ratio
non-gap letters [8]. Once we obtain a distance matrix
based on distance matrix is built.
guide tree plays an important role as the branches
be considered for the next step of alignment [10]. A
constructed dependent on the distance matrix of the DNA
evolutionary tree which shows interrelations among various
dependent on physical or genetic characteristic similarities
5,No.3/4, August 2015
2
sequence alignment
pairwise distance
distance matrix
alignment with
progressive alignment
accuracy, namely
Neighbor Join
construction is
programming
sequence to the
achieved.
this project, a
formula. There are a
of which is the
of number of
Jukes cantor
ratio of number
matrix for the
branches of the tree
A guide tree is
DNA sequences.
various biological
similarities and
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015
3
differences. They show the distance between pairs of sequences when the tree edges are
weighted [10]. There are several types of phylogenetic trees; rooted, unrooted, and bifurcating. In
this project we will be using an unrooted phylogenetic tree as our guide tree.Guide trees may be
built using clustering algorithms or other learning models. In this project, the guide tree is
constructed using both UPGMA and Neighbor-Joining algorithms and the final results are
compared. Now we will have a closer look at these two algorithms.
2.2.1. UPGMA
UPGMA stands for Un-weighted Pair-Group Method with Arithmetic mean. Un-weighted refers
to all pairwise distances contributing equally, pair-group refers to groups being combined in pairs,
and arithmetic mean refers to pairwise distances between groups being mean distances between
all members of the two groups considered [7].
Consider four DNA sequences namely: S1, S2, S3 and S4. First, find the pairwise distances
between all the sequences. Then, find the smallest value in the distance matrix and its
corresponding sequences of the shorter distance. For instance let the two sequences with the
shortest distance between them be S1 and S2. Now, cluster S1 and S2 and name the cluster as C1,
updating the distance matrix by eliminating S1 and S2, but including C1. The C1 value
corresponding to the remaining sequences in the distance matrix is calculated with the values of
S1 and S2, i.e., arithmetic mean of S1 and S2 distances with corresponding to the other sequences.
Now, moving forward by considering the updated distance matrix, find the smallest distance
again and its corresponding sequences or clusters (namely sets of sequences). Say this next
smallest distance corresponds to that between clusters C3and C4. Then, in the next step, C3 and C4
would be merged into a new cluster C5, and all the distances to C5 would be updated in the
distance matrix by the corresponding average distances to the sequences in C5. Repeat the same
and find new clusters and sequences, merging and updating the distance matrix, until we are left
with one cluster.
Algorithm: Let the clusters be C1, C2, C3,....., Cn and si be the size of each cluster Ci, ‘d' be the
pairwise distance defined on the clusters. Clustering is done in the following manner:
1. Find the smallest pairwise distance amongst the clusters, d (Ci, Cj).
2. A new cluster Ck with size si+ sj is formed by joining Ci and Cj.
3. Compute the new distances from all other clusters to Ck by using the existing weighted
distances average.
Where l € {1, 2... n} and l ≠ i, j.
4. Repeat 1, 2, and 3 until only one cluster remains.
The asymptotic time complexity of UPGMA is O (n2
), since there are (n-1) iterations, with O (n)
steps per iteration [11].
2.2.2. Neighbor-Joining
The Neighbor-Joining algorithm is consistent with a parsimonious evolutionary model in which
the total sum of branch lengths is minimized [9]. It has the added benefit of achieving the correct
tree topology when given the correct pairwise distances, while also being flexible enough to
accommodate many distance models. Now let us look at the actual algorithm.
International Journal of Computer Science,
Algorithm: Let the clusters be C
pairwise distance defined on the
1. Compute
2. Find the smallest difference
3. A new cluster Ck is formed
between Ci, Cjand Ck as:
4. Compute distances between
Where l € {1, 2...
5. Repeat 1, 2 and 3 until you
Similarly to UPGMA, the asymptotic
O (n2
). However, as we shall see,
2.3. Dynamic Programming
Needleman and Wunsch’s elegant
well for aligning nucleic acid sequences.
algorithms for finding optimal
programming a global alignment
matrix. Paths in the scoring matrix
scoring matrix is dependent on
Match indicates that the two letters
different, and gap (Insertion or Deletion)
Let us consider an example with
score=3, mis-match score= 1, and
optimal solution of aligned sequences.
scoring with match, mis-match, and
Figure 2: Dynamic
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
C1, C2, C3,....., Cn and si be the size of each cluster
clusters. Clustering is done in the following manner:
difference d (Ci, Cj) – uj – uj and choose i and j accordingly.
formed by joining Ci and Cj. Calculate the intermediate
as:
between new cluster and all the other cluster's:
2... n} and l ≠ i, j.
you are left with two clusters.
asymptotic time complexity of the Neighbor-Joining algorithm
see, their execution time performance is certainly not identical.
elegant algorithm for comparing two protein sequences
sequences. The algorithm actually belongs to a very
optimal solutions called dynamic programming [1].
alignment of two sequences is done based on constructing
matrix decide the optimal solution for two aligned sequences.
three variables; match score, mis-match score, and
letters are the same, mismatch denotes that the two
Deletion) denotes one letter aligning to a gap in the other
with two strings: ATGCG and TGCAT. Consider the scores
and gap penalty= -1. There is no rule that we should have
sequences. Figure 2 shows the complete scoring matrix obtained
and gap scores.
Dynamic Programming pairwise sequence alignment.
5,No.3/4, August 2015
4
Ci, ‘d' be the
manner:
accordingly.
intermediate distances
algorithm is also
identical.
sequences also works
large class of
In dynamic
constructing a scoring
sequences. The
and gap penalty.
two letters are
other string.
scores as match
have only one
obtained after
International Journal of Computer Science,
The best optimal alignment that
2.4. Profiles
A profile is a table that records the
Usually profiles are represented
build a profile we need at least
sequences for two or more sequences
to compute multiple alignment heuristically
used to compute multiple alignments.
detail.
Consider four DNA sequences that
The profile for the above four sequences
Construct a profile sequence: Depending
that position [14] is determined.
chance to get `T'. As the probability
profiled sequence. Moving on the
positions. Then the final profiled
complications in deciding the character,
occurrence? For those situations,
the character at the same position
profile. Then, the rules are as follows:
1. If the character exists at
matched, consider the matched
of the identically highest
Example: Consider that
profile and we have a `
among `A', `C’, `T' and
position 4 in the profiled
it will be selected and copied
2. If the character does not
probable character.
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
that is achieved from above figure are:
A T G C G _
_ T G C A T
the frequency of each letter at each position in a DNA
using a matrix with letters as columns and position
least two DNA sequences. Profiles allows us to identify
sequences that are already aligned. Progressive alignment
heuristically [14]. In this project, the concept of aligned
alignments. Below is an example that explains the concept
that are in a MSA: AGT_C, AGTGC, ATTG_ and TG_GT.
sequences looks like;
Figure 3: Profile Scoring Matrix
Depending on the highest score for each position, the
Suppose we have, at position 0 a 0.75 chance to get
probability of `A' is greater than `T', `A' is chosen at position
the same way `G', `T', `G’, and `C’ are chosen for
profiled sequence achieved is “AGTGC”. This example
character, but what if two alphabets have same probability
situations, we state and implement two simple rules in this work.
position in the sequence or profile that you want to align to
follows:
at that position, compare it with the highest probable
matched character for the profiled sequence; otherwise,
highest probable characters with a random draw.
`A', `C’, `G' and `T' are with 0.25 probability at position
`-' in the sequence at position 4. Then a random draw
and `G'. Suppose that `C' is randomly drawn, then `C'
profiled sequence. If any of `A', `C’, `G' and `T' are at that
copied to position 4 of profiled sequence.
not exists at that position, then randomly choose any of
5,No.3/4, August 2015
5
DNA sequence.
position as rows. To
identify consensus
alignment uses profiles
aligned profiles is
concept of profiles in
TG_GT.
he character at
get `A' and 0.25
position 0 in the
the next four
example has no
probability of
work. Consider
to the existing
characters. If
otherwise, select any
position 4 in the
draw is made
`C' will be at
position then
of the highest
International Journal of Computer Science,
3. EXPERIMENTAL SETUP
3.1. Experimental Setup
We have discussed how to build
section 2. Now we discuss our
experimental purposes, three types
Input:
1. Seven short sequences
resemble any species but
2. Five sequences with lengths
3. The beta-cassein genetic
and Porpoise.
System Configuration: Intel CORE
Operating System: Ubuntu 14.04
Programming Language: C++
Setup 1:
Match Score: 3
Mis-match Score: 0
Gap penalty: -1
Guide-tree: UPGMA
3.2. Experimental Results
The Figures 4 and 5 are the guide
with input as the large sequences
can clearly observe that the guide
construction algorithms are used.
The progressive multiple alignment
guide tree topology of Figure 5
actual phylogenetic tree of these
Figure 4: UPGMA
Figure 6 shows graphs for execution
sequences as inputs for UPGMA
UPGMA and Neighbor-Joining
short length. This means that the
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
AND RESULTS
build an effective MSA through the progressive alignment
our implementation and results with some sample data
types of data sets are used:
each with lengths between 4 and 40. These sequences
but are used for testing.
lengths between 40 and 500, also not derived from actual
genetic sequences of five mammalian species Rat, Camel,
CORE i5 with 4GB RAM and 512GB Hard disk space.
14.04
Setup 2:
Match Score: 3
Mis-match Score: 0
Gap penalty: -1
Guide-tree: Neighbor-Joining
guide trees for both UPGMA and Neighbor-Joining,
sequences of rat, camel, dog, whale, and porpoise. From these
guide tree topologies are not same when different evolutionary
used. This has a great impact on the multiple sequence
alignment is purely dependent on the guide tree topology.
for the Neighbor-Joining algorithm is more consistent
species than that of the UPGMA algorithm.
UPGMA Figure 5: Neighbor-Joining
execution time generated by considering short and medium
UPGMA and Neighbor-joining. Observations from the
take the same amount of time when the input sequences
the algorithm used to build the guide tree has no impact
5,No.3/4, August 2015
6
alignment method in
data sets. For
sequences do not
actual species.
Camel, Dog, Whale,
space.
respectively,
these figures, we
evolutionary tree
sequence alignment.
Notably, the
consistent with the
medium length
graph: Both
sequences are of
impact on the
International Journal of Computer Science,
execution time or time complexity
relatively shorter sequences.
Figure
Figure
Figure 7 graphs execution times
and Neighbor-joining. Recall that
camel, porpoise, and whale. The
for both UPGMA and Neighbor-
terms of efficiency in this experiment
Setup 1 with small input sequences:
ACGTACT
ACTACG
ATGGATACTAACTCGG
ATGGCTA_GT
ATGCTCCGGCAAAGG
ATGCTGG
ATCGACAGTGTCA
Thus Multiple Sequence Alignment
Figure 8 below indicates the graph
medium, and large sequences as
graph are as follows: Neighbor-
input sequences. Note that both
actual optimal cost. Therefore,
Joining is more accurate than that
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
complexity of the program when the progressive alignment
Figure 6: Execution Time for Small sequences
Figure 7: Execution Time for Large sequences
generated by considering large sequences as inputs
that the sample data tested is for genetic sequences
The interesting observation from the graph is that the execution
-Joining differ. Neighbor-Joining clearly outperforms
experiment of a set of long sequences.
sequences: The resulted sequences are:
ACGTAC_T_ _ _ _ _ _ _ _
A_CTAC_G_ _ _ _ _ _ _ _
ATGGATACTAACTCGG
ATGG_ _ _CTA_GT_ _ _
ATGCTCCGGCAAAGG_
ATGCT_ _ _GG_ _ _ _ _ _
ATCGACAGT_ _GTCA_
Alignment is achieved.
graph for total alignment costs generated by considering
inputs for UPGMA and Neighbor-Joining. Observations
-Joining has lower optimal costs than UPGMA for
both algorithms attempt to achieve the lowest total cost,
the results indicate that the final MSA achieved by
that achieved by UPGMA.
5,No.3/4, August 2015
7
alignment is done for
inputs for UPGMA
sequences of rat, dog,
execution time
outperforms UPGMA in
considering small,
Observations from the
for all types of
cost, meaning the
by Neighbor-
International Journal of Computer Science,
Figure 8: Total
4. CONCLUSIONS
Progressive alignment is highly
are arranged by following the leaf
common guide tree algorithms,
accuracy. A simple and greedy
alignment, making the progressive
alignment process previously outlined
profile alignment used in this project
indicate that Neighbor-Joining
alignment, for use in guide trees
ACKNOWLEDGEMENTS
We would like to take this opportunity
McKenney for taking their valuable
REFERENCES
[1] Tatiana A Tatusova and Thomas
nucleotide sequences. FEMS microbiology
[2] Bo Qu and Zhaozhi Wu. An efficient
and Service Science (ICSESS),
2011.
[3] Julie D Thompson, Desmond
progressive multiple sequence
and weight matrix choice. Nucleic
[4] Juan Chen, Yi Pan, Juan Chen,
multiple sequence alignment.
2006. 20th International Conference
[5] Burkhard Morgenstern. Dialign
sequence alignment. Bioinformatics,
[6] Dan Wei and Qingshan Jiang.
construction. In Bio-Inspired
International Conference on, pa
[7] Che-Lun Hung, Chun-Yuan Lin,
three-profile alignment method.
IJCBS’09. International Joint Confer
Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4,
Total Costs for small, medium, and large sequences
dependent on how the guide tree is built and how the
leaf nodes of a guide tree. In this project, we have compared
UPGMA and Neighbor Joining, in terms of both efficiency
eedy approach is used for profile to sequence and profile
progressive alignments relatively faster. Although the five step
outlined is not new, some aspects of the profile evaluation
project are novel. Furthermore, our experimental results
is superior to UPGMA in both time complexity
for progressive MSA.
opportunity to thank Dr. Xudong William Yu and
valuable time in reviewing the project and providing good
Thomas L Madden. Blast 2 sequences, a new tool for comparing
microbiology letters, 174(2): 247–250, 1999.
efficient way of multiple sequence alignment. In Software
(ICSESS), 2011 IEEE 2nd International Conference on, pages 442
G Higgins, and Toby J Gibson. Clustal w: improving the
sequence alignment through sequence weighting, position-specific
Nucleic acids research, 22(22):4673–4680, 1994.
Chen, We Liu, and Ling Chen. Partitioned optimization
In Advanced Information Networking and Applications,
Conference on, volume 2, pages 5–pp. IEEE, 2006.
Dialign 2: improvement of the segment-to-segment approach
Bioinformatics, 15(3):211–218, 1999.
Jiang. A dna sequence distance measure approach for phylogenetic
Computing: Theories and Applications (BIC-TA), 2010
pages 204–212. IEEE, 2010.
Lin, Yeh-Ching Chung, and Chuan Yi Tang. A parallel
method. In Bioinformatics, Systems Biology and Intelligent Computing,
Conference on, pages 153–159. IEEE, 2009.
5,No.3/4, August 2015
8
the sequences
compared two
efficiency and
profile to profile
step progressive
evaluation and
results clearly
complexity and final
and Dr. Mark
good feedback.
comparing protein and
Software Engineering
442–445. IEEE,
the sensitivity of
specific gap penalties
algorithms for
Applications, 2006. AINA
approach to multiple
phylogenetic tree
2010 IEEE Fifth
parallel algorithm for
Computing, 2009.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015
9
[8] Conrad Shyu, Luke Sheneman, and James A Foster. Multiple sequence alignment with evolutionary
computation. Genetic Programming and Evolvable Machines, 5 (2):121–144, 2004.
[9] Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Molecular biology and evolution, 4(4):406–425, 1987.
[10] Li Yujian and Xu Liye. Unweighted multiple group method with arithmetic mean. In Bio-Inspired
Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on,
pages 830–834. IEEE, 2010.
[11] IlanGronau and Shlomo Moran. Optimal implementations of upgma and other common clustering
algorithms. Information Processing Letters, 104(6):205–210, 2007.
[12] Kevin Howe, Alex Bateman, and Richard Durbin. Quicktree: building huge neighbour-joining trees of
protein sequences. Bioinformatics, 18(11):1546–1547, 2002.
[13] Da-Fei Feng and Russell F Doolittle. Progressive sequence alignment as a prerequisitetto correct
phylogenetic trees. Journal of molecular evolution, 25(4):351–360, 1987.
[14] Chikhaoui, Belkacem, Shengrui Wang, and HélenePigot. "Causality-Based Model for User Profile
Construction from Behavior Sequences." Advanced Information Networking and Applications
(AINA), 2013 IEEE 27th International Conference on. IEEE, 2013.
[15] Motahari, Abolfazl, Guy Bresler, and David Tse. "Information theory for DNA sequencing: Part I: A
basic model." Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on. IEEE,
2012.
Authors
Dega Ravi Kumar Yadav born in India, in 1991. Perusing Master’s in Computer
Science at Southern Illinois University, Edwardsville, Illinois, USA. In 2013 published
an International Journal titled “Authenticated Mutual Communication between two
nodes in MANETs”. Areas of interest are Mobile Ad-Hoc networks, Bio-Informatics
and AI.
Dr. GunesErcal, is an Assistant Professor of Computer Science at Southern Illinois
University Edwardsville. Her primary research area is graph theory and its applications
to diverse areas such as network analysis and unsupervised learning. She received her
PhD in Computer Science from the University of California, Los Angeles in 2008. She
received her BS degrees in both Mathematics and Computer Science in 2001 from the
University of Southern California.

More Related Content

What's hot

Quantum-Min-Cut/Max-Flow-Based Vertex Importance Ranking
Quantum-Min-Cut/Max-Flow-Based Vertex Importance RankingQuantum-Min-Cut/Max-Flow-Based Vertex Importance Ranking
Quantum-Min-Cut/Max-Flow-Based Vertex Importance RankingColleen Farrelly
 
Distributed vertex cover
Distributed vertex coverDistributed vertex cover
Distributed vertex coverIJCNCJournal
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degreeIJCNCJournal
 
The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...journalBEEI
 
Minimization of Localization Error using Connectivity based Geometrical Metho...
Minimization of Localization Error using Connectivity based Geometrical Metho...Minimization of Localization Error using Connectivity based Geometrical Metho...
Minimization of Localization Error using Connectivity based Geometrical Metho...Dr. Amarjeet Singh
 
Discretization methods for Bayesian networks in the case of the earthquake
Discretization methods for Bayesian networks in the case of the earthquakeDiscretization methods for Bayesian networks in the case of the earthquake
Discretization methods for Bayesian networks in the case of the earthquakejournalBEEI
 
Multiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham KaushikMultiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham KaushikShubham Kaushik
 
Compressing the dependent elements of multiset
Compressing the dependent elements of multisetCompressing the dependent elements of multiset
Compressing the dependent elements of multisetIRJET Journal
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programmingeSAT Publishing House
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSAIRCC Publishing Corporation
 
Mobile ad hoc networks – dangling issues of optimal path strategy
Mobile ad hoc networks – dangling issues of optimal path strategyMobile ad hoc networks – dangling issues of optimal path strategy
Mobile ad hoc networks – dangling issues of optimal path strategyAlexander Decker
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK csandit
 

What's hot (18)

Quantum-Min-Cut/Max-Flow-Based Vertex Importance Ranking
Quantum-Min-Cut/Max-Flow-Based Vertex Importance RankingQuantum-Min-Cut/Max-Flow-Based Vertex Importance Ranking
Quantum-Min-Cut/Max-Flow-Based Vertex Importance Ranking
 
Distributed vertex cover
Distributed vertex coverDistributed vertex cover
Distributed vertex cover
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degree
 
The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...
 
Minimization of Localization Error using Connectivity based Geometrical Metho...
Minimization of Localization Error using Connectivity based Geometrical Metho...Minimization of Localization Error using Connectivity based Geometrical Metho...
Minimization of Localization Error using Connectivity based Geometrical Metho...
 
Discretization methods for Bayesian networks in the case of the earthquake
Discretization methods for Bayesian networks in the case of the earthquakeDiscretization methods for Bayesian networks in the case of the earthquake
Discretization methods for Bayesian networks in the case of the earthquake
 
Multiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham KaushikMultiple Sequence Alignment by Shubham Kaushik
Multiple Sequence Alignment by Shubham Kaushik
 
Compressing the dependent elements of multiset
Compressing the dependent elements of multisetCompressing the dependent elements of multiset
Compressing the dependent elements of multiset
 
Small world
Small worldSmall world
Small world
 
poster
posterposter
poster
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
 
Mobile ad hoc networks – dangling issues of optimal path strategy
Mobile ad hoc networks – dangling issues of optimal path strategyMobile ad hoc networks – dangling issues of optimal path strategy
Mobile ad hoc networks – dangling issues of optimal path strategy
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
 

Viewers also liked

THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...
THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...
THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...ijcseit
 
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...ijcseit
 
Survey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chainSurvey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chainijcseit
 
An optimized round robin cpu scheduling
An optimized round robin cpu schedulingAn optimized round robin cpu scheduling
An optimized round robin cpu schedulingijcseit
 
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NET
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NETDECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NET
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NETijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
 
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...ijcseit
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 

Viewers also liked (9)

THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...
THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...
THE NUMERICAL SIMULATION SUPPORT TOOL FOR THE ASSESSMENT OF THE ACOUSTIC QUAL...
 
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...
Study and Analysis of Multidimensional Hilbert Space Filling Curve and Its Ap...
 
Survey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chainSurvey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chain
 
An optimized round robin cpu scheduling
An optimized round robin cpu schedulingAn optimized round robin cpu scheduling
An optimized round robin cpu scheduling
 
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NET
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NETDECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NET
DECENTRALIZED SUPERVISION OF MOBILE SENSOR NETWORKS USING PETRI NET
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
 
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...
TRAFFIC CONTROL MANAGEMENT AND ROAD SAFETY USING VEHICLE TO VEHICLE DATA TRAN...
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 

Similar to A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES

International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...csandit
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...cscpconf
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...rahulmonikasharma
 
Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Deepti Dohare
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataeSAT Journals
 
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGA COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn GraphAshwani kumar
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapeletsIJDKP
 
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...ijdpsjournal
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyijpla
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherIAEME Publication
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103Farah Diba
 

Similar to A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES (20)

International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
 
Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGA COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERING
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapelets
 
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...
FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH...
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcher
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103
 
336 1170-1-pb tocher anderson
336 1170-1-pb  tocher anderson336 1170-1-pb  tocher anderson
336 1170-1-pb tocher anderson
 
336 1170-1-pb tocher anderson
336 1170-1-pb  tocher anderson336 1170-1-pb  tocher anderson
336 1170-1-pb tocher anderson
 
336 1170-1-pb tocher anderson
336 1170-1-pb  tocher anderson336 1170-1-pb  tocher anderson
336 1170-1-pb tocher anderson
 

Recently uploaded

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 

Recently uploaded (20)

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 

A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES

  • 1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015 DOI : 10.5121/ijcseit.2015.5401 1 A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES USING UPGMA AND NEIGHBOR JOIN BASED GUIDE TREES Dega Ravi Kumar Yadav1 and Gunes Ercal2 Department of Computer Science, SIUE, Edwardsville, Illinois, USA ABSTRACT Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence alignment, an important way of which is the progressive alignment approach studied in this work. Progressive alignment involves three steps: find the distance between each pair of sequences; construct a guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance and more accurate in terms of overall cost minimization. KEYWORDS Progressive MSA, Guide Tree, Profiles, Pair-wise Distance & Dynamic Programming. 1. INTRODUCTION The traditional pairwise sequence alignment problem in its utmost generality is to find an arrangement of two given strings, S and T, such that the arrangement yields information on the relationship between S and T, such as the minimum number of changes to S that would transform S into T. In the context of DNA sequences, which can be viewed as strings from the 4 letter alphabet {A, C, G, T}, these changes may represent mutation events, so that the alignment sought yields important evolutionary information [15]. Similarly, the pairwise sequence alignment problem can be generalized to the multiple sequence alignment problem to yield information on the relatedness of multiple sequences. Applications of the multiple sequence alignment (MSA) problem for DNA sequences include phylogenetic analysis, domain identification, discovery of DNA regulatory elements, and pattern identification. Additionally, MSA applications for protein sequences also includes protein family identification and structure prediction. This work is concerned with approaches to multiple sequence alignment in the context of DNA sequences. Generally, aligning two sequences is straightforward via dynamic programming. But pairwise alignment is insufficient for many applications in which the relationship among several sequences is sought. Moreover, it is infeasible to naturally extend the dynamic programming approach that works for pairwise sequence alignment directly to multiple sequence alignment when there are more than three sequences to align. Unfortunately, multiple sequences alignment is NP-hard based on SP (sum-of-pairs) scores [1]. Therefore, heuristics are crucial to MSA.
  • 2. International Journal of Computer Science, Progressive alignments are by far method [2, 3]. Progressive alignment calculations for all the input DNA computed in the previous step. 3. nearest sequences first. In this approaches to MSA in terms of progressive MSA with UPGMA based guide trees. Our results indicate preferable in terms of both efficiency 2. BACKGROUND AND RELATED The main steps of the progressive 1. Compute pairwise distances 2. Build the guide tree based 3. Align first two sequences (Global Alignment). 4. From the next sequence profile. 5. Repeat step 4 until the longest 6. Figure 1: 2.1. Pairwise Distance The distance between two DNA distance matrix is computed among number of evolutionary models Jukes-Cantor model [8]. The Jukes matches to the number of non-gaps distance formula is: Here `D' is the Jukes-Cantor distance of matches to the number of non specified species, a guide tree based 2.2. Guide Tree In progressive alignment, the guide determine which sequence is to be a phylogenetic tree that is constructed A phylogenetic tree is an evolutionary species. Phylogenetic trees are Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, far the most widely used heuristic multiple sequence alignment is done in three major steps; 1. Perform pairwise DNA sequences. 2. Build the guide tree using the distance 3. Based on the guide tree, perform progressive alignment this project, we compare two important progressive of algorithmic efficiency as well as alignment accuracy, UPGMA based guide trees and progressive MSA with Neighbor indicate that the Neighbor Join method of guide tree construction efficiency and accuracy of the overall resulting MSA. RELATED WORK progressive alignment methodology are as follows [6, 13]: distances for all the sequences. based on the distance matrix. sequences based on the guide tree leaf nodes using dynamic sequence alignment, construct a profile and align the new sequence longest branch leaf node is aligned. Hence, MSA is achieved. 1: Progressive Multiple sequence Alignment DNA sequences is known as the pairwise distance. In this among the species using Jukes Cantor distance formula. proposed to measure pairwise distance, the first of Jukes Cantor distance formula is based on the ratio gaps in the DNA of two sequences of the species. D = (-3/4 * log (1-(4/3*R))) distance between two DNA sequences and `R' is the ratio non-gap letters [8]. Once we obtain a distance matrix based on distance matrix is built. guide tree plays an important role as the branches be considered for the next step of alignment [10]. A constructed dependent on the distance matrix of the DNA evolutionary tree which shows interrelations among various dependent on physical or genetic characteristic similarities 5,No.3/4, August 2015 2 sequence alignment pairwise distance distance matrix alignment with progressive alignment accuracy, namely Neighbor Join construction is programming sequence to the achieved. this project, a formula. There are a of which is the of number of Jukes cantor ratio of number matrix for the branches of the tree A guide tree is DNA sequences. various biological similarities and
  • 3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015 3 differences. They show the distance between pairs of sequences when the tree edges are weighted [10]. There are several types of phylogenetic trees; rooted, unrooted, and bifurcating. In this project we will be using an unrooted phylogenetic tree as our guide tree.Guide trees may be built using clustering algorithms or other learning models. In this project, the guide tree is constructed using both UPGMA and Neighbor-Joining algorithms and the final results are compared. Now we will have a closer look at these two algorithms. 2.2.1. UPGMA UPGMA stands for Un-weighted Pair-Group Method with Arithmetic mean. Un-weighted refers to all pairwise distances contributing equally, pair-group refers to groups being combined in pairs, and arithmetic mean refers to pairwise distances between groups being mean distances between all members of the two groups considered [7]. Consider four DNA sequences namely: S1, S2, S3 and S4. First, find the pairwise distances between all the sequences. Then, find the smallest value in the distance matrix and its corresponding sequences of the shorter distance. For instance let the two sequences with the shortest distance between them be S1 and S2. Now, cluster S1 and S2 and name the cluster as C1, updating the distance matrix by eliminating S1 and S2, but including C1. The C1 value corresponding to the remaining sequences in the distance matrix is calculated with the values of S1 and S2, i.e., arithmetic mean of S1 and S2 distances with corresponding to the other sequences. Now, moving forward by considering the updated distance matrix, find the smallest distance again and its corresponding sequences or clusters (namely sets of sequences). Say this next smallest distance corresponds to that between clusters C3and C4. Then, in the next step, C3 and C4 would be merged into a new cluster C5, and all the distances to C5 would be updated in the distance matrix by the corresponding average distances to the sequences in C5. Repeat the same and find new clusters and sequences, merging and updating the distance matrix, until we are left with one cluster. Algorithm: Let the clusters be C1, C2, C3,....., Cn and si be the size of each cluster Ci, ‘d' be the pairwise distance defined on the clusters. Clustering is done in the following manner: 1. Find the smallest pairwise distance amongst the clusters, d (Ci, Cj). 2. A new cluster Ck with size si+ sj is formed by joining Ci and Cj. 3. Compute the new distances from all other clusters to Ck by using the existing weighted distances average. Where l € {1, 2... n} and l ≠ i, j. 4. Repeat 1, 2, and 3 until only one cluster remains. The asymptotic time complexity of UPGMA is O (n2 ), since there are (n-1) iterations, with O (n) steps per iteration [11]. 2.2.2. Neighbor-Joining The Neighbor-Joining algorithm is consistent with a parsimonious evolutionary model in which the total sum of branch lengths is minimized [9]. It has the added benefit of achieving the correct tree topology when given the correct pairwise distances, while also being flexible enough to accommodate many distance models. Now let us look at the actual algorithm.
  • 4. International Journal of Computer Science, Algorithm: Let the clusters be C pairwise distance defined on the 1. Compute 2. Find the smallest difference 3. A new cluster Ck is formed between Ci, Cjand Ck as: 4. Compute distances between Where l € {1, 2... 5. Repeat 1, 2 and 3 until you Similarly to UPGMA, the asymptotic O (n2 ). However, as we shall see, 2.3. Dynamic Programming Needleman and Wunsch’s elegant well for aligning nucleic acid sequences. algorithms for finding optimal programming a global alignment matrix. Paths in the scoring matrix scoring matrix is dependent on Match indicates that the two letters different, and gap (Insertion or Deletion) Let us consider an example with score=3, mis-match score= 1, and optimal solution of aligned sequences. scoring with match, mis-match, and Figure 2: Dynamic Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, C1, C2, C3,....., Cn and si be the size of each cluster clusters. Clustering is done in the following manner: difference d (Ci, Cj) – uj – uj and choose i and j accordingly. formed by joining Ci and Cj. Calculate the intermediate as: between new cluster and all the other cluster's: 2... n} and l ≠ i, j. you are left with two clusters. asymptotic time complexity of the Neighbor-Joining algorithm see, their execution time performance is certainly not identical. elegant algorithm for comparing two protein sequences sequences. The algorithm actually belongs to a very optimal solutions called dynamic programming [1]. alignment of two sequences is done based on constructing matrix decide the optimal solution for two aligned sequences. three variables; match score, mis-match score, and letters are the same, mismatch denotes that the two Deletion) denotes one letter aligning to a gap in the other with two strings: ATGCG and TGCAT. Consider the scores and gap penalty= -1. There is no rule that we should have sequences. Figure 2 shows the complete scoring matrix obtained and gap scores. Dynamic Programming pairwise sequence alignment. 5,No.3/4, August 2015 4 Ci, ‘d' be the manner: accordingly. intermediate distances algorithm is also identical. sequences also works large class of In dynamic constructing a scoring sequences. The and gap penalty. two letters are other string. scores as match have only one obtained after
  • 5. International Journal of Computer Science, The best optimal alignment that 2.4. Profiles A profile is a table that records the Usually profiles are represented build a profile we need at least sequences for two or more sequences to compute multiple alignment heuristically used to compute multiple alignments. detail. Consider four DNA sequences that The profile for the above four sequences Construct a profile sequence: Depending that position [14] is determined. chance to get `T'. As the probability profiled sequence. Moving on the positions. Then the final profiled complications in deciding the character, occurrence? For those situations, the character at the same position profile. Then, the rules are as follows: 1. If the character exists at matched, consider the matched of the identically highest Example: Consider that profile and we have a ` among `A', `C’, `T' and position 4 in the profiled it will be selected and copied 2. If the character does not probable character. Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, that is achieved from above figure are: A T G C G _ _ T G C A T the frequency of each letter at each position in a DNA using a matrix with letters as columns and position least two DNA sequences. Profiles allows us to identify sequences that are already aligned. Progressive alignment heuristically [14]. In this project, the concept of aligned alignments. Below is an example that explains the concept that are in a MSA: AGT_C, AGTGC, ATTG_ and TG_GT. sequences looks like; Figure 3: Profile Scoring Matrix Depending on the highest score for each position, the Suppose we have, at position 0 a 0.75 chance to get probability of `A' is greater than `T', `A' is chosen at position the same way `G', `T', `G’, and `C’ are chosen for profiled sequence achieved is “AGTGC”. This example character, but what if two alphabets have same probability situations, we state and implement two simple rules in this work. position in the sequence or profile that you want to align to follows: at that position, compare it with the highest probable matched character for the profiled sequence; otherwise, highest probable characters with a random draw. `A', `C’, `G' and `T' are with 0.25 probability at position `-' in the sequence at position 4. Then a random draw and `G'. Suppose that `C' is randomly drawn, then `C' profiled sequence. If any of `A', `C’, `G' and `T' are at that copied to position 4 of profiled sequence. not exists at that position, then randomly choose any of 5,No.3/4, August 2015 5 DNA sequence. position as rows. To identify consensus alignment uses profiles aligned profiles is concept of profiles in TG_GT. he character at get `A' and 0.25 position 0 in the the next four example has no probability of work. Consider to the existing characters. If otherwise, select any position 4 in the draw is made `C' will be at position then of the highest
  • 6. International Journal of Computer Science, 3. EXPERIMENTAL SETUP 3.1. Experimental Setup We have discussed how to build section 2. Now we discuss our experimental purposes, three types Input: 1. Seven short sequences resemble any species but 2. Five sequences with lengths 3. The beta-cassein genetic and Porpoise. System Configuration: Intel CORE Operating System: Ubuntu 14.04 Programming Language: C++ Setup 1: Match Score: 3 Mis-match Score: 0 Gap penalty: -1 Guide-tree: UPGMA 3.2. Experimental Results The Figures 4 and 5 are the guide with input as the large sequences can clearly observe that the guide construction algorithms are used. The progressive multiple alignment guide tree topology of Figure 5 actual phylogenetic tree of these Figure 4: UPGMA Figure 6 shows graphs for execution sequences as inputs for UPGMA UPGMA and Neighbor-Joining short length. This means that the Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, AND RESULTS build an effective MSA through the progressive alignment our implementation and results with some sample data types of data sets are used: each with lengths between 4 and 40. These sequences but are used for testing. lengths between 40 and 500, also not derived from actual genetic sequences of five mammalian species Rat, Camel, CORE i5 with 4GB RAM and 512GB Hard disk space. 14.04 Setup 2: Match Score: 3 Mis-match Score: 0 Gap penalty: -1 Guide-tree: Neighbor-Joining guide trees for both UPGMA and Neighbor-Joining, sequences of rat, camel, dog, whale, and porpoise. From these guide tree topologies are not same when different evolutionary used. This has a great impact on the multiple sequence alignment is purely dependent on the guide tree topology. for the Neighbor-Joining algorithm is more consistent species than that of the UPGMA algorithm. UPGMA Figure 5: Neighbor-Joining execution time generated by considering short and medium UPGMA and Neighbor-joining. Observations from the take the same amount of time when the input sequences the algorithm used to build the guide tree has no impact 5,No.3/4, August 2015 6 alignment method in data sets. For sequences do not actual species. Camel, Dog, Whale, space. respectively, these figures, we evolutionary tree sequence alignment. Notably, the consistent with the medium length graph: Both sequences are of impact on the
  • 7. International Journal of Computer Science, execution time or time complexity relatively shorter sequences. Figure Figure Figure 7 graphs execution times and Neighbor-joining. Recall that camel, porpoise, and whale. The for both UPGMA and Neighbor- terms of efficiency in this experiment Setup 1 with small input sequences: ACGTACT ACTACG ATGGATACTAACTCGG ATGGCTA_GT ATGCTCCGGCAAAGG ATGCTGG ATCGACAGTGTCA Thus Multiple Sequence Alignment Figure 8 below indicates the graph medium, and large sequences as graph are as follows: Neighbor- input sequences. Note that both actual optimal cost. Therefore, Joining is more accurate than that Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, complexity of the program when the progressive alignment Figure 6: Execution Time for Small sequences Figure 7: Execution Time for Large sequences generated by considering large sequences as inputs that the sample data tested is for genetic sequences The interesting observation from the graph is that the execution -Joining differ. Neighbor-Joining clearly outperforms experiment of a set of long sequences. sequences: The resulted sequences are: ACGTAC_T_ _ _ _ _ _ _ _ A_CTAC_G_ _ _ _ _ _ _ _ ATGGATACTAACTCGG ATGG_ _ _CTA_GT_ _ _ ATGCTCCGGCAAAGG_ ATGCT_ _ _GG_ _ _ _ _ _ ATCGACAGT_ _GTCA_ Alignment is achieved. graph for total alignment costs generated by considering inputs for UPGMA and Neighbor-Joining. Observations -Joining has lower optimal costs than UPGMA for both algorithms attempt to achieve the lowest total cost, the results indicate that the final MSA achieved by that achieved by UPGMA. 5,No.3/4, August 2015 7 alignment is done for inputs for UPGMA sequences of rat, dog, execution time outperforms UPGMA in considering small, Observations from the for all types of cost, meaning the by Neighbor-
  • 8. International Journal of Computer Science, Figure 8: Total 4. CONCLUSIONS Progressive alignment is highly are arranged by following the leaf common guide tree algorithms, accuracy. A simple and greedy alignment, making the progressive alignment process previously outlined profile alignment used in this project indicate that Neighbor-Joining alignment, for use in guide trees ACKNOWLEDGEMENTS We would like to take this opportunity McKenney for taking their valuable REFERENCES [1] Tatiana A Tatusova and Thomas nucleotide sequences. FEMS microbiology [2] Bo Qu and Zhaozhi Wu. An efficient and Service Science (ICSESS), 2011. [3] Julie D Thompson, Desmond progressive multiple sequence and weight matrix choice. Nucleic [4] Juan Chen, Yi Pan, Juan Chen, multiple sequence alignment. 2006. 20th International Conference [5] Burkhard Morgenstern. Dialign sequence alignment. Bioinformatics, [6] Dan Wei and Qingshan Jiang. construction. In Bio-Inspired International Conference on, pa [7] Che-Lun Hung, Chun-Yuan Lin, three-profile alignment method. IJCBS’09. International Joint Confer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, Total Costs for small, medium, and large sequences dependent on how the guide tree is built and how the leaf nodes of a guide tree. In this project, we have compared UPGMA and Neighbor Joining, in terms of both efficiency eedy approach is used for profile to sequence and profile progressive alignments relatively faster. Although the five step outlined is not new, some aspects of the profile evaluation project are novel. Furthermore, our experimental results is superior to UPGMA in both time complexity for progressive MSA. opportunity to thank Dr. Xudong William Yu and valuable time in reviewing the project and providing good Thomas L Madden. Blast 2 sequences, a new tool for comparing microbiology letters, 174(2): 247–250, 1999. efficient way of multiple sequence alignment. In Software (ICSESS), 2011 IEEE 2nd International Conference on, pages 442 G Higgins, and Toby J Gibson. Clustal w: improving the sequence alignment through sequence weighting, position-specific Nucleic acids research, 22(22):4673–4680, 1994. Chen, We Liu, and Ling Chen. Partitioned optimization In Advanced Information Networking and Applications, Conference on, volume 2, pages 5–pp. IEEE, 2006. Dialign 2: improvement of the segment-to-segment approach Bioinformatics, 15(3):211–218, 1999. Jiang. A dna sequence distance measure approach for phylogenetic Computing: Theories and Applications (BIC-TA), 2010 pages 204–212. IEEE, 2010. Lin, Yeh-Ching Chung, and Chuan Yi Tang. A parallel method. In Bioinformatics, Systems Biology and Intelligent Computing, Conference on, pages 153–159. IEEE, 2009. 5,No.3/4, August 2015 8 the sequences compared two efficiency and profile to profile step progressive evaluation and results clearly complexity and final and Dr. Mark good feedback. comparing protein and Software Engineering 442–445. IEEE, the sensitivity of specific gap penalties algorithms for Applications, 2006. AINA approach to multiple phylogenetic tree 2010 IEEE Fifth parallel algorithm for Computing, 2009.
  • 9. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 5,No.3/4, August 2015 9 [8] Conrad Shyu, Luke Sheneman, and James A Foster. Multiple sequence alignment with evolutionary computation. Genetic Programming and Evolvable Machines, 5 (2):121–144, 2004. [9] Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4):406–425, 1987. [10] Li Yujian and Xu Liye. Unweighted multiple group method with arithmetic mean. In Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on, pages 830–834. IEEE, 2010. [11] IlanGronau and Shlomo Moran. Optimal implementations of upgma and other common clustering algorithms. Information Processing Letters, 104(6):205–210, 2007. [12] Kevin Howe, Alex Bateman, and Richard Durbin. Quicktree: building huge neighbour-joining trees of protein sequences. Bioinformatics, 18(11):1546–1547, 2002. [13] Da-Fei Feng and Russell F Doolittle. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. Journal of molecular evolution, 25(4):351–360, 1987. [14] Chikhaoui, Belkacem, Shengrui Wang, and HélenePigot. "Causality-Based Model for User Profile Construction from Behavior Sequences." Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on. IEEE, 2013. [15] Motahari, Abolfazl, Guy Bresler, and David Tse. "Information theory for DNA sequencing: Part I: A basic model." Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on. IEEE, 2012. Authors Dega Ravi Kumar Yadav born in India, in 1991. Perusing Master’s in Computer Science at Southern Illinois University, Edwardsville, Illinois, USA. In 2013 published an International Journal titled “Authenticated Mutual Communication between two nodes in MANETs”. Areas of interest are Mobile Ad-Hoc networks, Bio-Informatics and AI. Dr. GunesErcal, is an Assistant Professor of Computer Science at Southern Illinois University Edwardsville. Her primary research area is graph theory and its applications to diverse areas such as network analysis and unsupervised learning. She received her PhD in Computer Science from the University of California, Los Angeles in 2008. She received her BS degrees in both Mathematics and Computer Science in 2001 from the University of Southern California.