SlideShare a Scribd company logo
1 of 5
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 312
COMPARATIVE ANALYSIS OF DYNAMIC PROGRAMMING
ALGORITHMS TO FIND SIMILARITY IN GENE SEQUENCES
Shankar Biradar1
, Vinod Desai2
, Basavaraj Madagouda3
, Manjunath Patil4
1, 2, 3, 4
Assistant Professor, Department of Computer Science & Engineering, Angadi Institute of Technology and
Management, Belgaum, Karnataka, India.
shankar_pda@yahoo.com, vinod.cd0891@gmail.com, basavarajmadagoda@gmail.com, manjunath.patil03@gmail.com
Abstract
There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity
is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within
the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common
Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of
the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic
data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data
sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis
and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence
belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families.
Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
-----------------------------------------------------------------------***-----------------------------------------------------------------------
1. INTRODUCTION
Bioinformatics is the application of computer technology to
the management of biological information. The field of
bioinformatics has gained widespread popularity largely due
to efforts such as the genome projects, which have produced
lot of biological sequence data for analysis. This has led to the
development and improvement of many computational
techniques for making inference in biology and medicine. A
gene is a molecular unit of heredity of a living organism. It is
a name given to some stretches of DNA and RNA that code
for a polypeptide or for an RNA chain that has a function in
the organism. Genes hold the information to build and
maintain an organism's cells and pass genetic characteristic to
their child. Gene sequencing can be used to gain important
information on genes, genetic variation and gene function for
biological and medical studies [13]. Edit distance is a method
of finding similarity between gene/protein sequences by
finding dissimilarity between two sequences [5]. Edit distance
between source and target string is represented by how many
fundamental operation are required to transfer source string
into target, these fundamental operations are insertion,
deletion and subtraction. The similarity of two strings is the
minimum number of edit distance. String Similarity is
quantitative term that shows degree of commonality or
difference between two comparative sequences [10], Finding
the gene similarity has massive use in the field of
bioinformatics.
2. MATERIALS AND METHODS
In this section we describe the various materials and methods
which are used in our algorithms
2.1 Dataset Used
For the experiment purpose we took data sets from 5 different
families which are listed below, and the source of information
is [16] [17].
Family: kruppel c2h2-type zinc finger protein.
Family: caution-diffusion facilitator (CDF) transporter family.
Family: E3 ubquitin-protein ligase.
Family: Semaphorin-7A.
Family: SPAG11 family.
2.2 Dataset Format
In this research work we used various data sets from different
families for the implementation of different algorithms, all this
data set is in FASTA format. In bioinformatics, FASTA
format is a text-based format for representing nucleotide
sequences, in which nucleotides or amino acids are
represented using single-letter codes. The format also contain
sequence name before the sequences start. A sequence in
FASTA format begins with a single-line description, followed
by lines of sequence data. The description line is distinguished
from the sequence data by a greater-than (">") symbol in the
first column. The word following the ">" symbol is the
identifier of the sequence, and the rest of the line is the
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 313
description (both are optional). There should be no space
between the ">" and the first letter of the identifier. It is
recommended that all lines of text be shorter than 80
characters. The sequence ends if another line starting with a
">" appears; this indicates the start of another sequence.
2.3 Gap Penalty
In order to get best possible sequence alignment between two
DNA sequences, it important to insert gaps in sequence
alignment and use gap penalties. While aligning DNA
sequences, a positive score is assigned for matches negative
score is assigned for mismatch To find out score for matches
and mismatches in alignments of proteins, it is necessary to
know how often one sequence is substituted for another in
related proteins. In addition, a method is needed to account for
insertions and deletions that sometimes appear in related DNA
or protein sequences. To accommodate such sequence
variations, gaps that appear in sequence alignments are given a
negative penalty score reflecting the fact that they are not
expected to occur very often. It is very difficult to get the best-
possible alignment, either global or local, unless gaps are
included in the alignment.
2.4 Blosum Matrix
A Blosum matrix is necessary for pair wise sequence
alignment. The four DNA bases are of two types, purines (A
and G) and pyrimidines (T and C). The purines are chemically
similar to each other and the pyrimidines are chemically
similar to each other. Therefore, we will penalize substitutions
between a purine and a purine or between a pyrimidine and a
pyrimidine (transitions) less heavily than substitutions
between purines and pyrimidines (transfusions). We will use
the following matrix for substitutions and matching’s. The
score is 2 for a match, 1 for a purine with a purine or a
pyrimidine with a pyrimidine, and -1 for a purine with a
pyrimidine.
3. ALGORITHMS
Dynamic programming algorithms for finding gene sequence
similarity are discussed in detail in this section along with
pseudo codes and algorithms. We used three algorithms for
analysis purpose, all these algorithms uses the concept of
dynamic programming, which is output sequence depends
upon the input of previous sequence. Those three algorithms
are.
a. Levenshtein edit distance algorithm
b. Longest common subsequence algorithm
c. Smith-waterman algorithm
3.1 Levenshtein Edit Distance Algorithms
It is one of the most popular algorithms to find dissimilarity
between two nucleotide sequences, it is an approximate string
matching algorithm mainly used for forensic data set, the basic
principle of this algorithm is to measure the similarity between
two strings [4]. This is done by calculating the number of
basic operations as mentioned in introduction part. Algorithm
for Levenshtein edit distance is as fallows
Int LevenshteinDistance (char S[1..N], char T[1...M] )
{
Declare int D[1....N, 1....M]
For i from 0 to N
D[i,0] := i // the distance of any first string to an empty
second string
For j from 0 to M
D[0, j] := j // the distance of any second string to an empty
first string
For j from 1 to M
{
For i from 1 to N
{
S[i] = T[j] then
D[i, j] := D[i-1, j-1]
Else
D[i,j] := min {
D[i-1 , j] +1 // deletion
D[i, j-1] +1 // insertion
D[i-1, j-1] +1// substitution
}
}
}
Return D[M, N]
}
3.2 Longest Common Subsequence Algorithm
Finding LCS [3] [8] is one way of computing how similar two
sequences are, Longer the LCS more similar they are. The
LCS problem is a special case of the edit distance problem.
LCS is similar to Levenshtein edit distance algorithm except
few steps and it also involves trace back process in order to
find similar sequences.
Algorithm for Longest common subsequence is as fallows
Int LCS (char S[1...N], char T[1....M] )
{
Declare int D[1....N, 1....M]
For i from 0 to N
D[i,0] := 0
For j from 0 to M
D[0, j] := 0
For j from 1 to M
{
For i from 1 to N
{
D[i,j] := max {
V1;
V2;
V3+1 if S=T else V3
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 314
}
}
}
Return D [M, N]
}
Where V1 = the value in the cell to the left of current cell.
V2= the value in the cell above the current cell.
V3= value in the cell above left to the current cell,
S and T are source string and Target string respectively
3.3 Smith-Waterman Algorithm
The Smith–Waterman algorithm is a well-known algorithm for
performing local sequence alignment; that is, for determining
similar regions between two nucleotide or protein sequences.
Instead of looking at the total sequence, the Smith–Waterman
algorithm compares segments of all possible lengths and
optimizes the similarity measure.
Smith-waterman algorithm differ from other Local alignment
algorithm in fallowing factors
a. A negative score/weight must be given to mismatches.
b. Zero must be the minimum score recorded in the
matrix.
c. The beginning and end of an optimal path may be found
anywhere in the matrix not just the last row or column.
Pseudo code core smith-waterman algorithm is as fallows.
Pseudo code for initialization of matrix
For i=0 to length(A)
F(i,0) ← d*i
For j=0 to length(B)
F(0,j) ← d*j
For i=1 to length(A)
For j=1 to length(B)
{
Diag ← F(i-1,j-1) + S(Ai, Bj)
Up ← F(i-1, j) + d
Left ← F(i, j-1) + d
F(i,j) ← max(Match, Insert, Delete)
}
Pseudo code for SW alignment
For (int i=1;i<=n;i++)
For (int j=1;j<=m;j++)
int s=score[seq1.charAt(i-1)][seq2.charAt(j-1)];
int val=max(0,F[i-1][j-1]+s,F[i-1][j]-d,F[i][j-1]-d);
F[i][j]=val;
If (val==0)
B[i][j]=null;
Else if(val==F[i-1][j-1]+s)
B[i][j]=new Traceback2(i-1,j-1);
Else if(val==F[i-1][j]-d)
S[i][j]= new Traceback2(i-1,j);
Else if(val==F]i][j-1]-d)
B[i][j]= new Traceback2(i,j-1);
Where i and j are columns and rows respectively, S (xi; yj) is
value of substitution matrix and g is gap penalty, the
substitution matrix is a matrix which describes the rate at
which one character in a sequence changes to other character
states over time
3.4 Results within the Same Families
Table1. Family: cation-duffusion facilitator
Figure1. Similarity graph for cation-duffusion facilitator
family
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 315
Table2. Family: semaphorin
Figure2. Similarity graph for family semaphorin
In the figure 2, blue line indicates similarity in smith-
waterman algorithm, red line indicate similarity in longest
common subsequence algorithm and finally the green line
indicate similarity in Levenshtein algorithm. As we see from
the graph smith-water man algorithm is more efficient then
other two algorithms while finding the similarity of gene
sequences that belonging to same family.
3.5 Results between Different Families
Figure3. Similarity graph across family
Where red line indicates LCS algorithm, green line indicates
Levenshtein edit distance algorithm and blue line is for smith-
waterman algorithm. From the above graph, we can conclude
that while comparing two gene sequences belonging to
different families, longest common subsequence is better
algorithm because it gives maximum similarity as compare to
other two algorithms.
Table3. Similarities across the families
CONCLUSIONS
We considered finding the gene sequence similarity using
dynamic programming for our project work. In dynamic
programming there exist many different approaches to find
similarity among gene sequences; we took some of these
algorithms for our project and did comparative analysis of
these algorithms using datasets from five different families.
We took different protein sequences from all these dataset as
input to our program and did rigorous experimentation on
these datasets, both within the families and across the families.
Five data sets which are used for our experimental work are
kruppel c2h2-type zinc finger protein, cation-diffusion
facilitator (CDF) transporter, E3 ubquitin –protein ligase,
semaphorin and finally SPAG11B and got the results as
discussed in the previous section. From the results we can
conclude that smith-waterman algorithm is best suited to find
similarity for protein sequences that belonging to the same
family, and longest common subsequence algorithm is best
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 316
suited for protein sequences that are belong to different
families.
REFERENCES
[1] S. Hirschberg; “Algorithms for the longest common
subsequence problem”. J.ACM, 24;{664-675};1977
[2] Levenshtein V.I; “binary code capable of correcting
deletion, insertion and reversal”; soviet physics
doklady; vol 8; 1966
[3] Ristead, R.S Yianilos,P.N; “learning string edit
distance”; IEEE Transaction or pattern analysis and
machine intelligence; 1998
[4] L. Bergroth; “Survey of Longest Common Subsequence
Algorithms”; Department of Computer Science,
University of Turku20520 Turku,Finland; 2000 IEEE
[5] Hekki Hyyro, Ayumi Shinohara; “A new bit-parallel-
distance algorithm”; Nikoltseas.LNCS 3772; 2005
[6] Adrian Horia Dediu,et al; “A fast longest common
subsequence algorithm for similar strings”; Language
and automation theory and application, International
Conference, LATA; 2010
[7] Patsaraporn Somboonsat, Mud-Armeen munlin; “a new
edit distance method for finding similarity in DNA
sequence”; world academy of science engineering and
technology 58; 2011
[8] Dekang Lin; “ An Information-Theoretic Definition of
Similarity”; Department of Computer Science
University of Manitoba,Winnipeg, Manitoba, Canada
R3T 2N2
[9] Xingqin Qi, Qin Wu, Yusen Zhang2, Eddie Fuller and
Cun-Quan Zhang1; “A Novel Model for DNA
Sequence Similarity Analysis Based on Graph Theory”;
Department of Mathematics,
[10] West Virginia University, Morgantown, WV, USA,
26506. School of Mathematics and Statistics,Shandong
University at Weihai, Weihai, China, 264209 Gina M.
Cannarozzi; “String Alignment using Dynamic
Programming”
[11] David R Bentley; Whole-genome re-sequencing.
[12] M. Madan Babu; Biological Databases and Protein
Sequence Analysis; Center for Biotechnology, Anna
University, Chenna
[13] A pattern classification; Richard O.Duda, peter E.Hart,
David G.Stork 2nd editon;
[14] simultaneous solution of the RNA folding alignment
and protosequence problem; David Sankoff Siam
,J.Apple Math; vol 45; 1985
[15] http://www.ncbi.nlm.nih.gov/
[16] http://www.uniprot.org/uniprot/

More Related Content

What's hot

Dynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationDynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationeSAT Publishing House
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
 
Eyeblink artefact removal from eeg using independent
Eyeblink artefact removal from eeg using independentEyeblink artefact removal from eeg using independent
Eyeblink artefact removal from eeg using independenteSAT Publishing House
 
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...ijfls
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networkseSAT Publishing House
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniqueseSAT Journals
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesIRJET Journal
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKijsc
 
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYDCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYIAEME Publication
 
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...CSCJournals
 
A over damped person identification system using emg signal
A over damped person identification system using emg signalA over damped person identification system using emg signal
A over damped person identification system using emg signaleSAT Publishing House
 
Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIJERA Editor
 

What's hot (16)

Dynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentationDynamic thresholding on speech segmentation
Dynamic thresholding on speech segmentation
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Eyeblink artefact removal from eeg using independent
Eyeblink artefact removal from eeg using independentEyeblink artefact removal from eeg using independent
Eyeblink artefact removal from eeg using independent
 
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...
INTERVAL TYPE-2 INTUITIONISTIC FUZZY LOGIC SYSTEM FOR TIME SERIES AND IDENTIF...
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networks
 
40120130406014 2
40120130406014 240120130406014 2
40120130406014 2
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniques
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of Trusses
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
 
F017533540
F017533540F017533540
F017533540
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
 
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYDCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
 
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in...
 
A over damped person identification system using emg signal
A over damped person identification system using emg signalA over damped person identification system using emg signal
A over damped person identification system using emg signal
 
Intelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIntelligent Handwritten Digit Recognition using Artificial Neural Network
Intelligent Handwritten Digit Recognition using Artificial Neural Network
 

Similar to Comparative analysis of dynamic programming algorithms to find similarity in gene sequences

IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataeSAT Journals
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofeSAT Publishing House
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...rahulmonikasharma
 
Power spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticPower spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticeSAT Publishing House
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...eSAT Journals
 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure ofcsandit
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genapnooriasukmaningtyas
 
Double layered dna based cryptography
Double layered dna based cryptographyDouble layered dna based cryptography
Double layered dna based cryptographyeSAT Journals
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomicsShyam Sarkar
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
 

Similar to Comparative analysis of dynamic programming algorithms to find similarity in gene sequences (20)

IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
1207.2600
1207.26001207.2600
1207.2600
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction of
 
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
 
Power spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticPower spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumatic
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...
Power spectrum sequence analysis of rheumatic arthritis (ra disease using dsp...
 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure of
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequences
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
 
Double layered dna based cryptography
Double layered dna based cryptographyDouble layered dna based cryptography
Double layered dna based cryptography
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
 

More from eSAT Journals

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case studyeSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementeSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteeSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabseSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
 
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
Estimation of morphometric parameters and runoff using rs &amp; gis techniquesEstimation of morphometric parameters and runoff using rs &amp; gis techniques
Estimation of morphometric parameters and runoff using rs &amp; gis techniqueseSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
 

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case study
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
 
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
Estimation of morphometric parameters and runoff using rs &amp; gis techniquesEstimation of morphometric parameters and runoff using rs &amp; gis techniques
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
 

Recently uploaded

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Comparative analysis of dynamic programming algorithms to find similarity in gene sequences

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 312 COMPARATIVE ANALYSIS OF DYNAMIC PROGRAMMING ALGORITHMS TO FIND SIMILARITY IN GENE SEQUENCES Shankar Biradar1 , Vinod Desai2 , Basavaraj Madagouda3 , Manjunath Patil4 1, 2, 3, 4 Assistant Professor, Department of Computer Science & Engineering, Angadi Institute of Technology and Management, Belgaum, Karnataka, India. shankar_pda@yahoo.com, vinod.cd0891@gmail.com, basavarajmadagoda@gmail.com, manjunath.patil03@gmail.com Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity. -----------------------------------------------------------------------***----------------------------------------------------------------------- 1. INTRODUCTION Bioinformatics is the application of computer technology to the management of biological information. The field of bioinformatics has gained widespread popularity largely due to efforts such as the genome projects, which have produced lot of biological sequence data for analysis. This has led to the development and improvement of many computational techniques for making inference in biology and medicine. A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a polypeptide or for an RNA chain that has a function in the organism. Genes hold the information to build and maintain an organism's cells and pass genetic characteristic to their child. Gene sequencing can be used to gain important information on genes, genetic variation and gene function for biological and medical studies [13]. Edit distance is a method of finding similarity between gene/protein sequences by finding dissimilarity between two sequences [5]. Edit distance between source and target string is represented by how many fundamental operation are required to transfer source string into target, these fundamental operations are insertion, deletion and subtraction. The similarity of two strings is the minimum number of edit distance. String Similarity is quantitative term that shows degree of commonality or difference between two comparative sequences [10], Finding the gene similarity has massive use in the field of bioinformatics. 2. MATERIALS AND METHODS In this section we describe the various materials and methods which are used in our algorithms 2.1 Dataset Used For the experiment purpose we took data sets from 5 different families which are listed below, and the source of information is [16] [17]. Family: kruppel c2h2-type zinc finger protein. Family: caution-diffusion facilitator (CDF) transporter family. Family: E3 ubquitin-protein ligase. Family: Semaphorin-7A. Family: SPAG11 family. 2.2 Dataset Format In this research work we used various data sets from different families for the implementation of different algorithms, all this data set is in FASTA format. In bioinformatics, FASTA format is a text-based format for representing nucleotide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also contain sequence name before the sequences start. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 313 description (both are optional). There should be no space between the ">" and the first letter of the identifier. It is recommended that all lines of text be shorter than 80 characters. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. 2.3 Gap Penalty In order to get best possible sequence alignment between two DNA sequences, it important to insert gaps in sequence alignment and use gap penalties. While aligning DNA sequences, a positive score is assigned for matches negative score is assigned for mismatch To find out score for matches and mismatches in alignments of proteins, it is necessary to know how often one sequence is substituted for another in related proteins. In addition, a method is needed to account for insertions and deletions that sometimes appear in related DNA or protein sequences. To accommodate such sequence variations, gaps that appear in sequence alignments are given a negative penalty score reflecting the fact that they are not expected to occur very often. It is very difficult to get the best- possible alignment, either global or local, unless gaps are included in the alignment. 2.4 Blosum Matrix A Blosum matrix is necessary for pair wise sequence alignment. The four DNA bases are of two types, purines (A and G) and pyrimidines (T and C). The purines are chemically similar to each other and the pyrimidines are chemically similar to each other. Therefore, we will penalize substitutions between a purine and a purine or between a pyrimidine and a pyrimidine (transitions) less heavily than substitutions between purines and pyrimidines (transfusions). We will use the following matrix for substitutions and matching’s. The score is 2 for a match, 1 for a purine with a purine or a pyrimidine with a pyrimidine, and -1 for a purine with a pyrimidine. 3. ALGORITHMS Dynamic programming algorithms for finding gene sequence similarity are discussed in detail in this section along with pseudo codes and algorithms. We used three algorithms for analysis purpose, all these algorithms uses the concept of dynamic programming, which is output sequence depends upon the input of previous sequence. Those three algorithms are. a. Levenshtein edit distance algorithm b. Longest common subsequence algorithm c. Smith-waterman algorithm 3.1 Levenshtein Edit Distance Algorithms It is one of the most popular algorithms to find dissimilarity between two nucleotide sequences, it is an approximate string matching algorithm mainly used for forensic data set, the basic principle of this algorithm is to measure the similarity between two strings [4]. This is done by calculating the number of basic operations as mentioned in introduction part. Algorithm for Levenshtein edit distance is as fallows Int LevenshteinDistance (char S[1..N], char T[1...M] ) { Declare int D[1....N, 1....M] For i from 0 to N D[i,0] := i // the distance of any first string to an empty second string For j from 0 to M D[0, j] := j // the distance of any second string to an empty first string For j from 1 to M { For i from 1 to N { S[i] = T[j] then D[i, j] := D[i-1, j-1] Else D[i,j] := min { D[i-1 , j] +1 // deletion D[i, j-1] +1 // insertion D[i-1, j-1] +1// substitution } } } Return D[M, N] } 3.2 Longest Common Subsequence Algorithm Finding LCS [3] [8] is one way of computing how similar two sequences are, Longer the LCS more similar they are. The LCS problem is a special case of the edit distance problem. LCS is similar to Levenshtein edit distance algorithm except few steps and it also involves trace back process in order to find similar sequences. Algorithm for Longest common subsequence is as fallows Int LCS (char S[1...N], char T[1....M] ) { Declare int D[1....N, 1....M] For i from 0 to N D[i,0] := 0 For j from 0 to M D[0, j] := 0 For j from 1 to M { For i from 1 to N { D[i,j] := max { V1; V2; V3+1 if S=T else V3
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 314 } } } Return D [M, N] } Where V1 = the value in the cell to the left of current cell. V2= the value in the cell above the current cell. V3= value in the cell above left to the current cell, S and T are source string and Target string respectively 3.3 Smith-Waterman Algorithm The Smith–Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. Smith-waterman algorithm differ from other Local alignment algorithm in fallowing factors a. A negative score/weight must be given to mismatches. b. Zero must be the minimum score recorded in the matrix. c. The beginning and end of an optimal path may be found anywhere in the matrix not just the last row or column. Pseudo code core smith-waterman algorithm is as fallows. Pseudo code for initialization of matrix For i=0 to length(A) F(i,0) ← d*i For j=0 to length(B) F(0,j) ← d*j For i=1 to length(A) For j=1 to length(B) { Diag ← F(i-1,j-1) + S(Ai, Bj) Up ← F(i-1, j) + d Left ← F(i, j-1) + d F(i,j) ← max(Match, Insert, Delete) } Pseudo code for SW alignment For (int i=1;i<=n;i++) For (int j=1;j<=m;j++) int s=score[seq1.charAt(i-1)][seq2.charAt(j-1)]; int val=max(0,F[i-1][j-1]+s,F[i-1][j]-d,F[i][j-1]-d); F[i][j]=val; If (val==0) B[i][j]=null; Else if(val==F[i-1][j-1]+s) B[i][j]=new Traceback2(i-1,j-1); Else if(val==F[i-1][j]-d) S[i][j]= new Traceback2(i-1,j); Else if(val==F]i][j-1]-d) B[i][j]= new Traceback2(i,j-1); Where i and j are columns and rows respectively, S (xi; yj) is value of substitution matrix and g is gap penalty, the substitution matrix is a matrix which describes the rate at which one character in a sequence changes to other character states over time 3.4 Results within the Same Families Table1. Family: cation-duffusion facilitator Figure1. Similarity graph for cation-duffusion facilitator family
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 315 Table2. Family: semaphorin Figure2. Similarity graph for family semaphorin In the figure 2, blue line indicates similarity in smith- waterman algorithm, red line indicate similarity in longest common subsequence algorithm and finally the green line indicate similarity in Levenshtein algorithm. As we see from the graph smith-water man algorithm is more efficient then other two algorithms while finding the similarity of gene sequences that belonging to same family. 3.5 Results between Different Families Figure3. Similarity graph across family Where red line indicates LCS algorithm, green line indicates Levenshtein edit distance algorithm and blue line is for smith- waterman algorithm. From the above graph, we can conclude that while comparing two gene sequences belonging to different families, longest common subsequence is better algorithm because it gives maximum similarity as compare to other two algorithms. Table3. Similarities across the families CONCLUSIONS We considered finding the gene sequence similarity using dynamic programming for our project work. In dynamic programming there exist many different approaches to find similarity among gene sequences; we took some of these algorithms for our project and did comparative analysis of these algorithms using datasets from five different families. We took different protein sequences from all these dataset as input to our program and did rigorous experimentation on these datasets, both within the families and across the families. Five data sets which are used for our experimental work are kruppel c2h2-type zinc finger protein, cation-diffusion facilitator (CDF) transporter, E3 ubquitin –protein ligase, semaphorin and finally SPAG11B and got the results as discussed in the previous section. From the results we can conclude that smith-waterman algorithm is best suited to find similarity for protein sequences that belonging to the same family, and longest common subsequence algorithm is best
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 08 | Aug-2013, Available @ http://www.ijret.org 316 suited for protein sequences that are belong to different families. REFERENCES [1] S. Hirschberg; “Algorithms for the longest common subsequence problem”. J.ACM, 24;{664-675};1977 [2] Levenshtein V.I; “binary code capable of correcting deletion, insertion and reversal”; soviet physics doklady; vol 8; 1966 [3] Ristead, R.S Yianilos,P.N; “learning string edit distance”; IEEE Transaction or pattern analysis and machine intelligence; 1998 [4] L. Bergroth; “Survey of Longest Common Subsequence Algorithms”; Department of Computer Science, University of Turku20520 Turku,Finland; 2000 IEEE [5] Hekki Hyyro, Ayumi Shinohara; “A new bit-parallel- distance algorithm”; Nikoltseas.LNCS 3772; 2005 [6] Adrian Horia Dediu,et al; “A fast longest common subsequence algorithm for similar strings”; Language and automation theory and application, International Conference, LATA; 2010 [7] Patsaraporn Somboonsat, Mud-Armeen munlin; “a new edit distance method for finding similarity in DNA sequence”; world academy of science engineering and technology 58; 2011 [8] Dekang Lin; “ An Information-Theoretic Definition of Similarity”; Department of Computer Science University of Manitoba,Winnipeg, Manitoba, Canada R3T 2N2 [9] Xingqin Qi, Qin Wu, Yusen Zhang2, Eddie Fuller and Cun-Quan Zhang1; “A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory”; Department of Mathematics, [10] West Virginia University, Morgantown, WV, USA, 26506. School of Mathematics and Statistics,Shandong University at Weihai, Weihai, China, 264209 Gina M. Cannarozzi; “String Alignment using Dynamic Programming” [11] David R Bentley; Whole-genome re-sequencing. [12] M. Madan Babu; Biological Databases and Protein Sequence Analysis; Center for Biotechnology, Anna University, Chenna [13] A pattern classification; Richard O.Duda, peter E.Hart, David G.Stork 2nd editon; [14] simultaneous solution of the RNA folding alignment and protosequence problem; David Sankoff Siam ,J.Apple Math; vol 45; 1985 [15] http://www.ncbi.nlm.nih.gov/ [16] http://www.uniprot.org/uniprot/