SlideShare a Scribd company logo
D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 49|P a g e
A Novel Framework for Short Tandem Repeats (STRs) Using
Parallel String Matching
D. Bala MuraliKrishna, Ch. Someswara Rao.
Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India.
Associate Professor, Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India.
Abstract
Short tandem repeats (STRs) have become important molecular markers for a broad range of applications, such
as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a
range of molecular ecology and diversity studies. These repeated DNA sequences are found in both Plants and
bacteria. Most of the computer programs that find STRs failed to report its number of occurrences of the
repeated pattern, exact position and it is difficult task to obtain accurate results from the larger datasets. So we
need high performance computing models to extract certain repeats. One of the solution is STRs using parallel
string matching, it gives number of occurrences with corresponding line number and exact location or position
of each STR in the genome of any length. In this, we implemented parallel string matching using JAVA Multi-
threading with multi core processing, for this we implemented a basic algorithm and made a comparison with
previous algorithms like Knuth Morris Pratt, Boyer Moore and Brute force string matching algorithms and from
the results our new basic algorithm gives better results than the previous algorithms. We apply this algorithm in
parallel string matching using multi-threading concept to reduce the time by running on multicore processors.
From the test results it is shown that the multicore processing is a remarkably efficient and powerful compared
to lower versions and finally this proposed STR using parallel string matching algorithm is better than the
sequential approaches.
Keywords: computing model, DNA, STR, parallel string matching, multicore processing.
I. Introduction
DNA molecules are subject to a variety of
mutational events. One of the less well understood is
tandem duplicationin which a stretch of DNA, which
we call the pattern, is converted into two or more
copies, each following the preceding one in a
contiguous fashion. For example we could have
consider the below String TCGGCGGCGGA, and the
pattern CGG.
In which the single occurrence of triplet CGG
has been transformed into three identical, adjacent
copies. The result of a tandem duplication event is
termed a tandem repeat. Over time, individual copies
within a tandem repeat may undergo additional,
uncoordinated mutations so that typically, only
approximate tandem copies are present. Tandem
repeats are presumed to occur frequently in genomic
sequences, comprising perhaps 10% or more of the
human genome [1]. But, accurate characterization of
the properties of tandem repeats has been limited by
the inability to easily detect them. In recent years, the
discovery of the trinucleotide repeat diseases has
piqued interest in tandem repeats.
STRs (Short Tandem Repeats) are the genetic
loci where one or few bases are repeated for varying
numbers of times. Such repetitions occur primarily
due to slipped-strand impairing and subsequent
error(s) during DNA replication, repair, or
recombination [1]. STRs comprising 2–6 base pairs
long, occur frequently and are ubiquitously
interspersed in many genomes [2, 3]. The biological
importance of STR tracts has been clearly delineated.
STR loci show extensive length polymorphism, and
hence they are widely used in DNA fingerprinting
and diversity studies [4, 5]. They are also considered
as ideal genetic markers for the construction of high-
density linkage maps.In spite of its high significance,
a bioinformatics tool for the analysis of these regions
is not available.
Available algorithms directly or indirectly detect
tandem repeats. However, there are many limitations
with these algorithms [6].The drawbacks are high
computational time required by the algorithm and
their inability to predict the positions of STRs in the
genome.Finally, researchers have to understand the
classical methods of pattern matching to develop new
efficient algorithms [7-10]. In this work, we describe
a program called STRs using parallel string matching
that uses Multi-threading to find STRs on multicore
processors. The program Framework for STRs using
parallel string matching locates repeats with
RESEARCH ARTICLE OPEN ACCESS
D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 50|P a g e
patternsexact positon, line number and number of
occurrences of large number of files and directories
taking as input.
A serious comparison between different tools
should, of course, be performed by a third party
against a sufficient number of carefully selected test
problems, carefully tuning many different search
parameters; meanwhile we can say that in the case of
naive runs with default parameters on easy problems,
all the above programs appear to output basically
equivalent results. With the developments of new
string matching techniques, efficiency and speed are
the main factors in deciding among different options
available for each application area [11-13].However,
a major drawback of all such tools is that, when
running against very long sequences, they produce a
large amount of cumbersome results that require a
painstaking interpretation. In order to provide the
user with some capability of grasping the STRs at
first glance and looking over the drawbacks for large
sequencing files, we decided that the only way to do
this is by introducing parallel programing concept
into STRs identification [14-17]. And since, as far as
we know, no tool with Parallel programing feature is
as yet available [18, 19], a novel framework for the
STRs using parallel string matching of the results
compared with sequential approach has been devised.
STRs parallel string approach developed for
scanning genomes to find repeats of any length, their
exact position on chromosome and number of
occurrence. It can accept large sequences and large
number of files and directories up to GBs as input
can be searched simultaneously on multicore
processors. It divides the files and directories for
different cores and run these files parallel then after it
gives the files getting the output. Thus, the running
time of the program is greatly reduced. The
advantages over many other programs developed for
STR identification includes its ability to do parallel
programing using Multi-threading with multi core
processingto search patterns repeated for a number of
times of large input files and to give the exact
position and number of occurrences of the pattern in
the genome.
II. Methodology
Our goal is efficiently search STRs in large
genomes of given pattern. The proposed
methodology has two phases, Sequential and Parallel
approaches. We model a basic algorithm which is
giving better results than the previous algorithms like
Boyer Moore, KMP and Brute force. In the first
phase we apply on Sequential approach and later in
the second phase the basic algorithm is to be applied
on parallel string matching using Multi-threading
with multi core processing concept. Finally we give a
comparison between Boyer Moore, KMP and Brute
force with our basic algorithm in sequential approach
and then comparison between sequential approach
and parallel approach of basic algorithm.
Methodology comprises of two phases
Phase 1:STR using Sequential String matching
approach
The Sequential string matching approach uses
the basic algorithm which is better than the previous
algorithms and it processes the input file line by line
from a document and run sequential manner to
finding the repeats. The algorithm works with
Improved Right Prefix concept, the below example
demonstrate the working of the algorithm.
Figure 1: Example for Sequential pattern searching
Phase 2:STR Using Parallel String matching
approach
In The second Phase “Short Tandem Repeats
(STRs) Using Parallel String Matching” we apply the
above basic algorithm to this concept and it processes
the input file at first we take the input as a string or
text. The required text that is to be searched is further
divided into further small patterns or processes and
all this patterns are passed on the basic algorithm and
produce the result as number of occurrences of
repeated pattern as shown in the Fig.2 below.
Figure 2: Parallel approach using Multi-Threading
concept
G G C C G A G A A G A T A
G A T C
D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 51|P a g e
Because threads can operate independently on
different areas of an array for this algorithm, you will
see a clear performance boost on multicore
architectures compared to a sequential algorithm that
would iterate over each process in the array. It uses
the parallel execution of multi cores at a time which
will results the speedup in the number of cores with
minimal effort, because the Multi-threading takes
care of maximizing parallelism.
The Flowchart of the Improved Right Prefix is
shown in Fig 3.It .scans the genome sequence line by
line for given input file and process the linefor given
pattern length e.g. (ATGCATGCATGC)is a
nucleotide with pattern count =3.If STR found then it
identifies the position of the pattern and then
increment the count value and search for next STR, if
not then again reads the line of sequence. The process
is continues until all the given patterns are found then
it gives the number of occurrences of the pattern with
line number and position.
Figure 3: Flowchart of Improved Right Prefix
algorithm
III. Results and Discussion
The sequential approach gives the better results
by using the basic algorithm compared to previous
algorithms Boyer Moore, KMP and Brute force with
additional features like exact position with line
number and number of occurrences. Then this basic
algorithm reduces the time when we are applying in
parallel string matching using Multi-threading with
multi core processing,the results are illustrated in the
histogram below. The sequential approach took
80ms, the parallel string matching using Multi-
threading approach took 40ms.
The Fig 4 shows Execution time vs File size on
sequential search withBrute Force, Improved Right
Prefix Algorithm. This graph shows the performance
difference between Brute Force,Improved Right
Prefix algorithms. From the graph we clearly observe
that IRP is better compared to Brute Force.
Figure 4: Histogram for Brute Forcevs IRP (Time vs
File size)
The Fig 5 shows Execution time vs File size on
sequential search with KMP, Improved Right Prefix
Algorithm. This graph shows the performance
difference between KMP,Improved Right Prefix
algorithms. From the graph we clearly observe that
IRP is better compared to KMP.The Fig 6 shows
Execution time vs File size using Boyer Moore, IRP
Algorithm on Sequential search. From the graph we
clearly observe that IRP is much better than Boyer
Moore.
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Brute Force vs IRP
Bruteforce IRP
D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 52|P a g e
Figure 5: Histogram for KMPvs IRP (Time vs File
size)
The most crucial stage in achieving a new
successful system and in giving confidence on the
system for the users that will work efficiently and
effectively. The system will be implemented only
after thorough testing and if it is found to work
according to the specification. For testing our
proposed system we will take the gene sequence data
set, consists of the four nucleotides A, C, G and T
(standing for adenine, cytosine, guanine, and
thymine, respectively) used to encode DNA.
Figure 6: Histogram for Boyer Moore vs IRP (Time
vs File size)
Therefore, the alphabet is O= {A, C, G, T}.
The text is consisted of 7, 50,000 records. Our
algorithm tested on different cores like 2, 4, 8, 12
etc., in the Fig 8 (Table). Here we put some
achievements what we develop and observe, and
finally our system shows that parallel approach is
much better than sequential approach with multi core
processor. The Fig 7 shows (Histogram) Execution
time vs File size on sequential search and parallel
search with Improved Right Prefix. From the Fig (4,
5, 6) we clearly observe that IRP is better compared
to other approaches.
Figure 7: Histogram for Sequential vs Parallel
approach
Figure 8: Table of Parallel execution times with
different cores
IV. Conclusion
In this paper we performed a comparisonbetween
Knuth Morris Pratt, Boyer Moore and Brute force
string matching algorithms with our improved right
prefix algorithm in sequential approach based on the
running time and in our tests with multicore
processing, we used strings of varying lengths and
texts of varying lengths. From the test results it is
shown that the improved right prefix algorithm is
extremely efficient in all cases. We apply this
algorithm in parallel string matching using multi-
0
200
400
600
800
1000
5 10 15 20 25 30 35
Time(ms)
File Size (MB)
KMP vs IRP
KMP IRP
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Boyer Moore vs IRP
Boyer Moore IRP
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Sequential Approach vs Parallel
Approach
Sequential Parallel
Number of Cores Parallel Execution Time (MS)
2 11026
4 8329
8 4208
12 2876
D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 53|P a g e
threading concept to reduce the time by running on
multicore processors. We conclude that STR using
parallel string matching is the most efficient
compared to earlier versions. As a future
enhancement, we apply this algorithmto fork and join
concept in JAVA SE7 of parallel string matching
algorithm thereby finding the most efficient
algorithm which can be used in many fields such as
cryptography, molecular biology. Thus the problem
of matching becomes easier.
References
[1] Angelika Merkel and Neil Gemmell (2008)
Detecting short tandem repeats from
genome data: opening the software black
box, VOL 9. NO 5. 355-366.
[2] Gary Benson (1998) Tandem Repeats
Finder: A program to analyze DNA
sequences Nucleic Acids Research, 1999,
Vol_ 27, No. 2 573_580.
[3] Karen Norrgard (Write Science Right)
(2008) Forensics, DNA fingerprinting, and
CODIS. Nature Education 1(1):35.
[4] Kathleen McNamara-Schroeder, Cheryl
Olonan, Simon Chu, Maria C. Montoya,
MahtaAlviri,ShannonGinty, and John J.
Love (2005) DNA Fingerprint Analysis of
Three Short Tandem Repeat (STR) Loci for
Biochemistry and Forensic Science
Laboratory Courses November 21, and in
revised form, April 3, Vol. 34, No. 5, pp.
378–383, 2006.
[5] KoichiroDoi, TakuMonjo Pham H.
Hoang, Jun Yoshimura, Hideaki Yurino, Jun
Mitsui, Hiroyuki Ishiura, Yuji Takahashi,
Yaeko Ichikawa, Jun Goto, Shoji Tsuji,
and Shinichi
MorishitaRapid(2014) detection of expanded
short tandem repeats in personal genomics
using hybrid sequencing
Bioinformatics 30 (6): 815-822.
[6] TamannaAnwarandAsad U Khan (2006)
SSRscanner: a program for reporting
distribution and exact location of simple
sequence repeats.
[7] Jonathan L., “Analysis of Fundamental
Exact and Inexact Pattern Matching
Algorithms,” Technical Document, Stanford
University, 2004.
[8] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
Parallel Algorithms for String matching on
computing models - A survey and
experimental results”, LNCS, Springer,
pp.270-278, ISBN: 978-3-642-29279-8,
2011.
[9] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “PDM data classification
from STEP- an object oriented String
matching approach”, IEEE conference on
Application of Information and
Communication Technologies, pp.1-9, ISBN:
978-1-61284-831-0, 2011.
[10] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
Parallel Algorithms for String matching - A
survey and experimental results”, IJAC, Vol
4 issue 4, pp-91- 97, 2012.
[11] Simon Y. and Inayatullah M., “Improving
Approximate Matching Capabilities for
Meta Map Transfer Applications,”
Proceedings of Symposium on Principles
and Practice of Programming in Java,
pp.143 147, 2004.
[12] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Parallel Algorithms for
String Matching Problem based on Butterfly
Model”, pp.41-56, IJCST, Vol. 3, Issue 3,
July – Sept, ISSN 2229-4333, 2012.
[13] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
String matching algorithms- A survey and
experimental results”, IJCIS, Vol 6 No 3,
pp.56-61, 2013.
[14] Geist, A., Beguelin, A., Dongarra, J., Jiang,
W., Manchek, R. & V., S. “PVM: Parallel
Virtual Machine”, A Users Guide and
Tutorial for Networked Parallel Computing,
MIT Press. 1994.
[15] Grama, A., Karypis, G., Kumar, V. &
Gupta, A. “Introduction to Parallel
Computing”, Addison Wesley, 2003.
[16] Leow, Y., Ng, C. &W.F.,W.. “Generating
hardware from OpenMP programs”,
International Conference on Field
Programmable Technology, pp. 73–80,
2006.
[17] Marr, D. T., Binns, F., Hill, D. L., Hinton,
G., Koufaty, D. A., Miller, J. A. & Upton,
M..“Hyper-Threading Technology
Architecture and Microarchitecture”, Intel
Technology Journal, pp. 4–15, 2002.
[18] Donald Knuth; James H. Morris, Jr,
Vaughan Pratt, "Fast pattern matching in
strings". SIAM Journal on Computing, pp-
323–350, 1977.
[19] Z. Galil, “Optimal parallel algorithms for
string matching,” in Proc. 16th Annu. ACM
symposium on Theory of computing, pp.
240-248, 1984.

More Related Content

What's hot

Scimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujoScimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujo
Robson Araujo
 
A machine learning based protocol for efficient routing in opportunistic netw...
A machine learning based protocol for efficient routing in opportunistic netw...A machine learning based protocol for efficient routing in opportunistic netw...
A machine learning based protocol for efficient routing in opportunistic netw...
Fellowship at Vodafone FutureLab
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
IJET - International Journal of Engineering and Techniques
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
International Journal of Engineering Inventions www.ijeijournal.com
 
High throughput reliable multicast in multi-hop wireless mesh networks
High throughput reliable multicast in multi-hop wireless mesh networksHigh throughput reliable multicast in multi-hop wireless mesh networks
High throughput reliable multicast in multi-hop wireless mesh networks
LogicMindtech Nologies
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
Ashwani kumar
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYNEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
ijcisjournal
 
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
ijsrd.com
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Prof. Wim Van Criekinge
 
Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlab
SaifAbdulNabi1
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
ijnlc
 
Implementation on Data Security Approach in Dynamic Multi Hop Communication
 Implementation on Data Security Approach in Dynamic Multi Hop Communication Implementation on Data Security Approach in Dynamic Multi Hop Communication
Implementation on Data Security Approach in Dynamic Multi Hop Communication
IJCSIS Research Publications
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
Vinícius Uchôa
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Innovation Quotient Pvt Ltd
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
ijscai
 

What's hot (16)

Scimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujoScimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujo
 
A machine learning based protocol for efficient routing in opportunistic netw...
A machine learning based protocol for efficient routing in opportunistic netw...A machine learning based protocol for efficient routing in opportunistic netw...
A machine learning based protocol for efficient routing in opportunistic netw...
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
High throughput reliable multicast in multi-hop wireless mesh networks
High throughput reliable multicast in multi-hop wireless mesh networksHigh throughput reliable multicast in multi-hop wireless mesh networks
High throughput reliable multicast in multi-hop wireless mesh networks
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYNEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
 
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlab
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Implementation on Data Security Approach in Dynamic Multi Hop Communication
 Implementation on Data Security Approach in Dynamic Multi Hop Communication Implementation on Data Security Approach in Dynamic Multi Hop Communication
Implementation on Data Security Approach in Dynamic Multi Hop Communication
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...
 

Viewers also liked

Learn python
Learn pythonLearn python
Learn python
Kracekumar Ramaraju
 
Algorithms summary (English version)
Algorithms summary (English version)Algorithms summary (English version)
Algorithms summary (English version)
Young-Min kang
 
String searching
String searching String searching
String searching
thinkphp
 
STRING MATCHING
STRING MATCHINGSTRING MATCHING
STRING MATCHING
Hessam Yusaf
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
Kritika Purohit
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
shravs_188
 
Boyer–Moore string search algorithm
Boyer–Moore string search algorithmBoyer–Moore string search algorithm
Boyer–Moore string search algorithm
Hamid Shekarforoush
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifying
IJDKP
 

Viewers also liked (9)

Learn python
Learn pythonLearn python
Learn python
 
Algorithms summary (English version)
Algorithms summary (English version)Algorithms summary (English version)
Algorithms summary (English version)
 
String searching
String searching String searching
String searching
 
STRING MATCHING
STRING MATCHINGSTRING MATCHING
STRING MATCHING
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Boyer–Moore string search algorithm
Boyer–Moore string search algorithmBoyer–Moore string search algorithm
Boyer–Moore string search algorithm
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifying
 

Similar to A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching

Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
TELKOMNIKA JOURNAL
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
csandit
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET Journal
 
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
IJORCS
 
50620130101006
5062013010100650620130101006
50620130101006
IAEME Publication
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
CSCJournals
 
Sequence alignment
Sequence alignmentSequence alignment
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming Assignments
IRJET Journal
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
IJCSEIT Journal
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
Meghaj Mallick
 
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmCongestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Editor IJCATR
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
eSAT Publishing House
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
eSAT Journals
 
Jr3417541760
Jr3417541760Jr3417541760
Jr3417541760
IJERA Editor
 
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
IRJET Journal
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
Alexander Decker
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
redpel dot com
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
idescitation
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching Algorithms
IJERA Editor
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
journal ijrtem
 

Similar to A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching (20)

Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
 
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
 
50620130101006
5062013010100650620130101006
50620130101006
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming Assignments
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmCongestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Jr3417541760
Jr3417541760Jr3417541760
Jr3417541760
 
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching Algorithms
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 

Recently uploaded

ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 

Recently uploaded (20)

ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 

A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching

  • 1. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53 www.ijera.com 49|P a g e A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching D. Bala MuraliKrishna, Ch. Someswara Rao. Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India. Associate Professor, Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India. Abstract Short tandem repeats (STRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both Plants and bacteria. Most of the computer programs that find STRs failed to report its number of occurrences of the repeated pattern, exact position and it is difficult task to obtain accurate results from the larger datasets. So we need high performance computing models to extract certain repeats. One of the solution is STRs using parallel string matching, it gives number of occurrences with corresponding line number and exact location or position of each STR in the genome of any length. In this, we implemented parallel string matching using JAVA Multi- threading with multi core processing, for this we implemented a basic algorithm and made a comparison with previous algorithms like Knuth Morris Pratt, Boyer Moore and Brute force string matching algorithms and from the results our new basic algorithm gives better results than the previous algorithms. We apply this algorithm in parallel string matching using multi-threading concept to reduce the time by running on multicore processors. From the test results it is shown that the multicore processing is a remarkably efficient and powerful compared to lower versions and finally this proposed STR using parallel string matching algorithm is better than the sequential approaches. Keywords: computing model, DNA, STR, parallel string matching, multicore processing. I. Introduction DNA molecules are subject to a variety of mutational events. One of the less well understood is tandem duplicationin which a stretch of DNA, which we call the pattern, is converted into two or more copies, each following the preceding one in a contiguous fashion. For example we could have consider the below String TCGGCGGCGGA, and the pattern CGG. In which the single occurrence of triplet CGG has been transformed into three identical, adjacent copies. The result of a tandem duplication event is termed a tandem repeat. Over time, individual copies within a tandem repeat may undergo additional, uncoordinated mutations so that typically, only approximate tandem copies are present. Tandem repeats are presumed to occur frequently in genomic sequences, comprising perhaps 10% or more of the human genome [1]. But, accurate characterization of the properties of tandem repeats has been limited by the inability to easily detect them. In recent years, the discovery of the trinucleotide repeat diseases has piqued interest in tandem repeats. STRs (Short Tandem Repeats) are the genetic loci where one or few bases are repeated for varying numbers of times. Such repetitions occur primarily due to slipped-strand impairing and subsequent error(s) during DNA replication, repair, or recombination [1]. STRs comprising 2–6 base pairs long, occur frequently and are ubiquitously interspersed in many genomes [2, 3]. The biological importance of STR tracts has been clearly delineated. STR loci show extensive length polymorphism, and hence they are widely used in DNA fingerprinting and diversity studies [4, 5]. They are also considered as ideal genetic markers for the construction of high- density linkage maps.In spite of its high significance, a bioinformatics tool for the analysis of these regions is not available. Available algorithms directly or indirectly detect tandem repeats. However, there are many limitations with these algorithms [6].The drawbacks are high computational time required by the algorithm and their inability to predict the positions of STRs in the genome.Finally, researchers have to understand the classical methods of pattern matching to develop new efficient algorithms [7-10]. In this work, we describe a program called STRs using parallel string matching that uses Multi-threading to find STRs on multicore processors. The program Framework for STRs using parallel string matching locates repeats with RESEARCH ARTICLE OPEN ACCESS
  • 2. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53 www.ijera.com 50|P a g e patternsexact positon, line number and number of occurrences of large number of files and directories taking as input. A serious comparison between different tools should, of course, be performed by a third party against a sufficient number of carefully selected test problems, carefully tuning many different search parameters; meanwhile we can say that in the case of naive runs with default parameters on easy problems, all the above programs appear to output basically equivalent results. With the developments of new string matching techniques, efficiency and speed are the main factors in deciding among different options available for each application area [11-13].However, a major drawback of all such tools is that, when running against very long sequences, they produce a large amount of cumbersome results that require a painstaking interpretation. In order to provide the user with some capability of grasping the STRs at first glance and looking over the drawbacks for large sequencing files, we decided that the only way to do this is by introducing parallel programing concept into STRs identification [14-17]. And since, as far as we know, no tool with Parallel programing feature is as yet available [18, 19], a novel framework for the STRs using parallel string matching of the results compared with sequential approach has been devised. STRs parallel string approach developed for scanning genomes to find repeats of any length, their exact position on chromosome and number of occurrence. It can accept large sequences and large number of files and directories up to GBs as input can be searched simultaneously on multicore processors. It divides the files and directories for different cores and run these files parallel then after it gives the files getting the output. Thus, the running time of the program is greatly reduced. The advantages over many other programs developed for STR identification includes its ability to do parallel programing using Multi-threading with multi core processingto search patterns repeated for a number of times of large input files and to give the exact position and number of occurrences of the pattern in the genome. II. Methodology Our goal is efficiently search STRs in large genomes of given pattern. The proposed methodology has two phases, Sequential and Parallel approaches. We model a basic algorithm which is giving better results than the previous algorithms like Boyer Moore, KMP and Brute force. In the first phase we apply on Sequential approach and later in the second phase the basic algorithm is to be applied on parallel string matching using Multi-threading with multi core processing concept. Finally we give a comparison between Boyer Moore, KMP and Brute force with our basic algorithm in sequential approach and then comparison between sequential approach and parallel approach of basic algorithm. Methodology comprises of two phases Phase 1:STR using Sequential String matching approach The Sequential string matching approach uses the basic algorithm which is better than the previous algorithms and it processes the input file line by line from a document and run sequential manner to finding the repeats. The algorithm works with Improved Right Prefix concept, the below example demonstrate the working of the algorithm. Figure 1: Example for Sequential pattern searching Phase 2:STR Using Parallel String matching approach In The second Phase “Short Tandem Repeats (STRs) Using Parallel String Matching” we apply the above basic algorithm to this concept and it processes the input file at first we take the input as a string or text. The required text that is to be searched is further divided into further small patterns or processes and all this patterns are passed on the basic algorithm and produce the result as number of occurrences of repeated pattern as shown in the Fig.2 below. Figure 2: Parallel approach using Multi-Threading concept G G C C G A G A A G A T A G A T C
  • 3. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53 www.ijera.com 51|P a g e Because threads can operate independently on different areas of an array for this algorithm, you will see a clear performance boost on multicore architectures compared to a sequential algorithm that would iterate over each process in the array. It uses the parallel execution of multi cores at a time which will results the speedup in the number of cores with minimal effort, because the Multi-threading takes care of maximizing parallelism. The Flowchart of the Improved Right Prefix is shown in Fig 3.It .scans the genome sequence line by line for given input file and process the linefor given pattern length e.g. (ATGCATGCATGC)is a nucleotide with pattern count =3.If STR found then it identifies the position of the pattern and then increment the count value and search for next STR, if not then again reads the line of sequence. The process is continues until all the given patterns are found then it gives the number of occurrences of the pattern with line number and position. Figure 3: Flowchart of Improved Right Prefix algorithm III. Results and Discussion The sequential approach gives the better results by using the basic algorithm compared to previous algorithms Boyer Moore, KMP and Brute force with additional features like exact position with line number and number of occurrences. Then this basic algorithm reduces the time when we are applying in parallel string matching using Multi-threading with multi core processing,the results are illustrated in the histogram below. The sequential approach took 80ms, the parallel string matching using Multi- threading approach took 40ms. The Fig 4 shows Execution time vs File size on sequential search withBrute Force, Improved Right Prefix Algorithm. This graph shows the performance difference between Brute Force,Improved Right Prefix algorithms. From the graph we clearly observe that IRP is better compared to Brute Force. Figure 4: Histogram for Brute Forcevs IRP (Time vs File size) The Fig 5 shows Execution time vs File size on sequential search with KMP, Improved Right Prefix Algorithm. This graph shows the performance difference between KMP,Improved Right Prefix algorithms. From the graph we clearly observe that IRP is better compared to KMP.The Fig 6 shows Execution time vs File size using Boyer Moore, IRP Algorithm on Sequential search. From the graph we clearly observe that IRP is much better than Boyer Moore. 0 200 400 600 800 1000 5 10 15 20 25 30 Time(ms) File Size (MB) Brute Force vs IRP Bruteforce IRP
  • 4. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53 www.ijera.com 52|P a g e Figure 5: Histogram for KMPvs IRP (Time vs File size) The most crucial stage in achieving a new successful system and in giving confidence on the system for the users that will work efficiently and effectively. The system will be implemented only after thorough testing and if it is found to work according to the specification. For testing our proposed system we will take the gene sequence data set, consists of the four nucleotides A, C, G and T (standing for adenine, cytosine, guanine, and thymine, respectively) used to encode DNA. Figure 6: Histogram for Boyer Moore vs IRP (Time vs File size) Therefore, the alphabet is O= {A, C, G, T}. The text is consisted of 7, 50,000 records. Our algorithm tested on different cores like 2, 4, 8, 12 etc., in the Fig 8 (Table). Here we put some achievements what we develop and observe, and finally our system shows that parallel approach is much better than sequential approach with multi core processor. The Fig 7 shows (Histogram) Execution time vs File size on sequential search and parallel search with Improved Right Prefix. From the Fig (4, 5, 6) we clearly observe that IRP is better compared to other approaches. Figure 7: Histogram for Sequential vs Parallel approach Figure 8: Table of Parallel execution times with different cores IV. Conclusion In this paper we performed a comparisonbetween Knuth Morris Pratt, Boyer Moore and Brute force string matching algorithms with our improved right prefix algorithm in sequential approach based on the running time and in our tests with multicore processing, we used strings of varying lengths and texts of varying lengths. From the test results it is shown that the improved right prefix algorithm is extremely efficient in all cases. We apply this algorithm in parallel string matching using multi- 0 200 400 600 800 1000 5 10 15 20 25 30 35 Time(ms) File Size (MB) KMP vs IRP KMP IRP 0 200 400 600 800 1000 5 10 15 20 25 30 Time(ms) File Size (MB) Boyer Moore vs IRP Boyer Moore IRP 0 200 400 600 800 1000 5 10 15 20 25 30 Time(ms) File Size (MB) Sequential Approach vs Parallel Approach Sequential Parallel Number of Cores Parallel Execution Time (MS) 2 11026 4 8329 8 4208 12 2876
  • 5. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53 www.ijera.com 53|P a g e threading concept to reduce the time by running on multicore processors. We conclude that STR using parallel string matching is the most efficient compared to earlier versions. As a future enhancement, we apply this algorithmto fork and join concept in JAVA SE7 of parallel string matching algorithm thereby finding the most efficient algorithm which can be used in many fields such as cryptography, molecular biology. Thus the problem of matching becomes easier. References [1] Angelika Merkel and Neil Gemmell (2008) Detecting short tandem repeats from genome data: opening the software black box, VOL 9. NO 5. 355-366. [2] Gary Benson (1998) Tandem Repeats Finder: A program to analyze DNA sequences Nucleic Acids Research, 1999, Vol_ 27, No. 2 573_580. [3] Karen Norrgard (Write Science Right) (2008) Forensics, DNA fingerprinting, and CODIS. Nature Education 1(1):35. [4] Kathleen McNamara-Schroeder, Cheryl Olonan, Simon Chu, Maria C. Montoya, MahtaAlviri,ShannonGinty, and John J. Love (2005) DNA Fingerprint Analysis of Three Short Tandem Repeat (STR) Loci for Biochemistry and Forensic Science Laboratory Courses November 21, and in revised form, April 3, Vol. 34, No. 5, pp. 378–383, 2006. [5] KoichiroDoi, TakuMonjo Pham H. Hoang, Jun Yoshimura, Hideaki Yurino, Jun Mitsui, Hiroyuki Ishiura, Yuji Takahashi, Yaeko Ichikawa, Jun Goto, Shoji Tsuji, and Shinichi MorishitaRapid(2014) detection of expanded short tandem repeats in personal genomics using hybrid sequencing Bioinformatics 30 (6): 815-822. [6] TamannaAnwarandAsad U Khan (2006) SSRscanner: a program for reporting distribution and exact location of simple sequence repeats. [7] Jonathan L., “Analysis of Fundamental Exact and Inexact Pattern Matching Algorithms,” Technical Document, Stanford University, 2004. [8] ChintaSomeswararao, K Butchiraju, S ViswanadhaRaju, “Recent Advancement is Parallel Algorithms for String matching on computing models - A survey and experimental results”, LNCS, Springer, pp.270-278, ISBN: 978-3-642-29279-8, 2011. [9] ChintaSomeswararao, K Butchiraju, S ViswanadhaRaju, “PDM data classification from STEP- an object oriented String matching approach”, IEEE conference on Application of Information and Communication Technologies, pp.1-9, ISBN: 978-1-61284-831-0, 2011. [10] ChintaSomeswararao, K Butchiraju, S ViswanadhaRaju, “Recent Advancement is Parallel Algorithms for String matching - A survey and experimental results”, IJAC, Vol 4 issue 4, pp-91- 97, 2012. [11] Simon Y. and Inayatullah M., “Improving Approximate Matching Capabilities for Meta Map Transfer Applications,” Proceedings of Symposium on Principles and Practice of Programming in Java, pp.143 147, 2004. [12] ChintaSomeswararao, K Butchiraju, S ViswanadhaRaju, “Parallel Algorithms for String Matching Problem based on Butterfly Model”, pp.41-56, IJCST, Vol. 3, Issue 3, July – Sept, ISSN 2229-4333, 2012. [13] ChintaSomeswararao, K Butchiraju, S ViswanadhaRaju, “Recent Advancement is String matching algorithms- A survey and experimental results”, IJCIS, Vol 6 No 3, pp.56-61, 2013. [14] Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. & V., S. “PVM: Parallel Virtual Machine”, A Users Guide and Tutorial for Networked Parallel Computing, MIT Press. 1994. [15] Grama, A., Karypis, G., Kumar, V. & Gupta, A. “Introduction to Parallel Computing”, Addison Wesley, 2003. [16] Leow, Y., Ng, C. &W.F.,W.. “Generating hardware from OpenMP programs”, International Conference on Field Programmable Technology, pp. 73–80, 2006. [17] Marr, D. T., Binns, F., Hill, D. L., Hinton, G., Koufaty, D. A., Miller, J. A. & Upton, M..“Hyper-Threading Technology Architecture and Microarchitecture”, Intel Technology Journal, pp. 4–15, 2002. [18] Donald Knuth; James H. Morris, Jr, Vaughan Pratt, "Fast pattern matching in strings". SIAM Journal on Computing, pp- 323–350, 1977. [19] Z. Galil, “Optimal parallel algorithms for string matching,” in Proc. 16th Annu. ACM symposium on Theory of computing, pp. 240-248, 1984.