SlideShare a Scribd company logo
1 of 7
Download to read offline
1
A Comparison of Serial and Parallel Substring Matching
Algorithms
Gerrett Diamond, Thomas Manzini, Zexin Wan, Paul Zhou
Rensselaer Polytechnic Institute - Parallel Programming and Computing
File Saved to Gerrett Diamond’s Account
Abstract
We present a study on serial and parallel
algorithms for solving an advanced string
matching problem between to large texts. This
problem involves finding the longest common
substring between texts A and B. The study
performed was done by parallelizing a naive
serial algorithm and a dynamic programming
algorithm using RPI’s BlueGene/Q AMOS
supercomputer. We use a range of test cases and
situations to show where each method has
bottlenecks. We show that both algorithms in
parallel have different peak points and therefore
the better algorithm depends on the input being
run on.
1 Introduction
String matching has been a big focus. It
has many applications in the academic realm as
a tool to make sure academic integrity is
maintained, in the form of a plagiarism checker,
but also in other fields such as search and
information retrieval. For our purposes, we
decided to see what the differences were
between the different implementations of both
naive and dynamically programmed solutions
and how those different versions would work
when parallelized. Our different solutions were
run on the RPI BlueGene /Q AMOS machine
with many different initial conditions regarding
both tasks and the number of nodes. All versions
of the code performed differently under different
circumstances. As a result a wide range of tests
were applied to insure that no issues were lost in
the grand scheme. From there we compared both
the runtimes of the various algorithms and their
overall performance as a metric of the two
previous values. On the whole we saw a wide
range of performance differences.
2 Related Works
Typically in research, string matching
refers to the task of finding a small pattern string
within a larger text. Many algorithms have been
implemented in serial for this ranging from
O(nm), where n and m are the lengths of the text
and pattern respectively, to O(n) with a
preprocessing step of O(m). For parallel
algorithms, two were shown by Zvi Galil. The
first being a O(loglogn) algorithm [1] and then
being followed by a O(1) algorithm [2]. Both of
these algorithms by Galil were made for the
parallel random access machine computation
model and use preprocessing of the pattern
string to achieve better runtimes. This form of
string matching has been explored to its
potential hitting constant time but our problem
involves the search between two large bodies of
texts and cannot do the same preprocessing
tricks done in these algorithms.
A similar problem to ours is explained
by Landau [3]. Landau explores the problem
where the smaller pattern may have up to k
differences from a substring in the larger text.
Although our problem involves comparing two
texts similar concepts can be taken in terms of
handling the difference checking. While our
model looks for exact matching the idea of
having to check throughout the pattern string for
these differences provides key concepts for
comparing each text to each other.
3 Implementation
Two simple algorithms for finding the
longest common substring between two texts are
a naive 3 for loop structure and dynamic
programming. Our goal was to compare these
two methods both in serial and parallel and see if
their differences carried over to parallel version.
To do this we implemented both serial
algorithms and used these as a basis for the
parallel algorithms.
3.1 Serial Algorithms
The naive method runs loops over each
character of the two texts and then runs a
subsequent loop while the two characters are
equal to find common substring length. This
2
method has a runtime of O(n*m*k) where n and
m are the size of the texts and k is the size of the
longest common substring. This method is good
for its simplicity and low calculations per loop
but as the size of the substring increases, more
work will have to be done and the runtime will
take a huge hit. If two files that are compared are
equal, the runtime is O(n^3). The dynamic
method runs a two dimensional array that keeps
track of the current longest substring along
diagonals. This method results in a runtime of
O(n*m). This method unlike the naive way
avoids any effects from the substring length. The
extra overhead of this method is the two
dimensional array that takes up an extra O(n*m)
space in memory and more calculations must be
done to maintain this array.
3.2 Parallel Naive
The parallel naive algorithm, like the
serial version, tries to find the longest substring
length through brute force, but uses
parallelization to break up the tasks. The
parallelization is done on string B, splitting it
into multiple chunks, and comparing them
individually to string A. This, however, was not
trivial, because processes must now pass partial
match information to each other in order to
combine for the longest common substring. To
accomplish this, it stores an associative set of
starting index and length of all matches
including the beginning of the B chunk, and an
associative set of ending index and length of all
matches including the end of chunk B. Using
MPI, each process passes the ending set to the
process of rank one greater, which compares the
receive ending set to its own starting set, and
adds the substring lengths if they have the same
start and end index. This is done starting from
rank 0 to the last rank, after which all of the
partial matchings are combined and the
maximum substring length is known. Because
the sends and receives must be blocking, the
MPI passing at the end is expected to cause
more overhead as the number of tasks grows.
3.3 Parallel Dynamic
The parallelization of the dynamic
method maintained the two dimensional array
structure but only as needed per node. For
dynamic both strings were split between nodes
evenly and the largest pieces were given to the
last node. The algorithm was done as a two-step
process where first a local computation was
done and then global corrections were
completed to get the correct lengths. The local
computation is done by passing sections of the
second text around in a ring using MPI and
therefore compares each section of both texts to
each other while building a local two
dimensional array. This initial computation
however does not take into account substrings
from previous nodes. Therefore a second step is
done using blocking MPI calls in sequence to
compute the overall substring lengths. This
second step has the bottleneck of more latency
as the number of tasks grows. Each of these two
steps has opposing overheads that cause a
maximum performance point. The first
computation will be done faster with more
processes as the computation will be split up
more while the second computation will slow
down as more blocking must occur to correct
each node’s calculations.
4 Contribution
Thomas Manzini wrote serial
implementations of a string matching algorithm
that was initially implemented in python. This
code was later adapted by Gerrett Diamond in
C++ for the serial portions of the project that
were run as benchmarks. The python code was
used for proof of concept for both the serial and
the parallel implementations. At the same time,
additional code was written to create test cases
that could be modified to fit the problems that
we need. Our group was broken up into two
teams. The first consisting of Thomas Manzini
and Paul Zhou who worked on the parallel naive
implementation. The second group was Gerrett
Diamond and Zexin Wan who implemented the
parallel dynamic algorithm. Thomas Manzini
wrote the File I/O portion of the parallel naive
implementation. This was then used by Paul
Zhou who wrote the string matching algorithm
portion of the C++ parallel naive
implementation. Zexin Wan was responsible for
implementing the Parallel I/O for the dynamic
parallel program and Gerrett Diamond
implemented the algorithm for dynamic parallel.
As the code was being finished all members
contributed to running tests and gathering data.
3
5 Testing and Expectations
We used both randomly generated texts
with certain substring lengths and published
books for testing our programs. The randomly
generated tests were used to compare the serial
algorithms to the parallel ones as well as show
the effects of substring lengths and text length
on each program. We generated two sets of tests
one with constant substring length of 50% of the
file and the other with a set file size of 16384
bytes. The first set ranges the size of the file
from 1KB to 64KB. The second tested substring
lengths from 0% to 100%. Lastly the real world
cases were selected via the top selections portion
of the Gutenburg project website [4]. These
selections were then trimmed down to a size that
could be handled by the programs. For small
data cases, we were expecting that the serial
code would be faster than the parallel algorithms
since when computation time is short, the
blocking time would be the majority of
execution time. However as the data grows
larger, the parallel algorithms would be faster.
Also, we were expecting that the naive
algorithms would have equal or even better
performance compared to the dynamic algorithm
but when the longest common substring grows
to large percentages of the file, the dynamic
algorithm should be much faster.
6 Performance Results
The graphs displaying execution times
as a function of the size of the two input files
(Fig. 1, Fig. 2) show that the naive serial code is
always faster than the dynamic serial code. On
the other hand, the parallel naive method is
faster than the parallel dynamic method only for
smaller file sizes, with the parallel dynamic
method outperforming it as the file size
increases.
Fig. 1 This figure shows the graph of the execution time
versus the size of the files that are being searched by the
naïve methods
For the naive algorithm, the serial code
outperforms the parallel code at 1KB, .but
becomes much slower at greater than 2KB. The
curves for numbers of tasks show that runs with
smaller task counts tend to be faster than those
with larger task counts
Fig. 2 This figure shows the graph of execution time versus
the size of the files that are being searched by the parallel
methods.
Similarly, for the dynamic
implementations, the serial code outperforms the
parallel code at 1KB and is slower for greater
than 2K. Runs with smaller tasks counts are
faster than those with larger tasks counts except
when the file size is large, at 64KB.
The graphs as seen in Fig. 3 and Fig. 4
show us the time that the programs took to
execute versus the size of the substring. From
this data we can see two obvious trends in the
data. The first is that between the two graphs, we
0.01
0.1
1
10
100
1000
2^10 2^12 2^14 2^16
TimeInSeconds
Naive Time vs File Size
Naïve Serial
Parallel
Naïve 256
Parallel
Naïve 512
Parallel
Naïve 1024
0.1
1
10
100
1000
2^10 2^12 2^14 2^16
TimeInSeconds
Dynamic Time vs File Size
Dynamic
Serial
Parallel
Dynamic 256
Parallel
Dynamic 512
Parallel
Dynamic
1024
4
see that the serial implementations at this scale
take significantly more time than either of the
parallel implementations.
Fig. 3 This is the graph that shows the execution time
versus the percent of the file that contains the substring that
is being searched for by the naïve method
For the naive implementation we can
see an obvious trend that shows that as the size
of the substring in the file increases the
execution time increases as well. This data
shows a rapid uptake in execution time that
follows when 20% to 40% of the file is a shared
substring. From that point we see the trend in the
execution time level off and approach a limit.
This limit is different for each different number
of tasks but appears to grow as the number of
tasks grows. One thing worth noting is that the
time of the serial implementation increases as
well as the size of the substring increases as
well.
Fig. 4 This is the graph that shows the execution time
versus the percent of the file that contains the substring that
is being searched for by the parallel method.
For the dynamic implementation, we see
something strikingly different. Aside from the
serial implementation we see that the execution
time for the different numbers of tasks seems to
stay around a constant value. These constant
values appear to be different for each different
number of tasks. It appears that as the number of
tasks increases, the execution time increases as
well. For all implementations, including the
serial one, the execution time appears to be, for
the most part, constant regardless of the size of
the substring.
Fig. 5 This graph shows the execution time versus the
number of tasks that the system used when performing the
string matching, this graph is a comparison of the naïve and
dynamic implementations when comparing Pride and
Prejudice and The Divine Comedy.
Fig. 6 This graph shows the execution time versus the
number of tasks that the system used when performing the
string matching, this graph is a comparison of the naïve and
dynamic implementations when comparing The Adventures
of Huckleberry Finn and The Divine Comedy.
0.1
1
10
100
TimeInSeconds
Naive Time vs Substring Size
Naïve Serial
Parallel
Naïve 256
Parallel
Naïve 512
Parallel
Naïve 1024
0.1
1
10
100
TimeInSeconds
Dynamic Time vs Substring Size
Dynamic
Serial
Parallel
Dynamic
256
Parallel
Dynamic
512
Parallel
Dynamic
1024
0
20
40
60
80
100
128 256 512 10242048
TimeInSecnds
Time vs Tasks
Pride and Prejudice vs The Divine Comedy
Naïve
Dynamic
0
20
40
60
80
100
128 256 512 1024 2048
TimeInSeconds
Time vs Tasks
Adventures of Huckleberry Finn vs The
Divine Comedy
Naïve
Dynamic
5
Fig. 7 This graph shows the execution time versus the
number of tasks that the system used when performing the
string matching, this graph is a comparison of the naïve and
dynamic implementations when comparing Pride and
Prejudice and The Adventures of Huckleberry Finn.
The graphs seen in Fig. 5, Fig. 6, and
Fig. 7 show us the performance time as a
function of the number of tasks. These graphs
refer to the time that it takes for the comparison
of the novels Pride and Prejudice and The
Divine Comedy, The Adventures of Huckleberry
Finn and The Divine Comedy, and Pride and
Prejudice and Huckleberry Finn, respectively.
The data shows an interesting trend which is that
the parallel naive implementation is better in
most cases than the parallelized implementation,
this holds for the vast majority of the test cases.
This changes however when it comes to the
1024 task case and the 2048 case. The results
differ between the different graphs, however, the
naive results appear much more consistent
whereas the parallel results become much less
consistent when the number of tasks passes 512.
Though the naive implementation is not always
faster the inconsistency of the dynamic
implementation means that the naive
implementation performs better on the whole.
Fig. 8 This graph shows the percentage of time that each
program spent using the message passing interface. It is the
percent time spent versus the number of tasks utilized.
For every algorithm, a significant
portion of the execution time is spent performing
MPI sending and receiving, depending on the
number of tasks. For every number of tasks, the
dynamic code takes less time in MPI than the
naive. The dynamic code, however, shows a
steeper difference between the minimum and
maximum numbers of tasks, with a factor of 6,
than the naive code, with a factor of 2.2.
On the whole we saw an average speed
up of 25.79683378 times when comparing the
average runtime of the fastest serial
implementation (naïve) and the average runtime
of the all both the naïve and dynamic with 256,
512, and 1024 tasks.
7 Analysis of Performance Results
The serial execution times grow
exponentially; however, although the times are
lower for our test cases, the parallel execution
times appear to grow at an even greater
exponential rate. This is due to the increasing
overhead of blocking MPI calls. We expect that
given larger test cases, the parallel code may end
up slower than the serial code.
But, the parallel dynamic code had
memory problems for large input files; thus, we
had to limit the file size of our test cases to
200KB. Larger files cause memory allocation
errors. This is because the dynamic code creates
a 2D array with dimensions (size of file A) x
(size of file B), meaning that memory usage
scales intensively with file size. We believe that
0
20
40
60
80
100
128 256 512 10242048
TimeIinSeconds
Time vs Tasks
Pride and Prejudice vs Adventures of
Huckleberry Finn
Naïve
Dynamic
0%
20%
40%
60%
80%
100%
PercentTime
Communication Overhead
Percent Time Spent in MPI vs Task Count
% Time in
MPI
(Dynamic)
% Time in
MPI (Naïve)
6
an environment with more memory would be
needed to try these larger test cases.
We observed that as the data size
increases, the gaps between the execution times
of the parallel method with different numbers of
nodes become smaller. The reason for this is
because the time of data transfer between nodes
remains constant while the time of computing
data grows linearly. Comparing Fig. 1 and Fig.
2, it shows that the parallel naive is faster than
the parallel dynamic when the data is really
small and when the data grows bigger, the
parallel dynamic outperforms the parallel naive
when the data size gets bigger than 16KB. The
reason is because
The results in Fig. 3 and Fig. 4 showed
us data that is in line with what we were
expecting. For the Naive implementation, we see
that as the size of the substring increases, the
execution time increases as well. This is to be
expected with the larger shared string as it being
calculated, it must be passed to all the relevant
nodes. This increases communication time
amongst the nodes not only because the data
must be passed but also because the amount of
data that must be passed increases as well. We
see this same trend for all of the parallel
implementations. For the serial case however,
we see a slight increase as well as the size of the
substring increases as well. This increase is
caused by the simple fact that as the serial
implementation continues looking through the
contents of the file even after a substring has
been found. This means that after the initial
computation has been found, the program will
continue searching to insure that it hasn't missed
anything.
For the dynamic implementation we also
see data that is in line with what we were
expecting. We see a very consistent execution
time regardless of the size of the substring that is
in the file. This makes sense as the amount of
communication is not dependent on the size of
the substring as the data that is being passed
around to each node remains constant
throughout the program. The changes that we
see in the data as the number of tasks increases
is also consistent as the number of blocking
calls, and hence communication, increases as the
number of tasks increases. From there it is
simple to see how the execution time would
increase as the time that the program spends
communication is outweighed by the
performance increase and therefore outweighs
the advantage of splitting up the file at a certain
point.
Fig. 5, Fig. 6 and Fig. 7 are testing the
ability of running very large text files. The three
books we are used for testing are Pride and
Prejudice, Adventures of Huckleberry Finn and
the Divine Comedy and each book is about a
size of 600KB. From the performance curves of
the naive method, we can see it reach its peak
performance with 256 tasks and then the
execution times start to raise. The reason for that
is because the longest substrings between books
are very short compare to the size of a book, the
naive method which strongly dependent on the
length of longest substring would have very few
computations for each task. When the program
get more tasks, the time for computing the local
max substring decreases in a very small scale
while the time spending on MPI blocking
increases by the number of nodes times a
constant. The dynamic method has similar
performance as well. Fig. 6 and Fig. 7 show the
dynamic method reach its peak performance
with 1024 tasks and outperformance the peak of
naive method. The reason for the dynamic takes
more tasks to max out is because the dynamic
methods does not dependent on the length of
longest substring and would have more stuff to
compute.
A general trend displayed in these
graphs is that the more tasks we use, the slower
the execution time is. The only times when using
more than 256 is beneficial are with a string
match of 0% using the naive algorithm and with
file size greater than 16KB using the dynamic
algorithm. We attribute this to the MPI I/O
overhead, which scales only to the number of
tasks. For smaller calculations, the overhead
incurred by larger task counts outdoes the
benefit of increased parallelization from having
more tasks.
It was expected that the naive algorithm
spent a larger percentage of its running time
doing MPI communication, because the passing
of partial matches requires sending more data. It
is notable however that the naive algorithm was
generally faster for our test cases, meaning that
despite having greater percentage of
7
communication overhead, the running time was
significantly faster than that of the dynamic
algorithm.
8 Future Works
A current bottleneck in the naive
method is that while text B is broken up between
the processes, text A is kept whole on each node.
A method for breaking up A as well was
explored but not implemented as the amount of
information needed to be stored is at least
doubled and the model becomes much more
complex. This does however have a potential
speedup for the program as less checking will
happen per node at a time.
During the discussion while we were
researching for parallel naive algorithms, we
also found out other algorithms that could have
been faster than the current parallel naive one.
The idea of the method is to split both string A
and B, and then convert the string B into a circle
of small strings. For each rank, we will rotate the
circle of string B and let it compare to part of the
string A, recording the length of longest
substring, the position of substring and the order
of String B in the circle and then pass then to the
next String. Ideally this method should be a lot
faster than the current naive algorithms since it
has better scaling and less dependence.
However, we did not implement this method
because it was not only more memory intensive
than the current method, but also a lot more
complicated. We listed this method for a
possible direction for future study.
Also there are potentially better
algorithms that can be done in serial that could
be extended to parallel. We took two of the
simplest methods in order to have easy
comparison by runtime but future work could be
done on better algorithms with better runtimes.
9 Conclusions
For all the data we have for this
assignment, we realize there is no ultimate
method to find the longest substring in two texts.
Performances differ when the cases change. If
the user has a powerful machine with massive
texts and multiple long substrings, the dynamic
methods would be the better choice. However in
real world, most text would not be extremely
similar and people would not be able to own a
supercomputer like blue geneQ, the naive
methods would be a more realistic choice. (The
alternative naive method which is mentioned in
future work section should have an even better
performance since it has a better scaling in
theory.)
References
[1] Dany Breslauer, Zvi Galil. “An Optimal
$O(loglog n)$ Time Parallel String Matching
Algorithm”. SIAM J. Comput., 19(6), 1051–
1058.
http://epubs.siam.org/doi/abs/10.1137/0219072?
journalCode=smjcat
[2] Zvi Galil. “A constant-time optimal parallel
string-matching algorithm”. Journal of the ACM.
Volume 42 Issue 4, July 1995. Pages 908-918.
http://dl.acm.org/citation.cfm?id=210341
[3] Gad M. Landau. “Fast parallel and serial
approximate string matching”. Journal of
Algorithms. Volume 10, Issue 2, June 1989,
Pages 157–169
http://www.sciencedirect.com/science/article/pii/
0196677489900102
[4]Michael S. Hart. Project Gutenberg.
University of North Carolina, 1 Dec. 1996. Web.
7 May. 2014
http://www.gutenberg.org/

More Related Content

What's hot

AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
Scalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkScalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkLynxAnalytics
 
Modification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialModification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialAlexander Decker
 
Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPIHanif Durad
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...ijcsity
 
Bliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmBliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmCSCJournals
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012Jussara F.M.
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...IRJET Journal
 
Elgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingElgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingijwmn
 
GRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMGRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMijcseit
 
Parallel processing -open mp
Parallel processing -open mpParallel processing -open mp
Parallel processing -open mpTanjilla Sarkar
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic MusicIRJET Journal
 
An Enhanced Message Digest Hash Algorithm for Information Security
An Enhanced Message Digest Hash Algorithm for Information SecurityAn Enhanced Message Digest Hash Algorithm for Information Security
An Enhanced Message Digest Hash Algorithm for Information Securitypaperpublications3
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONcsandit
 
White space steganography on text
White space steganography on textWhite space steganography on text
White space steganography on textIJCNCJournal
 

What's hot (18)

Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
Scalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache SparkScalable Distributed Graph Algorithms on Apache Spark
Scalable Distributed Graph Algorithms on Apache Spark
 
Modification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialModification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorial
 
Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPI
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...
 
Algoritmo quântico
Algoritmo quânticoAlgoritmo quântico
Algoritmo quântico
 
Bliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmBliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection Algorithm
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Elgamal signature for content distribution with network coding
Elgamal signature for content distribution with network codingElgamal signature for content distribution with network coding
Elgamal signature for content distribution with network coding
 
GRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPMGRAMMAR-BASED PRE-PROCESSING FOR PPM
GRAMMAR-BASED PRE-PROCESSING FOR PPM
 
Parallel processing -open mp
Parallel processing -open mpParallel processing -open mp
Parallel processing -open mp
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
 
An Enhanced Message Digest Hash Algorithm for Information Security
An Enhanced Message Digest Hash Algorithm for Information SecurityAn Enhanced Message Digest Hash Algorithm for Information Security
An Enhanced Message Digest Hash Algorithm for Information Security
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
 
White space steganography on text
White space steganography on textWhite space steganography on text
White space steganography on text
 

Viewers also liked

Viewers also liked (6)

Parallel port programming
Parallel port programmingParallel port programming
Parallel port programming
 
Parallel Port
Parallel PortParallel Port
Parallel Port
 
Motherboard + ports & connector
Motherboard + ports & connectorMotherboard + ports & connector
Motherboard + ports & connector
 
Computer ports
Computer portsComputer ports
Computer ports
 
Computer Ports
Computer PortsComputer Ports
Computer Ports
 
Ports and connectors
Ports and connectorsPorts and connectors
Ports and connectors
 

Similar to A Comparison of Serial and Parallel Substring Matching Algorithms

A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsIJERA Editor
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
Scimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujoScimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujoRobson Araujo
 
Complier design
Complier design Complier design
Complier design shreeuva
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered HarmfulPrateek Singh
 
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP IJCSEIT Journal
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...Muhammad Nasiri
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
Discrete structure ch 3 short question's
Discrete structure ch 3 short question'sDiscrete structure ch 3 short question's
Discrete structure ch 3 short question'shammad463061
 
cis97003
cis97003cis97003
cis97003perfj
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingIJERA Editor
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryVinícius Uchôa
 
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1AIRCC Publishing Corporation
 
cis97007
cis97007cis97007
cis97007perfj
 

Similar to A Comparison of Serial and Parallel Substring Matching Algorithms (20)

report
reportreport
report
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching Algorithms
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
I0343047049
I0343047049I0343047049
I0343047049
 
Scimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujoScimakelatex.83323.robson+medeiros+de+araujo
Scimakelatex.83323.robson+medeiros+de+araujo
 
Complier design
Complier design Complier design
Complier design
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered Harmful
 
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...
A FPGA-Based Deep Packet Inspection Engine for Network Intrusion Detection Sy...
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
Discrete structure ch 3 short question's
Discrete structure ch 3 short question'sDiscrete structure ch 3 short question's
Discrete structure ch 3 short question's
 
cis97003
cis97003cis97003
cis97003
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
 
cis97007
cis97007cis97007
cis97007
 

A Comparison of Serial and Parallel Substring Matching Algorithms

  • 1. 1 A Comparison of Serial and Parallel Substring Matching Algorithms Gerrett Diamond, Thomas Manzini, Zexin Wan, Paul Zhou Rensselaer Polytechnic Institute - Parallel Programming and Computing File Saved to Gerrett Diamond’s Account Abstract We present a study on serial and parallel algorithms for solving an advanced string matching problem between to large texts. This problem involves finding the longest common substring between texts A and B. The study performed was done by parallelizing a naive serial algorithm and a dynamic programming algorithm using RPI’s BlueGene/Q AMOS supercomputer. We use a range of test cases and situations to show where each method has bottlenecks. We show that both algorithms in parallel have different peak points and therefore the better algorithm depends on the input being run on. 1 Introduction String matching has been a big focus. It has many applications in the academic realm as a tool to make sure academic integrity is maintained, in the form of a plagiarism checker, but also in other fields such as search and information retrieval. For our purposes, we decided to see what the differences were between the different implementations of both naive and dynamically programmed solutions and how those different versions would work when parallelized. Our different solutions were run on the RPI BlueGene /Q AMOS machine with many different initial conditions regarding both tasks and the number of nodes. All versions of the code performed differently under different circumstances. As a result a wide range of tests were applied to insure that no issues were lost in the grand scheme. From there we compared both the runtimes of the various algorithms and their overall performance as a metric of the two previous values. On the whole we saw a wide range of performance differences. 2 Related Works Typically in research, string matching refers to the task of finding a small pattern string within a larger text. Many algorithms have been implemented in serial for this ranging from O(nm), where n and m are the lengths of the text and pattern respectively, to O(n) with a preprocessing step of O(m). For parallel algorithms, two were shown by Zvi Galil. The first being a O(loglogn) algorithm [1] and then being followed by a O(1) algorithm [2]. Both of these algorithms by Galil were made for the parallel random access machine computation model and use preprocessing of the pattern string to achieve better runtimes. This form of string matching has been explored to its potential hitting constant time but our problem involves the search between two large bodies of texts and cannot do the same preprocessing tricks done in these algorithms. A similar problem to ours is explained by Landau [3]. Landau explores the problem where the smaller pattern may have up to k differences from a substring in the larger text. Although our problem involves comparing two texts similar concepts can be taken in terms of handling the difference checking. While our model looks for exact matching the idea of having to check throughout the pattern string for these differences provides key concepts for comparing each text to each other. 3 Implementation Two simple algorithms for finding the longest common substring between two texts are a naive 3 for loop structure and dynamic programming. Our goal was to compare these two methods both in serial and parallel and see if their differences carried over to parallel version. To do this we implemented both serial algorithms and used these as a basis for the parallel algorithms. 3.1 Serial Algorithms The naive method runs loops over each character of the two texts and then runs a subsequent loop while the two characters are equal to find common substring length. This
  • 2. 2 method has a runtime of O(n*m*k) where n and m are the size of the texts and k is the size of the longest common substring. This method is good for its simplicity and low calculations per loop but as the size of the substring increases, more work will have to be done and the runtime will take a huge hit. If two files that are compared are equal, the runtime is O(n^3). The dynamic method runs a two dimensional array that keeps track of the current longest substring along diagonals. This method results in a runtime of O(n*m). This method unlike the naive way avoids any effects from the substring length. The extra overhead of this method is the two dimensional array that takes up an extra O(n*m) space in memory and more calculations must be done to maintain this array. 3.2 Parallel Naive The parallel naive algorithm, like the serial version, tries to find the longest substring length through brute force, but uses parallelization to break up the tasks. The parallelization is done on string B, splitting it into multiple chunks, and comparing them individually to string A. This, however, was not trivial, because processes must now pass partial match information to each other in order to combine for the longest common substring. To accomplish this, it stores an associative set of starting index and length of all matches including the beginning of the B chunk, and an associative set of ending index and length of all matches including the end of chunk B. Using MPI, each process passes the ending set to the process of rank one greater, which compares the receive ending set to its own starting set, and adds the substring lengths if they have the same start and end index. This is done starting from rank 0 to the last rank, after which all of the partial matchings are combined and the maximum substring length is known. Because the sends and receives must be blocking, the MPI passing at the end is expected to cause more overhead as the number of tasks grows. 3.3 Parallel Dynamic The parallelization of the dynamic method maintained the two dimensional array structure but only as needed per node. For dynamic both strings were split between nodes evenly and the largest pieces were given to the last node. The algorithm was done as a two-step process where first a local computation was done and then global corrections were completed to get the correct lengths. The local computation is done by passing sections of the second text around in a ring using MPI and therefore compares each section of both texts to each other while building a local two dimensional array. This initial computation however does not take into account substrings from previous nodes. Therefore a second step is done using blocking MPI calls in sequence to compute the overall substring lengths. This second step has the bottleneck of more latency as the number of tasks grows. Each of these two steps has opposing overheads that cause a maximum performance point. The first computation will be done faster with more processes as the computation will be split up more while the second computation will slow down as more blocking must occur to correct each node’s calculations. 4 Contribution Thomas Manzini wrote serial implementations of a string matching algorithm that was initially implemented in python. This code was later adapted by Gerrett Diamond in C++ for the serial portions of the project that were run as benchmarks. The python code was used for proof of concept for both the serial and the parallel implementations. At the same time, additional code was written to create test cases that could be modified to fit the problems that we need. Our group was broken up into two teams. The first consisting of Thomas Manzini and Paul Zhou who worked on the parallel naive implementation. The second group was Gerrett Diamond and Zexin Wan who implemented the parallel dynamic algorithm. Thomas Manzini wrote the File I/O portion of the parallel naive implementation. This was then used by Paul Zhou who wrote the string matching algorithm portion of the C++ parallel naive implementation. Zexin Wan was responsible for implementing the Parallel I/O for the dynamic parallel program and Gerrett Diamond implemented the algorithm for dynamic parallel. As the code was being finished all members contributed to running tests and gathering data.
  • 3. 3 5 Testing and Expectations We used both randomly generated texts with certain substring lengths and published books for testing our programs. The randomly generated tests were used to compare the serial algorithms to the parallel ones as well as show the effects of substring lengths and text length on each program. We generated two sets of tests one with constant substring length of 50% of the file and the other with a set file size of 16384 bytes. The first set ranges the size of the file from 1KB to 64KB. The second tested substring lengths from 0% to 100%. Lastly the real world cases were selected via the top selections portion of the Gutenburg project website [4]. These selections were then trimmed down to a size that could be handled by the programs. For small data cases, we were expecting that the serial code would be faster than the parallel algorithms since when computation time is short, the blocking time would be the majority of execution time. However as the data grows larger, the parallel algorithms would be faster. Also, we were expecting that the naive algorithms would have equal or even better performance compared to the dynamic algorithm but when the longest common substring grows to large percentages of the file, the dynamic algorithm should be much faster. 6 Performance Results The graphs displaying execution times as a function of the size of the two input files (Fig. 1, Fig. 2) show that the naive serial code is always faster than the dynamic serial code. On the other hand, the parallel naive method is faster than the parallel dynamic method only for smaller file sizes, with the parallel dynamic method outperforming it as the file size increases. Fig. 1 This figure shows the graph of the execution time versus the size of the files that are being searched by the naïve methods For the naive algorithm, the serial code outperforms the parallel code at 1KB, .but becomes much slower at greater than 2KB. The curves for numbers of tasks show that runs with smaller task counts tend to be faster than those with larger task counts Fig. 2 This figure shows the graph of execution time versus the size of the files that are being searched by the parallel methods. Similarly, for the dynamic implementations, the serial code outperforms the parallel code at 1KB and is slower for greater than 2K. Runs with smaller tasks counts are faster than those with larger tasks counts except when the file size is large, at 64KB. The graphs as seen in Fig. 3 and Fig. 4 show us the time that the programs took to execute versus the size of the substring. From this data we can see two obvious trends in the data. The first is that between the two graphs, we 0.01 0.1 1 10 100 1000 2^10 2^12 2^14 2^16 TimeInSeconds Naive Time vs File Size Naïve Serial Parallel Naïve 256 Parallel Naïve 512 Parallel Naïve 1024 0.1 1 10 100 1000 2^10 2^12 2^14 2^16 TimeInSeconds Dynamic Time vs File Size Dynamic Serial Parallel Dynamic 256 Parallel Dynamic 512 Parallel Dynamic 1024
  • 4. 4 see that the serial implementations at this scale take significantly more time than either of the parallel implementations. Fig. 3 This is the graph that shows the execution time versus the percent of the file that contains the substring that is being searched for by the naïve method For the naive implementation we can see an obvious trend that shows that as the size of the substring in the file increases the execution time increases as well. This data shows a rapid uptake in execution time that follows when 20% to 40% of the file is a shared substring. From that point we see the trend in the execution time level off and approach a limit. This limit is different for each different number of tasks but appears to grow as the number of tasks grows. One thing worth noting is that the time of the serial implementation increases as well as the size of the substring increases as well. Fig. 4 This is the graph that shows the execution time versus the percent of the file that contains the substring that is being searched for by the parallel method. For the dynamic implementation, we see something strikingly different. Aside from the serial implementation we see that the execution time for the different numbers of tasks seems to stay around a constant value. These constant values appear to be different for each different number of tasks. It appears that as the number of tasks increases, the execution time increases as well. For all implementations, including the serial one, the execution time appears to be, for the most part, constant regardless of the size of the substring. Fig. 5 This graph shows the execution time versus the number of tasks that the system used when performing the string matching, this graph is a comparison of the naïve and dynamic implementations when comparing Pride and Prejudice and The Divine Comedy. Fig. 6 This graph shows the execution time versus the number of tasks that the system used when performing the string matching, this graph is a comparison of the naïve and dynamic implementations when comparing The Adventures of Huckleberry Finn and The Divine Comedy. 0.1 1 10 100 TimeInSeconds Naive Time vs Substring Size Naïve Serial Parallel Naïve 256 Parallel Naïve 512 Parallel Naïve 1024 0.1 1 10 100 TimeInSeconds Dynamic Time vs Substring Size Dynamic Serial Parallel Dynamic 256 Parallel Dynamic 512 Parallel Dynamic 1024 0 20 40 60 80 100 128 256 512 10242048 TimeInSecnds Time vs Tasks Pride and Prejudice vs The Divine Comedy Naïve Dynamic 0 20 40 60 80 100 128 256 512 1024 2048 TimeInSeconds Time vs Tasks Adventures of Huckleberry Finn vs The Divine Comedy Naïve Dynamic
  • 5. 5 Fig. 7 This graph shows the execution time versus the number of tasks that the system used when performing the string matching, this graph is a comparison of the naïve and dynamic implementations when comparing Pride and Prejudice and The Adventures of Huckleberry Finn. The graphs seen in Fig. 5, Fig. 6, and Fig. 7 show us the performance time as a function of the number of tasks. These graphs refer to the time that it takes for the comparison of the novels Pride and Prejudice and The Divine Comedy, The Adventures of Huckleberry Finn and The Divine Comedy, and Pride and Prejudice and Huckleberry Finn, respectively. The data shows an interesting trend which is that the parallel naive implementation is better in most cases than the parallelized implementation, this holds for the vast majority of the test cases. This changes however when it comes to the 1024 task case and the 2048 case. The results differ between the different graphs, however, the naive results appear much more consistent whereas the parallel results become much less consistent when the number of tasks passes 512. Though the naive implementation is not always faster the inconsistency of the dynamic implementation means that the naive implementation performs better on the whole. Fig. 8 This graph shows the percentage of time that each program spent using the message passing interface. It is the percent time spent versus the number of tasks utilized. For every algorithm, a significant portion of the execution time is spent performing MPI sending and receiving, depending on the number of tasks. For every number of tasks, the dynamic code takes less time in MPI than the naive. The dynamic code, however, shows a steeper difference between the minimum and maximum numbers of tasks, with a factor of 6, than the naive code, with a factor of 2.2. On the whole we saw an average speed up of 25.79683378 times when comparing the average runtime of the fastest serial implementation (naïve) and the average runtime of the all both the naïve and dynamic with 256, 512, and 1024 tasks. 7 Analysis of Performance Results The serial execution times grow exponentially; however, although the times are lower for our test cases, the parallel execution times appear to grow at an even greater exponential rate. This is due to the increasing overhead of blocking MPI calls. We expect that given larger test cases, the parallel code may end up slower than the serial code. But, the parallel dynamic code had memory problems for large input files; thus, we had to limit the file size of our test cases to 200KB. Larger files cause memory allocation errors. This is because the dynamic code creates a 2D array with dimensions (size of file A) x (size of file B), meaning that memory usage scales intensively with file size. We believe that 0 20 40 60 80 100 128 256 512 10242048 TimeIinSeconds Time vs Tasks Pride and Prejudice vs Adventures of Huckleberry Finn Naïve Dynamic 0% 20% 40% 60% 80% 100% PercentTime Communication Overhead Percent Time Spent in MPI vs Task Count % Time in MPI (Dynamic) % Time in MPI (Naïve)
  • 6. 6 an environment with more memory would be needed to try these larger test cases. We observed that as the data size increases, the gaps between the execution times of the parallel method with different numbers of nodes become smaller. The reason for this is because the time of data transfer between nodes remains constant while the time of computing data grows linearly. Comparing Fig. 1 and Fig. 2, it shows that the parallel naive is faster than the parallel dynamic when the data is really small and when the data grows bigger, the parallel dynamic outperforms the parallel naive when the data size gets bigger than 16KB. The reason is because The results in Fig. 3 and Fig. 4 showed us data that is in line with what we were expecting. For the Naive implementation, we see that as the size of the substring increases, the execution time increases as well. This is to be expected with the larger shared string as it being calculated, it must be passed to all the relevant nodes. This increases communication time amongst the nodes not only because the data must be passed but also because the amount of data that must be passed increases as well. We see this same trend for all of the parallel implementations. For the serial case however, we see a slight increase as well as the size of the substring increases as well. This increase is caused by the simple fact that as the serial implementation continues looking through the contents of the file even after a substring has been found. This means that after the initial computation has been found, the program will continue searching to insure that it hasn't missed anything. For the dynamic implementation we also see data that is in line with what we were expecting. We see a very consistent execution time regardless of the size of the substring that is in the file. This makes sense as the amount of communication is not dependent on the size of the substring as the data that is being passed around to each node remains constant throughout the program. The changes that we see in the data as the number of tasks increases is also consistent as the number of blocking calls, and hence communication, increases as the number of tasks increases. From there it is simple to see how the execution time would increase as the time that the program spends communication is outweighed by the performance increase and therefore outweighs the advantage of splitting up the file at a certain point. Fig. 5, Fig. 6 and Fig. 7 are testing the ability of running very large text files. The three books we are used for testing are Pride and Prejudice, Adventures of Huckleberry Finn and the Divine Comedy and each book is about a size of 600KB. From the performance curves of the naive method, we can see it reach its peak performance with 256 tasks and then the execution times start to raise. The reason for that is because the longest substrings between books are very short compare to the size of a book, the naive method which strongly dependent on the length of longest substring would have very few computations for each task. When the program get more tasks, the time for computing the local max substring decreases in a very small scale while the time spending on MPI blocking increases by the number of nodes times a constant. The dynamic method has similar performance as well. Fig. 6 and Fig. 7 show the dynamic method reach its peak performance with 1024 tasks and outperformance the peak of naive method. The reason for the dynamic takes more tasks to max out is because the dynamic methods does not dependent on the length of longest substring and would have more stuff to compute. A general trend displayed in these graphs is that the more tasks we use, the slower the execution time is. The only times when using more than 256 is beneficial are with a string match of 0% using the naive algorithm and with file size greater than 16KB using the dynamic algorithm. We attribute this to the MPI I/O overhead, which scales only to the number of tasks. For smaller calculations, the overhead incurred by larger task counts outdoes the benefit of increased parallelization from having more tasks. It was expected that the naive algorithm spent a larger percentage of its running time doing MPI communication, because the passing of partial matches requires sending more data. It is notable however that the naive algorithm was generally faster for our test cases, meaning that despite having greater percentage of
  • 7. 7 communication overhead, the running time was significantly faster than that of the dynamic algorithm. 8 Future Works A current bottleneck in the naive method is that while text B is broken up between the processes, text A is kept whole on each node. A method for breaking up A as well was explored but not implemented as the amount of information needed to be stored is at least doubled and the model becomes much more complex. This does however have a potential speedup for the program as less checking will happen per node at a time. During the discussion while we were researching for parallel naive algorithms, we also found out other algorithms that could have been faster than the current parallel naive one. The idea of the method is to split both string A and B, and then convert the string B into a circle of small strings. For each rank, we will rotate the circle of string B and let it compare to part of the string A, recording the length of longest substring, the position of substring and the order of String B in the circle and then pass then to the next String. Ideally this method should be a lot faster than the current naive algorithms since it has better scaling and less dependence. However, we did not implement this method because it was not only more memory intensive than the current method, but also a lot more complicated. We listed this method for a possible direction for future study. Also there are potentially better algorithms that can be done in serial that could be extended to parallel. We took two of the simplest methods in order to have easy comparison by runtime but future work could be done on better algorithms with better runtimes. 9 Conclusions For all the data we have for this assignment, we realize there is no ultimate method to find the longest substring in two texts. Performances differ when the cases change. If the user has a powerful machine with massive texts and multiple long substrings, the dynamic methods would be the better choice. However in real world, most text would not be extremely similar and people would not be able to own a supercomputer like blue geneQ, the naive methods would be a more realistic choice. (The alternative naive method which is mentioned in future work section should have an even better performance since it has a better scaling in theory.) References [1] Dany Breslauer, Zvi Galil. “An Optimal $O(loglog n)$ Time Parallel String Matching Algorithm”. SIAM J. Comput., 19(6), 1051– 1058. http://epubs.siam.org/doi/abs/10.1137/0219072? journalCode=smjcat [2] Zvi Galil. “A constant-time optimal parallel string-matching algorithm”. Journal of the ACM. Volume 42 Issue 4, July 1995. Pages 908-918. http://dl.acm.org/citation.cfm?id=210341 [3] Gad M. Landau. “Fast parallel and serial approximate string matching”. Journal of Algorithms. Volume 10, Issue 2, June 1989, Pages 157–169 http://www.sciencedirect.com/science/article/pii/ 0196677489900102 [4]Michael S. Hart. Project Gutenberg. University of North Carolina, 1 Dec. 1996. Web. 7 May. 2014 http://www.gutenberg.org/