SlideShare a Scribd company logo
1 of 9
Download to read offline
UNIVERSITY OF UTAH, CS 6230: PARALLEL AND HIGH PERFORMANCE COMPUTING
Fast fgrep using parallel string matching algorithm
Myungho Jung
May 10, 2015
1. INTRODUCTION
Grep is a common tool to search patterns in files and to print lines for Unix-like systems. GNU grep
can find the pattern fast using string matching algorithms like Boyer-Moore algorithm. However, the
GNU grep cannot maximize the performance on multi-core systems because it is implemented using
sequential algorithms. The purpose of this project is to implement parallel algorithm for string matching
and to apply to the existing grep code. It would be hard to show the better performance than GNU grep
since it has been developed and optimized for a long time. Nevertheless, this project will be able to
propose a new possibility for grep to utilize multi-core system.
GNU grep is divided into several parts. First, egrep which is equal to option ‘-E’ searches patterns of
extended regular expression. Second, the grep can also find fixed strings using fgrep or option ‘-F’. Lastly,
pcregrep searches patterns of Perl regular expressions in files. As a first step to parallelize GNU grep,
fgrep will be parallelized in this project.
Fgrep uses two algorithms depending on the number of patterns. If the number of input string is one,
Boyer-Moore algorithm is used. And if the input string is more than one, the grep tool finds the pattern
by Commentz-Walter algorithm. In this project, the first part will be parallelized.
1
2. METHODS
There are several methods to parallelize grep. First, file level parallelism is possible. For example, while
the first thread searches the pattern in the first file, the second thread can search the other file at the same
time. It is easy to implement but it is impossible to search a large file in parallel. And results of different
files will be mixed. Second, the grep can be parallelized for each line. But, it has similar problems with
the first one. Lines will not be printed sequentially and it cannot be applied to binary files because there
is no carriage return character. So, the purpose of the project is to show the same result as the sequential
algorithm.
Many parallel algorithms for string matching are studied. One of the earliest optimal algorithms is
devised by Zvi Galil [2]. However, this algorithm is limited to fixed alphabets. Uzi Vishkin improved this
algorithm to work for general cases [3]. Constant time algorithm is also developed by Zvi Galil at el using
randomization [1]. In this project, the algorithm is based on Uzi Vishkin’s idea and several mechanisms
are added for grep utility.
2.1. DEFINITIONS
It is necessary to explain definitions for this algorithm.
• Given a pattern of length m and a text of length n, the purpose of the algorithms is searching the
pattern in the text.
• Period : Given a string of length m, X is a period if the string is X k
X
(X is repeated k times before X which is the prefix of X )
• The period : The shortest period if there are more than one period
• Periodic : A string is periodic if the length is less than or equal to m/2.
(m is the length of string)
Ex>‘abaabcab’ is not periodic but ‘abcabcab’ is periodic.
• Witness : An array indicating indices of the positions that two overlapped characters are different.
Ex> Given witness[i] = j, this means that when a string overlaps on top of the the same string
at the index of i, j is the index of upper string where two characters of strings are different at the
position.
• Duel : Using the characteristics of witness, we can guarantee that at least one of the characters of
patterns at the position should be different from that of the text.
2
2.2. ALGORITHMS
2.2.1. PATTERN ANALYSIS
First, the witness array should be created from the pattern string. This part can be parallelized but it
is not useful in practice because the length of the pattern is usually short. Therefore, the overhead for
parallelism would be higher than the sequential algorithm.
Algorithm 1 The algorithm for pattern analysis
function PATTERNANALYSIS(Pattern[1..m])
for i = 1;i < logm;++i do
Divide the pattern into blocks of size 2i
Get j which is a candidate of the first block for matching
Compute witness[j] with patter n[1..2i+1
] by brute-force D : O(1),W : O(2i
)
if witness[j] = 0 then
Apply the duel function on blocks left
else
Find the largest α (α ≥ i +1 and j is the period for patter n[1..2α
]
but not for patter n[1..2α+1
]) D : O(1),W : O(2α
)
if α == logm then
return witness
else
Eliminate all candidates in the first α−1 blocks using the duel function
i = α
Continue
end if
end if
return witness
end for
end function
2.2.2. TEXT ANALYSIS (NONPERIODIC CASE)
This is a common case. If the pattern string is not periodic, follow the algorithm below:
1. Partition the text into blocks of size m/2
2. For each block, eliminate all but one candidate using the duel function. D : O(logm),W : O(m)
3. For each candidate, check if the string is matched at the position by brute-force algorithm.
D : O(1),W : O(n)
Finally, the total depth is O(logm), and the work is O(n).
2.2.3. TEXT ANALYSIS (PERIODIC CASE)
In reality, it is a rare case that the pattern is periodic. However, if the pattern is periodic, the algorithm
above cannot be applied because there are more than one position where the string can be matched in
each block of size m/2 . The algorithm is below:
3
1. Given p is the period of the pattern, compute all candidates using the prefix of patter n[1..2p −1].
D : O(logm),W : O(n)
2. For each occurence, check if the p2
p (p is prefix of p at the end of the pattern) occurs at the
postiion. D : O(logm),W : O(p)
3. At the position of i, if the p2
p occurs all of the positions of i + j p(j = 0..m/p), the pattern occurs
at the position. D : O(logm),W : O(n)
Therefore, the total depth is O(logm), and the work is O(n).
3. IMPLEMENTATION DETAILS
There are many parts that can be parallelized in this algorithm. However, it is not a good way to par-
allelize every part because it would cause more overhead. Therefore, only the outermost loop is par-
allelized. Applying this algorithm to grep is little tricky. The sequential algorithm for the original grep
returns the first occurrence of the pattern in the buffer. However, parallel algorithm can only find all
occurrences regardless of the order. Thus, the code should be modified before applying.
In the sequential grep, it reads a file into a buffer. If the file is larger than the buffer, the program
reads as much data as the size of buffer. Then, it doesn’t search the string line by line. To improve the
performance, it returns the position of the first occurrence of the string and prints the line. The algorithm
searches other occurrences of the pattern only if the color option is set. This algorithm is efficient and
fast but it is hard to parallelize it.
To apply the parallel algorithm, a part of the original grep code is edited. However, almost all parts
of the code for the parallel algorithm is separately added by extra source files which are ‘vuset.c’ and
‘vuset.h’. The program loads data into buffer similarly to the original grep. Then, the algorithm finds
all occurrences of the pattern in the buffer and returns them as a linked list. This linked list acts as a
cache to print lines. The linked list is usually not fast for random accesses. However, it works effectively
because the program should print the results sequentially. Although this algorithm will not be effective
for small files, it works well in common cases.
4. EXPERIMENTS & RESULTS
4.1. TEST ENVIRONMENT
The program is implemented using openmp and tested on stampede. To compare with sequential gnu
grep, parallel parts are added to the GNU grep code. By doing so, many functions of original GNU grep
could be used for parallel grep. Each case is tested 10 times and averaged. Largemem queue of stampede
is used to test on 32 cores per a node. The execution time is measured using time command of linux.
A large test file is generated for the test. The size of the file is about 600MB. It consists of 5 millions
of lines and each line has 12 random words. The file is created by a ruby script using the dictionary in
Linux. Also, the program is tested for many small files. Test files are source codes of autotools(automake,
autoconf, and m4) and there are 3,216 text files.
4.2. RESULTS
4.2.1. TESTS ON A LARGE FILE
First, the program is tested on a file larger than 600MB. The pattern is ‘apple’ and it occurs in the file
several times, which means this is not the worst case. The result is below:
4
0 5 10 15 20 25 30
0
2
4
6
8
number of threads
executiontime(seconds)
Figure 4.1: Execution times of the normal case (there are occurrences of the pattern)
As seen in the result, the execution time decreases as the number of threads increases. The execution
time is 1.3374 seconds when the sequential grep (option -F) is used. However, comparing with the origi-
nal grep is meaningless because the GNU grep has been developed and optimized for many years unlike
the parallel grep. Nevertheless, the result shows the possibility that the performance of the parallel grep
can be better than the sequential one.
The second test is at the worst case. In this case, there is no occurrence of the pattern in the test file.
The result is below:
0 5 10 15 20 25 30
0
1
2
3
4
number of threads
executiontime
Figure 4.2: Execution times at the worst case (there is no occurrence of the pattern)
5
Although this is the worst case, the execution time is less than the first test. The first test takes more
time because it includes the time to print lines. Therefore, the process for the first test becomes I/O
bound.
Finally, the program is tested with a periodic pattern. The result is below:
0 5 10 15 20 25 30
1
2
3
4
5
6
number of threads
executiontime
Figure 4.3: Execution times at the worst case when the pattern ‘abcdeabcdeabc’ is periodic.
It shows a similar result with the nonperiodic case. In all of the tests, the execution time doesn’t de-
creases much as the number of threads increases. After analyzing, it is found that there are many se-
quential parts that cannot be parallelized. For example, codes to read files to buffer or print lines should
runs sequentially. That’s the limitation of the project and the most of the codes should be modified to
parallelize more.
6
4.2.2. TESTS ON MULTIPLE FILES
The program is also tested for multiple small files(3,216 files from autotools source codes). Results of
normal and the worst case are below:
0 5 10 15 20 25 30
1.3
1.4
1.5
1.6
number of threads
executiontime
Figure 4.4: Execution times of a normal case for many small files
0 5 10 15 20 25 30
1
1.02
1.04
1.06
1.08
1.1
number of threads
executiontime
Figure 4.5: Execution times at the worst case for many small files
As we can see from the result above, this program is not efficient for many small files. The reason
is that the overhead coming from frequent parallelizing is high. In other words, it takes a long time to
frequently create and join threads.
7
There is a way to overcome this. Instead of loading only a file to buffer, read multiple files to a larger
buffer and store the positions of the beginning and the end of files. And find all occurrences of the pattern
in the buffer. Finally, print lines and file names using the stored offsets.
4.2.3. COMPARISON WITH ANOTHER METHOD TO PARALLELIZE
There is not a good way to parallelize grep and that’s why I started this project. A simple way to parallelize
grep is using GNU parallel. GNU parallel tool is a command-line driven utility for Linux or other Unix-
like operating systems which allows the user to execute shell scripts in parallel. For example, ‘parallel
-j 20 –pipe –block 1024k grep -F –color=always apple < text.txt’ command can easily parallelize the grep
command. The command divides a large file into several blocks of size 1024KB and executes the grep in
parallel. However, the execution time was 8.55 seconds when the example command is executed with
the same test file and environment. Moreover, the result is different from the original grep. Thus, the
parallel command cannot effectively parallelize the grep.
5. CONCLUSIONS
By using parallel string matching algorithm, parallel grep is implemented and tested. Unfortunately, the
result shows that it is not ideal to substitute the original grep at this time. It requires more works to solve
problems.
Although this algorithm may not be the best for GNU grep, it can be used for other programs. For
example, gzip which is a file compression tool uses string matching algorithm to compress files. If the
algorithm is applied to it, the compression time would decrease when using multi-core systems. And the
find tool searches files by name for unix-like systems. Not only that, there are many unix utilities using a
string matching algorithm. If the algorithm is implemented as a library, it can be utilized for many areas.
By doing so, it would be possible to increase the load balance and efficiency of these tools on multiple
core machines.
A. HOW TO BUILD AND EXECUTE THE PROGRAM
A.1. BUILD
The GNU grep is packaged by autotools and thus, it can be compiled and installed using a common
command to build package below:
> configure && make && make install
If you use a system that the root permission is not granted, you can change the directory to be installed.
> configure - -prefix=<path to be installed>
If the latest version of autotools is not installed, the program may not be compiled. In this case, autotools
(automake, autoconf, and m4) should be installed before compiling.
automake 1.15: http://ftp.gnu.org/gnu/automake/automake-1.15.tar.gz
autoconf 2.69: http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
m4 1.4.17: http://ftp.gnu.org/gnu/m4/m4-1.4.17.tar.gz
Likewise, the path to be installed can change using configure command. To execute the program on
stampede, these tools should be installed on your account.
8
A.2. EXECUTION
The program runs in parallel with the option ‘-p’.
Usage: grep -p <number of threads> <pattern> <files>
Ex> grep -p 4 - -color=always apple test.txt
Ex> grep -p 4 - -color=always -r apple /test/*
The number of patterns should be one.
REFERENCES
[1] Maxime Crochemore, Zvi Galil, Leszek Gasieniec, Kunsoo Park, and Wojciech Rytter. Constant-time
randomized parallel string matching. SIAM Journal on Computing, 26(4):950–960, 1997.
[2] Zvi Galil. Optimal parallel algorithms for string matching. Information and Control, 67(1):144–157,
1985.
[3] Uzi Vishkin. Optimal parallel pattern matching in strings. Springer, 1985.
9

More Related Content

What's hot

Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
WVKULAK13_submission_14
WVKULAK13_submission_14WVKULAK13_submission_14
WVKULAK13_submission_14
Max De Koninck
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry2
 

What's hot (20)

Randomized Computation
Randomized ComputationRandomized Computation
Randomized Computation
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority TransitionsSimulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Sync, async and multithreading
Sync, async and multithreadingSync, async and multithreading
Sync, async and multithreading
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
C traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmersC traps and pitfalls for C++ programmers
C traps and pitfalls for C++ programmers
 
Acm aleppo cpc training eighth session
Acm aleppo cpc training eighth sessionAcm aleppo cpc training eighth session
Acm aleppo cpc training eighth session
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
 
OOP and FP
OOP and FPOOP and FP
OOP and FP
 
Computer programming 2 Lesson 10
Computer programming 2  Lesson 10Computer programming 2  Lesson 10
Computer programming 2 Lesson 10
 
DEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World HaskellDEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World Haskell
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Introduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word EmbeddingsIntroduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word Embeddings
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
 
Decision Making & Loops
Decision Making & LoopsDecision Making & Loops
Decision Making & Loops
 
WVKULAK13_submission_14
WVKULAK13_submission_14WVKULAK13_submission_14
WVKULAK13_submission_14
 
OpenMP
OpenMPOpenMP
OpenMP
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 

Similar to report

A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithms
zexin wan
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
algocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff inalgocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff in
javed75
 
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docxISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
priestmanmable
 
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdfADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
RGPV De Bunkers
 

Similar to report (20)

A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithms
 
Gk3611601162
Gk3611601162Gk3611601162
Gk3611601162
 
Parallel programming Comparisions
Parallel programming ComparisionsParallel programming Comparisions
Parallel programming Comparisions
 
Huffman Text Compression Technique
Huffman Text Compression TechniqueHuffman Text Compression Technique
Huffman Text Compression Technique
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithms
 
DATA STRUCTURE.pdf
DATA STRUCTURE.pdfDATA STRUCTURE.pdf
DATA STRUCTURE.pdf
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
Performance Evaluation of Parallel Bubble Sort Algorithm on Supercomputer IMAN1
 
Algorithm.pptx
Algorithm.pptxAlgorithm.pptx
Algorithm.pptx
 
Algorithm.pptx
Algorithm.pptxAlgorithm.pptx
Algorithm.pptx
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
 
On Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsOn Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic Programs
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Discrete structure ch 3 short question's
Discrete structure ch 3 short question'sDiscrete structure ch 3 short question's
Discrete structure ch 3 short question's
 
genalg
genalggenalg
genalg
 
algocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff inalgocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff in
 
Performance evaluation of parallel bubble sort algorithm on supercomputer IMAN1.
Performance evaluation of parallel bubble sort algorithm on supercomputer IMAN1.Performance evaluation of parallel bubble sort algorithm on supercomputer IMAN1.
Performance evaluation of parallel bubble sort algorithm on supercomputer IMAN1.
 
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docxISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
ISTA 130 Lab 21 Turtle ReviewHere are all of the turt.docx
 
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdfADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
ADA Unit-1 Algorithmic Foundations Analysis, Design, and Efficiency.pdf
 

report

  • 1. UNIVERSITY OF UTAH, CS 6230: PARALLEL AND HIGH PERFORMANCE COMPUTING Fast fgrep using parallel string matching algorithm Myungho Jung May 10, 2015 1. INTRODUCTION Grep is a common tool to search patterns in files and to print lines for Unix-like systems. GNU grep can find the pattern fast using string matching algorithms like Boyer-Moore algorithm. However, the GNU grep cannot maximize the performance on multi-core systems because it is implemented using sequential algorithms. The purpose of this project is to implement parallel algorithm for string matching and to apply to the existing grep code. It would be hard to show the better performance than GNU grep since it has been developed and optimized for a long time. Nevertheless, this project will be able to propose a new possibility for grep to utilize multi-core system. GNU grep is divided into several parts. First, egrep which is equal to option ‘-E’ searches patterns of extended regular expression. Second, the grep can also find fixed strings using fgrep or option ‘-F’. Lastly, pcregrep searches patterns of Perl regular expressions in files. As a first step to parallelize GNU grep, fgrep will be parallelized in this project. Fgrep uses two algorithms depending on the number of patterns. If the number of input string is one, Boyer-Moore algorithm is used. And if the input string is more than one, the grep tool finds the pattern by Commentz-Walter algorithm. In this project, the first part will be parallelized. 1
  • 2. 2. METHODS There are several methods to parallelize grep. First, file level parallelism is possible. For example, while the first thread searches the pattern in the first file, the second thread can search the other file at the same time. It is easy to implement but it is impossible to search a large file in parallel. And results of different files will be mixed. Second, the grep can be parallelized for each line. But, it has similar problems with the first one. Lines will not be printed sequentially and it cannot be applied to binary files because there is no carriage return character. So, the purpose of the project is to show the same result as the sequential algorithm. Many parallel algorithms for string matching are studied. One of the earliest optimal algorithms is devised by Zvi Galil [2]. However, this algorithm is limited to fixed alphabets. Uzi Vishkin improved this algorithm to work for general cases [3]. Constant time algorithm is also developed by Zvi Galil at el using randomization [1]. In this project, the algorithm is based on Uzi Vishkin’s idea and several mechanisms are added for grep utility. 2.1. DEFINITIONS It is necessary to explain definitions for this algorithm. • Given a pattern of length m and a text of length n, the purpose of the algorithms is searching the pattern in the text. • Period : Given a string of length m, X is a period if the string is X k X (X is repeated k times before X which is the prefix of X ) • The period : The shortest period if there are more than one period • Periodic : A string is periodic if the length is less than or equal to m/2. (m is the length of string) Ex>‘abaabcab’ is not periodic but ‘abcabcab’ is periodic. • Witness : An array indicating indices of the positions that two overlapped characters are different. Ex> Given witness[i] = j, this means that when a string overlaps on top of the the same string at the index of i, j is the index of upper string where two characters of strings are different at the position. • Duel : Using the characteristics of witness, we can guarantee that at least one of the characters of patterns at the position should be different from that of the text. 2
  • 3. 2.2. ALGORITHMS 2.2.1. PATTERN ANALYSIS First, the witness array should be created from the pattern string. This part can be parallelized but it is not useful in practice because the length of the pattern is usually short. Therefore, the overhead for parallelism would be higher than the sequential algorithm. Algorithm 1 The algorithm for pattern analysis function PATTERNANALYSIS(Pattern[1..m]) for i = 1;i < logm;++i do Divide the pattern into blocks of size 2i Get j which is a candidate of the first block for matching Compute witness[j] with patter n[1..2i+1 ] by brute-force D : O(1),W : O(2i ) if witness[j] = 0 then Apply the duel function on blocks left else Find the largest α (α ≥ i +1 and j is the period for patter n[1..2α ] but not for patter n[1..2α+1 ]) D : O(1),W : O(2α ) if α == logm then return witness else Eliminate all candidates in the first α−1 blocks using the duel function i = α Continue end if end if return witness end for end function 2.2.2. TEXT ANALYSIS (NONPERIODIC CASE) This is a common case. If the pattern string is not periodic, follow the algorithm below: 1. Partition the text into blocks of size m/2 2. For each block, eliminate all but one candidate using the duel function. D : O(logm),W : O(m) 3. For each candidate, check if the string is matched at the position by brute-force algorithm. D : O(1),W : O(n) Finally, the total depth is O(logm), and the work is O(n). 2.2.3. TEXT ANALYSIS (PERIODIC CASE) In reality, it is a rare case that the pattern is periodic. However, if the pattern is periodic, the algorithm above cannot be applied because there are more than one position where the string can be matched in each block of size m/2 . The algorithm is below: 3
  • 4. 1. Given p is the period of the pattern, compute all candidates using the prefix of patter n[1..2p −1]. D : O(logm),W : O(n) 2. For each occurence, check if the p2 p (p is prefix of p at the end of the pattern) occurs at the postiion. D : O(logm),W : O(p) 3. At the position of i, if the p2 p occurs all of the positions of i + j p(j = 0..m/p), the pattern occurs at the position. D : O(logm),W : O(n) Therefore, the total depth is O(logm), and the work is O(n). 3. IMPLEMENTATION DETAILS There are many parts that can be parallelized in this algorithm. However, it is not a good way to par- allelize every part because it would cause more overhead. Therefore, only the outermost loop is par- allelized. Applying this algorithm to grep is little tricky. The sequential algorithm for the original grep returns the first occurrence of the pattern in the buffer. However, parallel algorithm can only find all occurrences regardless of the order. Thus, the code should be modified before applying. In the sequential grep, it reads a file into a buffer. If the file is larger than the buffer, the program reads as much data as the size of buffer. Then, it doesn’t search the string line by line. To improve the performance, it returns the position of the first occurrence of the string and prints the line. The algorithm searches other occurrences of the pattern only if the color option is set. This algorithm is efficient and fast but it is hard to parallelize it. To apply the parallel algorithm, a part of the original grep code is edited. However, almost all parts of the code for the parallel algorithm is separately added by extra source files which are ‘vuset.c’ and ‘vuset.h’. The program loads data into buffer similarly to the original grep. Then, the algorithm finds all occurrences of the pattern in the buffer and returns them as a linked list. This linked list acts as a cache to print lines. The linked list is usually not fast for random accesses. However, it works effectively because the program should print the results sequentially. Although this algorithm will not be effective for small files, it works well in common cases. 4. EXPERIMENTS & RESULTS 4.1. TEST ENVIRONMENT The program is implemented using openmp and tested on stampede. To compare with sequential gnu grep, parallel parts are added to the GNU grep code. By doing so, many functions of original GNU grep could be used for parallel grep. Each case is tested 10 times and averaged. Largemem queue of stampede is used to test on 32 cores per a node. The execution time is measured using time command of linux. A large test file is generated for the test. The size of the file is about 600MB. It consists of 5 millions of lines and each line has 12 random words. The file is created by a ruby script using the dictionary in Linux. Also, the program is tested for many small files. Test files are source codes of autotools(automake, autoconf, and m4) and there are 3,216 text files. 4.2. RESULTS 4.2.1. TESTS ON A LARGE FILE First, the program is tested on a file larger than 600MB. The pattern is ‘apple’ and it occurs in the file several times, which means this is not the worst case. The result is below: 4
  • 5. 0 5 10 15 20 25 30 0 2 4 6 8 number of threads executiontime(seconds) Figure 4.1: Execution times of the normal case (there are occurrences of the pattern) As seen in the result, the execution time decreases as the number of threads increases. The execution time is 1.3374 seconds when the sequential grep (option -F) is used. However, comparing with the origi- nal grep is meaningless because the GNU grep has been developed and optimized for many years unlike the parallel grep. Nevertheless, the result shows the possibility that the performance of the parallel grep can be better than the sequential one. The second test is at the worst case. In this case, there is no occurrence of the pattern in the test file. The result is below: 0 5 10 15 20 25 30 0 1 2 3 4 number of threads executiontime Figure 4.2: Execution times at the worst case (there is no occurrence of the pattern) 5
  • 6. Although this is the worst case, the execution time is less than the first test. The first test takes more time because it includes the time to print lines. Therefore, the process for the first test becomes I/O bound. Finally, the program is tested with a periodic pattern. The result is below: 0 5 10 15 20 25 30 1 2 3 4 5 6 number of threads executiontime Figure 4.3: Execution times at the worst case when the pattern ‘abcdeabcdeabc’ is periodic. It shows a similar result with the nonperiodic case. In all of the tests, the execution time doesn’t de- creases much as the number of threads increases. After analyzing, it is found that there are many se- quential parts that cannot be parallelized. For example, codes to read files to buffer or print lines should runs sequentially. That’s the limitation of the project and the most of the codes should be modified to parallelize more. 6
  • 7. 4.2.2. TESTS ON MULTIPLE FILES The program is also tested for multiple small files(3,216 files from autotools source codes). Results of normal and the worst case are below: 0 5 10 15 20 25 30 1.3 1.4 1.5 1.6 number of threads executiontime Figure 4.4: Execution times of a normal case for many small files 0 5 10 15 20 25 30 1 1.02 1.04 1.06 1.08 1.1 number of threads executiontime Figure 4.5: Execution times at the worst case for many small files As we can see from the result above, this program is not efficient for many small files. The reason is that the overhead coming from frequent parallelizing is high. In other words, it takes a long time to frequently create and join threads. 7
  • 8. There is a way to overcome this. Instead of loading only a file to buffer, read multiple files to a larger buffer and store the positions of the beginning and the end of files. And find all occurrences of the pattern in the buffer. Finally, print lines and file names using the stored offsets. 4.2.3. COMPARISON WITH ANOTHER METHOD TO PARALLELIZE There is not a good way to parallelize grep and that’s why I started this project. A simple way to parallelize grep is using GNU parallel. GNU parallel tool is a command-line driven utility for Linux or other Unix- like operating systems which allows the user to execute shell scripts in parallel. For example, ‘parallel -j 20 –pipe –block 1024k grep -F –color=always apple < text.txt’ command can easily parallelize the grep command. The command divides a large file into several blocks of size 1024KB and executes the grep in parallel. However, the execution time was 8.55 seconds when the example command is executed with the same test file and environment. Moreover, the result is different from the original grep. Thus, the parallel command cannot effectively parallelize the grep. 5. CONCLUSIONS By using parallel string matching algorithm, parallel grep is implemented and tested. Unfortunately, the result shows that it is not ideal to substitute the original grep at this time. It requires more works to solve problems. Although this algorithm may not be the best for GNU grep, it can be used for other programs. For example, gzip which is a file compression tool uses string matching algorithm to compress files. If the algorithm is applied to it, the compression time would decrease when using multi-core systems. And the find tool searches files by name for unix-like systems. Not only that, there are many unix utilities using a string matching algorithm. If the algorithm is implemented as a library, it can be utilized for many areas. By doing so, it would be possible to increase the load balance and efficiency of these tools on multiple core machines. A. HOW TO BUILD AND EXECUTE THE PROGRAM A.1. BUILD The GNU grep is packaged by autotools and thus, it can be compiled and installed using a common command to build package below: > configure && make && make install If you use a system that the root permission is not granted, you can change the directory to be installed. > configure - -prefix=<path to be installed> If the latest version of autotools is not installed, the program may not be compiled. In this case, autotools (automake, autoconf, and m4) should be installed before compiling. automake 1.15: http://ftp.gnu.org/gnu/automake/automake-1.15.tar.gz autoconf 2.69: http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz m4 1.4.17: http://ftp.gnu.org/gnu/m4/m4-1.4.17.tar.gz Likewise, the path to be installed can change using configure command. To execute the program on stampede, these tools should be installed on your account. 8
  • 9. A.2. EXECUTION The program runs in parallel with the option ‘-p’. Usage: grep -p <number of threads> <pattern> <files> Ex> grep -p 4 - -color=always apple test.txt Ex> grep -p 4 - -color=always -r apple /test/* The number of patterns should be one. REFERENCES [1] Maxime Crochemore, Zvi Galil, Leszek Gasieniec, Kunsoo Park, and Wojciech Rytter. Constant-time randomized parallel string matching. SIAM Journal on Computing, 26(4):950–960, 1997. [2] Zvi Galil. Optimal parallel algorithms for string matching. Information and Control, 67(1):144–157, 1985. [3] Uzi Vishkin. Optimal parallel pattern matching in strings. Springer, 1985. 9