SlideShare a Scribd company logo
1 of 3
Download to read offline
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Liwei Ren
Data Center Research
Trend Micro
Cupertino, USA
e-mail: liwei_ren@trendmicro.com
Abstract— The bad character shift rule of Boyer-Moore string
search algorithm is studied in this paper for the purpose of
extending it to more general string match problems. An abstract
problem of string match is defined in general. An optimized string
match algorithm based one the bad character heuristics is
proposed to solve the abstract match problem efficiently.
Keywords: pattern; string; sequence; search; match; bad
character; Boyer-Moore
I. INTRODUCTION
String searching is a classic problem in many text
processing applications. Among many string searching
algorithms, Boyer-Moore algorithm [1] is a particular
efficient one for single pattern string match. It uses both
the concepts of good suffix shift and bad character heuristics
to accelerate the string match. Two shift tables are
established to determine how many shifts to make after
match fails. The algorithm shifts the pattern according to the
larger shift given by two shift tables.
The Horspool algorithm [2] is the best known variant of
Boyer-Moore algorithm. It only uses the bad character
heuristics to build the shift table. There are other variants as
well such as the algorithms given by Raita [3] and Sunday
[4].
In summary, the essence of all the Boyer-Moore style
algorithms is to skip the unnecessary character comparisons
as many as possible.
If we introduce the concept of match window as a
substring of the reference string , the naïve string searching
algorithm is basically a sliding window match algorithm
with N-M+1 match windows, where N and M are the sizes
of the reference string and the pattern respectively. Hence,
in practice, the Boyer-Moore algorithm selects only a few of
candidate match windows that possibly contains the target
strings. This is done by ruling out many windows that
definitely have no target substrings.
The bad character shift with Boyer-Moore algorithm can
take a weaker form as character identity verification. It
verifies whether a given character in the reference string
belongs to the alphabet of the search pattern or not.
We can extends the concepts of both match window and
character identity verification to other string match
problems, for instance, the regular expression based pattern
match problem which has many applications in practice.
This paper proposes an abstract problem of string match
which includes the two classic string matching problems, i.e.,
single pattern string search and regular expression pattern
match, as the special cases.
An efficient algorithm is constructed to solve the abstract
problem based on the concepts of match window and
character identity verification.
II. A GENERAL PROBLEM OF STRING MATCH
In this section, we uses an abstract model to present
string match problems in more general terms. With this
model, many practical problems can be covered beyond the
scope of both single pattern string searching and regular
expression based string matching.
Before we define the problem, lets observes the follows
from classic string match problems:
1. The target string has a small alphabet S when
comparing to the whole character space. In the case
of single pattern string search problem, S consists
of all unique characters of the pattern string. In the
case of regular expression match, it is typical that
most entities defined by regular expression patterns
in practical applications have small alphabets as
well. Examples of these entities include IP
addresses, dates, credit card numbers, bank account
numbers , ID numbers and etc..
2. The target strings have well-defined minimum and
maximum lengths. This is obvious with the single
pattern search problem. As to the regular
expression match, it is not uncommon that these
two numbers can be pre-defined. For example, to
match master credit card number from a text, the
minimum length is 16 while the maximum length
can be defined as 19 if one also includes the format
dddd-dddd-dddd-dddd.
Pattern Match Function: For any given reference
string R and the match window R[s,e], a pattern match
function F can extract a target string, based on well-defined
matching rules, from the window R[s,e] if there is any,
otherwise it returns NIL. The function can be denoted as
F(R,s,e). The match mechanism is defined inside F itself.
Abstract Problem of String Match: The string match
problem is to retrieve all target substrings from a given
reference string R[1,…,N] with pattern match function F(R,
s,t), where the pattern match function F defines what the
target substrings should be with the following conditions:
 All target substrings consist of characters from a
small alphabet S.
 The length of each target substring falls in the
interval [m,M] where m is the minimum length and
M the maximum.
Both single pattern string search and regular expression
pattern search are special cases of this abstract match
problem.
Yet another example is the problem of regular
expression pattern match with checksum validation that
requires all target substrings must be validated by a
checksum. This example is useful for data discovery
systems for minimizing false positives.
III. OPTIMIZED STRING MATCH ALGORITHM
A naïve algorithm to solve the abstract problem of string
match can be easily constructed. It is based on the
mechanism of sliding match windows.
Naïve String Match Algorithm : One starts from the 1st
match window R[1,M]. Call match function F. If a match
exists, obtain the target substring and move to the next
match window immediately after the target substring,
otherwise, slide the match window one step further. Repeat
this until the reference string R is exhausted.
With the naïve string match, one will go through N-M+1
matching windows if there is no target string at all. That is
not efficient.
We can reduce the number of matching windows if we
are able to determine quickly that a match windows does not
contain a target string at all. That can be done with the
character identity verification. Lets construct the optimized
algorithm as follows.
Optimized String Match Algorithm:
Input: Minimum length m, maximum length M, target
string alphabet S, pattern match function F, reference
string R[1,…,N]
Matching Procedure:
Step 1: set s=1
Step 2: Let r= MIN(s+M-1, N)
Step 3: If r-s<m-1, RETURN
Step 4: Set match window as W=T[s, …,r]
Step 5: Set sub-window w=T[s,…,s + m - 1]. Lets find
out the rightmost character T[s + p] that does not belong
to S, set s = s + p, go to step 2
Step 6: Otherwise, all characters of sub-window w pass
identity verification. Lets match with the function
F(R,s,r):
a. If result is NIL, let s=s+1
b. If a target substring is matched as T[t,e], save
it, let s=e+1
Step 7: Go to step 2
Output: Matches
IV. ANALYSIS OF THE ALGORITHM
The algorithm starts with the first match window defined
by step 1. The key step for optimization is step 5. Step 5
does the identity verification for characters in the sub-
window w. The verification is done character by character
from the rightmost of the sub-window. When any character
fails the verification, we slide the match window ahead with
multiple steps instead of one step. This step is somewhat
like the Raita’s [3] multiple point checking. It may cost
more time when the target substring does exist in the
window, however, in most cases, it reduces the number of
matching windows by shifting multiple steps. The best case
is that we shift m steps ahead if no character in w belongs to
S. The step 6 does the pattern match. If the match fails,
unlike the Boyer-Moore or Horspool algorithms, there is no
shift table that advises shifting more than one step.
The optimized algorithm is not designed to exceed
Boyer-Moore algorithm or its variants for single pattern
string match. Instead, its purpose is to extend the concept of
bad character shift rule to more general case. This extension
has immediate applications in two special pattern match
problems:
 Regular expression pattern match.
 Regular expression pattern match with checksum
validation.
Example 1: One needs to search all social security
numbers (SSN) from a text with the regular expression
pattern defined as d{9}|d{3}-d{2}-d{3}. The alphabet
S={0,1,2,3,4,5,6,7,8,9,-} has 11 characters. The minimum
and maximum length for SSN are 9 and 11 respectively. The
best case is that we do not need to apply regular expression
pattern match at all if the text does not contain any numbers
or -.
Example 2: One needs to search Master or Visa credit
card numbers (CCN) from a text with the regular
expression pattern defined as d{16}|d{4}-d{4}-d{4}-
d{4}. The alphabet S={0,1,2,3,4,5,6,7,8,9,-} has 11
characters. The minimum and maximum lengths for SSN
are 16 and 19 respectively. The checksum applies the Luhn
algorithm [5] to validate the CCN.
V. PROBLEM OF MATCHING SEQUENCE OF OBJECTS
This paper has been focusing on problem of string
search. Due to the fact that we have been using general
terms to discuss the problem and the solution, the abstract
problem of string match can be extended to more general
problem. This is the problem of sequence match if we define
a sequence as a sequence of objects and a subsequence of
objects as a consecutive subsequence. We can achieve this
by extending two basic concepts --- character and string.
Lets use object instead of character and sequence instead of
string. Then pattern match function, abstract problem of
sequence match and optimized algorithm can be introduced
accordingly. It is not sure yet whether this further
abstraction of problem has any practical implication.
However, it deserves a theoretical perspective.
VI. CONCLUSION
We presented a general problem of string match and its
optimized algorithm inspired by the bad character shift rule
of Boyer-Moore string search algorithm. The abstract
nature of the problem allows us to include both single
pattern string search and regular expression pattern match as
its two special cases.
While the optimized algorithm discussed is not better
than Boyer-Moore type string search algorithms, it can be
used for match optimization in other pattern problem such as
regular expression pattern match or the problem of regular
expression pattern match with checksum validation. One
can even use it for many other pattern match problems
beyond the scope of strings of characters such as sequence of
objects, where the concept of object can be very general.
ACKNOWLEDGMENT
Special thanks to Joe Lin, the engineering site director at
Trend Micro for his support. Without his sponsorship, this
research work will not be possible.
REFERENCES
[1] R. Boyer, J. Moore, "A fast string searching algorithm",
Comm. ACM vol 20, pp. 762–772., 1977
[2] R. Horspool, "Practical fast searching in strings", Software -
Practice & Experience , vol.10 (6), pp. 501–506, 1980
[3] T. Raita, “Tuning the Boyer–Moore–Horspool String
Searching Algorithm”, Software - Practice & Experience , vol
22(10), pp. 879–884, 1992
[4] D. Sunday, “Very Fast Substring Search Algorithm”, Comm.
ACM, vol 33, issue 8, pp. 132-142 , 1990
[5] http://en.wikipedia.org/wiki/Luhn_algorithm.

More Related Content

What's hot

C Sharp Nagina (1)
C Sharp Nagina (1)C Sharp Nagina (1)
C Sharp Nagina (1)guest58c84c
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceTransweb Global Inc
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2rohassanie
 
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Koji Matsuda
 
The Next Best String
The Next Best StringThe Next Best String
The Next Best StringKevlin Henney
 
Data Types in C
Data Types in CData Types in C
Data Types in Cyarkhosh
 
Chapter 13.1.1
Chapter 13.1.1Chapter 13.1.1
Chapter 13.1.1patcha535
 
Mid term exam
Mid term examMid term exam
Mid term examH K
 
Graph representation of context-free grammar
Graph representation of context-free grammarGraph representation of context-free grammar
Graph representation of context-free grammarAlex Shkotin
 
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERS
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERSANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERS
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERScsandit
 
Yara user's manual 1.6
Yara user's manual 1.6Yara user's manual 1.6
Yara user's manual 1.6Vijay Kumar
 

What's hot (17)

C Sharp Nagina (1)
C Sharp Nagina (1)C Sharp Nagina (1)
C Sharp Nagina (1)
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer Science
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2FP 201 Unit 2 - Part 2
FP 201 Unit 2 - Part 2
 
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
 
Mql4 manual
Mql4 manualMql4 manual
Mql4 manual
 
Compiler lec 8
Compiler lec 8Compiler lec 8
Compiler lec 8
 
The Next Best String
The Next Best StringThe Next Best String
The Next Best String
 
Data Types in C
Data Types in CData Types in C
Data Types in C
 
Lecture02(constants, variable & data types)
Lecture02(constants, variable & data types)Lecture02(constants, variable & data types)
Lecture02(constants, variable & data types)
 
Chapter 13.1.1
Chapter 13.1.1Chapter 13.1.1
Chapter 13.1.1
 
Mid term exam
Mid term examMid term exam
Mid term exam
 
4- Drumacon
4- Drumacon4- Drumacon
4- Drumacon
 
5- Frasers Pub
5- Frasers Pub5- Frasers Pub
5- Frasers Pub
 
Graph representation of context-free grammar
Graph representation of context-free grammarGraph representation of context-free grammar
Graph representation of context-free grammar
 
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERS
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERSANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERS
ANOTHER PROOF OF THE DENUMERABILITY OF THE COMPLEX NUMBERS
 
Yara user's manual 1.6
Yara user's manual 1.6Yara user's manual 1.6
Yara user's manual 1.6
 

Viewers also liked

AutoEsential.ro curs mecanica
AutoEsential.ro curs mecanicaAutoEsential.ro curs mecanica
AutoEsential.ro curs mecanicaAutoEsential.ro
 
Webquest 6 v 16 colonia y s. xix (1)
Webquest 6 v 16 colonia y s. xix (1)Webquest 6 v 16 colonia y s. xix (1)
Webquest 6 v 16 colonia y s. xix (1)Vicky Johanne Smith
 
Miksi siirtyä Windows 10:een
Miksi siirtyä Windows 10:eenMiksi siirtyä Windows 10:een
Miksi siirtyä Windows 10:een3 Step IT Suomi
 
Fashion Report №13
Fashion Report №13Fashion Report №13
Fashion Report №13Anna Levi
 
Forth year award in engineering
Forth year award in engineeringForth year award in engineering
Forth year award in engineeringNguyen Vu Tran
 
Huelga en portugal 17 y 18 mayo
Huelga en portugal 17 y 18 mayoHuelga en portugal 17 y 18 mayo
Huelga en portugal 17 y 18 mayoIberia
 
T4 Word of the week 2 pneumatic
T4 Word of the week 2 pneumaticT4 Word of the week 2 pneumatic
T4 Word of the week 2 pneumaticDRiversVLC
 
Accurate localization of impacted supernumerary tooth associated with dentige...
Accurate localization of impacted supernumerary tooth associated with dentige...Accurate localization of impacted supernumerary tooth associated with dentige...
Accurate localization of impacted supernumerary tooth associated with dentige...Indian dental academy
 
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...ortoma1926
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithmKamal Nayan
 

Viewers also liked (20)

Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
AutoEsential.ro curs mecanica
AutoEsential.ro curs mecanicaAutoEsential.ro curs mecanica
AutoEsential.ro curs mecanica
 
Webquest 6 v 16 colonia y s. xix (1)
Webquest 6 v 16 colonia y s. xix (1)Webquest 6 v 16 colonia y s. xix (1)
Webquest 6 v 16 colonia y s. xix (1)
 
Miksi siirtyä Windows 10:een
Miksi siirtyä Windows 10:eenMiksi siirtyä Windows 10:een
Miksi siirtyä Windows 10:een
 
GPR AFR Opinion
GPR AFR OpinionGPR AFR Opinion
GPR AFR Opinion
 
Fashion Report №13
Fashion Report №13Fashion Report №13
Fashion Report №13
 
Evaluation 2
Evaluation 2Evaluation 2
Evaluation 2
 
Forth year award in engineering
Forth year award in engineeringForth year award in engineering
Forth year award in engineering
 
мхитар себастаци
мхитар себастацимхитар себастаци
мхитар себастаци
 
Derecho con foto
Derecho con fotoDerecho con foto
Derecho con foto
 
Draft Drawings
Draft DrawingsDraft Drawings
Draft Drawings
 
Presentaciòn
PresentaciònPresentaciòn
Presentaciòn
 
Huelga en portugal 17 y 18 mayo
Huelga en portugal 17 y 18 mayoHuelga en portugal 17 y 18 mayo
Huelga en portugal 17 y 18 mayo
 
T4 Word of the week 2 pneumatic
T4 Word of the week 2 pneumaticT4 Word of the week 2 pneumatic
T4 Word of the week 2 pneumatic
 
Accurate localization of impacted supernumerary tooth associated with dentige...
Accurate localization of impacted supernumerary tooth associated with dentige...Accurate localization of impacted supernumerary tooth associated with dentige...
Accurate localization of impacted supernumerary tooth associated with dentige...
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...
4824 taller de juego, tecnología y conocimiento infantil. doc. orlando torres...
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
 

Similar to Extending Boyer-Moore Algorithm to an Abstract String Matching Problem

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsIJERA Editor
 
Discrete structure ch 3 short question's
Discrete structure ch 3 short question'sDiscrete structure ch 3 short question's
Discrete structure ch 3 short question'shammad463061
 
Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...csandit
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programsakruthi k
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemIRJET Journal
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern RecognitionMustafa Salam
 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithmszexin wan
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network Jie Bao
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceKALPANATCSE
 
Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Vahid Farahmandian
 
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms
The Improved Hybrid Algorithm for the Atheer and  Berry-ravindran Algorithms  The Improved Hybrid Algorithm for the Atheer and  Berry-ravindran Algorithms
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms IJECEIAES
 
brown.ppt for identifying rabin karp algo
brown.ppt for identifying rabin karp algobrown.ppt for identifying rabin karp algo
brown.ppt for identifying rabin karp algoSadiaSharmin40
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniquepptGyandeep Kansal
 

Similar to Extending Boyer-Moore Algorithm to an Abstract String Matching Problem (20)

An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
A Survey of String Matching Algorithms
A Survey of String Matching AlgorithmsA Survey of String Matching Algorithms
A Survey of String Matching Algorithms
 
4 report format
4 report format4 report format
4 report format
 
4 report format
4 report format4 report format
4 report format
 
3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf
 
3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf3. Monalisha Pattnaik.pdf
3. Monalisha Pattnaik.pdf
 
Discrete structure ch 3 short question's
Discrete structure ch 3 short question'sDiscrete structure ch 3 short question's
Discrete structure ch 3 short question's
 
Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...Backtracking based integer factorisation, primality testing and square root c...
Backtracking based integer factorisation, primality testing and square root c...
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
 
A Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching AlgorithmsA Comparison of Serial and Parallel Substring Matching Algorithms
A Comparison of Serial and Parallel Substring Matching Algorithms
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1Core C# Programming Constructs, Part 1
Core C# Programming Constructs, Part 1
 
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms
The Improved Hybrid Algorithm for the Atheer and  Berry-ravindran Algorithms  The Improved Hybrid Algorithm for the Atheer and  Berry-ravindran Algorithms
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms
 
brown.ppt for identifying rabin karp algo
brown.ppt for identifying rabin karp algobrown.ppt for identifying rabin karp algo
brown.ppt for identifying rabin karp algo
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniqueppt
 

More from Liwei Ren任力偉

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇Liwei Ren任力偉
 
Introduction to Deep Neural Network
Introduction to Deep Neural NetworkIntroduction to Deep Neural Network
Introduction to Deep Neural NetworkLiwei Ren任力偉
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维Liwei Ren任力偉
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究Liwei Ren任力偉
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based SecurityLiwei Ren任力偉
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Liwei Ren任力偉
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsLiwei Ren任力偉
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemLiwei Ren任力偉
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsLiwei Ren任力偉
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential CompressionLiwei Ren任力偉
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
 

More from Liwei Ren任力偉 (20)

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇
 
企业安全市场综述
企业安全市场综述 企业安全市场综述
企业安全市场综述
 
Introduction to Deep Neural Network
Introduction to Deep Neural NetworkIntroduction to Deep Neural Network
Introduction to Deep Neural Network
 
聊一聊大明朝的火器
聊一聊大明朝的火器聊一聊大明朝的火器
聊一聊大明朝的火器
 
防火牆們的故事
防火牆們的故事防火牆們的故事
防火牆們的故事
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维
 
硅谷的那点事儿
硅谷的那点事儿硅谷的那点事儿
硅谷的那点事儿
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究
 
世纪猜想
世纪猜想世纪猜想
世纪猜想
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based Security
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillators
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problem
 
Math stories
Math storiesMath stories
Math stories
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and Solutions
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential Compression
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) Technology
 

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Extending Boyer-Moore Algorithm to an Abstract String Matching Problem

  • 1. Extending Boyer-Moore Algorithm to an Abstract String Matching Problem Liwei Ren Data Center Research Trend Micro Cupertino, USA e-mail: liwei_ren@trendmicro.com Abstract— The bad character shift rule of Boyer-Moore string search algorithm is studied in this paper for the purpose of extending it to more general string match problems. An abstract problem of string match is defined in general. An optimized string match algorithm based one the bad character heuristics is proposed to solve the abstract match problem efficiently. Keywords: pattern; string; sequence; search; match; bad character; Boyer-Moore I. INTRODUCTION String searching is a classic problem in many text processing applications. Among many string searching algorithms, Boyer-Moore algorithm [1] is a particular efficient one for single pattern string match. It uses both the concepts of good suffix shift and bad character heuristics to accelerate the string match. Two shift tables are established to determine how many shifts to make after match fails. The algorithm shifts the pattern according to the larger shift given by two shift tables. The Horspool algorithm [2] is the best known variant of Boyer-Moore algorithm. It only uses the bad character heuristics to build the shift table. There are other variants as well such as the algorithms given by Raita [3] and Sunday [4]. In summary, the essence of all the Boyer-Moore style algorithms is to skip the unnecessary character comparisons as many as possible. If we introduce the concept of match window as a substring of the reference string , the naïve string searching algorithm is basically a sliding window match algorithm with N-M+1 match windows, where N and M are the sizes of the reference string and the pattern respectively. Hence, in practice, the Boyer-Moore algorithm selects only a few of candidate match windows that possibly contains the target strings. This is done by ruling out many windows that definitely have no target substrings. The bad character shift with Boyer-Moore algorithm can take a weaker form as character identity verification. It verifies whether a given character in the reference string belongs to the alphabet of the search pattern or not. We can extends the concepts of both match window and character identity verification to other string match problems, for instance, the regular expression based pattern match problem which has many applications in practice. This paper proposes an abstract problem of string match which includes the two classic string matching problems, i.e., single pattern string search and regular expression pattern match, as the special cases. An efficient algorithm is constructed to solve the abstract problem based on the concepts of match window and character identity verification. II. A GENERAL PROBLEM OF STRING MATCH In this section, we uses an abstract model to present string match problems in more general terms. With this model, many practical problems can be covered beyond the scope of both single pattern string searching and regular expression based string matching. Before we define the problem, lets observes the follows from classic string match problems: 1. The target string has a small alphabet S when comparing to the whole character space. In the case of single pattern string search problem, S consists of all unique characters of the pattern string. In the case of regular expression match, it is typical that most entities defined by regular expression patterns in practical applications have small alphabets as well. Examples of these entities include IP addresses, dates, credit card numbers, bank account numbers , ID numbers and etc.. 2. The target strings have well-defined minimum and maximum lengths. This is obvious with the single pattern search problem. As to the regular expression match, it is not uncommon that these two numbers can be pre-defined. For example, to match master credit card number from a text, the minimum length is 16 while the maximum length can be defined as 19 if one also includes the format dddd-dddd-dddd-dddd.
  • 2. Pattern Match Function: For any given reference string R and the match window R[s,e], a pattern match function F can extract a target string, based on well-defined matching rules, from the window R[s,e] if there is any, otherwise it returns NIL. The function can be denoted as F(R,s,e). The match mechanism is defined inside F itself. Abstract Problem of String Match: The string match problem is to retrieve all target substrings from a given reference string R[1,…,N] with pattern match function F(R, s,t), where the pattern match function F defines what the target substrings should be with the following conditions:  All target substrings consist of characters from a small alphabet S.  The length of each target substring falls in the interval [m,M] where m is the minimum length and M the maximum. Both single pattern string search and regular expression pattern search are special cases of this abstract match problem. Yet another example is the problem of regular expression pattern match with checksum validation that requires all target substrings must be validated by a checksum. This example is useful for data discovery systems for minimizing false positives. III. OPTIMIZED STRING MATCH ALGORITHM A naïve algorithm to solve the abstract problem of string match can be easily constructed. It is based on the mechanism of sliding match windows. Naïve String Match Algorithm : One starts from the 1st match window R[1,M]. Call match function F. If a match exists, obtain the target substring and move to the next match window immediately after the target substring, otherwise, slide the match window one step further. Repeat this until the reference string R is exhausted. With the naïve string match, one will go through N-M+1 matching windows if there is no target string at all. That is not efficient. We can reduce the number of matching windows if we are able to determine quickly that a match windows does not contain a target string at all. That can be done with the character identity verification. Lets construct the optimized algorithm as follows. Optimized String Match Algorithm: Input: Minimum length m, maximum length M, target string alphabet S, pattern match function F, reference string R[1,…,N] Matching Procedure: Step 1: set s=1 Step 2: Let r= MIN(s+M-1, N) Step 3: If r-s<m-1, RETURN Step 4: Set match window as W=T[s, …,r] Step 5: Set sub-window w=T[s,…,s + m - 1]. Lets find out the rightmost character T[s + p] that does not belong to S, set s = s + p, go to step 2 Step 6: Otherwise, all characters of sub-window w pass identity verification. Lets match with the function F(R,s,r): a. If result is NIL, let s=s+1 b. If a target substring is matched as T[t,e], save it, let s=e+1 Step 7: Go to step 2 Output: Matches IV. ANALYSIS OF THE ALGORITHM The algorithm starts with the first match window defined by step 1. The key step for optimization is step 5. Step 5 does the identity verification for characters in the sub- window w. The verification is done character by character from the rightmost of the sub-window. When any character fails the verification, we slide the match window ahead with multiple steps instead of one step. This step is somewhat like the Raita’s [3] multiple point checking. It may cost more time when the target substring does exist in the window, however, in most cases, it reduces the number of matching windows by shifting multiple steps. The best case is that we shift m steps ahead if no character in w belongs to S. The step 6 does the pattern match. If the match fails, unlike the Boyer-Moore or Horspool algorithms, there is no shift table that advises shifting more than one step. The optimized algorithm is not designed to exceed Boyer-Moore algorithm or its variants for single pattern string match. Instead, its purpose is to extend the concept of bad character shift rule to more general case. This extension has immediate applications in two special pattern match problems:  Regular expression pattern match.  Regular expression pattern match with checksum validation. Example 1: One needs to search all social security numbers (SSN) from a text with the regular expression pattern defined as d{9}|d{3}-d{2}-d{3}. The alphabet S={0,1,2,3,4,5,6,7,8,9,-} has 11 characters. The minimum and maximum length for SSN are 9 and 11 respectively. The best case is that we do not need to apply regular expression pattern match at all if the text does not contain any numbers or -. Example 2: One needs to search Master or Visa credit card numbers (CCN) from a text with the regular expression pattern defined as d{16}|d{4}-d{4}-d{4}-
  • 3. d{4}. The alphabet S={0,1,2,3,4,5,6,7,8,9,-} has 11 characters. The minimum and maximum lengths for SSN are 16 and 19 respectively. The checksum applies the Luhn algorithm [5] to validate the CCN. V. PROBLEM OF MATCHING SEQUENCE OF OBJECTS This paper has been focusing on problem of string search. Due to the fact that we have been using general terms to discuss the problem and the solution, the abstract problem of string match can be extended to more general problem. This is the problem of sequence match if we define a sequence as a sequence of objects and a subsequence of objects as a consecutive subsequence. We can achieve this by extending two basic concepts --- character and string. Lets use object instead of character and sequence instead of string. Then pattern match function, abstract problem of sequence match and optimized algorithm can be introduced accordingly. It is not sure yet whether this further abstraction of problem has any practical implication. However, it deserves a theoretical perspective. VI. CONCLUSION We presented a general problem of string match and its optimized algorithm inspired by the bad character shift rule of Boyer-Moore string search algorithm. The abstract nature of the problem allows us to include both single pattern string search and regular expression pattern match as its two special cases. While the optimized algorithm discussed is not better than Boyer-Moore type string search algorithms, it can be used for match optimization in other pattern problem such as regular expression pattern match or the problem of regular expression pattern match with checksum validation. One can even use it for many other pattern match problems beyond the scope of strings of characters such as sequence of objects, where the concept of object can be very general. ACKNOWLEDGMENT Special thanks to Joe Lin, the engineering site director at Trend Micro for his support. Without his sponsorship, this research work will not be possible. REFERENCES [1] R. Boyer, J. Moore, "A fast string searching algorithm", Comm. ACM vol 20, pp. 762–772., 1977 [2] R. Horspool, "Practical fast searching in strings", Software - Practice & Experience , vol.10 (6), pp. 501–506, 1980 [3] T. Raita, “Tuning the Boyer–Moore–Horspool String Searching Algorithm”, Software - Practice & Experience , vol 22(10), pp. 879–884, 1992 [4] D. Sunday, “Very Fast Substring Search Algorithm”, Comm. ACM, vol 33, issue 8, pp. 132-142 , 1990 [5] http://en.wikipedia.org/wiki/Luhn_algorithm.