SlideShare a Scribd company logo
1
Searching for Patterns | Set 1 (Naive Pattern Searching)
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all
occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
1) Input:
txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
Pattern searching is an important problem in computer science. When we do search for a string in
notepad/word file or browser or database, pattern searching algorithms are used to show the search
results.
Naive Pattern Searching: Slide the pattern over text one by one and check for a match. If a match is
found, then slides by 1 again to check for subsequent matches.
2
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
What is the best case?
The best case occurs when the first character of the pattern is not present in text at all.
txt[] = "AABCCAADDEE"
pat[] = "FAA"
Run on IDE
The number of comparisons in best case is O(n).
What is the worst case?
The worst case of Naive Pattern Searching occurs in following scenarios.
1) When all characters of the text and pattern are same.
txt[] = "AAAAAAAAAAAAAAAAAA"
pat[] = "AAAAA".
Run on IDE
2) Worst case also occurs when only the last character is different.
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
Run on IDE
3
Number of comparisons in worst case is O(m*(n-m+1)). Although strings which have repeated characters
are not likely to appear in English text, they may well occur in other applications (for example, in binary
texts). The KMP matching algorithm improves the worst case to O(n). We will be covering KMP in the
next post. Also, we will be writing more posts to cover all pattern searching algorithms and data
structures.
4
Searching for Patterns | Set 2 (KMP Algorithm)
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all
occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
1) Input:
txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
Pattern searching is an important problem in computer science. When we do search for a string in
notepad/word file or browser or database, pattern searching algorithms are used to show the search
results.
We have discussed Naive pattern searching algorithm in the previous program. The worst case
complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in worst case.
KMP (Knuth Morris Pratt) Pattern Searching
The Naive pattern searching algorithm doesn’t work well in cases where we see many matching
characters followed by a mismatching character. Following are some examples.
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
txt[] = "ABABABCABABABCABABABC"
pat[] = "ABABAC" (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing
more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic
idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know
some of the characters in the text (since they matched the pattern characters prior to the mismatch). We
take advantage of this information to avoid matching the characters that we know will anyway match.
KMP algorithm does some preprocessing over the pattern pat[] and constructs an auxiliary array lps[] of
size m (same as size of pattern). Here name lps indicates longest proper prefix which is also suffix..
5
For each sub-pattern pat[0…i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper
prefix which is also a suffix of the sub-pattern pat[0..i].
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Examples:
For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4]
For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]
For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
Searching Algorithm:
Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next
sliding position. Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we
know that characters pat[0..j-1] match with txt[i-j+1…i-1], and we also know that lps[j-1] characters of
pat[0…j-1] are both proper prefix and suffix which means we do not need to match these lps[j-1]
characters with txt[i-j…i-1] because we know that these characters will anyway match. See KMPSearch()
in the below code for details.
Preprocessing Algorithm:
In the preprocessing part, we calculate values in lps[]. To do that, we keep track of the length of the
longest prefix suffix value (we use len variable for this purpose) for the previous index. We initialize lps[0]
and len as 0. If pat[len] and pat[i] match, we increment len by 1 and assign the incremented value to lps[i].
If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1]. See computeLPSArray () in
the below code for details.
6
7
8
Searching for Patterns | Set 3 (Rabin-Karp Algorithm)
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all
occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
1) Input:
txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
The Naive String Matching algorithm slides the pattern one by one. After each slide, it one by one
checks characters at the current shift and if all characters match then prints the match.
Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by one. But unlike the Naive
algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash value of current
substring of text, and if the hash values match then only it starts matching individual characters. So Rabin
Karp algorithm needs to calculate hash values for following strings.
1) Pattern itself.
2) All the substrings of text of length m.
Since we need to efficiently calculate hash values for all the substrings of size m of text, we must have a
hash function which has following property. Hash at the next shift must be efficiently computable from the
current hash value and next character in text or we can say hash(txt[s+1 .. s+m]) must be efficiently
computable from hash(txt[s .. s+m-1]) and txt[s+m] i.e.,hash(txt[s+1 .. s+m])= rehash(txt[s+m], hash(txt[s
.. s+m-1]) and rehash must be O(1) operation.
The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a
string is numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric
value of “122” will be 122. The number of possible characters is higher than 10 (256 in general) and
pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore,
the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored
in an integer variable (can fit in memory words). To do rehashing, we need to take off the most significant
digit and add the new least significant digit for in hash value. Rehashing is done using the following
formula.
hash( txt[s+1 .. s+m] ) = d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q
9
hash( txt[s .. s+m-1] ) : Hash value at shift s.
hash( txt[s+1 .. s+m] ) : Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: d^(m-1)
10
11
Searching for Patterns | Set 4 (Finite Automata)
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all
occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
1) Input:
txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
Pattern searching is an important problem in computer science. When we do search for a string in
notepad/word file or browser or database, pattern searching algorithms are used to show the search
results.
We have discussed the following algorithms in the previous posts:
Naive Algorithm
KMP Algorithm
Rabin Karp Algorithm
In this post, we will discuss Finite Automata (FA) based pattern searching algorithm. In FA based
algorithm, we preprocess the pattern and build a 2D array that represents a Finite Automata. Construction
of the FA is the main tricky part of this algorithm. Once the FA is built, the searching is simple. In search,
we simply need to start from the first state of the automata and first character of the text. At every step,
we consider next character of text, look for the next state in the built FA and move to new state. If we
reach final state, then pattern is found in text. Time complexity of the search prcess is O(n).
12
Before we discuss FA construction, let us take a look at the following FA for pattern ACACAGA.
The above diagrams represent graphical and tabular representations of pattern ACACAGA.
Number of states in FA will be M+1 where M is length of the pattern. The main thing to construct FA is to
get the next state from the current state for every possible character. Given a character x and a state k,
we can get the next state by considering the string “pat[0..k-1]x” which is basically concatenation of
pattern characters pat[0], pat[1] … pat[k-1] and the character x. The idea is to get length of the longest
prefix of the given pattern such that the prefix is also suffix of “pat[0..k-1]x”. The value of length gives us
the next state. For example, let us see how to get the next state from current state 5 and character ‘C’ in
the above diagram. We need to consider the string, “pat[0..5]C” which is “ACACAC”. The lenght of the
longest prefix of the pattern such that the prefix is suffix of “ACACAC”is 4 (“ACAC”). So the next state
(from state 5) is 4 for character ‘C’.
In the following code, computeTF() constructs the FA. The time complexity of the computeTF() is
O(m^3*NO_OF_CHARS) where m is length of the pattern and NO_OF_CHARS is size of alphabet (total
number of possible characters in pattern and text). The implementation tries all possible prefixes starting
from the longest possible that can be a suffix of “pat[0..k-1]x”. There are better implementations to
construct FA in O(m*NO_OF_CHARS) (Hint: we can use something like lps array construction in
KMP algorithm).
13
14

More Related Content

What's hot

String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
Adeel Rasheed
 
String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
Amit Kumar Rathi
 
2. python basic syntax
2. python   basic syntax2. python   basic syntax
2. python basic syntax
Soba Arjun
 
7-NFA to Minimized DFA.pptx
7-NFA to Minimized DFA.pptx7-NFA to Minimized DFA.pptx
7-NFA to Minimized DFA.pptx
SLekshmiNair
 
10-SLR parser practice problems-02-06-2023.pptx
10-SLR parser practice problems-02-06-2023.pptx10-SLR parser practice problems-02-06-2023.pptx
10-SLR parser practice problems-02-06-2023.pptx
venkatapranaykumarGa
 
Pandas csv
Pandas csvPandas csv
Pandas csv
Devashish Kumar
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Journals
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
Kritika Purohit
 
Regular Languages
Regular LanguagesRegular Languages
Regular Languages
parmeet834
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
Satya Narayana
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
Swapan Shakhari
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
Dilouar Hossain
 
Theory of Computation Grammar Concepts and Problems
Theory of Computation Grammar Concepts and ProblemsTheory of Computation Grammar Concepts and Problems
Theory of Computation Grammar Concepts and Problems
Rushabh2428
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
sabiya sabiya
 
Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)
Nitin Mohan Sharma
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Ashikapokiya12345
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
Sadia Akter
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
meresie tesfay
 
Mealy and moore machine
Mealy and moore machineMealy and moore machine
Mealy and moore machine
Ehatsham Riaz
 

What's hot (20)

String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
 
String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
 
2. python basic syntax
2. python   basic syntax2. python   basic syntax
2. python basic syntax
 
7-NFA to Minimized DFA.pptx
7-NFA to Minimized DFA.pptx7-NFA to Minimized DFA.pptx
7-NFA to Minimized DFA.pptx
 
10-SLR parser practice problems-02-06-2023.pptx
10-SLR parser practice problems-02-06-2023.pptx10-SLR parser practice problems-02-06-2023.pptx
10-SLR parser practice problems-02-06-2023.pptx
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Regular Languages
Regular LanguagesRegular Languages
Regular Languages
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
 
Theory of Computation Grammar Concepts and Problems
Theory of Computation Grammar Concepts and ProblemsTheory of Computation Grammar Concepts and Problems
Theory of Computation Grammar Concepts and Problems
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
 
Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Mealy and moore machine
Mealy and moore machineMealy and moore machine
Mealy and moore machine
 

Viewers also liked

Jose E Rivera Resume loc16
Jose E Rivera Resume loc16Jose E Rivera Resume loc16
Jose E Rivera Resume loc16
Jose Rivera
 
Attraction Marketing
Attraction MarketingAttraction Marketing
Attraction Marketing
evelyn vereen
 
1.1 Моделювання
1.1 Моделювання1.1 Моделювання
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
Diritto & Information and Communication Technology
 
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
CurtisMediaStudies
 
Evaulation 3
Evaulation 3Evaulation 3
Evaulation 3
CurtisMediaStudies
 
Mitos bzd
Mitos bzdMitos bzd
Mitos bzd
PaezJoseM
 
Hixson office2013
Hixson office2013Hixson office2013
Hixson office2013
jhudson01
 
Teories ètiques 1r de Batxillerat
Teories ètiques 1r de BatxilleratTeories ètiques 1r de Batxillerat
Teories ètiques 1r de Batxillerat
Marta Balsells Príncep
 
FRS urinary System
FRS urinary SystemFRS urinary System
FRS urinary System
RMLIMS
 
Pitching deck
Pitching deckPitching deck
Pitching deck
Alok Dubey
 
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
Ilana Kovach
 

Viewers also liked (18)

Jose E Rivera Resume loc16
Jose E Rivera Resume loc16Jose E Rivera Resume loc16
Jose E Rivera Resume loc16
 
100330
100330100330
100330
 
Attraction Marketing
Attraction MarketingAttraction Marketing
Attraction Marketing
 
1.1 Моделювання
1.1 Моделювання1.1 Моделювання
1.1 Моделювання
 
1
11
1
 
Florida casino parties by dan mar productions 17
Florida casino parties by dan mar productions 17Florida casino parties by dan mar productions 17
Florida casino parties by dan mar productions 17
 
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
La stampa 3D a confronto con la proprietà intellettuale: tutela giuridica, li...
 
100410
100410100410
100410
 
100400
100400100400
100400
 
100324
100324100324
100324
 
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
D:\College System Files\Media\Magazines\Coursework\Powerpoints\Q6
 
Evaulation 3
Evaulation 3Evaulation 3
Evaulation 3
 
Mitos bzd
Mitos bzdMitos bzd
Mitos bzd
 
Hixson office2013
Hixson office2013Hixson office2013
Hixson office2013
 
Teories ètiques 1r de Batxillerat
Teories ètiques 1r de BatxilleratTeories ètiques 1r de Batxillerat
Teories ètiques 1r de Batxillerat
 
FRS urinary System
FRS urinary SystemFRS urinary System
FRS urinary System
 
Pitching deck
Pitching deckPitching deck
Pitching deck
 
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
Microbiology Practical 2!!!! i will miss this class! (Ilana Kovach)
 

Similar to Pattern matching programs

String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Dr Shashikant Athawale
 
Rabin-Karp (2).ppt
Rabin-Karp (2).pptRabin-Karp (2).ppt
Rabin-Karp (2).ppt
UmeshThoriya
 
Matlab: Procedures And Functions
Matlab: Procedures And FunctionsMatlab: Procedures And Functions
Matlab: Procedures And Functions
matlab Content
 
Procedures And Functions in Matlab
Procedures And Functions in MatlabProcedures And Functions in Matlab
Procedures And Functions in Matlab
DataminingTools Inc
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
IRJET Journal
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
praweenkumarsahu9
 
Data Representation of Strings
Data Representation of StringsData Representation of Strings
Data Representation of Strings
Prof Ansari
 
4 report format
4 report format4 report format
4 report format
Ashikapokiya12345
 
4 report format
4 report format4 report format
4 report format
Ashikapokiya12345
 
Python Strings Methods
Python Strings MethodsPython Strings Methods
Python Strings Methods
Mr Examples
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
wayn
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptx
SumitYadav641839
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
Garima Singh
 
String
StringString
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
Harshit Agarwal
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Mahdi Esmailoghli
 
Advance algorithms in master of technology
Advance algorithms in master of technologyAdvance algorithms in master of technology
Advance algorithms in master of technology
ManjunathaOk
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
CSCJournals
 

Similar to Pattern matching programs (20)

String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Rabin-Karp (2).ppt
Rabin-Karp (2).pptRabin-Karp (2).ppt
Rabin-Karp (2).ppt
 
Matlab: Procedures And Functions
Matlab: Procedures And FunctionsMatlab: Procedures And Functions
Matlab: Procedures And Functions
 
Procedures And Functions in Matlab
Procedures And Functions in MatlabProcedures And Functions in Matlab
Procedures And Functions in Matlab
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment ProblemAlgorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
Algorithm of Dynamic Programming for Paper-Reviewer Assignment Problem
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
 
Data Representation of Strings
Data Representation of StringsData Representation of Strings
Data Representation of Strings
 
4 report format
4 report format4 report format
4 report format
 
4 report format
4 report format4 report format
4 report format
 
Python Strings Methods
Python Strings MethodsPython Strings Methods
Python Strings Methods
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptx
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
 
String
StringString
String
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Advance algorithms in master of technology
Advance algorithms in master of technologyAdvance algorithms in master of technology
Advance algorithms in master of technology
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 

More from akruthi k

Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
akruthi k
 
Kmp
KmpKmp
Boyer moore
Boyer mooreBoyer moore
Boyer moore
akruthi k
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overview
akruthi k
 
Fhss
FhssFhss
Fhss
akruthi k
 
Dsss phy
Dsss phyDsss phy
Dsss phy
akruthi k
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opern
akruthi k
 
802.11i
802.11i802.11i
802.11i
akruthi k
 
802.1x
802.1x802.1x
802.1x
akruthi k
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)
akruthi k
 

More from akruthi k (10)

Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
 
Kmp
KmpKmp
Kmp
 
Boyer moore
Boyer mooreBoyer moore
Boyer moore
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overview
 
Fhss
FhssFhss
Fhss
 
Dsss phy
Dsss phyDsss phy
Dsss phy
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opern
 
802.11i
802.11i802.11i
802.11i
 
802.1x
802.1x802.1x
802.1x
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)
 

Recently uploaded

一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
shubhamsaraswat8740
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 

Recently uploaded (20)

一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 

Pattern matching programs

  • 1. 1 Searching for Patterns | Set 1 (Naive Pattern Searching) Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m. Examples: 1) Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 2) Input: txt[] = "AABAACAADAABAAABAA" pat[] = "AABA" Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 13 Pattern searching is an important problem in computer science. When we do search for a string in notepad/word file or browser or database, pattern searching algorithms are used to show the search results. Naive Pattern Searching: Slide the pattern over text one by one and check for a match. If a match is found, then slides by 1 again to check for subsequent matches.
  • 2. 2 Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 13 What is the best case? The best case occurs when the first character of the pattern is not present in text at all. txt[] = "AABCCAADDEE" pat[] = "FAA" Run on IDE The number of comparisons in best case is O(n). What is the worst case? The worst case of Naive Pattern Searching occurs in following scenarios. 1) When all characters of the text and pattern are same. txt[] = "AAAAAAAAAAAAAAAAAA" pat[] = "AAAAA". Run on IDE 2) Worst case also occurs when only the last character is different. txt[] = "AAAAAAAAAAAAAAAAAB" pat[] = "AAAAB" Run on IDE
  • 3. 3 Number of comparisons in worst case is O(m*(n-m+1)). Although strings which have repeated characters are not likely to appear in English text, they may well occur in other applications (for example, in binary texts). The KMP matching algorithm improves the worst case to O(n). We will be covering KMP in the next post. Also, we will be writing more posts to cover all pattern searching algorithms and data structures.
  • 4. 4 Searching for Patterns | Set 2 (KMP Algorithm) Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m. Examples: 1) Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 2) Input: txt[] = "AABAACAADAABAAABAA" pat[] = "AABA" Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 13 Pattern searching is an important problem in computer science. When we do search for a string in notepad/word file or browser or database, pattern searching algorithms are used to show the search results. We have discussed Naive pattern searching algorithm in the previous program. The worst case complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in worst case. KMP (Knuth Morris Pratt) Pattern Searching The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters followed by a mismatching character. Following are some examples. txt[] = "AAAAAAAAAAAAAAAAAB" pat[] = "AAAAB" txt[] = "ABABABCABABABCABABABC" pat[] = "ABABAC" (not a worst case, but a bad case for Naive) The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text (since they matched the pattern characters prior to the mismatch). We take advantage of this information to avoid matching the characters that we know will anyway match. KMP algorithm does some preprocessing over the pattern pat[] and constructs an auxiliary array lps[] of size m (same as size of pattern). Here name lps indicates longest proper prefix which is also suffix..
  • 5. 5 For each sub-pattern pat[0…i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i]. lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]. Examples: For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5] For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0] For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4] For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3] For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4] Searching Algorithm: Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next sliding position. Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match with txt[i-j+1…i-1], and we also know that lps[j-1] characters of pat[0…j-1] are both proper prefix and suffix which means we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. See KMPSearch() in the below code for details. Preprocessing Algorithm: In the preprocessing part, we calculate values in lps[]. To do that, we keep track of the length of the longest prefix suffix value (we use len variable for this purpose) for the previous index. We initialize lps[0] and len as 0. If pat[len] and pat[i] match, we increment len by 1 and assign the incremented value to lps[i]. If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1]. See computeLPSArray () in the below code for details.
  • 6. 6
  • 7. 7
  • 8. 8 Searching for Patterns | Set 3 (Rabin-Karp Algorithm) Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m. Examples: 1) Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 2) Input: txt[] = "AABAACAADAABAAABAA" pat[] = "AABA" Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 13 The Naive String Matching algorithm slides the pattern one by one. After each slide, it one by one checks characters at the current shift and if all characters match then prints the match. Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by one. But unlike the Naive algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash value of current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm needs to calculate hash values for following strings. 1) Pattern itself. 2) All the substrings of text of length m. Since we need to efficiently calculate hash values for all the substrings of size m of text, we must have a hash function which has following property. Hash at the next shift must be efficiently computable from the current hash value and next character in text or we can say hash(txt[s+1 .. s+m]) must be efficiently computable from hash(txt[s .. s+m-1]) and txt[s+m] i.e.,hash(txt[s+1 .. s+m])= rehash(txt[s+m], hash(txt[s .. s+m-1]) and rehash must be O(1) operation. The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122. The number of possible characters is higher than 10 (256 in general) and pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable (can fit in memory words). To do rehashing, we need to take off the most significant digit and add the new least significant digit for in hash value. Rehashing is done using the following formula. hash( txt[s+1 .. s+m] ) = d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q
  • 9. 9 hash( txt[s .. s+m-1] ) : Hash value at shift s. hash( txt[s+1 .. s+m] ) : Hash value at next shift (or shift s+1) d: Number of characters in the alphabet q: A prime number h: d^(m-1)
  • 10. 10
  • 11. 11 Searching for Patterns | Set 4 (Finite Automata) Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m. Examples: 1) Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 2) Input: txt[] = "AABAACAADAABAAABAA" pat[] = "AABA" Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 13 Pattern searching is an important problem in computer science. When we do search for a string in notepad/word file or browser or database, pattern searching algorithms are used to show the search results. We have discussed the following algorithms in the previous posts: Naive Algorithm KMP Algorithm Rabin Karp Algorithm In this post, we will discuss Finite Automata (FA) based pattern searching algorithm. In FA based algorithm, we preprocess the pattern and build a 2D array that represents a Finite Automata. Construction of the FA is the main tricky part of this algorithm. Once the FA is built, the searching is simple. In search, we simply need to start from the first state of the automata and first character of the text. At every step, we consider next character of text, look for the next state in the built FA and move to new state. If we reach final state, then pattern is found in text. Time complexity of the search prcess is O(n).
  • 12. 12 Before we discuss FA construction, let us take a look at the following FA for pattern ACACAGA. The above diagrams represent graphical and tabular representations of pattern ACACAGA. Number of states in FA will be M+1 where M is length of the pattern. The main thing to construct FA is to get the next state from the current state for every possible character. Given a character x and a state k, we can get the next state by considering the string “pat[0..k-1]x” which is basically concatenation of pattern characters pat[0], pat[1] … pat[k-1] and the character x. The idea is to get length of the longest prefix of the given pattern such that the prefix is also suffix of “pat[0..k-1]x”. The value of length gives us the next state. For example, let us see how to get the next state from current state 5 and character ‘C’ in the above diagram. We need to consider the string, “pat[0..5]C” which is “ACACAC”. The lenght of the longest prefix of the pattern such that the prefix is suffix of “ACACAC”is 4 (“ACAC”). So the next state (from state 5) is 4 for character ‘C’. In the following code, computeTF() constructs the FA. The time complexity of the computeTF() is O(m^3*NO_OF_CHARS) where m is length of the pattern and NO_OF_CHARS is size of alphabet (total number of possible characters in pattern and text). The implementation tries all possible prefixes starting from the longest possible that can be a suffix of “pat[0..k-1]x”. There are better implementations to construct FA in O(m*NO_OF_CHARS) (Hint: we can use something like lps array construction in KMP algorithm).
  • 13. 13
  • 14. 14