SlideShare a Scribd company logo
1 of 16
Case Study: Searching for Patterns 
Problem: find all occurrences of pattern 
P of length m inside the text T of 
length n. 
Þ Exact matching problem 
1
String Matching - Applications 
2 
 Text editing 
 Term rewriting 
 Lexical analysis 
 Information retrieval 
 And, bioinformatics
Given a string P (pattern) and a longer string T (text). 
Aim is to find all occurrences of pattern P in text T . 
If m is the length of P , and n is the length of T , then 
3 
Exact matching problem 
The naive method: 
T a g c a g a a g a g t a 
P Paagg Paaggagg Paaggaggag Paggaaggagg Pagaaggaggg Pgaggagaagg agaagggagg gaggag agg g 
Time complexity = O(m.n), 
Space complexity = O(m + n)
4 
Can we be more clever ? 
 When a mismatch is detected, say at position 
k in the pattern string, we have already 
successfully matched k-1 characters. 
 We try to take advantage of this to decide 
where to restart matching 
T a g c a g a a g a g t a 
P aagg Pagg Pagaagg aagggagg Paggag Paaggagg aaggaggg aggag agg g
The Knuth-Morris-Pratt Algorithm 
Observation: when a mismatch occurs, we 
may not need to restart the comparison all way 
back (from the next input position). 
What to do: 
Constructing an array h, that determines how 
many characters to shift the pattern to the right 
in case of a mismatch during the pattern-matching 
5 
process.
The key idea is that if we have 
successfully matched the prefix P[1…i- 
1] of the pattern with the substring T[j-i+ 
1,…,j-1] of the input string and P(i) ¹ 
T(j), then we do not need to reprocess 
any of the suffix T[j-i+1,…,j-1] since we 
know this portion of the text string is 
the prefix of the pattern that we have 
just matched. 
6 
KMP (2)
The failure function h 
For each position i in pattern P, define hi to be 
the length of the longest proper suffix of P[1,…,i] 
that matches a prefix of P. 
i = 11 
1 2 3 4 5 6 7 8 9 10 11 12 13 
Pattern: a b a a b a b a a b a a b 
7 
Hence, h(11) = 6. 
If there is no proper suffix of P[1,…,i] with the 
property . mentioned above, then h(i) = 0
The first mismatch in position 
k=12 of T and in pos. i+1=12 of P. 
T: a b a a b a b a a b a c a b a a b a b a a b 
P: 
P: 
a b a a b a b a a b a a b 
8 
The KMP shift rule 
a b a a b a b a a b a a b 
h11 = 6 
Shift P to the right so that P[1,…,h(i)] aligns with the suffix 
T[k- h(i),…,k-1]. 
They must match because of the definition of h. 
In other words, shift P exactly i – h(i) places to the right. 
If there is no mismatch, then shift P by m – h(m) places to 
the right.
The KMP algorithm finds all occurrences of 
P in T. 
P before shift 
missed occurrence of P 
9 
P after shift 
First mismatch in position 
k of T and in pos. i+1 of P. 
Suppose not, … 
T: 
b b
First mismatch in position 
k of T and in pos. i+1 of P. 
10 
Correctness of KMP. 
P before shift 
P after shift 
missed occurrence of P 
b 
a 
a 
b 
|a| > |b| = h(i) 
T: 
It is a contradiction. 
T(k)
Array h: 0 0 1 1 2 3 2 3 4 5 6 4 5 
What is h(i)= h(11) = ? h(11) = 6 Þ i – h(i) = 11 – 6 = 5 
11 
An Example 
1 2 3 4 5 6 7 8 9 10 11 12 13 
Pattern: a b a a b a b a a b a a b 
Next funciton: 0 1 0 2 1 0 4 0 2 1 0 7 1 
abaababaabacabaababaabaab. 
Given: 
Input string: 
a b a a b a b a a b a a b 
a b a a b a b a a b a c a b a a b a b a a b a a b 
Scenario 1: 
i+1 = 12 
k = 12 
Scenario 2: i+1 
k 
i = 6 , h(6)= 3 
a b a a b a b a a b a a b 
a b a a b a b a a b a c a b a a b a b a a b a a b
a b a a b a b a a b a a b 
a b a a b a b a a b a c a b a a b a b a a b a a b 
12 
An Example 
Scenario 3: i+1 
k 
i = 3, h(3) = 1 
Subsequently i = 2, 1, 0 
Finally, a match is found: 
i+1 
a b a a b a b a a b a a b 
a b a a b a b a a b a c a b a a b a b a a b a a b 
k
Complexity of KMP 
In the KMP algorithm, the number of character 
comparisons is at most 2n. 
After shift, no character will be no backtrack 
compared before this position 
P before shift 
Hence #comparisons £ #shifts + |T| £ 2|T| = 2n. 
13 
P after shift 
T: 
In any shift at most one comparison involves a 
character of T that was previously compared.
Computing the failure function 
 We can compute h(i+1) if we know h(1)..h(i) 
 To do this we run the KMP algorithm where the text is 
the pattern with the first character replaced with a $. 
 Suppose we have successfully matched a prefix of the 
pattern with a suffix of T[1..i]; the length of this match is 
h(i). 
 If the next character of the pattern and text match then 
h(i+1)=h(i)+1. 
 If not, then in the KMP algorithm we would shift the 
pattern; the length of the new match is h(h(i)). 
 If the next character of the pattern and text match then 
h(i+1)=h(h(i))+1, else we continue to shift the pattern. 
 Since the no. of comparisons required by KMP is length 
of pattern+text, time taken for computing the failure 
function is O(n). 
14
1 2 3 4 5 6 7 8 9 10 11 12 13 
Failure fPuantctteiornn: h: 0 0 1 1 2 3 2 3 4 5 6 
1 2 3 4 5 6 7 8 9 10 11 12 13 
Text: $ b a a b a b a a b a a b 
15 
Computing h: an example 
 h(12)= 4 = h(6)+1 = h(h(11))+1 
 h(13)= 5 = h(12)+1 
Given: 
Pattern: a b a a b a 
b 
a 
h(11)=6 
4 
Pattern a b a b 
h(6)=3 
5
preprocessing searching 
16 
KMP - Analysis 
 The KMP algorithm never needs to 
backtrack on the text string. 
Time complexity = O(m + n) 
Space complexity = O(m + n), 
where m =|P| and n=|T|.

More Related Content

What's hot (20)

Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
 
Heaps & priority queues
Heaps & priority queuesHeaps & priority queues
Heaps & priority queues
 
CLOSEST PAIR (Final)
CLOSEST PAIR (Final)CLOSEST PAIR (Final)
CLOSEST PAIR (Final)
 
Lec7
Lec7Lec7
Lec7
 
Lec3
Lec3Lec3
Lec3
 
Java presentation on insertion sort
Java presentation on insertion sortJava presentation on insertion sort
Java presentation on insertion sort
 
Binary trees
Binary treesBinary trees
Binary trees
 
Tree and graph
Tree and graphTree and graph
Tree and graph
 
Single linked list
Single linked listSingle linked list
Single linked list
 
Linked list
Linked listLinked list
Linked list
 
Data Structures - Searching & sorting
Data Structures - Searching & sortingData Structures - Searching & sorting
Data Structures - Searching & sorting
 
Lec2
Lec2Lec2
Lec2
 
Quick sort
Quick sortQuick sort
Quick sort
 
Dsc++ unit 3 notes
Dsc++ unit 3 notesDsc++ unit 3 notes
Dsc++ unit 3 notes
 
Lec12
Lec12Lec12
Lec12
 
8 python data structure-1
8 python data structure-18 python data structure-1
8 python data structure-1
 
recursion tree method.pdf
recursion tree method.pdfrecursion tree method.pdf
recursion tree method.pdf
 
Data Structures- Part7 linked lists
Data Structures- Part7 linked listsData Structures- Part7 linked lists
Data Structures- Part7 linked lists
 
Lec1
Lec1Lec1
Lec1
 
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 Data Structures - Lecture 9 [Stack & Queue using Linked List] Data Structures - Lecture 9 [Stack & Queue using Linked List]
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 

Viewers also liked (9)

LinkedIn
LinkedInLinkedIn
LinkedIn
 
Lec15
Lec15Lec15
Lec15
 
Lec16
Lec16Lec16
Lec16
 
Lec11
Lec11Lec11
Lec11
 
Lec14
Lec14Lec14
Lec14
 
Lec35
Lec35Lec35
Lec35
 
Lec20
Lec20Lec20
Lec20
 
Lec22
Lec22Lec22
Lec22
 
Lec23
Lec23Lec23
Lec23
 

Similar to Lec17

Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algosabiya sabiya
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatchingdbhanumahesh
 
Boyer-Moore-algorithm-Vladimir.pptx
Boyer-Moore-algorithm-Vladimir.pptxBoyer-Moore-algorithm-Vladimir.pptx
Boyer-Moore-algorithm-Vladimir.pptxssuserf56658
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)Aditya pratap Singh
 
module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfShiwani Gupta
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesBenjamin Sach
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptsaki931
 
String searching
String searching String searching
String searching thinkphp
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin KarpGarima Singh
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistanceBenjamin Sach
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfbhagabatijenadukura
 
Rabin Carp String Matching algorithm
Rabin Carp String Matching  algorithmRabin Carp String Matching  algorithm
Rabin Carp String Matching algorithmsabiya sabiya
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2S.Shayan Daneshvar
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithmssuseraf60311
 
Pattern matching
Pattern matchingPattern matching
Pattern matchingshravs_188
 

Similar to Lec17 (20)

lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
 
Daa chapter9
Daa chapter9Daa chapter9
Daa chapter9
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 
Boyer-Moore-algorithm-Vladimir.pptx
Boyer-Moore-algorithm-Vladimir.pptxBoyer-Moore-algorithm-Vladimir.pptx
Boyer-Moore-algorithm-Vladimir.pptx
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdf
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
String searching
String searching String searching
String searching
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming Distance
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdf
 
Rabin Carp String Matching algorithm
Rabin Carp String Matching  algorithmRabin Carp String Matching  algorithm
Rabin Carp String Matching algorithm
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithm
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 

More from Nikhil Chilwant (20)

The joyless economy
The joyless economyThe joyless economy
The joyless economy
 
I 7
I 7I 7
I 7
 
IIT Delhi training pioneer ct (acemicromatic )
IIT Delhi training pioneer ct (acemicromatic )IIT Delhi training pioneer ct (acemicromatic )
IIT Delhi training pioneer ct (acemicromatic )
 
Lec39
Lec39Lec39
Lec39
 
Lec38
Lec38Lec38
Lec38
 
Lec37
Lec37Lec37
Lec37
 
Lec36
Lec36Lec36
Lec36
 
Lec34
Lec34Lec34
Lec34
 
Lec33
Lec33Lec33
Lec33
 
Lec32
Lec32Lec32
Lec32
 
Lec30
Lec30Lec30
Lec30
 
Lec29
Lec29Lec29
Lec29
 
Lec28
Lec28Lec28
Lec28
 
Lec27
Lec27Lec27
Lec27
 
Lec26
Lec26Lec26
Lec26
 
Lec25
Lec25Lec25
Lec25
 
Lec24
Lec24Lec24
Lec24
 
Lec23
Lec23Lec23
Lec23
 
Lec21
Lec21Lec21
Lec21
 
Lec20
Lec20Lec20
Lec20
 

Lec17

  • 1. Case Study: Searching for Patterns Problem: find all occurrences of pattern P of length m inside the text T of length n. Þ Exact matching problem 1
  • 2. String Matching - Applications 2  Text editing  Term rewriting  Lexical analysis  Information retrieval  And, bioinformatics
  • 3. Given a string P (pattern) and a longer string T (text). Aim is to find all occurrences of pattern P in text T . If m is the length of P , and n is the length of T , then 3 Exact matching problem The naive method: T a g c a g a a g a g t a P Paagg Paaggagg Paaggaggag Paggaaggagg Pagaaggaggg Pgaggagaagg agaagggagg gaggag agg g Time complexity = O(m.n), Space complexity = O(m + n)
  • 4. 4 Can we be more clever ?  When a mismatch is detected, say at position k in the pattern string, we have already successfully matched k-1 characters.  We try to take advantage of this to decide where to restart matching T a g c a g a a g a g t a P aagg Pagg Pagaagg aagggagg Paggag Paaggagg aaggaggg aggag agg g
  • 5. The Knuth-Morris-Pratt Algorithm Observation: when a mismatch occurs, we may not need to restart the comparison all way back (from the next input position). What to do: Constructing an array h, that determines how many characters to shift the pattern to the right in case of a mismatch during the pattern-matching 5 process.
  • 6. The key idea is that if we have successfully matched the prefix P[1…i- 1] of the pattern with the substring T[j-i+ 1,…,j-1] of the input string and P(i) ¹ T(j), then we do not need to reprocess any of the suffix T[j-i+1,…,j-1] since we know this portion of the text string is the prefix of the pattern that we have just matched. 6 KMP (2)
  • 7. The failure function h For each position i in pattern P, define hi to be the length of the longest proper suffix of P[1,…,i] that matches a prefix of P. i = 11 1 2 3 4 5 6 7 8 9 10 11 12 13 Pattern: a b a a b a b a a b a a b 7 Hence, h(11) = 6. If there is no proper suffix of P[1,…,i] with the property . mentioned above, then h(i) = 0
  • 8. The first mismatch in position k=12 of T and in pos. i+1=12 of P. T: a b a a b a b a a b a c a b a a b a b a a b P: P: a b a a b a b a a b a a b 8 The KMP shift rule a b a a b a b a a b a a b h11 = 6 Shift P to the right so that P[1,…,h(i)] aligns with the suffix T[k- h(i),…,k-1]. They must match because of the definition of h. In other words, shift P exactly i – h(i) places to the right. If there is no mismatch, then shift P by m – h(m) places to the right.
  • 9. The KMP algorithm finds all occurrences of P in T. P before shift missed occurrence of P 9 P after shift First mismatch in position k of T and in pos. i+1 of P. Suppose not, … T: b b
  • 10. First mismatch in position k of T and in pos. i+1 of P. 10 Correctness of KMP. P before shift P after shift missed occurrence of P b a a b |a| > |b| = h(i) T: It is a contradiction. T(k)
  • 11. Array h: 0 0 1 1 2 3 2 3 4 5 6 4 5 What is h(i)= h(11) = ? h(11) = 6 Þ i – h(i) = 11 – 6 = 5 11 An Example 1 2 3 4 5 6 7 8 9 10 11 12 13 Pattern: a b a a b a b a a b a a b Next funciton: 0 1 0 2 1 0 4 0 2 1 0 7 1 abaababaabacabaababaabaab. Given: Input string: a b a a b a b a a b a a b a b a a b a b a a b a c a b a a b a b a a b a a b Scenario 1: i+1 = 12 k = 12 Scenario 2: i+1 k i = 6 , h(6)= 3 a b a a b a b a a b a a b a b a a b a b a a b a c a b a a b a b a a b a a b
  • 12. a b a a b a b a a b a a b a b a a b a b a a b a c a b a a b a b a a b a a b 12 An Example Scenario 3: i+1 k i = 3, h(3) = 1 Subsequently i = 2, 1, 0 Finally, a match is found: i+1 a b a a b a b a a b a a b a b a a b a b a a b a c a b a a b a b a a b a a b k
  • 13. Complexity of KMP In the KMP algorithm, the number of character comparisons is at most 2n. After shift, no character will be no backtrack compared before this position P before shift Hence #comparisons £ #shifts + |T| £ 2|T| = 2n. 13 P after shift T: In any shift at most one comparison involves a character of T that was previously compared.
  • 14. Computing the failure function  We can compute h(i+1) if we know h(1)..h(i)  To do this we run the KMP algorithm where the text is the pattern with the first character replaced with a $.  Suppose we have successfully matched a prefix of the pattern with a suffix of T[1..i]; the length of this match is h(i).  If the next character of the pattern and text match then h(i+1)=h(i)+1.  If not, then in the KMP algorithm we would shift the pattern; the length of the new match is h(h(i)).  If the next character of the pattern and text match then h(i+1)=h(h(i))+1, else we continue to shift the pattern.  Since the no. of comparisons required by KMP is length of pattern+text, time taken for computing the failure function is O(n). 14
  • 15. 1 2 3 4 5 6 7 8 9 10 11 12 13 Failure fPuantctteiornn: h: 0 0 1 1 2 3 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 13 Text: $ b a a b a b a a b a a b 15 Computing h: an example  h(12)= 4 = h(6)+1 = h(h(11))+1  h(13)= 5 = h(12)+1 Given: Pattern: a b a a b a b a h(11)=6 4 Pattern a b a b h(6)=3 5
  • 16. preprocessing searching 16 KMP - Analysis  The KMP algorithm never needs to backtrack on the text string. Time complexity = O(m + n) Space complexity = O(m + n), where m =|P| and n=|T|.

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>