SlideShare a Scribd company logo
Knuth-Morris-Pratt Substring
Search Algorithm
Prepared By:
Sabiya Fatima
Email ID: sabiya1990fatima@gmail.com
1
Outline
 Definition
 History
 Components of KMP
 Algorithm
 Example
 Run-Time Analysis
 Complexity comparison of String Matching Algorithms
 Advantages and Disadvantages
 Real Time Applications
 References
2
What is Pattern Searching ?
 Suppose you are reading a text document.
 You want to search for a word.
 You click CTRL + F and search for that word.
 The word processor scans the document and shows the position of
occurrence.
What exactly happens is that, word i.e. pattern is searched inside the
text document.
3
Definition
 Best known for linear time for exact matching. Compares from left to right.
 Shifts more than one position.
 Preprocessing approach of Pattern to avoid trivial comparisions.
 Avoids recomputing matches.
4
History
 This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by
James H.Morris in 1977.
 Knuth, Morris and Pratt discovered first linear time string-matching algorithm by
analysis of the naive algorithm.
 It keeps the information that naive approach wasted gathered during the scan of the
text. By avoiding this waste of information, it achieves a running time of O(m + n).
 The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes
the total number of comparisons of the pattern against the input string.
5
Naïve Approach
The naïve approach is to check whether the pattern matches the string at
every possible position in the string.
P= Pattern (word) of length m
T= Text (document) of length n
Naive string matching algorithm
takes time O((n-m+1)m) or
O(mn)
6
The KMP Algorithm - Motivation
x
j
. . a b a a b . . . . .
a b a a b a
a b a a b a
No need to
repeat these
comparisons
Resume
comparing
here
 Knuth-Morris-Pratt’s algorithm
compares the pattern to the text
in left-to-right, but shifts the
pattern more intelligently than
the brute-force algorithm.
 When a mismatch occurs, what
is the most we can shift the
pattern so as to avoid redundant
comparisons?
 Answer: the largest prefix of
P[0..j] that is a suffix of P[1..j]
7
Components of KMP algorithm
 The prefix function, Π
The prefix function,Π for a pattern encapsulates knowledge about how the pattern
matches against shifts of itself. This information can be used to avoid useless shifts of
the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.
 The KMP Matcher
With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in
‘T’ and returns the number of shifts of ‘p’ after which occurrence is found.
8
The prefix function, Π
Following pseudocode computes the prefix function, Π:
Compute-Prefix-Function (p)
1 m  length[p] //’p’ pattern to be matched
2 Π[1]  0
3 k  0
4 for q  2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k  Π[k]
7 If p[k+1] = p[q]
8 then k  k +1
9 Π[q]  k
10 return Π
9
Example: compute Π for the pattern ‘p’ below:
a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k = 0
Step 1: q = 2, k=0
Π[2] = 0
Step 2: q = 3, k = 0,
Π[3] = 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1
10
p
10
Contd…
11
Step 3: q = 4, k = 1
Π[4] = 2 q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2
Step 4: q = 5, k =2
Π[5] = 3
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3
Step 5: q = 6, k = 3
Π[6] = 0
Step 6: q = 7, k = 0
Π[7] = 1
After iterating 6 times, the prefix function
computation is complete: 
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1
The running time of the prefix function is O(m).
12Contd…..
The KMP Matcher
Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(T,p)
1 n  length[T]
2 m  length[p]
3 Π  Compute-Prefix-Function(p)
4 q  0 //number of characters matched
5 for i  1 to n //scan T from left to right
6 do while q > 0 and p[q+1] != T[i]
7 do q  Π[q] //next character does not match
8 if p[q+1] = T[i]
9 then q  q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q  Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches
remainder of ‘T’for any more occurrences of ‘p’.
13
Illustration: given a Text ‘T’ and pattern ‘p’ as follows:
T
b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’.
For ‘p’the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 0 1 14
14
b a c b a b a b a b a c a a b
a b a b a c a
Initially: n = size of T = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with T[1]
T
p
P[1] does not match with T[1]. ‘p’ will be shifted one position to the right.
15
15
Contd…
Step 2: i = 2, q = 0
comparing p[1] with T[2]
T
p
a b a b a c a
P[1] matches T[2]. Since there is a match, p is not shifted.
b a c b a b a b a b a c a a b
Contd…
16
T b a c b a b a b a b a c a a b
a b a b a c ap
p[2] does not match with T[3]
Backtracking on p, comparing p[1] and T[3]
Step 3: i = 3, q = 1
Comparing p[2] with T[3]
a b a b a c a
T
p
Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
T
p
Step 5: i = 5, q = 0
comparing p[1] with T[5] p[1] matches with T[5]
17
17
Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
T
p
Step 7: i = 7, q = 2 Comparing p[3] with T[7]
p[3] matches with T[7]
Step 8: i = 8, q = 3 Comparing p[4] with T[8]
p[4] matches with T[8]
T
p
18
18
Contd…
Step 9: i = 9, q = 4
Comparing p[5] with T[9]
Comparing p[6] with T[10]Step 10: i = 10, q = 5
T
p
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
p[6] doesn’t match with T[10]
Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3
p[5] matches with T[9]
19
19
T
p
Contd…
20
Step 11: i = 11, q = 4
Comparing p[5] with T[11] p[5] matches with T[11]
T
p
b a c b a b a b a b a c a a b
a b a b a c a
Contd…
Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12]
a b a b a c ap
b a c b a b a b a b a c a a bT
b a c b a b a b a b a c a a b
a b a b a c a
Comparing p[7] with T[13]
T
p
Step 13: i = 13, q = 6 p[7] matches with T[13]
Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
The running time of the KMP-Matcher function is O(n).
21
21
Contd…
Complexity
 O(m) - It is to compute the prefix function values.
 O(n) - It is to compare the pattern to the text.
 Total of O(n + m) run time.
22
Complexity comparison of String Matching
Algorithms
23
Advantage and Disadvantage
Advantages:
1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast.
2.The algorithm never needs to move backwards in the input text T. It makes the
algorithm good for processing very large files.
Disadvantages:
Doesn’t work so well as the size of the alphabets increases. By which more chances
of mismatch occurs.
24
Real time Applications
 Good for plagiarism analysis.
 search engines
 language syntax checker
 database queries
 music content retrieval
25
Real time Applications
26
 DNA sequences analysis :
• It is mainly composed of nucleotides of four types. The four bases in DNA are
Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a
representation of a string of nucleotides contained in a strand of DNA.
• DNA sequences analysis of various diseases which are stored in database for retrieval
and comparison. This system compares similarity values with threshold value and
stores particular result which is diseased or not.
• For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have
allowed the vast amount of data to be analyzed in a short span of time. So, pattern
matching techniques plays a vital role in computational biology for data analysis
related to biological data such as DNA sequences
References
 Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt
Algorithm”, year = 2001.
 https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf
 http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf
27
Thank You
28

More Related Content

What's hot

String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
Amit Kumar Rathi
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
Gajanand Sharma
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
AYESHA JAVED
 
Rabin karp string matcher
Rabin karp string matcherRabin karp string matcher
Rabin karp string matcher
Amit Kumar Rathi
 
Multi Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing MachineMulti Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing Machine
Radhakrishnan Chinnusamy
 
KMP String Matching Algorithm
KMP String Matching AlgorithmKMP String Matching Algorithm
KMP String Matching Algorithm
kalpanasatishkumar
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
Foysal Mahmud
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
Respa Peter
 
Binary Search
Binary SearchBinary Search
Binary Search
kunj desai
 
Rabin Carp String Matching algorithm
Rabin Carp String Matching  algorithmRabin Carp String Matching  algorithm
Rabin Carp String Matching algorithm
sabiya sabiya
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-Hard
Animesh Chaturvedi
 
String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
Adeel Rasheed
 
Pattern matching
Pattern matchingPattern matching
Pattern matchingshravs_188
 
Merge sort algorithm
Merge sort algorithmMerge sort algorithm
Merge sort algorithm
Shubham Dwivedi
 
NFA to DFA
NFA to DFANFA to DFA
NFA to DFA
Animesh Chaturvedi
 
Bruteforce algorithm
Bruteforce algorithmBruteforce algorithm
Bruteforce algorithm
Rezwan Siam
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
dbhanumahesh
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
Mohammed Hussein
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
Kiran K
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 

What's hot (20)

String matching, naive,
String matching, naive,String matching, naive,
String matching, naive,
 
Rabin karp string matching algorithm
Rabin karp string matching algorithmRabin karp string matching algorithm
Rabin karp string matching algorithm
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
 
Rabin karp string matcher
Rabin karp string matcherRabin karp string matcher
Rabin karp string matcher
 
Multi Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing MachineMulti Head, Multi Tape Turing Machine
Multi Head, Multi Tape Turing Machine
 
KMP String Matching Algorithm
KMP String Matching AlgorithmKMP String Matching Algorithm
KMP String Matching Algorithm
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
 
Binary Search
Binary SearchBinary Search
Binary Search
 
Rabin Carp String Matching algorithm
Rabin Carp String Matching  algorithmRabin Carp String Matching  algorithm
Rabin Carp String Matching algorithm
 
P, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-HardP, NP, NP-Complete, and NP-Hard
P, NP, NP-Complete, and NP-Hard
 
String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Merge sort algorithm
Merge sort algorithmMerge sort algorithm
Merge sort algorithm
 
NFA to DFA
NFA to DFANFA to DFA
NFA to DFA
 
Bruteforce algorithm
Bruteforce algorithmBruteforce algorithm
Bruteforce algorithm
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
 

Similar to Knuth morris pratt string matching algo

module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdf
Shiwani Gupta
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
shivkr15
 
W9Presentation.ppt
W9Presentation.pptW9Presentation.ppt
W9Presentation.ppt
AlinaMishra7
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptx
SumitYadav641839
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithm
ssuseraf60311
 
Daa chapter9
Daa chapter9Daa chapter9
Daa chapter9
B.Kirron Reddi
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
RAtna29
 
Daa unit 5
Daa unit 5Daa unit 5
Daa unit 5
Abhimanyu Mishra
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
saki931
 
String searching
String searching String searching
String searching
thinkphp
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
Kritika Purohit
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systems
ahmad bassiouny
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
Kritika Purohit
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdf
bhagabatijenadukura
 
Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03宁 梅
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
Programming Homework Help
 

Similar to Knuth morris pratt string matching algo (20)

Lec17
Lec17Lec17
Lec17
 
module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdf
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
 
W9Presentation.ppt
W9Presentation.pptW9Presentation.ppt
W9Presentation.ppt
 
Gp 27[string matching].pptx
Gp 27[string matching].pptxGp 27[string matching].pptx
Gp 27[string matching].pptx
 
String-Matching Algorithms Advance algorithm
String-Matching  Algorithms Advance algorithmString-Matching  Algorithms Advance algorithm
String-Matching Algorithms Advance algorithm
 
Daa chapter9
Daa chapter9Daa chapter9
Daa chapter9
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnPatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Daa unit 5
Daa unit 5Daa unit 5
Daa unit 5
 
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.pptKnutt Morris Pratt Algorithm by Dr. Rose.ppt
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
 
String searching
String searching String searching
String searching
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systems
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Chap09alg
Chap09algChap09alg
Chap09alg
 
Chap09alg
Chap09algChap09alg
Chap09alg
 
StringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdfStringMatching-Rabikarp algorithmddd.pdf
StringMatching-Rabikarp algorithmddd.pdf
 
Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03Ning_Mei.ASSIGN03
Ning_Mei.ASSIGN03
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 

Recently uploaded

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 

Recently uploaded (20)

Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 

Knuth morris pratt string matching algo

  • 1. Knuth-Morris-Pratt Substring Search Algorithm Prepared By: Sabiya Fatima Email ID: sabiya1990fatima@gmail.com 1
  • 2. Outline  Definition  History  Components of KMP  Algorithm  Example  Run-Time Analysis  Complexity comparison of String Matching Algorithms  Advantages and Disadvantages  Real Time Applications  References 2
  • 3. What is Pattern Searching ?  Suppose you are reading a text document.  You want to search for a word.  You click CTRL + F and search for that word.  The word processor scans the document and shows the position of occurrence. What exactly happens is that, word i.e. pattern is searched inside the text document. 3
  • 4. Definition  Best known for linear time for exact matching. Compares from left to right.  Shifts more than one position.  Preprocessing approach of Pattern to avoid trivial comparisions.  Avoids recomputing matches. 4
  • 5. History  This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H.Morris in 1977.  Knuth, Morris and Pratt discovered first linear time string-matching algorithm by analysis of the naive algorithm.  It keeps the information that naive approach wasted gathered during the scan of the text. By avoiding this waste of information, it achieves a running time of O(m + n).  The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes the total number of comparisons of the pattern against the input string. 5
  • 6. Naïve Approach The naïve approach is to check whether the pattern matches the string at every possible position in the string. P= Pattern (word) of length m T= Text (document) of length n Naive string matching algorithm takes time O((n-m+1)m) or O(mn) 6
  • 7. The KMP Algorithm - Motivation x j . . a b a a b . . . . . a b a a b a a b a a b a No need to repeat these comparisons Resume comparing here  Knuth-Morris-Pratt’s algorithm compares the pattern to the text in left-to-right, but shifts the pattern more intelligently than the brute-force algorithm.  When a mismatch occurs, what is the most we can shift the pattern so as to avoid redundant comparisons?  Answer: the largest prefix of P[0..j] that is a suffix of P[1..j] 7
  • 8. Components of KMP algorithm  The prefix function, Π The prefix function,Π for a pattern encapsulates knowledge about how the pattern matches against shifts of itself. This information can be used to avoid useless shifts of the pattern ‘p’. In other words, this enables avoiding backtracking on the text ‘T’.  The KMP Matcher With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in ‘T’ and returns the number of shifts of ‘p’ after which occurrence is found. 8
  • 9. The prefix function, Π Following pseudocode computes the prefix function, Π: Compute-Prefix-Function (p) 1 m  length[p] //’p’ pattern to be matched 2 Π[1]  0 3 k  0 4 for q  2 to m 5 do while k > 0 and p[k+1] != p[q] 6 do k  Π[k] 7 If p[k+1] = p[q] 8 then k  k +1 9 Π[q]  k 10 return Π 9
  • 10. Example: compute Π for the pattern ‘p’ below: a b a b a c a Initially: m = length[p] = 7 Π[1] = 0 k = 0 Step 1: q = 2, k=0 Π[2] = 0 Step 2: q = 3, k = 0, Π[3] = 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 10 p 10
  • 11. Contd… 11 Step 3: q = 4, k = 1 Π[4] = 2 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 Step 4: q = 5, k =2 Π[5] = 3 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3
  • 12. Step 5: q = 6, k = 3 Π[6] = 0 Step 6: q = 7, k = 0 Π[7] = 1 After iterating 6 times, the prefix function computation is complete:  q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 The running time of the prefix function is O(m). 12Contd…..
  • 13. The KMP Matcher Input: The KMP Matcher, with pattern ‘p’, text ‘T’and prefix function ‘Π’, finds a match of p in T. Following pseudocode computes the matching component of KMP algorithm: KMP-Matcher(T,p) 1 n  length[T] 2 m  length[p] 3 Π  Compute-Prefix-Function(p) 4 q  0 //number of characters matched 5 for i  1 to n //scan T from left to right 6 do while q > 0 and p[q+1] != T[i] 7 do q  Π[q] //next character does not match 8 if p[q+1] = T[i] 9 then q  q + 1 //next character matches 10 if q = m //is all of p matched? 11 then print “Pattern occurs with shift” i – m 12 q  Π[ q] // look for the next match Note: KMP finds every occurrence of a ‘p’in ‘T’. That is why KMP does not terminate in step 12, rather it searches remainder of ‘T’for any more occurrences of ‘p’. 13
  • 14. Illustration: given a Text ‘T’ and pattern ‘p’ as follows: T b a c b a b a b a b a c a c a p a b a b a c a Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’. For ‘p’the prefix function, Π was computed previously and is as follows: q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 0 1 14 14
  • 15. b a c b a b a b a b a c a a b a b a b a c a Initially: n = size of T = 15; m = size of p = 7 Step 1: i = 1, q = 0 comparing p[1] with T[1] T p P[1] does not match with T[1]. ‘p’ will be shifted one position to the right. 15 15 Contd… Step 2: i = 2, q = 0 comparing p[1] with T[2] T p a b a b a c a P[1] matches T[2]. Since there is a match, p is not shifted. b a c b a b a b a b a c a a b
  • 16. Contd… 16 T b a c b a b a b a b a c a a b a b a b a c ap p[2] does not match with T[3] Backtracking on p, comparing p[1] and T[3] Step 3: i = 3, q = 1 Comparing p[2] with T[3] a b a b a c a T p Step 4: i = 4, q = 0 comparing p[1] with T[4] p[1] does not match with T[4] b a c b a b a b a b a c a a b
  • 17. b a c b a b a b a b a c a a b a b a b a c a T p Step 5: i = 5, q = 0 comparing p[1] with T[5] p[1] matches with T[5] 17 17 Step 6: i = 6, q = 1 Comparing p[2] with T[6] p[2] matches with T[6] T p b a c b a b a b a b a c a a b a b a b a c a Contd…
  • 18. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a T p Step 7: i = 7, q = 2 Comparing p[3] with T[7] p[3] matches with T[7] Step 8: i = 8, q = 3 Comparing p[4] with T[8] p[4] matches with T[8] T p 18 18 Contd…
  • 19. Step 9: i = 9, q = 4 Comparing p[5] with T[9] Comparing p[6] with T[10]Step 10: i = 10, q = 5 T p b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a p[6] doesn’t match with T[10] Backtracking on p, comparing p[4] with T[10] because after mismatch q = Π[5] = 3 p[5] matches with T[9] 19 19 T p Contd…
  • 20. 20 Step 11: i = 11, q = 4 Comparing p[5] with T[11] p[5] matches with T[11] T p b a c b a b a b a b a c a a b a b a b a c a Contd… Step 12: i = 12, q = 5 Comparing p[6] with T[12] p[6] matches with T[12] a b a b a c ap b a c b a b a b a b a c a a bT
  • 21. b a c b a b a b a b a c a a b a b a b a c a Comparing p[7] with T[13] T p Step 13: i = 13, q = 6 p[7] matches with T[13] Pattern ‘p’ has been found to completely occur in text ‘T’. The total number of shifts that took place for the match to be found are: i – m = 13 – 7 = 6 shifts. The running time of the KMP-Matcher function is O(n). 21 21 Contd…
  • 22. Complexity  O(m) - It is to compute the prefix function values.  O(n) - It is to compare the pattern to the text.  Total of O(n + m) run time. 22
  • 23. Complexity comparison of String Matching Algorithms 23
  • 24. Advantage and Disadvantage Advantages: 1.The running time of the KMP algorithm is optimal (O(m + n)), which is very fast. 2.The algorithm never needs to move backwards in the input text T. It makes the algorithm good for processing very large files. Disadvantages: Doesn’t work so well as the size of the alphabets increases. By which more chances of mismatch occurs. 24
  • 25. Real time Applications  Good for plagiarism analysis.  search engines  language syntax checker  database queries  music content retrieval 25
  • 26. Real time Applications 26  DNA sequences analysis : • It is mainly composed of nucleotides of four types. The four bases in DNA are Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A DNA sequence is a representation of a string of nucleotides contained in a strand of DNA. • DNA sequences analysis of various diseases which are stored in database for retrieval and comparison. This system compares similarity values with threshold value and stores particular result which is diseased or not. • For example: ATTCGTAACTAGTAAGTTA. The DNA sequencing techniques have allowed the vast amount of data to be analyzed in a short span of time. So, pattern matching techniques plays a vital role in computational biology for data analysis related to biological data such as DNA sequences
  • 27. References  Thomas H.Cormen; Charles E.Leiserson., Introduction to algorithms second edition , “The Knuth-Morris-Pratt Algorithm”, year = 2001.  https://pdfs.semanticscholar.org/fe41/52465f96d09c94b46b86a3b6408dae5dbe13.pdf  http://research.ijcaonline.org/volume115/number23/pxc3902734.pdf 27