The string matching problem is a classic of algorithms. In this class, we only look at the Rabin-Karpp algorithm as a classic example of the string matching algorithms
Here i discuss 3 algorithm about String matching.
Those algorithm are:
1. The naive algorithm.
2. The Rabin-Krap algorithm.
3. The Knuth-Morris-Pratt algorithm.
i hope,by readinng this slide, it is easy to undarstand those algorithm.
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
WHAT IS PATTERN RECOGNISITION ?
Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of the pattern within the text.
Example: T = 000010001010001 and P = 0001, the occurrences are:
first occurrence starts at T[1]
second occurrence starts at T[5]
third occurrence starts at T[11]
APPLICATION :
Image preprocessing
• Computer vision
• Artificial intelligence
• Radar signal classification/analysis
• Speech recognition/understanding
• Fingerprint identification
• Character (letter or number) recognition
• Handwriting analysis
• Electro-cardiographic signal analysis/understanding
• Medical diagnosis
• Data mining/reduction
ALGORITHM DESCRIPTION
The Knuth-Morris-Pratt (KMP) algorithm:
It was published by Donald E. Knuth, James H.Morris and Vaughan R. Pratt, 1977 in: “Fast Pattern Matching in Strings.“
To illustrate the ideas of the algorithm, consider the following example:T = xyxxyxyxyyxyxyxyyxyxyxxy
And P = xyxyyxyxyxx
it considers shifts in order from 1 to n-m, and determines if the pattern matches at that shift. The difference is that the KMP algorithm uses information gleaned from partial matches of the pattern and text to skip over shifts that are guaranteed not to result in a match.
The text and pattern are included in Figure 1, with numbering, to make it easier to follow.
1.Consider the situation when P[1……3] is successfully matched with T[1……..3]. We then find a mismatch: P[4] = T[4]. Based on our knowledge that P[1…… 3] =T[1…… 3], and ignoring symbols of the pattern and text after position 3, what can we deduce about where a potential match might be? In this case, the algorithm slides the pattern 2 positions to the right so that P[1] is lined up with T[3]. The next comparison is between P[2] and T[4].
P: x y x y y x y x y x x
q: 1 2 3 4 5 6 7 8 9 10 11
_(q): 0 0 1 2 0 3
Table 1: Table of values for pattern P.
Time Complexity :
The call to compute prefix is O(m)
using q as the value of the potential function, we argue in the same manner as above to show the loop is O(n)
Therefore the overall complexity is O(m + n)
Boyer-Moore algorithm:
It was developed by Bob Boyer and J Strother Moore in 1977. The algorithm preprocesses the pattern string that is being searched in text string.
String pattern matching - Boyer-Moore:
This algorithm uses fail-functions to shift the pattern efficiently. Boyer-Moore starts however at the end of the pattern, which can result in larger shifts.Two heuristics are used:1: if we encounter a mismatch at character c in Q, we can shift to the first occurrence of c in P from the right:
Q a b c a b c d g a b c e a b c d a c e d
P a b c e b c d
a b c e b c d
(restart here)
Time Complexity:
• performs the comparisons from right to left;
• preprocessing phase in O(m+ ) t
Here i discuss 3 algorithm about String matching.
Those algorithm are:
1. The naive algorithm.
2. The Rabin-Krap algorithm.
3. The Knuth-Morris-Pratt algorithm.
i hope,by readinng this slide, it is easy to undarstand those algorithm.
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
WHAT IS PATTERN RECOGNISITION ?
Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of the pattern within the text.
Example: T = 000010001010001 and P = 0001, the occurrences are:
first occurrence starts at T[1]
second occurrence starts at T[5]
third occurrence starts at T[11]
APPLICATION :
Image preprocessing
• Computer vision
• Artificial intelligence
• Radar signal classification/analysis
• Speech recognition/understanding
• Fingerprint identification
• Character (letter or number) recognition
• Handwriting analysis
• Electro-cardiographic signal analysis/understanding
• Medical diagnosis
• Data mining/reduction
ALGORITHM DESCRIPTION
The Knuth-Morris-Pratt (KMP) algorithm:
It was published by Donald E. Knuth, James H.Morris and Vaughan R. Pratt, 1977 in: “Fast Pattern Matching in Strings.“
To illustrate the ideas of the algorithm, consider the following example:T = xyxxyxyxyyxyxyxyyxyxyxxy
And P = xyxyyxyxyxx
it considers shifts in order from 1 to n-m, and determines if the pattern matches at that shift. The difference is that the KMP algorithm uses information gleaned from partial matches of the pattern and text to skip over shifts that are guaranteed not to result in a match.
The text and pattern are included in Figure 1, with numbering, to make it easier to follow.
1.Consider the situation when P[1……3] is successfully matched with T[1……..3]. We then find a mismatch: P[4] = T[4]. Based on our knowledge that P[1…… 3] =T[1…… 3], and ignoring symbols of the pattern and text after position 3, what can we deduce about where a potential match might be? In this case, the algorithm slides the pattern 2 positions to the right so that P[1] is lined up with T[3]. The next comparison is between P[2] and T[4].
P: x y x y y x y x y x x
q: 1 2 3 4 5 6 7 8 9 10 11
_(q): 0 0 1 2 0 3
Table 1: Table of values for pattern P.
Time Complexity :
The call to compute prefix is O(m)
using q as the value of the potential function, we argue in the same manner as above to show the loop is O(n)
Therefore the overall complexity is O(m + n)
Boyer-Moore algorithm:
It was developed by Bob Boyer and J Strother Moore in 1977. The algorithm preprocesses the pattern string that is being searched in text string.
String pattern matching - Boyer-Moore:
This algorithm uses fail-functions to shift the pattern efficiently. Boyer-Moore starts however at the end of the pattern, which can result in larger shifts.Two heuristics are used:1: if we encounter a mismatch at character c in Q, we can shift to the first occurrence of c in P from the right:
Q a b c a b c d g a b c e a b c d a c e d
P a b c e b c d
a b c e b c d
(restart here)
Time Complexity:
• performs the comparisons from right to left;
• preprocessing phase in O(m+ ) t
Algorithms Discussed
Knuth–Morris–Pratt algorithm
Boyer–Moore string search algorithm
Bitap algorithm (for exact string searching)
-------------------
Checking whether two or more strings are same or not.
Finding a string (pattern) into another string (text). --> Looking for substring
The presenation gives a brief detail about a searching algorithm known as "Rabin-Karp Algorithm". The presentation contains the history of the algorithm and its working alongwith the example.
In this approach, the pattern is made to slide over text one by one and test for a match. If a match is found while testing, then it returns the starting index number from where the pattern is found in the text and then slides by 1 again to check for subsequent matches of the pattern in the text. Copy the link given below and paste it in new browser window to get more information on Naive String Matching Algorithm:- http://www.transtutors.com/homework-help/computer-science/naive-string-matching-algorithm.aspx
RABIN KARP algorithm with hash function and hash collision, analysis, algorithm and code for implementation. Besides it contains applications of RABIN KARP algorithm also
Algorithms Discussed
Knuth–Morris–Pratt algorithm
Boyer–Moore string search algorithm
Bitap algorithm (for exact string searching)
-------------------
Checking whether two or more strings are same or not.
Finding a string (pattern) into another string (text). --> Looking for substring
The presenation gives a brief detail about a searching algorithm known as "Rabin-Karp Algorithm". The presentation contains the history of the algorithm and its working alongwith the example.
In this approach, the pattern is made to slide over text one by one and test for a match. If a match is found while testing, then it returns the starting index number from where the pattern is found in the text and then slides by 1 again to check for subsequent matches of the pattern in the text. Copy the link given below and paste it in new browser window to get more information on Naive String Matching Algorithm:- http://www.transtutors.com/homework-help/computer-science/naive-string-matching-algorithm.aspx
RABIN KARP algorithm with hash function and hash collision, analysis, algorithm and code for implementation. Besides it contains applications of RABIN KARP algorithm also
Abstract:
This paper deals with the vital role of primitive polynomials for designing PN sequence generators. The standard LFSR (linear feedback shift register) used for pattern generation may give repetitive patterns. Which are in certain cases is not efficient for complete test coverage. The programmable LFSR based on primitive polynomial generates maximum-length PRPG with maximum fault coverage.
Hand-outs of a lecture on Aho-Corarick string matching algorithm on Biosequence Algorithms course at University of Eastern Finland, Kuopio, in Spring 2012
This talk is going to be centered on two papers that are going to appear in the following months:
Neerja Mhaskar and Michael Soltys, Non-repetitive strings over alphabet lists
to appear in WALCOM, February 2015.
Neerja Mhaskar and Michael Soltys, String Shuffle: Circuits and Graphs
to appear in the Journal of Discrete Algorithms, January 2015.
Visit http://soltys.cs.csuci.edu for more details (these two papers are number 3 and 19 on the page), as well as Python programs that can be used to illustrate the ideas in the papers. We are going to introduce some basic concepts related to computations on string, present some recent results, and propose two open problems.
Zero. Probabilystic Foundation of Theoretyical PhysicsGunn Quznetsov
No need models - the fundamental theoretical physics is a part of classical probability theory (the part that considers the probability of dot events in the 3 + 1 space-time).
Senior data scientist and founder of the company Intelligentia Data I+D SA de CV. We are offering consultancy services, development of projects and products in Machine Learning, Big Data, Data Sciences and Artificial Intelligence.
My first set of slides (The NN and DL class I am preparing for the fall)... I included the problem of Vanishing Gradient and the need to have ReLu (Mentioning btw the saturation problem inherited from Hebbian Learning)
It has been almost 62 years since the invention of the term Artificial Intelligence by Samuel and Minsky et al. at the Dartmouth workshop College in 1956 (“Dartmouth Summer Research Project on Artificial Intelligence”) where this new area of Computer Science was invented. However, the history of Artificial Intelligence goes back to previous millennia, when the Greeks in their Myths spoke about golden robots at Hephaestus, and the Galatea of Pygmalion. They were the first automatons known at the dawn of history, and although these first attempts were only myths, automatons were invented and built through multiple civilizations in history. Nevertheless, these automatons resembled in quite limited way their final objectives, representing animals and humans. In spite of that, the greatest illusion of an automaton, the Turk by Wolfgang von Kempelen, inspired many people, trough its exhibitions, as Alexander Graham Bell and Charles Babbage to develop inventions that would change forever human history. Thus, the importance of the concept “Artificial Intelligence” as a driver of our technological dreams. And although Artificial Intelligence has never been defined in a precise practical way, the amount of research and methods that have been developed to tackle some of its basics tasks have been and are quite humongous. Thus, the importance of having an introduction to the concepts of Artificial Intelligence, thus the dream can continue.
A review of one of the most popular methods of clustering, a part of what is know as unsupervised learning, K-Means. Here, we go from the basic heuristic used to solve the NP-Hard problem to an approximation algorithm K-Centers. Additionally, we look at variations coming from the Fuzzy Set ideas. In the future, we will add more about On-Line algorithms in the line of Stochastic Gradient Ideas...
Here a Review of the Combination of Machine Learning models from Bayesian Averaging, Committees to Boosting... Specifically An statistical analysis of Boosting is done
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
2. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
2 / 40
3. What is string matching?
Given two sequences of characters drawn from a finite alphabet Σ,
T [1..n] and P [1..m]
a b c a b a a b c a b a c a b
a b a a
TEXT T
PATTERN P
s=3
Valid Shift
3 / 40
4. Where...
A Valid Shift
P occurs with a valid shift s if for some 0 ≤ s ≤ n − m =⇒
T [s + 1..s + m] == P [1..m].
Otherwise
it is an invalid shift.
Thus
The string-matching problem is the problem of of finding all valid shifts
given a patten P on a text T.
4 / 40
5. Where...
A Valid Shift
P occurs with a valid shift s if for some 0 ≤ s ≤ n − m =⇒
T [s + 1..s + m] == P [1..m].
Otherwise
it is an invalid shift.
Thus
The string-matching problem is the problem of of finding all valid shifts
given a patten P on a text T.
4 / 40
6. Where...
A Valid Shift
P occurs with a valid shift s if for some 0 ≤ s ≤ n − m =⇒
T [s + 1..s + m] == P [1..m].
Otherwise
it is an invalid shift.
Thus
The string-matching problem is the problem of of finding all valid shifts
given a patten P on a text T.
4 / 40
7. Possible Algorithms
We have the following ones
Algorithm Preprocessing Time Worst Case Matching Time
Naive 0 O ((n − m + 1) m)
Rabin-Karp Θ (m) O ((n − m + 1) m)
Finite Automaton O (m |Σ|) Θ (n)
Knuth-Morris-Pratt Θ (m) Θ (n)
5 / 40
8. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
6 / 40
9. Notation and Terminology
Definition
We denote by Σ∗ (read “sigma-star”) the set of all finite length strings
formed using characters from the alphabet Σ.
Constraint
We assume a finite length strings.
Some basic concepts
The zero empty string, , also belong to Σ∗.
The length of a string x is denoted |x|.
The concatenation of two strings x and y, denoted xy, has length
|x| + |y|
7 / 40
10. Notation and Terminology
Definition
We denote by Σ∗ (read “sigma-star”) the set of all finite length strings
formed using characters from the alphabet Σ.
Constraint
We assume a finite length strings.
Some basic concepts
The zero empty string, , also belong to Σ∗.
The length of a string x is denoted |x|.
The concatenation of two strings x and y, denoted xy, has length
|x| + |y|
7 / 40
11. Notation and Terminology
Definition
We denote by Σ∗ (read “sigma-star”) the set of all finite length strings
formed using characters from the alphabet Σ.
Constraint
We assume a finite length strings.
Some basic concepts
The zero empty string, , also belong to Σ∗.
The length of a string x is denoted |x|.
The concatenation of two strings x and y, denoted xy, has length
|x| + |y|
7 / 40
12. Notation and Terminology
Definition
We denote by Σ∗ (read “sigma-star”) the set of all finite length strings
formed using characters from the alphabet Σ.
Constraint
We assume a finite length strings.
Some basic concepts
The zero empty string, , also belong to Σ∗.
The length of a string x is denoted |x|.
The concatenation of two strings x and y, denoted xy, has length
|x| + |y|
7 / 40
13. Notation and Terminology
Definition
We denote by Σ∗ (read “sigma-star”) the set of all finite length strings
formed using characters from the alphabet Σ.
Constraint
We assume a finite length strings.
Some basic concepts
The zero empty string, , also belong to Σ∗.
The length of a string x is denoted |x|.
The concatenation of two strings x and y, denoted xy, has length
|x| + |y|
7 / 40
14. Notation and Terminology
Prefix
A string w is a prefix x, w x if x = wy for some string y ∈ Σ∗.
Suffix
A string w is a suffix x, w x if x = yw for some string y ∈ Σ∗.
Properties
Prefix: If w x ⇒ |w| ≤ |x|
Suffix: If w x ⇒ |w| ≤ |x|
The is both suffix and prefix of every string.
8 / 40
15. Notation and Terminology
Prefix
A string w is a prefix x, w x if x = wy for some string y ∈ Σ∗.
Suffix
A string w is a suffix x, w x if x = yw for some string y ∈ Σ∗.
Properties
Prefix: If w x ⇒ |w| ≤ |x|
Suffix: If w x ⇒ |w| ≤ |x|
The is both suffix and prefix of every string.
8 / 40
16. Notation and Terminology
Prefix
A string w is a prefix x, w x if x = wy for some string y ∈ Σ∗.
Suffix
A string w is a suffix x, w x if x = yw for some string y ∈ Σ∗.
Properties
Prefix: If w x ⇒ |w| ≤ |x|
Suffix: If w x ⇒ |w| ≤ |x|
The is both suffix and prefix of every string.
8 / 40
17. Notation and Terminology
Prefix
A string w is a prefix x, w x if x = wy for some string y ∈ Σ∗.
Suffix
A string w is a suffix x, w x if x = yw for some string y ∈ Σ∗.
Properties
Prefix: If w x ⇒ |w| ≤ |x|
Suffix: If w x ⇒ |w| ≤ |x|
The is both suffix and prefix of every string.
8 / 40
18. Notation and Terminology
Prefix
A string w is a prefix x, w x if x = wy for some string y ∈ Σ∗.
Suffix
A string w is a suffix x, w x if x = yw for some string y ∈ Σ∗.
Properties
Prefix: If w x ⇒ |w| ≤ |x|
Suffix: If w x ⇒ |w| ≤ |x|
The is both suffix and prefix of every string.
8 / 40
19. Notation and Terminology
In addition
For any string x and y and any character a, we have w x if and
only if aw ax
In addition, and are transitive relations.
9 / 40
20. Notation and Terminology
In addition
For any string x and y and any character a, we have w x if and
only if aw ax
In addition, and are transitive relations.
9 / 40
21. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
10 / 40
22. The naive string algorithm
Algorithm
NAIVE-STRING-MATCHER(T, P)
1 n = T.length
2 m = P.length
3 for s = 0 to n − m
4 if P [1..m] == T [s + 1..s + m]
5 print “Pattern occurs with shift” s
Complexity
O((n − m + 1)m) or Θ n2 if m = n
2
11 / 40
23. The naive string algorithm
Algorithm
NAIVE-STRING-MATCHER(T, P)
1 n = T.length
2 m = P.length
3 for s = 0 to n − m
4 if P [1..m] == T [s + 1..s + m]
5 print “Pattern occurs with shift” s
Complexity
O((n − m + 1)m) or Θ n2 if m = n
2
11 / 40
24. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
12 / 40
25. A more elaborated algorithm
Rabin-Karp algorithm
Lets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimal
number:
c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . . + 10ck−1 + ck
Thus
p correspond the decimal number for pattern P [1..m].
ts denote decimal value of m-length substring T [s + 1..s + m] for
s = 0, 1, ..., n − m.
13 / 40
26. A more elaborated algorithm
Rabin-Karp algorithm
Lets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimal
number:
c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . . + 10ck−1 + ck
Thus
p correspond the decimal number for pattern P [1..m].
ts denote decimal value of m-length substring T [s + 1..s + m] for
s = 0, 1, ..., n − m.
13 / 40
27. A more elaborated algorithm
Rabin-Karp algorithm
Lets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimal
number:
c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . . + 10ck−1 + ck
Thus
p correspond the decimal number for pattern P [1..m].
ts denote decimal value of m-length substring T [s + 1..s + m] for
s = 0, 1, ..., n − m.
13 / 40
28. A more elaborated algorithm
Rabin-Karp algorithm
Lets assume that Σ = {0, 1, ..., 9} then we have the following:
Thus, each string of k consecutive characters is a k-length decimal
number:
c1c2 · · · ck−1ck = 10k−1c1 + 10k−2c2 + . . . + 10ck−1 + ck
Thus
p correspond the decimal number for pattern P [1..m].
ts denote decimal value of m-length substring T [s + 1..s + m] for
s = 0, 1, ..., n − m.
13 / 40
30. Now, think about this
What if we put everything in a single word of the machine
If we can compute p in Θ (m).
If we can compute all the ts in Θ (n − m + 1).
We will get
Θ (m) + Θ (n − m + 1) = Θ (n)
15 / 40
31. Now, think about this
What if we put everything in a single word of the machine
If we can compute p in Θ (m).
If we can compute all the ts in Θ (n − m + 1).
We will get
Θ (m) + Θ (n − m + 1) = Θ (n)
15 / 40
32. Now, think about this
What if we put everything in a single word of the machine
If we can compute p in Θ (m).
If we can compute all the ts in Θ (n − m + 1).
We will get
Θ (m) + Θ (n − m + 1) = Θ (n)
15 / 40
33. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
16 / 40
34. Remember Horner’s Rule
Consider
We can compute, the decimal representation by the Horner’s rule:
p = P[m] + 10(P[m − 1] + 10(P[m − 2] + ... + 10(P[2] + 10P[1])...)) (1)
Thus, we can compute t0 using this rule in
Θ (m) (2)
17 / 40
35. Remember Horner’s Rule
Consider
We can compute, the decimal representation by the Horner’s rule:
p = P[m] + 10(P[m − 1] + 10(P[m − 2] + ... + 10(P[2] + 10P[1])...)) (1)
Thus, we can compute t0 using this rule in
Θ (m) (2)
17 / 40
36. Example
If you have the following set of digits
2 4 0 1
Using the Horner’s Rule for m = 4
2401 = 1 + 10 × (0 + 10 × (4 + 10 × 2))
= 2 × 103
+ 4 × 102
+ 0 × 10 + 1
18 / 40
37. Example
If you have the following set of digits
2 4 0 1
Using the Horner’s Rule for m = 4
2401 = 1 + 10 × (0 + 10 × (4 + 10 × 2))
= 2 × 103
+ 4 × 102
+ 0 × 10 + 1
18 / 40
38. Then
To compute the remaining values, we can use the previous value
ts+1 = 10 ts − 10m−1
T [s + 1] + T [s + m + 1] . (3)
Notice the following
Subtracting from it 10m−1T [s + 1] removes the high-order digit from
ts.
Multiplying the result by 10 shifts the number left by one digit
position.
Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
39. Then
To compute the remaining values, we can use the previous value
ts+1 = 10 ts − 10m−1
T [s + 1] + T [s + m + 1] . (3)
Notice the following
Subtracting from it 10m−1T [s + 1] removes the high-order digit from
ts.
Multiplying the result by 10 shifts the number left by one digit
position.
Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
40. Then
To compute the remaining values, we can use the previous value
ts+1 = 10 ts − 10m−1
T [s + 1] + T [s + m + 1] . (3)
Notice the following
Subtracting from it 10m−1T [s + 1] removes the high-order digit from
ts.
Multiplying the result by 10 shifts the number left by one digit
position.
Adding T [s + m + 1] brings in the appropriate low-order digit.
19 / 40
41. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
42. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
43. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
44. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
45. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
46. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
47. Example
Imagine that you have...
index 1 2 3 4 5 6
digit 1 0 2 4 1 0
1 We have t0 = 1024 then we want to calculate 0241
2 Then we subtract 103 × T [1] == 1000 of 1024
3 We get 0024
4 Multiply by 10, and we get 0240
5 We add T [5] == 1
6 Finally, we get 0241
20 / 40
48. Remarks
First
We can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?
We can use the module
Meaning
Compute p and ts values modulo a suitable modulus q.
21 / 40
49. Remarks
First
We can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?
We can use the module
Meaning
Compute p and ts values modulo a suitable modulus q.
21 / 40
50. Remarks
First
We can extend this beyond the decimals to any d digit system!!!
What happens when the numbers are quite large?
We can use the module
Meaning
Compute p and ts values modulo a suitable modulus q.
21 / 40
51. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
22 / 40
53. Then, to reduce the possible representation
Use the module of q
It is possible to compute p mod q in Θ (m) time and all the ts mod q in
Θ (n − m + 1) time.
Something Notable
If we choose the modulus q as a prime such that 10q just fits within one
computer word, then we can perform all the necessary computations with
single-precision arithmetic.
24 / 40
54. Then, to reduce the possible representation
Use the module of q
It is possible to compute p mod q in Θ (m) time and all the ts mod q in
Θ (n − m + 1) time.
Something Notable
If we choose the modulus q as a prime such that 10q just fits within one
computer word, then we can perform all the necessary computations with
single-precision arithmetic.
24 / 40
55. Why 10q?
After all
10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operation
For example given two numbers a and n, we can do the following
q = trunc
a
n
Then
r = a − nq
25 / 40
56. Why 10q?
After all
10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operation
For example given two numbers a and n, we can do the following
q = trunc
a
n
Then
r = a − nq
25 / 40
57. Why 10q?
After all
10q is the number that will subtracting for or multiplying by!!!
We use “truncated division” to implement the modulo operation
For example given two numbers a and n, we can do the following
q = trunc
a
n
Then
r = a − nq
25 / 40
58. Why 10q?
Thus
If q is a prime we can use this as the element of truncated division then
r = a − 10q.
Truncated Algorithm
Truncated-Module(a, q)
1 r = a − 10q
2 while r > 10q
3 r = r − 10q
4 return r
26 / 40
59. Why 10q?
Thus
If q is a prime we can use this as the element of truncated division then
r = a − 10q.
Truncated Algorithm
Truncated-Module(a, q)
1 r = a − 10q
2 while r > 10q
3 r = r − 10q
4 return r
26 / 40
60. Then
Thus
Then, we can implement this in basic arithmetic CPU operations.
Not only that
In general, for a d-ary alphabet, we choose q such that dq fits within a
computer word.
27 / 40
61. Then
Thus
Then, we can implement this in basic arithmetic CPU operations.
Not only that
In general, for a d-ary alphabet, we choose q such that dq fits within a
computer word.
27 / 40
62. In general, we have
The following
ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-order
position of an m-digit text window.
Here
We have a small problem!!!
Question?
Can we differentiate between p mod q and ts mod q?
28 / 40
63. In general, we have
The following
ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-order
position of an m-digit text window.
Here
We have a small problem!!!
Question?
Can we differentiate between p mod q and ts mod q?
28 / 40
64. In general, we have
The following
ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q (4)
where h ≡ dm−1(mod q) is the value of the digit “1” in the high-order
position of an m-digit text window.
Here
We have a small problem!!!
Question?
Can we differentiate between p mod q and ts mod q?
28 / 40
65. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
66. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
67. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
68. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
69. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
70. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
71. What?
Look at this with q = 11
14 mod 11 == 3
25 mod 11 == 3
But 14 = 25
However
11 mod 11 == 0
25 mod 11 == 3
We can say that 11 = 25!!!
This means
We can use the modulo to differentiate numbers, but not to exactly to say
if they are equal!!!
29 / 40
72. Thus
We have the following logic
If ts ≡ p (mod q) does not mean that ts == p.
If ts ≡ p (mod q), we have that ts = p.
Fixing the problem
To fix this problem we simply test to see if the hit is not spurious.
Note
If q is large enough, then we can hope that spurious hits occur infrequently
enough that the cost of the extra checking is low.
30 / 40
73. Thus
We have the following logic
If ts ≡ p (mod q) does not mean that ts == p.
If ts ≡ p (mod q), we have that ts = p.
Fixing the problem
To fix this problem we simply test to see if the hit is not spurious.
Note
If q is large enough, then we can hope that spurious hits occur infrequently
enough that the cost of the extra checking is low.
30 / 40
74. Thus
We have the following logic
If ts ≡ p (mod q) does not mean that ts == p.
If ts ≡ p (mod q), we have that ts = p.
Fixing the problem
To fix this problem we simply test to see if the hit is not spurious.
Note
If q is large enough, then we can hope that spurious hits occur infrequently
enough that the cost of the extra checking is low.
30 / 40
75. Thus
We have the following logic
If ts ≡ p (mod q) does not mean that ts == p.
If ts ≡ p (mod q), we have that ts = p.
Fixing the problem
To fix this problem we simply test to see if the hit is not spurious.
Note
If q is large enough, then we can hope that spurious hits occur infrequently
enough that the cost of the extra checking is low.
30 / 40
76. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
31 / 40
77. The Final Algorithm
Rabin-Karp-Matcher(T, P, d, q)
1 n = T.length
2 m = P.length
3 h = dm−1
mod q // Storing the reminder of the highest power
4 p = 0
5 t0 = 0
6 for i = 1 to m // Preprocessing
7 p = (dp + P [i]) mod q
8 t0 = (dp + T [i]) mod q
9 for s = 0 to n − m
10 if p == ts
11 if P [1..m] == T [s + 1..s + m] // Actually a Loop
12 print “Pattern occurs with shift” s
13 if s < n − m
14 ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q
32 / 40
78. The Final Algorithm
Rabin-Karp-Matcher(T, P, d, q)
1 n = T.length
2 m = P.length
3 h = dm−1
mod q // Storing the reminder of the highest power
4 p = 0
5 t0 = 0
6 for i = 1 to m // Preprocessing
7 p = (dp + P [i]) mod q
8 t0 = (dp + T [i]) mod q
9 for s = 0 to n − m
10 if p == ts
11 if P [1..m] == T [s + 1..s + m] // Actually a Loop
12 print “Pattern occurs with shift” s
13 if s < n − m
14 ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q
32 / 40
79. The Final Algorithm
Rabin-Karp-Matcher(T, P, d, q)
1 n = T.length
2 m = P.length
3 h = dm−1
mod q // Storing the reminder of the highest power
4 p = 0
5 t0 = 0
6 for i = 1 to m // Preprocessing
7 p = (dp + P [i]) mod q
8 t0 = (dp + T [i]) mod q
9 for s = 0 to n − m
10 if p == ts
11 if P [1..m] == T [s + 1..s + m] // Actually a Loop
12 print “Pattern occurs with shift” s
13 if s < n − m
14 ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q
32 / 40
80. The Final Algorithm
Rabin-Karp-Matcher(T, P, d, q)
1 n = T.length
2 m = P.length
3 h = dm−1
mod q // Storing the reminder of the highest power
4 p = 0
5 t0 = 0
6 for i = 1 to m // Preprocessing
7 p = (dp + P [i]) mod q
8 t0 = (dp + T [i]) mod q
9 for s = 0 to n − m
10 if p == ts
11 if P [1..m] == T [s + 1..s + m] // Actually a Loop
12 print “Pattern occurs with shift” s
13 if s < n − m
14 ts+1 = (d(ts − T[s + 1]h) + T[s + m + 1]) mod q
32 / 40
81. Complexity
Preprocessing
1 p mod q is done in Θ(m).
2 tss mod q is done in Θ(n − m + 1).
Then checking P[1..m] == T[s + 1..s + m]
In the worst case, Θ((n − m + 1)m)
33 / 40
82. Complexity
Preprocessing
1 p mod q is done in Θ(m).
2 tss mod q is done in Θ(n − m + 1).
Then checking P[1..m] == T[s + 1..s + m]
In the worst case, Θ((n − m + 1)m)
33 / 40
83. Still we can do better!
First
The number of spurious hits is O (n/q).
Because
We can estimate the chance that an arbitrary ts will be equivalent to p
mod q as 1/q.
Properties
Since there are O (n) positions at which the test of line 10 fails (Thus, you
have O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
84. Still we can do better!
First
The number of spurious hits is O (n/q).
Because
We can estimate the chance that an arbitrary ts will be equivalent to p
mod q as 1/q.
Properties
Since there are O (n) positions at which the test of line 10 fails (Thus, you
have O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
85. Still we can do better!
First
The number of spurious hits is O (n/q).
Because
We can estimate the chance that an arbitrary ts will be equivalent to p
mod q as 1/q.
Properties
Since there are O (n) positions at which the test of line 10 fails (Thus, you
have O (n/q) non valid hits) and we spend O (m) time per hit
34 / 40
86. Finally, we have that
The expected matching time
The expected matching time by Rabin-Karp algorithm is
O(n) + O m v +
n
q
where v is the number of valid shifts.
In addition
If v = O(1) (Number of valid shifts small) and choose q ≥ m such that
n
q = O(1) (q to be larger enough than the pattern’s length).
Then
The algorithm takes O(m + n) for finding the matches.
Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
87. Finally, we have that
The expected matching time
The expected matching time by Rabin-Karp algorithm is
O(n) + O m v +
n
q
where v is the number of valid shifts.
In addition
If v = O(1) (Number of valid shifts small) and choose q ≥ m such that
n
q = O(1) (q to be larger enough than the pattern’s length).
Then
The algorithm takes O(m + n) for finding the matches.
Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
88. Finally, we have that
The expected matching time
The expected matching time by Rabin-Karp algorithm is
O(n) + O m v +
n
q
where v is the number of valid shifts.
In addition
If v = O(1) (Number of valid shifts small) and choose q ≥ m such that
n
q = O(1) (q to be larger enough than the pattern’s length).
Then
The algorithm takes O(m + n) for finding the matches.
Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
89. Finally, we have that
The expected matching time
The expected matching time by Rabin-Karp algorithm is
O(n) + O m v +
n
q
where v is the number of valid shifts.
In addition
If v = O(1) (Number of valid shifts small) and choose q ≥ m such that
n
q = O(1) (q to be larger enough than the pattern’s length).
Then
The algorithm takes O(m + n) for finding the matches.
Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
90. Finally, we have that
The expected matching time
The expected matching time by Rabin-Karp algorithm is
O(n) + O m v +
n
q
where v is the number of valid shifts.
In addition
If v = O(1) (Number of valid shifts small) and choose q ≥ m such that
n
q = O(1) (q to be larger enough than the pattern’s length).
Then
The algorithm takes O(m + n) for finding the matches.
Finally, because m ≤ n, thus the expected time is O(n).
35 / 40
91. Outline
1 String Matching
Introduction
Some Notation
2 Naive Algorithm
Using Brute Force
3 The Rabin-Karp Algorithm
Efficiency After All
Horner’s Rule
Generaiting Possible Matches
The Final Algorithm
Other Methods
4 Exercises
36 / 40
92. We have other methods
We have the following ones
Algorithm Preprocessing Time Worst Case Matching Time
Rabin-Karp Θ (m) O ((n − m + 1) m)
Finite Automaton O (m |Σ|) Θ (n)
Knuth-Morris-Pratt Θ (m) Θ (n)
37 / 40
93. Remarks about Knuth-Morris-Pratt
The Algorithm
It is quite an elegant algorithm that improves over the state machine.
How?
It avoid to compute the transition function in the state machine by using
the prefix function π
It encapsulate information how the pattern matches against shifts of
itself
38 / 40
94. Remarks about Knuth-Morris-Pratt
The Algorithm
It is quite an elegant algorithm that improves over the state machine.
How?
It avoid to compute the transition function in the state machine by using
the prefix function π
It encapsulate information how the pattern matches against shifts of
itself
38 / 40
95. However, At the same time (Circa 1977)
Boyer–Moore string search algorithm
It was presented at the same time
It is used in the GREP function for pattern matching in UNIX
Actually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)
He gave a a proof of the algorithm with an upper bound of 3m
comparisons in the worst case!!!
39 / 40
96. However, At the same time (Circa 1977)
Boyer–Moore string search algorithm
It was presented at the same time
It is used in the GREP function for pattern matching in UNIX
Actually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)
He gave a a proof of the algorithm with an upper bound of 3m
comparisons in the worst case!!!
39 / 40
97. However, At the same time (Circa 1977)
Boyer–Moore string search algorithm
It was presented at the same time
It is used in the GREP function for pattern matching in UNIX
Actually is the basic algorithm to beat when doing research in this area!!!
Richard Cole (Circa 1991)
He gave a a proof of the algorithm with an upper bound of 3m
comparisons in the worst case!!!
39 / 40