3. WHAT IS STRING MATCHING ?
• In computer science, String searching algorithms, sometimes called string matching
algorithms, that try to be find place where one or several string (also called pattern)
are found within a larger string or text.
4. EXAMPLE
A B C C A T D E F
SHIFT = 3
C A T
PATTERN MATCH
TEXT
5. STRING MATCHING ALGORITHM
• There are many types of String Matching Algorithm like:-
1. The Naive string-matching algorithm.
2. The Rabin-Karp algorithm.
3. String matching with finite automata
4. The Knuth-Morris-Pratt algorithm
6. NAÏVE STRING
MATCHING ALGORITHM
1. Initialization: Start at the beginning of the text and the beginning of the pattern.
2. Comparison: Compare each character of the pattern with the corresponding characters in the text, starting
from the current position.
3. Matching: If all characters in the pattern match the characters in the text starting from the current position,
then a match is found.
4. Move to Next Position: If a match is not found, move one character forward in the text and repeat steps 2 and
3 until the end of the text is reached.
5. Repeat: Keep repeating steps 2-4 until all occurrences of the pattern in the text are found.
7. EXAMPLE
0 1 2 3 4 5 6 7 8 9 10 11
A K A N N A L A N A C A
TEXT
PATTERN A N N A
A K A N
K A N N
A N N A
NO MATCH FOUND AT POSITION 0
NO MATCH FOUND AT POSITION 1
MATCH FOUND AT POSITION 2
8. PSEUDO-CODE
• NaiveStringMatch(Text, Pattern):
• n = length(Text)
• m = length(Pattern)
• for i = 0 to n - m
• j = 0
• while j < m and Pattern[j] = Text[i + j]
• j = j + 1
• if j = m
• print "Pattern found at position", i
9. RABIN-KARP ALGORITHM
A string search algorithm which compares a string’s hash values, rather than the
strings themselves. For efficiency, the hash value of the next position in the text is
easily computed from the hash value of the current position.
10. PROBLEM STATEMENT
• Let text string be T of length N
• Pattern string be P of the length M
• Example
• T=“Hello World”; N=11;
• P=“llo”; M=3
H E L L O W O R L D
L L O
11. EXAMPLE
A B D A B C
tHash = Hash(“ABD”) = 1*3^0+2*3^2=43
pHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34
tHash == pHash FALSE
A B D A B C
tHash = Hash(“DAB”) = 2*3^0+4*3^1+1*3^2 = 23
tHash = pHash FALSE
12. A B D A B C
A B D A B C
tHash = Hash(“DAB”) = 4*3^0+1*3^1+2*3^2 =25
tHash = pHash FALSE
tHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34
tHash = pHash TRUE
13. INCASE OF HIT
A B D A B C
A B C
TEXT
:
PATTERN :
HERE PATTERN MATCHES THE
SUBSTRING SO INDEX NUMBER 3 IS RETURNED
14. HASH COLLISION
• Hash to two string match then it is called Hit
• There is possibility
• Hash of “abc” is 34
• Hash of “dga” is 34
• This is called Hash Collision
• Minimize Collision by
• Taking mod with prime number
15. ANALYSIS
• Hash of Pattern
• O(m)
• Best Running Time
• O(n-m+1)
• Average Running Time
• O(m+n)
• Worst Case Running Time
• m comparison in each iteration
• O(mn)