High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
1. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Bioinformatics Algorithms and
Data Structures
Chapter 2: KMP Algorithm
Lecturer: Dr. Rose
Slides by: Dr. Rose
January 28, 2003
2. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Preliminaries:
– KMP can be easily explained in terms of finite
state machines.
– KMP has a easily proved linear bound
– KMP is usually not the method of choice
3. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Recall that the naïve approach to string
matching is Q(mn).
• How can we reduce this complexity?
– Avoid redundant comparisons
– Use larger shifts
• Boyer-Moore good suffix rule
• Boyer-Moore extended bad character rule
4. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• KMP finds larger shifts by recognizing
patterns in P.
– Let spi(P) denote the length of the longest
proper suffix of P[1..i] that matches a prefix of
P.
– By definition sp1 = 0 for any string.
– Q: Why does this make sense?
– A: The proper suffix must be the empty string
5. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Example: P = abcaeabcabd
– P[1..2] = ab hence sp2 = ?
– sp2 = 0
– P[1..3] = abc hence sp3 = ?
– sp3 = 0
– P[1..4] = abca hence sp4 = ?
– sp4 = 1
– P[1..5] = abcae hence sp5 = ?
– sp5 = 0
– P[1..6] = abcaea hence sp6 = ?
– sp6 = 1
6. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Example Continued
– P[1..7] = abcaeab hence sp7 = ?
– sp7 = 2
– P[1..8] = abcaeabc hence sp8 = ?
– sp8 = 3
– P[1..9] = abcaeabca hence sp9 = ?
– sp9 = 4
– P[1..10] = abcaeabcab hence sp10 = ?
– sp10 = 2
– P[1..11] = abcaeabcabd hence sp11 = ?
– sp11 = 0
7. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Like the a/a concept for Boyer-Moore, there is
an analogous spi/sp´i concept.
• Let sp´i(P) denote the length of the longest proper
suffix of P[1..i] that matches a prefix of P, with
the added condition that characters P(i + 1) and
P(sp´i + 1) are unequal.
• Example: P = abcdabce sp´7 = 3
Obviously sp´i(P) <= spi(P), since the later is less
restrictive.
8. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• KMP Shift Rule:
1. Mismatch case:
• Let position i+1 in P and position k in T be the first mismatch
in a left-to-right scan.
• Shift P to the right, aligning P[1..sp´i] with T[k- sp´i..k-1]
2. Match case:
• If no mismatch is found, an occurrence of P has been found.
• Shift P by n – sp´n spaces to continue searching for other
occurrences.
9. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Observations:
– The prefix P[1..sp´i] of the shifted P is shifted to match
the corresponding substring in T.
– Subsequent character matching proceeds from position
sp´i + 1
– Unlike Boyer-Moore, the matched substring is not
compared again.
– The shift rule based on sp´i guarantees that the exact
same mismatch won’t occur at sp´i + 1 but doesn’t
guarantee that P(sp´i+1) = T(k)
10. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Example: P = abcxabcde
– If a mismatch occurs at position 8, P will be shifted 4
positions to the right.
– Q: Where did the 4 position shift come from?
– A: The number of position is given by i - sp´i , in this
example i = 7, sp´7 = 3, 7 – 3 = 4
– Notice that we know the amount of shift without
knowing anything about T other than there was a
mismatch at position 8..
11. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
• Example Continued: P = abcxabcde
– After the shift, P[1..3] lines up with T[k-4..k-1]
– Since it known that P[1..3] must match T[k-4..k-1], no
comparison is needed.
– The scan continues from P(4) & T(k)
• Advantages of KMP Shift Rule
1. P is often shifted by more than 1 character, (i - sp´i )
2. The left-most sp´i characters in the shifted P are known
to match the corresponding characters in T.
12. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
KMP Algorithm
Full Example: T = xyabcxabcxadcdqfeg P = abcxabcde
Assume that we have already shifted past the first two
positions in T.
xyabcxabcxadcdqfeg
abcxabcde
^
1
^
2
^
3
^
4
^
5
^
6
^
7
^
8 d!=x, shift 4 places
abcxabcde
^
1 start again from position 4
13. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Preprocessing for KMP
Approach: show how to derive sp´ values from Z values.
Definition: Position j > 1 maps to i if i = j + Zj(P) – 1
– Recall that Zj(P) denotes the length of the Z-box starting at
position j.
– This says that j maps to i if i is the right end of a Z-box starting at
j.
14. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Preprocessing for KMP
Theorem. For any i > 1, sp´i(P) = Zj = i – j + 1
Where j > 1 is the smallest position that maps to i.
If j then sp´i(P) = 0
Similarly for sp:
For any i > 1, spi(P) = i – j + 1
Where j, i j > 1, is the smallest position that maps to i
or beyond.
If j then spi(P) = 0
15. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Preprocessing for KMP
Given the theorem from the preceding slide, the sp´i and spi
values can be computed in linear time using Zi values:
For i = 1 to n { sp´i = 0;}
For j = n downto 2 {
i = j + Zi(P) – 1;
sp´i = Zi;
}
spn(P) = sp´n(P);
For i = n - 1 downto 2 {
spi (P) = max[spi+1 (P) - 1, sp´i(P)];}
16. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Preprocessing for KMP
Defn. Failure function F´(i) = sp´i-1 + 1 , 1 i n + 1,
sp´0 = 0
(similarly F(i) = spi-1 + 1 , 1 i n + 1, sp0 = 0)
• Idea:
– We maintain a pointer i in P and c in T.
– After a mismatch at P(i+1) with T(c), shift P to align
P(sp´i + 1) with T(c), i.e., i = sp´i + 1.
– Special case 1: i = 1 set i = F´(1) = 1 & c = c + 1
– Special case 2: we find P in T, shift n - sp´n spaces,
i.e., i = F´(n + 1) = sp´n + 1.
17. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Full KMP Algorithm
Preprocess P to find F´(k) = sp´k-1 +1 for k from 1 to n + 1
c = 1; p = 1;
While c + (n – p) m {
While P(p) = T( c )and p n {
p = p + 1;
c = c + 1;}
If (p = n + 1) then
report an occurrence of P at position c – n of T.
if (p = 1) then c = c + 1;
p = F´(p) ;}
18. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Full KMP Algorithm
xyabcxabcxabcdefeg
abcxabcde
^
1 a!=x
p != n+1
p = 1! c = 2
p = F’(1) = 1
c = 1; p = 1;
While c + (n – p) m {
While P(p) = T( c )and p n {
p = p + 1;
c = c + 1;}
If (p = n + 1) then
report an occurrence of P at position c – n of T.
if (p = 1) then c = c + 1;
p = F´(p) ;
}
19. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Full KMP Algorithm
xyabcxabcxabcdefeg
abcxabcde
^
1 a!=y
p != n+1
p = 1! c = 3
p = F’(1) = 1
c = 1; p = 1;
While c + (n – p) m {
While P(p) = T( c )and p n {
p = p + 1;
c = c + 1;}
If (p = n + 1) then
report an occurrence of P at position c – n of T.
if (p = 1) then c = c + 1;
p = F´(p) ;
}
abcxabcde
20. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Full KMP Algorithm
xyabcxabcxabcdefeg
p != n+1
p = 8! don’t change c
p = F´(8) = 4
abcxabcde
abcxabcde
^
1
^
2
^
3
^
4
^
5
^
6
^
7
^
8 d!=x
c = 1; p = 1;
While c + (n – p) m {
While P(p) = T( c )and p n {
p = p + 1;
c = c + 1;}
If (p = n + 1) then
report an occurrence of P at position c – n of T.
if (p = 1) then c = c + 1;
p = F´(p) ;
}
21. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
p = 4, c = 10
^
4
Full KMP Algorithm
xyabcxabcxabcdefeg
p = n+1 !
abcxabcde
^
5
^
6
^
7
^
8
abcxabcde
abcxabcde
abcxabcde
c = 1; p = 1;
While c + (n – p) m {
While P(p) = T( c )and p n {
p = p + 1;
c = c + 1;}
If (p = n + 1) then
report an occurrence of P at position c – n of T.
if (p = 1) then c = c + 1;
p = F´(p) ;
}
^
9
22. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
• Q: What is meant by real-time algorithms?
• A: Typically these are algorithms that are meant
to interact synchronously in the real world.
– This implies a known fixed turn-around time for
processing a task
– Many embedded scheduling systems are examples
involving real-time algorithms.
– For KMP this means that we require a constant time
for processing all strings of length n.
23. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
• Q: Why is KMP not real-time?
• A: For any mismatched character in T, we may
try matching it several times.
– Recall that sp´i only guarantees that P(i + 1) and P(sp´i + 1) differ
– There is NO guarantee that P(i + 1) and T(k) match
• We need to ensure that a mismatch at T(k) does
NOT entail additional matches at T(k).
• This means that we have to compute sp´i values
with respect to all characters in S since any could
appear in T.
24. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
• Define: sp´(i,x)(P) to be the length of the longest
proper suffix of P[1..i] that matches a prefix of
P, with the added condition that character P(sp´i
+ 1) is x.
• This is will tell us exactly what shift to use for
each possible mismatch.
• A mismatched character T(k) will never be
involved in subsequent comparisons.
25. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
• Q: How do we know that the mismatched
character T(k) will never be involved in
subsequent comparisons?
• A: Because the shift will shift P so that either the
matching character aligns with T(k) or P will be
shifted past T(k).
• This results in a real-time version of KMP.
• Let’s consider how we can find the sp´(i,x)(P)
values in linear time.
26. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
Thm. For P[i + 1] x, sp´(i,x)(P) = i - j + 1
– Here j is the smallest position such that j maps to i and
P(Zj + 1) = x.
– If there is no such j then where sp´(i,x)(P) = 0
For i = 1 to n { sp´(i,x) = 0 for every character x;}
For j = n downto 2 {
i = j + Zi(P) – 1;
x = P(Zj + 1);
sp´(i,x) = Zi;
}
27. UNIVERSITY OF SOUTH CAROLINA
College of Engineering & Information Technology
Real-Time KMP
• Notice how this works:
– Starting from the right
• Find i the right end of the Z box associated with j
• Find x the character immediately following the prefix
corresponding to this Z box.
• Set sp´(i,x) = Zi, the length of this Z box.
For i = 1 to n { sp´(i,x) = 0 for every character x;}
For j = n downto 2 {
i = j + Zi(P) – 1;
x = P(Zj + 1);
sp´(i,x) = Zi;}