RABIN KARP ALGORITHM STRING MATCHING

PRESENTED BY
ABHISHEK KUMAR SINGH
RABIN KARP
String Matching
Algorithm

 Let text string be T of length N
 Pattern string be P of length M
 Example
 T=“hello world”; N=11;
 P=“llo”; M=3
Problem Statement
H E L L O W O R L D
L L O

 The Rabin-Karp string matching algorithm calculates a hash value
for the pattern, as well as for each M-character subsequence of
given text to be compared.
 If the hash values for a particular subsequence are unequal, the
algorithm will calculate the hash value for next M-character
sequence.
 If the hash values are equal, the algorithm will do a Brute Force
comparison between the pattern and the M-character sequence.
 Therefore there is only one comparison per text subsequence, and
Brute Force is only needed when hash values match.
Note: Only one comparison for sub sequence
Rabin-Karp Algorithm

 Consider an M-character sequence as an M-
digit number in the alphabet. The text
subsequence t[I ... i+M-1] is mapped to the
number
x(i) = t[i]*3^0+ t[i+1]*b^1+...+ t[i+M-1]*3^(M-
1)
 Furthermore, given x(i) we can compute x(i+1)
for the next subsequence t[i+1 .. i+M] in
constant time,as follows:
x(i+1) = (x(i)-(t[i]))/3+(3^(M-1))*t[i+M-1]
Rolling Hash Function

A B D A B C
Example
tHash = Hash(“ABD”) = 1*3^0+2*3^1+4*3^2= 43
pHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34
tHash == pHash  FALSE
tHash = Hash(“BDA”) = 2*3^0+4*3^1+1*3^2 = 23
tHash = Hash(“DAB”) = 4*3^0+1*3^1+2*3^2 = 25
A B D A B C
A B D A B C
A B D A B C
tHash = Hash(“ABC”) = 30+30+40 = 34
tHash == pHash  TRUE

 TEXT :
 PATTERN :
HERE PATTERN MATCHES THE
SUBSTRING SO INDEX NO. 3 IS RETURNED
Incase of HIT
A B D A B C
A B C

int main()
{
string s1,s2;
cin>>s1>>s2;
int
l1=s1.size(),l2=s2.size(),
i,j,k;
int hs1=0,p=0,ar[l1];
for(i=0;i<l2;i++)
hs1+=(int)s2[i]*pow(3,i);
int hs2=0;
for(i=0;i<l1;i++)
{
if(i==0)
int c=0;
if(hs2==hs1)
{
for(j=i;j<i+l2;j++)
if(s1[j]==s2[j-
i])
c++;
else
break;
}
if(c==l2)
ar[p++]=i;
}
cout<<l1<<endl;
cout<<"Stored index

 Hash to two string match then it is called Hit
 There is possibility
 Hash of “abc” is 34
 Hash of “dga” is 34
 This is called Hash Collision
 Minimize Collision by
 Taking mod with prime number
HASH COLLISION

 Hash of Pattern
 O(m)
 Hash Comparison
 O(n-m+1) = O(n) ; m < n
 Average Running Time
 O(m+n)
 Worst Case Running Time
 m comparison in each iteration
 O(mn)
ANALYSIS

 Keyword matching in large files
 Good for Plagiarism detection
 Searching engines
 Database searching
 Bioinformatics

APPLICATION

1. CLRS – 2nd Edition, Page 911-916
2. slideshare.net/GajanandSharma1/rabin-karp-string-
matching-algorithm
3. stoimen.com/blog/2012/04/02/computer-algorithms-
rabin-karp-string-searching/
4. en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algori
thm
5. cs.princeton.edu/courses/archive/fall04/cos226/lectur
es/string.4up.pdf
References

RABIN KARP ALGORITHM STRING MATCHING

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Abhishek Singh

More from Abhishek Singh (7)

Recently uploaded

Recently uploaded (20)

RABIN KARP ALGORITHM STRING MATCHING

Editor's Notes