Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rabin karp string matching algorithm


Published on

This presentation gives the basic idea about the string matching problem and describes Rabin Karp String Matching algorithm.

Published in: Education
  • Dating for everyone is here: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here

Rabin karp string matching algorithm

  1. 1. Rabin-Karp String Matching Algorithm -Gajanand Sharma U V C E B A N G A L O R E B A N G
  2. 2. Let a text string T of length n and a pattern string P of length m are given such that m<=n, then the string matching problem is to find all occurrences of P in T. Example- T = “KARNATAKA” and P=“NAT” Applications: • Searching keywords in a file • Searching engines • Database searching Problem Statement…
  3. 3. T : Given text or String e.g. – “JAIPURISCALLEDPARISOFINDIA” |T| : 26, length of T Substring: Ti,j=TiT i+1…Tj e.g. – “PARIS” Subsequence of T: deleting zero or more characters from T e.g. –“RISALL” and “JISINA” Prefix of T: T1,k e.g. –“JAIPUR” is a prefix of T. Suffix of T: Th,|T| e.g. –“INDIA” is a suffix of T. Notations…
  4. 4. The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for each M-character subsequence of given text to be compared. If the hash values for a particular subsequence are unequal, the algorithm will calculate the hash value for next M-character sequence. If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. Therefore there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. Rabin-Karp Algorithm…
  5. 5. String:- LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLM Pattern:-LLLLLM Let Hash Value of “LLLLLL”= 0; And Hash Value of “LLLLLM”= 1; Step-1: LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLM LLLLLM ------------ 0 != 1 (One Comparison) Step-2: LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLM LLLLLM ------------ 0 != 1(One Comparison) :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: :::: Step-N: LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLM LLLLLM --------- 1 = 1 Hash Value is matching so compare elements one by one.- (6+1 Comparisons) Rabin-Karp Example…
  6. 6. Length of pattern = M; Hash(p) = hash value of pattern; Hash(t) = hash value of first M letters in body of text; do if (hash(p) == hash(t)) brute force comparison of pattern and selected section of text hash(t) = hash value of next section of text, one character over while (end of text or brute force comparison == true) Pseudo Code…
  7. 7. Let’s associate one integer with every letter of the alphabet. Hence we can say ‘A’ corresponds to 1, ‘B’ corresponds to 2 , ‘C’ corresponds to 3…… Similarly all other letters are corresponding to their index values. The Hash Value of the String “CAT” will be- 3*100 + 1*10 + 20 = 330 Calculating Hash Value… Index Position
  8. 8. If the hash value matches for two strings then it is called a ‘hit’. It may be possible that two or more different strings produce the same hash value. String 1: “CBJ” hash code=3*100 + 2*10 + 10 = 330 String 2: “CAT” hash code=3*100 + 1*10 + 20 = 330 Hence it is necessary to check whether it is a hit or not? Any hit will have to be tested to verify that it is not spurious and that p[1..m] = T[s+1..s+m] What if two values collide…
  9. 9. Let’s take an m-character sequence as an m-digit number in base b. The text subsequence t[ i .. i + m-1] is mapped to the number as follows- x(i) = t[i].𝑏 𝑚−1 + t[i+1].𝑏 𝑚−2 + t[i+2].𝑏 𝑚−3 +………..t[i+m-1] If m is very large then the hash value will be very large in size, so we can hash the value by taking mod a prime number, say q. h(i)=((t[i] 𝑏 𝑚−1 mod q) +(t[i+1] 𝑏 𝑚−2 mod q) + ...+(t[i+M-1] mod q))mod q Mathematical Function…
  10. 10. • If a large prime number is used to calculate hash function then there a very low possibility of being hashed values equal for two different patterns. • In this case searching takes O(N) time, where N is number of characters in the text body. • In worst case the time complexity may be O(MN), where M is no. of characters in the pattern. This case may occur when the prime no. chosen is very small. Complexity…