• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
53
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Set and String Problems• Sets and strings both represent collections of objects.• difference is whether order matters.• Sets are collections of symbols whose order is assumed to carry no significance .• strings are defined by the sequence or arrangement of symbols .
  • 2. Set and String Problems• I will discuss fourth subjects 1- Set Cover 2- Set Packing 3- String Matching 4- Approximate String Matching
  • 3. Set Cover• Input description: A collection of subsets S = {S1, . . . , Sm} of the universal setU = {1, . . . , n}.• Problem description: What is the smallest subset T of S whose union equalst he universal set—i.e. , ∪|T|i=1Ti = U?
  • 4. Set Cover• Example: – U = {a, b, c, d, e} – S = {S1, S2, S3, S4} – |T|=2 – S1 = {a, b, c} – S2 = {b, c, d} – S3 = {d, e} – S4 = {a, c} – T{S1,S3}
  • 5. Set CoverAre you allowed to cover elements more than once? • The distinction here is between set cover and set packing. set cover: allow to cover elements more than once. set packing: don’t allow to cover elements more than once .
  • 6. Set CoverAre your sets derived from the edges or vertices of a graph? – Set cover is a very general problem, and includes several useful graph problems as special cases. » vertex cover.
  • 7. Set Cover & Vertex Cover– U = {a, b, c, d, e} S1– S1 = {a, b} b a– S2 = {a} S5 S2– S3 = {d, e} d c– S4 = {c, e} S4 S3– S5 = {b, c, d} e– O(logn).
  • 8. Set Cover &Greedy• Greedy is the most natural and effective heuristic for set cover . 1. Begin by selecting the largest subset for the cover 2. and then delete all its elements from the universal set. We add the subset containing the largest number of remaining uncovered. 3. elements repeatedly until all are covered. This heuristic always gives a set. – O(ln n) .
  • 9. Set Packing• Input description: A set of subsets S = {S1, . . . , Sm} of the universal set U = {1, . . . , n}.• Problem description: Select (an ideally small) collection of mutually disjoint subsets from S whose union is the universal set.
  • 10. Set PackingMust every element appear in exactly one selected subset • we seek some collection of subsets such that each element is covered exactly once. The airplane scheduling problem above has the flavor of exact covering, since every plane and crew has to be employed.
  • 11. Set Packing• Example: – U = {a, b, c, d, e} – S = {S1, S2, S3, S4} – |T|=2 – S1 = {a, b, c} – S2 = {b, c, d} – S3 = {d, e} – S4 = {a, c} – T{S1,S3}
  • 12. String Matching• Input description: A text string t of length n. A pattern string p of length m.• Problem description: Find the first (or all) instances of pattern p in the text.
  • 13. String Matching• difference – String Matching :Matching without error. – Approximate String Matching: Matching with error. Spelling checkers scan an input text for words appearing in the dictionary and reject any strings that do not match.
  • 14. String Matching• Applications: – Searching keywords in a file. – Searching engines (like Google and Openfind). – Database searching (GenBank).• History of String Search – The brute force algorithm: • invented in the dawn of computer history • re-invented many times, still common • Worst O(m*n) – KMP algorithm: • Proposed by Knuth, Morris and Pratt in 1977. • O(m+n) . – Boyer-Moore Algorithm: • Proposed by Boyer-Moore in 1977. • O(n/m).
  • 15. Boyer-Moore • Compares right to left •Boyer-Moore(Example ) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D NThere is no E in the pattern : thus the pattern can’t match if any characters lieunder t[3]. So, move four boxes to the right.
  • 16. Boyer-Mooret[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10]A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D NAgain, no match. But there is a B in the pattern. So move two boxes to theright.
  • 17. Boyer-Mooret[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D Y Y Y Y
  • 18. Knuth-Morris-Pratt• searches for occurrences of a "word" W within a main "text string" S• Bypasses re-examination of previously matched characters.
  • 19. Knuth-Morris-Pratt (Example)t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13]A B C A B C D A B A B Cp[0] p[1] p[2] p[3] p[4] p[5] p[6]A B C D A B DY Y Y N m=0
  • 20. Knuth-Morris-Pratt (Example)t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13]A B C A B C D A B A B C p[0] p[1] p[2] p[3] p[4] p[5] p[6] A B C D A B D Y Y Y Y Y Y N m=4
  • 21. Knuth-Morris-Pratt (Example)t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13]A B C A B C D A B A B C p[0] p[1] p[2] p[3] p[4] p[5] p[6] A B C D A B D N m = 10
  • 22. Knuth-Morris-Pratt (Example )t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13]A B C A B C D A B A B C p[0] p[1] p[2] .. A B C .. Y Y Y m = 11
  • 23. Approximate String Matching• Input description: A text string t and a pattern string p.• Problem description: What is the minimum-cost way to transform t to p using insertions, deletions, and substitutions?
  • 24. Approximate String MatchingExample:Insertion: cat → castDeletion: cat → atSubstitution: cat → carTransposition: cta → cat
  • 25. Approximate String Matching• Dynamic programming provides the basic approach toapproximate string matching. Let D[i, j] denote the cost of editing the first i characters of the pattern string p into the first j characters of the text t. The recurrence follows because we must have done something with the tail characters pi and tj . Our only options are matching / substituting one for the other, deleting pi, or inserting a match for tj .Thus, D[i, j] is the minimum of the costs of these possibilities: 1. If pi = tj then D[i − 1, j − 1] else D[i − 1, j − 1] + substitution cost. 2. D[i − 1, j] + deletion cost of pi. 3. D[i, j − 1] + deletion cost of tj .