SlideShare a Scribd company logo
1 of 25
Set and String Problems
• Sets and strings both represent collections of
  objects.
• difference is whether order matters.
• Sets are collections of symbols whose order is
  assumed to carry no significance .
• strings are defined by the sequence or
  arrangement of symbols .
Set and String Problems
•     I will discuss fourth subjects
    1- Set Cover
    2- Set Packing
    3- String Matching
    4- Approximate String Matching
Set Cover

• Input description: A collection of subsets S =
  {S1, . . . , Sm} of the universal setU = {1, . . . , n}.
• Problem description: What is the smallest
  subset T of S whose union equalst he universal
  set—i.e. , ∪|T|i=1Ti = U?
Set Cover

• Example:
  – U = {a, b, c, d, e}
  – S = {S1, S2, S3, S4}
  – |T|=2
  – S1 = {a, b, c}
  – S2 = {b, c, d}
  – S3 = {d, e}
  – S4 = {a, c}
  – T{S1,S3}
Set Cover

Are you allowed to cover elements more
 than once?
    • The distinction here is between set cover and set
      packing.
        set cover: allow to cover elements more than
        once.
       set packing: don’t allow to cover elements more
        than once .
Set Cover

Are your sets derived from the edges or
 vertices of a graph?
  – Set cover is a very general problem, and
   includes several useful graph problems as
   special cases.
          » vertex cover.
Set Cover & Vertex Cover
– U = {a, b, c, d, e}
                                  S1
– S1 = {a, b}                b         a

– S2 = {a}              S5                 S2
– S3 = {d, e}                      d
                        c
– S4 = {c, e}
                             S4        S3
– S5 = {b, c, d}                   e

– O(logn).
Set Cover &Greedy
•    Greedy is the most natural and effective
     heuristic for set cover .
    1.   Begin by selecting the largest subset for the cover
    2.   and then delete all its elements from the universal set. We
         add the subset containing the largest number of remaining
         uncovered.
    3.   elements repeatedly until all are covered. This heuristic always
         gives a set.
    – O(ln n) .
Set Packing
• Input description: A set of subsets S =
  {S1, . . . , Sm} of the universal set U = {1, . . . ,
  n}.
• Problem description: Select (an ideally small)
  collection of mutually disjoint subsets from S
  whose union is the universal set.
Set Packing
Must every element appear in exactly one
 selected subset
    • we seek some collection of subsets such that each
      element is covered exactly once. The airplane
      scheduling problem above has the flavor of exact
      covering, since every plane and crew has to be
      employed.
Set Packing
• Example:
  – U = {a, b, c, d, e}
  – S = {S1, S2, S3, S4}
  – |T|=2
  – S1 = {a, b, c}
  – S2 = {b, c, d}
  – S3 = {d, e}
  – S4 = {a, c}
  – T{S1,S3}
String Matching
• Input description: A text string t of length
  n. A pattern string p of length m.
• Problem description: Find the first (or all)
  instances of pattern p in the text.
String Matching
• difference
  – String Matching :Matching without error.
  – Approximate String Matching: Matching
    with error.


 Spelling checkers scan an input text for
 words appearing in the dictionary and
 reject any strings that do not match.
String Matching
• Applications:
   – Searching keywords in a file.
   – Searching engines (like Google and Openfind).
   – Database searching (GenBank).
• History of String Search
   – The brute force algorithm:
       • invented in the dawn of computer history
       • re-invented many times, still common
       • Worst O(m*n)

   – KMP algorithm:
       • Proposed by Knuth, Morris and Pratt in 1977.
       • O(m+n) .
   – Boyer-Moore Algorithm:
       • Proposed by Boyer-Moore in 1977.
       • O(n/m).
Boyer-Moore
  • Compares right to left
  •Boyer-Moore(Example )
   t[0]    t[1]     t[2]    t[3]    t[4]   t[5]    t[6]   t[7]    t[8]    t[9]   t[10]

   A       B        C       E       F      G      A       B      C       D          E

    p[0]     p[1]    p[2]    p[3]

    A      B        C       D
                             N
There is no E in the pattern : thus the pattern can’t match if any characters lie
under t[3]. So, move four boxes to the right.
Boyer-Moore
t[0]   t[1]   t[2]   t[3]   t[4]   t[5]   t[6]   t[7]   t[8]   t[9]     t[10]



A             B             C             E             F         G             A      B      C   D   E

                                                    p[0]              p[1]      p[2]   p[3]

                                                        A         B             C      D

                                                                                        N


Again, no match. But there is a B in the pattern. So move two boxes to the
right.
Boyer-Moore
t[0]   t[1]   t[2]   t[3]   t[4]   t[5]   t[6]   t[7]   t[8]    t[9]    t10]

A      B      C      E      F      G      A      B      C       D       E

                                          p[0]   p[1]    p[2]    p[3]

                                           A B          C       D

                                          Y      Y       Y      Y
Knuth-Morris-Pratt
• searches for occurrences of a "word" W
  within a main "text string" S
• Bypasses re-examination of previously
  matched characters.
Knuth-Morris-Pratt
                     (Example)
t[0] t[1] t[2]   t[3]   t[4]   t[5]   t[6]   t[7] t[8]   t[9] t[10] t[11] t[12] t[13]

A B C                   A B C D A B                                  A B C

p[0] p[1] p[2] p[3] p[4] p[5] p[6]

A B C D A B D

Y    Y     Y      N


    m=0
Knuth-Morris-Pratt
                     (Example)
t[0] t[1] t[2]   t[3]   t[4]   t[5]   t[6]   t[7] t[8]   t[9] t[10] t[11] t[12] t[13]

A B C                   A B C D A B                                  A B C

                        p[0] p[1] p[2] p[3] p[4] p[5] p[6]

                        A B C D A B D

                         Y      Y     Y      Y     Y      Y     N


 m=4
Knuth-Morris-Pratt
                     (Example)
t[0] t[1] t[2]   t[3]   t[4]   t[5]   t[6]   t[7] t[8]   t[9] t[10] t[11] t[12] t[13]

A B C                   A B C D A B                                  A B C

                                                   p[0] p[1] p[2] p[3] p[4] p[5] p[6]

                                                   A B C D A B D

                                                                 N


 m = 10
Knuth-Morris-Pratt
                     (Example )
t[0] t[1] t[2]   t[3]   t[4]   t[5]   t[6]   t[7] t[8]   t[9] t[10] t[11] t[12] t[13]

A B C                   A B C D A B                                  A B C

                                                                     p[0] p[1] p[2] ..

                                                                     A B C ..

                                                                      Y     Y     Y

 m = 11
Approximate String Matching
• Input description: A text string t and a pattern string p.
• Problem description: What is the minimum-cost way to
  transform t to p using insertions, deletions, and
  substitutions?
Approximate String Matching


Example:
Insertion: cat → cast
Deletion: cat → at
Substitution:   cat → car
Transposition: cta → cat
Approximate String Matching
• Dynamic programming provides the basic approach
   toapproximate string matching. Let D[i, j] denote the cost
   of editing the first i characters of the pattern string p into
   the first j characters of the text t. The recurrence follows
   because we must have done something with the tail
   characters pi and tj . Our only options are
   matching / substituting one for the other, deleting pi, or
   inserting a match for tj .Thus, D[i, j] is the minimum of
   the costs of these possibilities:
  1. If pi = tj then D[i − 1, j − 1] else D[i − 1, j − 1] +
   substitution cost.
  2. D[i − 1, j] + deletion cost of pi.
  3. D[i, j − 1] + deletion cost of tj .

More Related Content

What's hot

Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthASPAK2014
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Alexander Decker
 
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1IRJET- W-R0 Space in Minimal G-Closed Sets of Type1
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1IRJET Journal
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: HomologySangwoo Mo
 
Group Theory and Its Application: Beamer Presentation (PPT)
Group Theory and Its Application:   Beamer Presentation (PPT)Group Theory and Its Application:   Beamer Presentation (PPT)
Group Theory and Its Application: Beamer Presentation (PPT)SIRAJAHMAD36
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastavaBIOLOGICAL FORUM
 
Proof methods-students
Proof methods-students Proof methods-students
Proof methods-students Yassirdino
 
Functions as Relations
Functions as Relations Functions as Relations
Functions as Relations Yassirdino
 
Fixed point result in menger space with ea property
Fixed point result in menger space with ea propertyFixed point result in menger space with ea property
Fixed point result in menger space with ea propertyAlexander Decker
 
Exercise 2
Exercise 2Exercise 2
Exercise 2math126
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 
Concept map function
Concept map functionConcept map function
Concept map functionzabidah awang
 

What's hot (20)

Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded Treewidth
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
 
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
 
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1IRJET- W-R0 Space in Minimal G-Closed Sets of Type1
IRJET- W-R0 Space in Minimal G-Closed Sets of Type1
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: Homology
 
Group Theory and Its Application: Beamer Presentation (PPT)
Group Theory and Its Application:   Beamer Presentation (PPT)Group Theory and Its Application:   Beamer Presentation (PPT)
Group Theory and Its Application: Beamer Presentation (PPT)
 
Huff
HuffHuff
Huff
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
 
Galois field
Galois fieldGalois field
Galois field
 
Proof methods-students
Proof methods-students Proof methods-students
Proof methods-students
 
Functions as Relations
Functions as Relations Functions as Relations
Functions as Relations
 
Chapter 22 Finite Field
Chapter 22 Finite FieldChapter 22 Finite Field
Chapter 22 Finite Field
 
Fixed point result in menger space with ea property
Fixed point result in menger space with ea propertyFixed point result in menger space with ea property
Fixed point result in menger space with ea property
 
Exercise 2
Exercise 2Exercise 2
Exercise 2
 
Ch04
Ch04Ch04
Ch04
 
Day 5 examples
Day 5 examplesDay 5 examples
Day 5 examples
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
Concept map function
Concept map functionConcept map function
Concept map function
 
Merrk
MerrkMerrk
Merrk
 

Viewers also liked

Knuth–Morris–Pratt Algorithm | Computer Science
Knuth–Morris–Pratt Algorithm | Computer ScienceKnuth–Morris–Pratt Algorithm | Computer Science
Knuth–Morris–Pratt Algorithm | Computer ScienceTransweb Global Inc
 
Splay tree && euler tour tree
Splay tree && euler tour treeSplay tree && euler tour tree
Splay tree && euler tour treeRezwanul Haque
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresAmrinder Arora
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithmKamal Nayan
 
Masterizing php data structure 102
Masterizing php data structure 102Masterizing php data structure 102
Masterizing php data structure 102Patrick Allaert
 
Stack Data Structure & It's Application
Stack Data Structure & It's Application Stack Data Structure & It's Application
Stack Data Structure & It's Application Tech_MX
 
Queue Data Structure
Queue Data StructureQueue Data Structure
Queue Data StructureZidny Nafan
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structureeShikshak
 
Applications of stack
Applications of stackApplications of stack
Applications of stackeShikshak
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structureghhgj jhgh
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary treeZaid Shabbir
 
Queue data structure
Queue data structureQueue data structure
Queue data structureanooppjoseph
 
Trees data structure
Trees data structureTrees data structure
Trees data structureSumit Gupta
 

Viewers also liked (18)

Knuth–Morris–Pratt Algorithm | Computer Science
Knuth–Morris–Pratt Algorithm | Computer ScienceKnuth–Morris–Pratt Algorithm | Computer Science
Knuth–Morris–Pratt Algorithm | Computer Science
 
Splay tree && euler tour tree
Splay tree && euler tour treeSplay tree && euler tour tree
Splay tree && euler tour tree
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data Structures
 
06. string matching
06. string matching06. string matching
06. string matching
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Masterizing php data structure 102
Masterizing php data structure 102Masterizing php data structure 102
Masterizing php data structure 102
 
Stack a Data Structure
Stack a Data StructureStack a Data Structure
Stack a Data Structure
 
Stack Data structure
Stack Data structureStack Data structure
Stack Data structure
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
Stack Data Structure & It's Application
Stack Data Structure & It's Application Stack Data Structure & It's Application
Stack Data Structure & It's Application
 
Queue Data Structure
Queue Data StructureQueue Data Structure
Queue Data Structure
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structure
 
Applications of stack
Applications of stackApplications of stack
Applications of stack
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structure
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary tree
 
Queue data structure
Queue data structureQueue data structure
Queue data structure
 
Splay tree
Splay treeSplay tree
Splay tree
 
Trees data structure
Trees data structureTrees data structure
Trees data structure
 

Similar to Paper

[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...
[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...
[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...Mumbai B.Sc.IT Study
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqPradeep Bisht
 
Day 4b iteration and functions for-loops.pptx
Day 4b   iteration and functions  for-loops.pptxDay 4b   iteration and functions  for-loops.pptx
Day 4b iteration and functions for-loops.pptxAdrien Melquiond
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanationsGopi Saiteja
 
Datastructure tree
Datastructure treeDatastructure tree
Datastructure treerantd
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Trial pahang 2014 spm add math k2 dan skema [scan]
Trial pahang 2014 spm add math k2 dan skema [scan]Trial pahang 2014 spm add math k2 dan skema [scan]
Trial pahang 2014 spm add math k2 dan skema [scan]Cikgu Pejal
 
KARNAUGH MAP(K-MAP)
KARNAUGH MAP(K-MAP)KARNAUGH MAP(K-MAP)
KARNAUGH MAP(K-MAP)mihir jain
 
Datamining 2nd decisiontree
Datamining 2nd decisiontreeDatamining 2nd decisiontree
Datamining 2nd decisiontreesesejun
 

Similar to Paper (20)

Algorithm
AlgorithmAlgorithm
Algorithm
 
[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...
[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...
[Question Paper] Logic and Discrete Mathematics (Revised Course) [January / 2...
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
4th Semester Mechanical Engineering (June-2016) Question Papers
4th Semester Mechanical Engineering (June-2016) Question Papers4th Semester Mechanical Engineering (June-2016) Question Papers
4th Semester Mechanical Engineering (June-2016) Question Papers
 
S.Y.B.Sc. 2013 Pattern Old question Paper
S.Y.B.Sc. 2013 Pattern Old question PaperS.Y.B.Sc. 2013 Pattern Old question Paper
S.Y.B.Sc. 2013 Pattern Old question Paper
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Day 4b iteration and functions for-loops.pptx
Day 4b   iteration and functions  for-loops.pptxDay 4b   iteration and functions  for-loops.pptx
Day 4b iteration and functions for-loops.pptx
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
Tcs 2014 saved in 97-2003 format
Tcs 2014 saved in 97-2003 formatTcs 2014 saved in 97-2003 format
Tcs 2014 saved in 97-2003 format
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanations
 
4th Semester (June-2016) Computer Science and Information Science Engineering...
4th Semester (June-2016) Computer Science and Information Science Engineering...4th Semester (June-2016) Computer Science and Information Science Engineering...
4th Semester (June-2016) Computer Science and Information Science Engineering...
 
Datastructure tree
Datastructure treeDatastructure tree
Datastructure tree
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Mit6 006 f11_quiz1
Mit6 006 f11_quiz1Mit6 006 f11_quiz1
Mit6 006 f11_quiz1
 
Linear sort
Linear sortLinear sort
Linear sort
 
K031065069
K031065069K031065069
K031065069
 
Trial pahang 2014 spm add math k2 dan skema [scan]
Trial pahang 2014 spm add math k2 dan skema [scan]Trial pahang 2014 spm add math k2 dan skema [scan]
Trial pahang 2014 spm add math k2 dan skema [scan]
 
akaleshchinese.pptx
akaleshchinese.pptxakaleshchinese.pptx
akaleshchinese.pptx
 
KARNAUGH MAP(K-MAP)
KARNAUGH MAP(K-MAP)KARNAUGH MAP(K-MAP)
KARNAUGH MAP(K-MAP)
 
Datamining 2nd decisiontree
Datamining 2nd decisiontreeDatamining 2nd decisiontree
Datamining 2nd decisiontree
 

Paper

  • 1. Set and String Problems • Sets and strings both represent collections of objects. • difference is whether order matters. • Sets are collections of symbols whose order is assumed to carry no significance . • strings are defined by the sequence or arrangement of symbols .
  • 2. Set and String Problems • I will discuss fourth subjects 1- Set Cover 2- Set Packing 3- String Matching 4- Approximate String Matching
  • 3. Set Cover • Input description: A collection of subsets S = {S1, . . . , Sm} of the universal setU = {1, . . . , n}. • Problem description: What is the smallest subset T of S whose union equalst he universal set—i.e. , ∪|T|i=1Ti = U?
  • 4. Set Cover • Example: – U = {a, b, c, d, e} – S = {S1, S2, S3, S4} – |T|=2 – S1 = {a, b, c} – S2 = {b, c, d} – S3 = {d, e} – S4 = {a, c} – T{S1,S3}
  • 5. Set Cover Are you allowed to cover elements more than once? • The distinction here is between set cover and set packing. set cover: allow to cover elements more than once. set packing: don’t allow to cover elements more than once .
  • 6. Set Cover Are your sets derived from the edges or vertices of a graph? – Set cover is a very general problem, and includes several useful graph problems as special cases. » vertex cover.
  • 7. Set Cover & Vertex Cover – U = {a, b, c, d, e} S1 – S1 = {a, b} b a – S2 = {a} S5 S2 – S3 = {d, e} d c – S4 = {c, e} S4 S3 – S5 = {b, c, d} e – O(logn).
  • 8. Set Cover &Greedy • Greedy is the most natural and effective heuristic for set cover . 1. Begin by selecting the largest subset for the cover 2. and then delete all its elements from the universal set. We add the subset containing the largest number of remaining uncovered. 3. elements repeatedly until all are covered. This heuristic always gives a set. – O(ln n) .
  • 9. Set Packing • Input description: A set of subsets S = {S1, . . . , Sm} of the universal set U = {1, . . . , n}. • Problem description: Select (an ideally small) collection of mutually disjoint subsets from S whose union is the universal set.
  • 10. Set Packing Must every element appear in exactly one selected subset • we seek some collection of subsets such that each element is covered exactly once. The airplane scheduling problem above has the flavor of exact covering, since every plane and crew has to be employed.
  • 11. Set Packing • Example: – U = {a, b, c, d, e} – S = {S1, S2, S3, S4} – |T|=2 – S1 = {a, b, c} – S2 = {b, c, d} – S3 = {d, e} – S4 = {a, c} – T{S1,S3}
  • 12. String Matching • Input description: A text string t of length n. A pattern string p of length m. • Problem description: Find the first (or all) instances of pattern p in the text.
  • 13. String Matching • difference – String Matching :Matching without error. – Approximate String Matching: Matching with error. Spelling checkers scan an input text for words appearing in the dictionary and reject any strings that do not match.
  • 14. String Matching • Applications: – Searching keywords in a file. – Searching engines (like Google and Openfind). – Database searching (GenBank). • History of String Search – The brute force algorithm: • invented in the dawn of computer history • re-invented many times, still common • Worst O(m*n) – KMP algorithm: • Proposed by Knuth, Morris and Pratt in 1977. • O(m+n) . – Boyer-Moore Algorithm: • Proposed by Boyer-Moore in 1977. • O(n/m).
  • 15. Boyer-Moore • Compares right to left •Boyer-Moore(Example ) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D N There is no E in the pattern : thus the pattern can’t match if any characters lie under t[3]. So, move four boxes to the right.
  • 16. Boyer-Moore t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D N Again, no match. But there is a B in the pattern. So move two boxes to the right.
  • 17. Boyer-Moore t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10] A B C E F G A B C D E p[0] p[1] p[2] p[3] A B C D Y Y Y Y
  • 18. Knuth-Morris-Pratt • searches for occurrences of a "word" W within a main "text string" S • Bypasses re-examination of previously matched characters.
  • 19. Knuth-Morris-Pratt (Example) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13] A B C A B C D A B A B C p[0] p[1] p[2] p[3] p[4] p[5] p[6] A B C D A B D Y Y Y N m=0
  • 20. Knuth-Morris-Pratt (Example) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13] A B C A B C D A B A B C p[0] p[1] p[2] p[3] p[4] p[5] p[6] A B C D A B D Y Y Y Y Y Y N m=4
  • 21. Knuth-Morris-Pratt (Example) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13] A B C A B C D A B A B C p[0] p[1] p[2] p[3] p[4] p[5] p[6] A B C D A B D N m = 10
  • 22. Knuth-Morris-Pratt (Example ) t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t[10] t[11] t[12] t[13] A B C A B C D A B A B C p[0] p[1] p[2] .. A B C .. Y Y Y m = 11
  • 23. Approximate String Matching • Input description: A text string t and a pattern string p. • Problem description: What is the minimum-cost way to transform t to p using insertions, deletions, and substitutions?
  • 24. Approximate String Matching Example: Insertion: cat → cast Deletion: cat → at Substitution: cat → car Transposition: cta → cat
  • 25. Approximate String Matching • Dynamic programming provides the basic approach toapproximate string matching. Let D[i, j] denote the cost of editing the first i characters of the pattern string p into the first j characters of the text t. The recurrence follows because we must have done something with the tail characters pi and tj . Our only options are matching / substituting one for the other, deleting pi, or inserting a match for tj .Thus, D[i, j] is the minimum of the costs of these possibilities: 1. If pi = tj then D[i − 1, j − 1] else D[i − 1, j − 1] + substitution cost. 2. D[i − 1, j] + deletion cost of pi. 3. D[i, j − 1] + deletion cost of tj .