SlideShare a Scribd company logo
1 of 12
Download to read offline
Boyer-Moore
Ben Langmead
You are free to use these slides. If you do, please sign the
guestbook (www.langmead-lab.org/teaching-materials), or email
me (ben.langmead@gmail.com) and tell me briefly how you’re
using them. For original Keynote files, email me.
Department of Computer Science
Exact matching: slightly less naïve algorithm
There would have been a time for such a wordT:
P: word
word
We match w and o, then mismatch (r ≠ u)
There would have been a time for such a wordT:
P: word
word
word
word
word
skip!
skip!
... since u doesn’t occur in P, we can skip the next two alignments
Mismatched text character (u) doesn’t occur in P
Boyer-Moore
Use knowledge gained from character comparisons to skip future
alignments that definitely won’t match:
1. If we mismatch, use knowledge of the
mismatched text character to skip alignments
2. If we match some characters, use knowledge
of the matched characters to skip alignments
3. Try alignments in one direction, then try
character comparisons in opposite direction
Boyer, RS and Moore, JS. "A fast string searching algorithm." Communications of the ACM 20.10 (1977): 762-772.
“Bad character rule”
“Good suffix rule”
For longer skips
Boyer-Moore: Bad character rule
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Upon mismatch, let b be the mismatched character in T. Skip
alignments until (a) b matches its opposite in P, or (b) P moves past b.
Step 1:
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Step 2:
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Step 3:
(etc)
Case (a)
Case (b)
b
b
Boyer-Moore: Bad character rule
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Step 1:
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Step 2:
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Step 3:
We skipped 8 alignments
In fact, there are 5 characters in T we never looked at
Boyer-Moore: Bad character rule preprocessing
As soon as P is known, build a | Σ |-by-n table. Say b is the character in T that
mismatched and i is the mismatch’s offset into P. The number of skips is given
by element in bth row and ith column.
Gusfield 2.2.2 gives space-efficient alternative.
T:
P:
G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
C C T T T T G C
Boyer-Moore: Good suffix rule
Let t be the substring of T that matched a suffix of P. Skip alignments
until (a) t matches opposite characters in P, or (b) a prefix of P
matches a suffix of t, or (c) P moves past t, whichever happens first
T:
P:
C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
C T T A C T T A C
Step 1:
t
T:
P:
C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
C T T A C T T A C
Step 2:
T:
P:
C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
C T T A C T T A C
Step 3:
Case (a)
Case (b)
t
Boyer-Moore: Good suffix rule
Like with the bad character rule, the number of skips possible using the good
suffix rule can be precalculated into a few tables (Gusfield 2.2.4 and 2.2.5)
Rule on previous slide is the weak good suffix rule; there is also a strong good
suffix rule (Gusfield 2.2.3)
T:
P:
C T T G C C T A C T T A C T T A C T
C T T A C T T A C
t
C T T A C T T A C
C T T A C T T A C
Weak:
Strong:
With the strong good suffix rule (and other minor modifications), Boyer-Moore
is O(m) worst-case time. Gusfield discusses proof.
guaranteed
mismatch!
Boyer-Moore: Putting it together
After each alignment, use bad character or good suffix rule, whichever skips more
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 1:
bc: 6, gs: 0
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 2:
bc: 0, gs: 2
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 3:
bc: 2, gs: 7
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 4:
Bad character rule:
Upon mismatch, let b be the mismatched
character in T. Skip alignments until (a) b
matches its opposite in P, or (b) P moves past b.
Part (a) of good
suffix rule
Part (b) of good
suffix rule
Part (a) of bad
character rule
Good suffix rule:
Let t be the substring of T that matched a suffix of P.
Skip alignments until (a) t matches opposite characters
in P, or (b) a prefix of P matches a suffix of t, or (c) P
moves past t, whichever happens first.
b
b
b
t
t
Boyer-Moore: Putting it together
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 1:
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 2:
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 3:
T:
P:
G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
G T A G C G G C G
Step 4:
Up to now: 15 alignments skipped, 11 text characters never examined
Boyer-Moore: Worst and best cases
Boyer-Moore (or a slight variant) is O(m) worst-case time
What’s the best case?
Every character comparison is a mismatch, and bad character
rule always slides P fully past the mismatch
How many character comparisons? floor(m / n)
Contrast with naive algorithm
Performance comparison
Naïve matchingNaïve matching Boyer-MooreBoyer-Moore
# character
comparisons
wall clock time
# character
comparisons
wall clock time
P:“tomorrow”
T: Shakespeare’s
complete works
P: 50 nt string
from Alu repeat*
T: Human
reference (hg19)
chromosome 1
Comparing simple Python implementations of naïve exact matching and
Boyer-Moore exact matching:
* GCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGG
336 matches
| T | = 249 M
17 matches
| T | = 5.59 M
5,906,125 2.90 s 785,855 1.54 s
307,013,905 137 s 32,495,111 55 s

More Related Content

What's hot

Variants of Turing Machine
Variants of Turing MachineVariants of Turing Machine
Variants of Turing MachineRajendran
 
Pattern matching
Pattern matchingPattern matching
Pattern matchingshravs_188
 
Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1Srimatre K
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)Aditya pratap Singh
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
DSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsDSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsRESHAN FARAZ
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by FoysalFoysal Mahmud
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithmKamal Nayan
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expressionvaluebound
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort AlgorithmLemia Algmri
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IMohamed Loey
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - GraphMadhu Bala
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Operator precedance parsing
Operator precedance parsingOperator precedance parsing
Operator precedance parsingsanchi29
 

What's hot (20)

Variants of Turing Machine
Variants of Turing MachineVariants of Turing Machine
Variants of Turing Machine
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
DSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) QuestionsDSA (Data Structure and Algorithm) Questions
DSA (Data Structure and Algorithm) Questions
 
8 issues in pos tagging
8 issues in pos tagging8 issues in pos tagging
8 issues in pos tagging
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms I
 
Turing machine
Turing machineTuring machine
Turing machine
 
Theory of Computation Unit 2
Theory of Computation Unit 2Theory of Computation Unit 2
Theory of Computation Unit 2
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - Graph
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Operator precedance parsing
Operator precedance parsingOperator precedance parsing
Operator precedance parsing
 

Viewers also liked

Viewers also liked (6)

Plasmids
PlasmidsPlasmids
Plasmids
 
Plasmids
PlasmidsPlasmids
Plasmids
 
Plasmids
PlasmidsPlasmids
Plasmids
 
Gene cloning
Gene cloningGene cloning
Gene cloning
 
Restriction
RestrictionRestriction
Restriction
 
PCR and its types
PCR and  its typesPCR and  its types
PCR and its types
 

More from akruthi k

Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introductionakruthi k
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programsakruthi k
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overviewakruthi k
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opernakruthi k
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)akruthi k
 

More from akruthi k (10)

Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
 
Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
 
Kmp
KmpKmp
Kmp
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overview
 
Fhss
FhssFhss
Fhss
 
Dsss phy
Dsss phyDsss phy
Dsss phy
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opern
 
802.11i
802.11i802.11i
802.11i
 
802.1x
802.1x802.1x
802.1x
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)
 

Recently uploaded

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Boyer moore

  • 1. Boyer-Moore Ben Langmead You are free to use these slides. If you do, please sign the guestbook (www.langmead-lab.org/teaching-materials), or email me (ben.langmead@gmail.com) and tell me briefly how you’re using them. For original Keynote files, email me. Department of Computer Science
  • 2. Exact matching: slightly less naïve algorithm There would have been a time for such a wordT: P: word word We match w and o, then mismatch (r ≠ u) There would have been a time for such a wordT: P: word word word word word skip! skip! ... since u doesn’t occur in P, we can skip the next two alignments Mismatched text character (u) doesn’t occur in P
  • 3. Boyer-Moore Use knowledge gained from character comparisons to skip future alignments that definitely won’t match: 1. If we mismatch, use knowledge of the mismatched text character to skip alignments 2. If we match some characters, use knowledge of the matched characters to skip alignments 3. Try alignments in one direction, then try character comparisons in opposite direction Boyer, RS and Moore, JS. "A fast string searching algorithm." Communications of the ACM 20.10 (1977): 762-772. “Bad character rule” “Good suffix rule” For longer skips
  • 4. Boyer-Moore: Bad character rule T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Upon mismatch, let b be the mismatched character in T. Skip alignments until (a) b matches its opposite in P, or (b) P moves past b. Step 1: T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Step 2: T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Step 3: (etc) Case (a) Case (b) b b
  • 5. Boyer-Moore: Bad character rule T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Step 1: T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Step 2: T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C Step 3: We skipped 8 alignments In fact, there are 5 characters in T we never looked at
  • 6. Boyer-Moore: Bad character rule preprocessing As soon as P is known, build a | Σ |-by-n table. Say b is the character in T that mismatched and i is the mismatch’s offset into P. The number of skips is given by element in bth row and ith column. Gusfield 2.2.2 gives space-efficient alternative. T: P: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A C C T T T T G C
  • 7. Boyer-Moore: Good suffix rule Let t be the substring of T that matched a suffix of P. Skip alignments until (a) t matches opposite characters in P, or (b) a prefix of P matches a suffix of t, or (c) P moves past t, whichever happens first T: P: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A C T T A C T T A C Step 1: t T: P: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A C T T A C T T A C Step 2: T: P: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A C T T A C T T A C Step 3: Case (a) Case (b) t
  • 8. Boyer-Moore: Good suffix rule Like with the bad character rule, the number of skips possible using the good suffix rule can be precalculated into a few tables (Gusfield 2.2.4 and 2.2.5) Rule on previous slide is the weak good suffix rule; there is also a strong good suffix rule (Gusfield 2.2.3) T: P: C T T G C C T A C T T A C T T A C T C T T A C T T A C t C T T A C T T A C C T T A C T T A C Weak: Strong: With the strong good suffix rule (and other minor modifications), Boyer-Moore is O(m) worst-case time. Gusfield discusses proof. guaranteed mismatch!
  • 9. Boyer-Moore: Putting it together After each alignment, use bad character or good suffix rule, whichever skips more T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 1: bc: 6, gs: 0 T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 2: bc: 0, gs: 2 T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 3: bc: 2, gs: 7 T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 4: Bad character rule: Upon mismatch, let b be the mismatched character in T. Skip alignments until (a) b matches its opposite in P, or (b) P moves past b. Part (a) of good suffix rule Part (b) of good suffix rule Part (a) of bad character rule Good suffix rule: Let t be the substring of T that matched a suffix of P. Skip alignments until (a) t matches opposite characters in P, or (b) a prefix of P matches a suffix of t, or (c) P moves past t, whichever happens first. b b b t t
  • 10. Boyer-Moore: Putting it together T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 1: T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 2: T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 3: T: P: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A G T A G C G G C G Step 4: Up to now: 15 alignments skipped, 11 text characters never examined
  • 11. Boyer-Moore: Worst and best cases Boyer-Moore (or a slight variant) is O(m) worst-case time What’s the best case? Every character comparison is a mismatch, and bad character rule always slides P fully past the mismatch How many character comparisons? floor(m / n) Contrast with naive algorithm
  • 12. Performance comparison Naïve matchingNaïve matching Boyer-MooreBoyer-Moore # character comparisons wall clock time # character comparisons wall clock time P:“tomorrow” T: Shakespeare’s complete works P: 50 nt string from Alu repeat* T: Human reference (hg19) chromosome 1 Comparing simple Python implementations of naïve exact matching and Boyer-Moore exact matching: * GCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGG 336 matches | T | = 249 M 17 matches | T | = 5.59 M 5,906,125 2.90 s 785,855 1.54 s 307,013,905 137 s 32,495,111 55 s