Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
Algorithms Discussed
Knuth–Morris–Pratt algorithm
Boyer–Moore string search algorithm
Bitap algorithm (for exact string searching)
-------------------
Checking whether two or more strings are same or not.
Finding a string (pattern) into another string (text). --> Looking for substring
Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
Algorithms Discussed
Knuth–Morris–Pratt algorithm
Boyer–Moore string search algorithm
Bitap algorithm (for exact string searching)
-------------------
Checking whether two or more strings are same or not.
Finding a string (pattern) into another string (text). --> Looking for substring
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Abstract: This paper will introduce a new class of fuzzy closed set (briefly F-CS) called fuzzy αm-closed set as well as, introduce the fuzzy αm-kernel set of the fuzzy topological space. The investigation will address and discuss some of the properties of the fuzzy separation axioms such as fuzzy αm-Ri space and fuzzy αm-Tj space (note that, the indexes i and j are natural numbers of the spaces R and T are from 0 to 3 and from 0 to 4 respectively).
Mathematics Subject Classification (2010):54A40.
The string matching problem is a classic of algorithms. In this class, we only look at the Rabin-Karpp algorithm as a classic example of the string matching algorithms
Here i discuss 3 algorithm about String matching.
Those algorithm are:
1. The naive algorithm.
2. The Rabin-Krap algorithm.
3. The Knuth-Morris-Pratt algorithm.
i hope,by readinng this slide, it is easy to undarstand those algorithm.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Abstract: This paper will introduce a new class of fuzzy closed set (briefly F-CS) called fuzzy αm-closed set as well as, introduce the fuzzy αm-kernel set of the fuzzy topological space. The investigation will address and discuss some of the properties of the fuzzy separation axioms such as fuzzy αm-Ri space and fuzzy αm-Tj space (note that, the indexes i and j are natural numbers of the spaces R and T are from 0 to 3 and from 0 to 4 respectively).
Mathematics Subject Classification (2010):54A40.
The string matching problem is a classic of algorithms. In this class, we only look at the Rabin-Karpp algorithm as a classic example of the string matching algorithms
Here i discuss 3 algorithm about String matching.
Those algorithm are:
1. The naive algorithm.
2. The Rabin-Krap algorithm.
3. The Knuth-Morris-Pratt algorithm.
i hope,by readinng this slide, it is easy to undarstand those algorithm.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
2. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
3. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
4. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
5. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
e.g. 4, 6, 10
4 6 10
6. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• Last lecture we saw the that text indexing problem can be solved using a suffix tree
e.g. 4, 6, 10
4 6 10
which uses O(n) space (when it’s stored compacted)
7. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• Last lecture we saw the that text indexing problem can be solved using a suffix tree
• Queries take O(m + occ) time when the alphabet size is constant
- occ is the number of occurences (matches)
e.g. 4, 6, 10
4 6 10
which uses O(n) space (when it’s stored compacted)
8. Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• Last lecture we saw the that text indexing problem can be solved using a suffix tree
• Queries take O(m + occ) time when the alphabet size is constant
- occ is the number of occurences (matches)
• Suffix trees can be constructed in O(n) time (but we only saw how to achieve O(n2) time)
e.g. 4, 6, 10
4 6 10
which uses O(n) space (when it’s stored compacted)
10. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
11. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa snsuffix
1
12. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
13. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
14. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
15. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
16. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
17. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
18. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
19. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
20. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
21. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
22. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
23. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
24. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
25. The suffix array
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
26. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
27. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
28. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
29. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
30. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in the dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
31. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
If the symbols don’t have a natural order, we use their binary representation in memory
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in the dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
32. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
33. The suffix array
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
10 2 3 4 5 6
34. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
35. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
36. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
37. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
38. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
39. From Suffix Trees to Suffix Arrays
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
Suffix Tree
1 0 625 43
n
Recall that we can get get the Suffix Array from the Suffix Tree
1 3
5
0
2 4
6
10 2 3 4 5 6
using depth-first search in O(n) time
40. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
157 100 1 43 112 1396 1485 12
c db cb a c bc c d adb
41. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
c db cb a c bc c d adb
m
42. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
Find an occurence of P
using binary search
43. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences could start anywhere
Find an occurence of P
using binary search
44. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences could start anywhere
Find an occurence of P
using binary search
45. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences could start anywhere
Find an occurence of P
using binary search
46. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c d adb>
c db cb a c bc c d adb
m
c db cb
occurences could start anywhere
Find an occurence of P
using binary search
47. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c d adb>
c db cb a c bc c d adb
m
c db cb
occurences could start anywhere
Find an occurence of P
using binary search
48. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c d adb>
c db cb a c bc c d adb
m
c db cb
occurences could start anywhere
Find an occurence of P
using binary search
49. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c d adb>
c db cb a c bc c d adb
m
c db cb
occurences must start in here
Find an occurence of P
using binary search
50. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
51. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
52. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
53. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb < c bc db
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
54. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb < c bc db
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
55. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb < c bc db
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
56. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
57. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
occurences must start in here
Find an occurence of P
using binary search
58. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
=
c db cb a c bc c d adb
c db cb
m
c db cb
occurences must start in here
Find an occurence of P
using binary search
59. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
=
c db cb a c bc c d adb
c db cb
m
c db cb
occurences must start in here
we found a match!
Find an occurence of P
using binary search
60. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
m
Find an occurence of P
using binary search
61. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
How long does this take?
m
Find an occurence of P
using binary search
62. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
How long does this take?
O(m) time to compare two strings
m
Find an occurence of P
using binary search
63. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
How long does this take?
O(m) time to compare two strings
so O(m log n) time in total
m
Find an occurence of P
using binary search
64. Searching in the Suffix Array
0
1
2
3
5
6
7
8
9
10
11
12
13
14
15
4
b c dba cb a c dbc cb d a
c dba cb a c bc c d adb
db cb a c bc c d adb
d cb a c bc c d adb
d c a c bc c d adb
c a c bc c d adb
a c bc c d adb
c bc c d adb
c b c d adb
c d a
d a
a
c d ad
c d adb
b c d adb
2
5
6
8
9
12
13
14
d c a c bc c d adb
c a c bc c d adb
c b c d adb
c d a
d a
c d ad
b c dbT a cb a c dbc cb d a
n
15 7 100Suffix Array 1 43 11 2 1396 148 5 12
c db cbP
157 100 1 43 112 1396 1485 12
find
Key Idea:
c db cb a c bc c d adb
How long does this take?
O(m) time to compare two strings
so O(m log n) time in total
This method generalises to O(m log n + occ) time
m
to find all occ occurences.
Find an occurence of P
using binary search
by continuing the binary search
(we will skip the details)
65. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
P aa n
m
an
Finding an occurrence of a pattern (length m) takes O(m log n) time
Finding all occurrences takes O(m log n + occ) time
where occ is the number of occurences
66. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
P aa n
m
an
Finding an occurrence of a pattern (length m) takes O(m log n) time
Finding all occurrences takes O(m log n + occ) time
where occ is the number of occurences
This can be further improved to O(m + log n + occ) time
(using LCP queries which we will see in a future lecture)
67. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
P aa n
m
an
Finding an occurrence of a pattern (length m) takes O(m log n) time
Finding all occurrences takes O(m log n + occ) time
where occ is the number of occurences
Do we really need to build the suffix tree to construct the suffix array?
This can be further improved to O(m + log n + occ) time
(using LCP queries which we will see in a future lecture)
68. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
69. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
70. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
71. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
72. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
73. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
74. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
R1 is split into blocks of length 3
75. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
76. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
77. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
R2 is also split into blocks of length 3
78. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
79. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
80. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
81. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1
82. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2
83. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 3
84. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34
85. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 5
86. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56
87. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
88. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
we assume that the bit representation of each symbol uses O(log n) bits.
(which is a common and realistic assumption)
This can be done by sorting the blocks in O(n) time using radix sort
89. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
90. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
91. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
92. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
93. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
1 2 3 4 5 6 70
94. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
1 2 3 4 5 6 70
0 1 6 4 2 5 3 7
95. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
1 2 3 4 5 6 70
0 1 6 4 2 5 3 7
How do we compute the suffix array for R ?
Recursion!
(Notice that R has length 2n/3)
96. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
1 2 3 4 5 6 70
0 1 6 4 2 5 3 7
97. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
what use1 2 3 4 5 6 70
0 1 6 4 2 5 3 7
is this?!
98. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $R1 =
b d ob a a b b a d $R2 = $
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
Introduce a new
“filler symbol” $.
B2 contains indices with
i mod 3 = 2
Number the blocks in lexicographical order ($ is the smallest symbol)
1 2 4 34 56 7
let R = 1 2 4 34 56 7
compute the suffix array of R :
what use1 2 3 4 5 6 70
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
is this?!
99. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
is this?!
100. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2
is this?!
101. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2
a b d ob a a b b a d
d oa b b a d
is this?!
1
5
102. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
is this?!
1
5
103. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
is this?!
1
5
104. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
is this?!
1
5
105. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
is this?!
1
5
106. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
d oa b b a d $ $
is this?!
1
5
107. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
d oa b b a d $ $
their order is given by the suffix array of R :
is this?!
1
5
108. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
d oa b b a d $ $
their order is given by the suffix array of R :
is this?!
1
5
109. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
a b d ob a a b b a d
d oa b b a d
a b d ob a a b b a d $ b d ob a a b b a d $ $
d oa b b a d $ $
their order is given by the suffix array of R :
is this?!
1
5
Suffix is smaller
than suffix
1
55
110. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
111. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
112. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
ob b a d $ b d ob a a b b a d $ $
113. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
d oa b b a d $ $
ob b a d $ b d ob a a b b a d $ $
114. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
d oa b b a d $ $
ob b a d $ b d ob a a b b a d $ $
115. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
d oa b b a d $ $
ob b a d $ b d ob a a b b a d $ $
116. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
ob b a d7
d oa b b a d $ $
ob b a d $ b d ob a a b b a d $ $
Suffix is smaller
than suffix
7
5
117. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
118. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
b d ob a a b b a d2
119. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
b d ob a a b b a d2
d oa b b a d $ $
b d ob a a b b a d $ $
120. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
b d ob a a b b a d2
d oa b b a d $ $
b d ob a a b b a d $ $
121. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
b d ob a a b b a d2
d oa b b a d $ $
b d ob a a b b a d $ $
122. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
Take any two suffixes in B1 ∪ B2 and find them in R
their order is given by the suffix array of R :
is this?!
d oa b b a d5
b d ob a a b b a d2
d oa b b a d $ $
b d ob a a b b a d $ $
Suffix is smaller
than suffix
2
5
123. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
compute the suffix array of R :
what use
0 1 6 4 2 5 3 7
0 1 2 3 4 5 6 7
is this?!
124. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
125. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
126. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
after relabelling,
127. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
after relabelling, 1 24 111058 7
128. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
after relabelling,
we have the suffix array of just the suffixes from B1 ∪ B2
1 24 111058 7
129. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
after relabelling,
we have the suffix array of just the suffixes from B1 ∪ B2
1 24 111058 7
130. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
a b d ob a a b b a d $ b d ob a a b b a d $ $
Concatenate R1 and R2 to obtain R:
B2 contains indices with
i mod 3 = 2
0 1 2 3 4 5 6 7
The suffix array of R : 0 1 6 4 2 5 3 7
after relabelling,
we have the suffix array of just the suffixes from B1 ∪ B2
1 24 111058 7
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
131. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
132. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2 1 24 111058 7
133. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
134. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
1 24 111058 7
135. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
1 24 111058 7
136. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
suffix 0 is a y
1 24 111058 7
followed by suffix 1
137. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
1 24 111058 7
138. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
1 24 111058 7
139. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
140. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
141. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
142. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
143. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
How do we find the ordering of the suffixes from B0? (where i mod 3 = 0)
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
144. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
145. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
Suffix array for just B0 6 09 3
146. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
Each suffix i ∈ B0 is represented by (T[i], r) where r is the rank of suffix (i + 1)
1 24 111058 7
(the ranks are given by the array below)
rank:
We then sort in O(n) time using radix sort
y a b d ob a a b b a d
d ob a a b b a d
oa b b a d
oa d
0
3
6
9
= + 1y
= + 4b
= + 7a
= +a 10
(y, 0)
(b, 1)
(a, 4)
(a, 6)
Suffix array for just B0 6 09 3 How do we merge these?
147. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
148. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
149. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
which is smaller, suffix or ?1 6
150. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
which is smaller, suffix or ?
+ 7a=
1 6
6
1 +a= 2
151. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
which is smaller, suffix or ?
+ 7a=
1 6
6
1 +a= 2
(a, 4)
(a, 3)
152. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
which is smaller, suffix or ?
+ 7a=
1 6
6
1 +a= 2
(a, 4)
(a, 3)
It takes O(1) time to decide
that 1 is smaller
153. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . .
which is smaller, suffix or ?
+ 7a=
1 6
6
1 +a= 2
(a, 4)
(a, 3)
It takes O(1) time to decide
that 1 is smaller
1
154. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1
which is smaller, suffix or4 6 ?
155. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1
which is smaller, suffix or4 6
+ 7a=6
4 +a= 5
?
156. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1
which is smaller, suffix or4 6
+ 7a=6
4 +a= 5
(a, 4)
(a, 5)
?
157. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1
which is smaller, suffix or4 6
+ 7a=6
4 +a= 5
(a, 4)
(a, 5)
Again, it takes O(1) time to decide
that is smaller6
?
158. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6
159. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6
which is smaller, suffix or4 9 ?
160. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6
)4(which is smaller, suffix or4 9 ? is smaller
161. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
)4(which is smaller, suffix or4 9 ? is smaller
162. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
8which is smaller, suffix or 9 ?
163. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
8
+a=
+b=
9 10
8 9
which is smaller, suffix or 9 ?
164. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
8
+a=
+b=
9 10
8 9
Uh oh! how do we compare 9 10to ?
which is smaller, suffix or 9 ?
165. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
8
Uh oh! how do we compare 9 10to ?
+a=
+b=
9
8
d
a
+
+
11
10
It still takes O(1) time to decide
that is smaller9
which is smaller, suffix or 9 ?
166. The DC3 method
y a b d ob a a b b a d
1 2 3 4 5 6 7 8 9 10 110
T =
B1 contains indices with
i mod 3 = 1
B2 contains indices with
i mod 3 = 2
Suffix array for just B1 ∪ B2
1 2 3 4 5 6 70
1 24 111058 7
Suffix array for just B0 6 09 3 How do we merge these?
Merge them like in mergesort. . . 1 6 4
Overall this merging phase takes O(n) time
(because processing each suffix takes O(1) time)
168. The DC3 method
Theorem
The DC3 algorithm constructs a suffix array in O(n) time.
Proof
Suppose T(n) is the running time. We have
T(n) = T(2n/3) + O(n)
169. The DC3 method
Theorem
The DC3 algorithm constructs a suffix array in O(n) time.
Proof
Suppose T(n) is the running time. We have
T(n) = T(2n/3) + O(n)
radix sorting and merging
recursion to construct
a suffix array of size 2n/3
170. The DC3 method
Theorem
The DC3 algorithm constructs a suffix array in O(n) time.
Proof
Suppose T(n) is the running time. We have
T(n) = T(2n/3) + O(n)
Solving this recurrence gives T(n) ∈ O(n).
radix sorting and merging
recursion to construct
a suffix array of size 2n/3
171. The suffix array
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
P aa n
m
an
Finding an occurrence of a pattern (length m) takes O(m log n) time
Finding all occurrences takes O(m log n + occ) time
where occ is the number of occurences
We can construct the suffix array in O(n) time
This can be further improved to O(m + log n + occ) time
(using LCP queries which we will see in a future lecture)