Advanced Algorithms – COMS31900
Pattern Matching part one
Suffix Trees
Benjamin Sach
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c
a b a
a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
m
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c
a b a
a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
m
4
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
10
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
• A naive algorithm takes O(nm) time
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
• A naive algorithm takes O(nm) time
• Many O(n) time algorithms are known (for example KMP)
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
• We want a query time which depends only on m and occ
- occ is the number of occurences (matches)
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
• We want a query time which depends only on m and occ
- occ is the number of occurences (matches)
• We also want O(n) space and fast preprocessing (prep.) time
e.g. 4, 6, 10
4 6 10
The atomic suffix tree
TT b n aaa sn
n
The atomic suffix tree
TT b n aaa sn
n
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Every edge is labelled with a character from T
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Every edge is labelled with a character from T
• No two edges leaving the same node have the same label
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Each leaf corresponds to a suffix (so there are n leaves)
• Every edge is labelled with a character from T
• No two edges leaving the same node have the same label
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
1
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
3
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
WARNING! How long does it take to find the correct child?
There could be n edges here!
In this lecture we assume the alphabet size is a constant
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
WARNING! How long does it take to find the correct child?
There could be n edges here!
In this lecture we assume the alphabet size is a constant
This may be fine in some applications
(English text or DNA for example)
We can remove the assumption via the magic of hashing
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
P bn a
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a 6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters 23 nodes
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters 23 nodes that’s not so bad, right?
6
how large is the atomic suffix tree?
how large is the atomic suffix tree?
T a b
2
how large is the atomic suffix tree?
T a b
2
a b
b
how large is the atomic suffix tree?
T a b
2
a b
b
4 nodes
how large is the atomic suffix tree?
T a b a b baT
2 4
a b b
b b
a
b
b
a
b
b
4 nodes
9 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa bT T
2 4 6
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa ba bT
T T
2 4 6
8
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
An atomic suffix tree can have
((n/2) + 1)2 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
An atomic suffix tree can have
((n/2) + 1)2 nodes
this is too big!far
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
we only store the end points
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
we only store the end points
we actually store (4, 6)
4 6
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring 1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
TT b b this doesn’t have
n leavesbb
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
TT b b this doesn’t have
n leavesbb
TT b b $ b$
$ b$
this has n
leaves
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Step one: Add a $ (unique symbol) to T
Compacted suffix trees
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$
Compacted suffix trees
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Compacted suffix trees
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
This is normally just called1 3
5
0
2 4
6
a suffix tree
Searching in a compacted suffix tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
1 3
5
0
2 4
6
remember that an edge is
actually stored as a pair
we’re actually looking in T
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
1
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
na
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
2
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
4
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?
O(occ) because it has occ leaves
1 3
5
0
2 4
6
(and each internal node has
at least two children)
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
We can find all the matches in O(m + occ) time (by looking at the whole subtree)
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?
O(occ) because it has occ leaves
1 3
5
0
2 4
6
(and each internal node has
at least two children)
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
we actually store this as (0, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
this is stored as (1, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
ananas$ was stored as (1, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
ananas$ was stored as (1, 7)
ana is stored as (1, 3)
nas$ is stored as (4, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
stored as (6, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
nanas$
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
s$
nas$
1 3
ana
na
nas$
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
ana
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
0
2 4
nas$
a
na
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
This takes O(n) time per suffix. . .
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
This takes O(n) time per suffix. . .
so O(n2) time in total
never actually
do it like this
you should
Suffix tree summary
TT b n aaa sn
n
$
a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
1 3
5
0
2 4
6
nas$
• The (compacted) suffix tree of a (length n) text uses O(n) space
• Finding all matches of a pattern P of length m takes O(m + occ)
where occ is the number of matches
we assumed that the alphabet contained a constant number of symbols
• Suffix trees can be built in O(n) time
but we have only seen the O(n2) time method
actually
do it like this (or build a suffix array instead)
you should
Multiple text indexing
T1 b n aaa sn
n1
T2 a p slp e
n2
How can we index multiple texts?
Multiple text indexing
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
two distinct unique symbols
How can we index multiple texts?
Multiple text indexing
TT
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
two distinct unique symbols
b n aaa sn $ a p slp e &
n
How can we index multiple texts?
Multiple text indexing
TT
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
• Using the linear time method (which we omitted), this takes O(n1 + n2) time
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
• Using the linear time method (which we omitted), this takes O(n1 + n2) time
• Finding all matches of a pattern P of length m still takes O(m + occ) time
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
where occ is the number of matches
The suffix array - a sneak preview
T b n aaT a sn
n
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa snsuffix
1
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in a dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
If the symbols don’t have a natural order, we use their binary representation in memory
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in a dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
Suffix Tree
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
Suffix Tree
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
6
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
6
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
this takes O(n) time
1 3
5
0
2 4
6
10 2 3 4 5 6
Suffix tree summary
TT b n aaa sn
n
$
a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
1 3
5
0
2 4
6
nas$
• The (compacted) suffix tree of a (length n) text uses O(n) space
• Finding all matches of a pattern P of length m takes O(m + occ)
where occ is the number of matches
we assumed that the alphabet contains a constant number of symbols
• Suffix trees can be built in O(n) time
but we have only seen the O(n2) time method

Pattern Matching Part One: Suffix Trees

  • 1.
    Advanced Algorithms –COMS31900 Pattern Matching part one Suffix Trees Benjamin Sach
  • 2.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n m for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 3.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n m 4 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 4.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 5.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 10 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 6.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 7.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed) • A naive algorithm takes O(nm) time
  • 8.
    Exact pattern matching T InputA text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed) • A naive algorithm takes O(nm) time • Many O(n) time algorithms are known (for example KMP)
  • 9.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n
  • 10.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m
  • 11.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T.
  • 12.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. e.g. 4, 6, 10 4 6 10
  • 13.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) e.g. 4, 6, 10 4 6 10
  • 14.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) • We want a query time which depends only on m and occ - occ is the number of occurences (matches) e.g. 4, 6, 10 4 6 10
  • 15.
    Text indexing T Preprocess atext string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) • We want a query time which depends only on m and occ - occ is the number of occurences (matches) • We also want O(n) space and fast preprocessing (prep.) time e.g. 4, 6, 10 4 6 10
  • 16.
    The atomic suffixtree TT b n aaa sn n
  • 17.
    The atomic suffixtree TT b n aaa sn n b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes
  • 18.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 19.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 20.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 21.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 22.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 23.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 24.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 25.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 26.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 27.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6
  • 28.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path
  • 29.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Every edge is labelled with a character from T
  • 30.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Every edge is labelled with a character from T • No two edges leaving the same node have the same label
  • 31.
    The atomic suffixtree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Each leaf corresponds to a suffix (so there are n leaves) • Every edge is labelled with a character from T • No two edges leaving the same node have the same label
  • 32.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 a 6
  • 33.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 a 6
  • 34.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m a 6
  • 35.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 36.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 37.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 38.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 39.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 40.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 41.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 42.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 1 6
  • 43.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 3 6
  • 44.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 45.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 46.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 47.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 48.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 49.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 50.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 51.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 52.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time P bn a 6
  • 53.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) P bn a 6
  • 54.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) WARNING! How long does it take to find the correct child? There could be n edges here! In this lecture we assume the alphabet size is a constant P bn a 6
  • 55.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) WARNING! How long does it take to find the correct child? There could be n edges here! In this lecture we assume the alphabet size is a constant This may be fine in some applications (English text or DNA for example) We can remove the assumption via the magic of hashing P bn a 6
  • 56.
    Searching in anatomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) P bn a 6
  • 57.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a 6
  • 58.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? 6
  • 59.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes 6
  • 60.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 6
  • 61.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 6
  • 62.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 6
  • 63.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 23 nodes 6
  • 64.
    how large isthe atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 23 nodes that’s not so bad, right? 6
  • 65.
    how large isthe atomic suffix tree?
  • 66.
    how large isthe atomic suffix tree? T a b 2
  • 67.
    how large isthe atomic suffix tree? T a b 2 a b b
  • 68.
    how large isthe atomic suffix tree? T a b 2 a b b 4 nodes
  • 69.
    how large isthe atomic suffix tree? T a b a b baT 2 4 a b b b b a b b a b b 4 nodes 9 nodes
  • 70.
    how large isthe atomic suffix tree? T a b a b ba a b baa bT T 2 4 6 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes
  • 71.
    how large isthe atomic suffix tree? T a b a b ba a b baa b a b baa ba bT T T 2 4 6 8 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b
  • 72.
    how large isthe atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b
  • 73.
    how large isthe atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b An atomic suffix tree can have ((n/2) + 1)2 nodes
  • 74.
    how large isthe atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b An atomic suffix tree can have ((n/2) + 1)2 nodes this is too big!far
  • 75.
    Compacted suffix trees sn a s n a s a n a s TTb n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big?
  • 76.
    Compacted suffix trees sn a s n a s a n a s TTb n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one
  • 77.
    Compacted suffix trees sn a s n a s a n a s TTb n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge
  • 78.
    Compacted suffix trees sn a s n a s a n a s TTb n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings
  • 79.
    Compacted suffix trees sn a s n a s a n a s TTb n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters)
  • 80.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas 1 3 5 0 2 4 6
  • 81.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves 1 3 5 0 2 4 6
  • 82.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children 1 3 5 0 2 4 6
  • 83.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges 1 3 5 0 2 4 6
  • 84.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up 1 3 5 0 2 4 6 lots of space?
  • 85.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up we only store the end points 1 3 5 0 2 4 6 lots of space?
  • 86.
    Compacted suffix trees TTb n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up we only store the end points we actually store (4, 6) 4 6 1 3 5 0 2 4 6 lots of space?
  • 87.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas 1 3 5 0 2 4 6
  • 88.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T 1 3 5 0 2 4 6
  • 89.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves 1 3 5 0 2 4 6
  • 90.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children 1 3 5 0 2 4 6
  • 91.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring 1 3 5 0 2 4 6
  • 92.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character 1 3 5 0 2 4 6
  • 93.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T 1 3 5 0 2 4 6
  • 94.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 95.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 96.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist?
  • 97.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist? TT b b this doesn’t have n leavesbb
  • 98.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist? TT b b this doesn’t have n leavesbb TT b b $ b$ $ b$ this has n leaves
  • 99.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 100.
    Compacted suffix trees TTb n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Step one: Add a $ (unique symbol) to T
  • 101.
    Compacted suffix trees a s nas nas nas s na s bananas CompactedSuffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Step one: Add a $ (unique symbol) to T TT b n aaa sn n $
  • 102.
    Compacted suffix trees CompactedSuffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix Step one: Add a $ (unique symbol) to T TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 103.
    Compacted suffix trees CompactedSuffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix Step one: Add a $ (unique symbol) to T TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ This is normally just called1 3 5 0 2 4 6 a suffix tree
  • 104.
    Searching in acompacted suffix tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 105.
    Searching in acompacted suffix tree How do you find a pattern? TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 106.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 107.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 108.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 109.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a an 1 3 5 0 2 4 6
  • 110.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an 1 3 5 0 2 4 6
  • 111.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an 1 3 5 0 2 4 6 remember that an edge is actually stored as a pair we’re actually looking in T
  • 112.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na1 3 5 0 2 4 6
  • 113.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ 1 3 5 0 2 4 6
  • 114.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 115.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 116.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree 1 TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 117.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 118.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 119.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 120.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an na 1 3 5 0 2 4 6
  • 121.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 122.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 123.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 124.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 2 1 3 5 0 2 4 6
  • 125.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 4 1 3 5 0 2 4 6
  • 126.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree?1 3 5 0 2 4 6
  • 127.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree? O(occ) because it has occ leaves 1 3 5 0 2 4 6 (and each internal node has at least two children)
  • 128.
    Searching in acompacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree We can find all the matches in O(m + occ) time (by looking at the whole subtree) P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree? O(occ) because it has occ leaves 1 3 5 0 2 4 6 (and each internal node has at least two children)
  • 129.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ never actually do it like this you should
  • 130.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 131.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 132.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 133.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 134.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 135.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 we actually store this as (0, 7) never actually do it like this you should
  • 136.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 137.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 138.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 139.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ this is stored as (1, 7) never actually do it like this you should
  • 140.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 141.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 142.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 143.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 144.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 145.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 146.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ never actually do it like this you should
  • 147.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ never actually do it like this you should
  • 148.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ ananas$ never actually do it like this you should
  • 149.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ ananas$ ananas$ never actually do it like this you should
  • 150.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana never actually do it like this you should
  • 151.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana ananas$ was stored as (1, 7) never actually do it like this you should
  • 152.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana ananas$ was stored as (1, 7) ana is stored as (1, 3) nas$ is stored as (4, 7) never actually do it like this you should
  • 153.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana never actually do it like this you should
  • 154.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 155.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana stored as (6, 7) never actually do it like this you should
  • 156.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 157.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 158.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ never actually do it like this you should
  • 159.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ nanas$ never actually do it like this you should
  • 160.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ nanas$ nanas$ never actually do it like this you should
  • 161.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 s$ nas$ 1 3 ana na nas$ nas$ never actually do it like this you should
  • 162.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 163.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 164.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 165.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana ana nas$ never actually do it like this you should
  • 166.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana ana ana nas$ never actually do it like this you should
  • 167.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 0 2 4 nas$ a na never actually do it like this you should
  • 168.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 169.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 170.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 171.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 172.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 173.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 174.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 175.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 176.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ This takes O(n) time per suffix. . . never actually do it like this you should
  • 177.
    Naively constructing acompacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ This takes O(n) time per suffix. . . so O(n2) time in total never actually do it like this you should
  • 178.
    Suffix tree summary TTb n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ • The (compacted) suffix tree of a (length n) text uses O(n) space • Finding all matches of a pattern P of length m takes O(m + occ) where occ is the number of matches we assumed that the alphabet contained a constant number of symbols • Suffix trees can be built in O(n) time but we have only seen the O(n2) time method actually do it like this (or build a suffix array instead) you should
  • 179.
    Multiple text indexing T1b n aaa sn n1 T2 a p slp e n2 How can we index multiple texts?
  • 180.
    Multiple text indexing T1b n aaa sn n1 T2 a p slp e n2 $ & two distinct unique symbols How can we index multiple texts?
  • 181.
    Multiple text indexing TT T1b n aaa sn n1 T2 a p slp e n2 $ & two distinct unique symbols b n aaa sn $ a p slp e & n How can we index multiple texts?
  • 182.
    Multiple text indexing TT T1b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n How can we index multiple texts?
  • 183.
    Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 13 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 184.
    Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 13 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 185.
    Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 13 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space • Using the linear time method (which we omitted), this takes O(n1 + n2) time 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 186.
    Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 13 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space • Using the linear time method (which we omitted), this takes O(n1 + n2) time • Finding all matches of a pattern P of length m still takes O(m + occ) time 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts? where occ is the number of matches
  • 187.
    The suffix array- a sneak preview T b n aaT a sn n
  • 188.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 189.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa snsuffix 1
  • 190.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 191.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn
  • 192.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order
  • 193.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs:
  • 194.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 195.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 196.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 197.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 198.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 199.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 200.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a
  • 201.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a
  • 202.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 203.
    The suffix array- a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 204.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 205.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 206.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 207.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 208.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller) just a fancy name for the order the strings would appear in a dictionary In lexicographical ordering we sort strings based on the first symbol that differs:
  • 209.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order If the symbols don’t have a natural order, we use their binary representation in memory b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller) just a fancy name for the order the strings would appear in a dictionary In lexicographical ordering we sort strings based on the first symbol that differs:
  • 210.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 211.
    The suffix array- a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 10 2 3 4 5 6
  • 212.
    The suffix array- a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 213.
    The suffix array- a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 214.
    The suffix array- a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 215.
    The suffix array- a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n The suffix array is much smaller than the suffix tree (in terms of constants) 10 2 3 4 5 6
  • 216.
    The suffix array- a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n The suffix array is much smaller than the suffix tree (in terms of constants) 10 2 3 4 5 6
  • 217.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array Suffix Tree recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n 10 2 3 4 5 6
  • 218.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array Suffix Tree recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 219.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 220.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 221.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 222.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 223.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 224.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 225.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 226.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 227.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 228.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 229.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 230.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 231.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 232.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 233.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 234.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 235.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 236.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 237.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 238.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 239.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 240.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 241.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 242.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 243.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 244.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 245.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 246.
    Constructing the SuffixArray from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) this takes O(n) time 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 247.
    Suffix tree summary TTb n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ • The (compacted) suffix tree of a (length n) text uses O(n) space • Finding all matches of a pattern P of length m takes O(m + occ) where occ is the number of matches we assumed that the alphabet contains a constant number of symbols • Suffix trees can be built in O(n) time but we have only seen the O(n2) time method