Index text, so substringqueries can be answered fast
The Text C G A C G C TSuffix Tree A C G T G T A C A C
The Text C G A C G C TA C G T G T A C A C Substring C G C Query
Trees take too much space.Are there smaller indices?
The Text C G A C G C TSuffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
How can one compute theSuffix Array in Linear Time?
TaskString of length n with charactersin the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
Divide and ConquerSeparate odd andeven suffixes; sort each recursively, then combine
Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to singlechars in the range 1..n/2 New text of half the length; sort suffixes recursively
Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2pairs, E’s are the even suffixes,whose order we know
Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
Merging O E A,E B,O Do we have any info to determine the relative order of anodd suffix and an even one?
The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groupsinstead of 2, so0 mod 3, 1 mod 3 and 2 mod 3
Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3pairs, 0’s are the mod 0 suffixes,whose order we know
Merging 1 2 AB,0 CD,1 We know theorder of all 0,1 suffixes!
GeneralizationSet D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
Key Property of D x<v x<vFor any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
Size of D sqrt(v)sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)