Suffix Arrays in Linear Time
Index text, so substringqueries can be answered fast
The Text                                 C       G   A       C       G       C   TSuffix Tree        A                    ...
The Text                       C       G   A       C       G       C      TA                  C                           ...
Trees take too much space.Are there smaller indices?
The Text                                               C           G       A       C       G       C   TSuffix Tree       ...
The Text                 C       G       A       C       G       C       T                                 Burrows-Wheeler...
How can one compute theSuffix Array in Linear Time?
TaskString of length n with charactersin the range 1..n          Sort these           suffixes      lexicographically     ...
Divide and ConquerSeparate odd andeven suffixes; sort each recursively,  then combine
Sorting Even Suffixes                     A1 A2                             A3 A4  Sort these n/2  pairs and map  them to ...
Sorting Odd Suffixes                        O1      O2      O3      O4                       A1,E1   A2,E2   A3,E3   A4,E4...
Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
Merging                          O     E                          A,E   B,O Do we have any info   to determine the  relati...
The Trick                   Sanders, Karkkainnen                      0      1      2 Split suffixes  into 3 groupsinstead...
Sorting 0 and 1 Together                   ABCDEFGHIJKL Sort these 2n/3triplets and map them to single      chars         ...
Sorting Suffixes in 2                         21     22      23     24                       A1,01   A2,02   A3,03   A4,04...
Merging                     1      2                    AB,0   CD,1 We know theorder of all 0,1   suffixes!
Time Complexity  T(n) = O(n) + T(2n/3) + O(n)  O(n)
GeneralizationSet D of indices mod v                           v                     2v         3v                        ...
Key Property of D                        x<v                                        x<vFor any 2 indices i and j          ...
Size of D                                       sqrt(v)sqrt(v)          There exists a Difference          Cover of size 1...
Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv)  T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)     For |D|=2.5 sqrt(v)
Upcoming SlideShare
Loading in...5
×

Suffix arrays

619

Published on

Suffix Arrays in Linear Time.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
619
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Suffix arrays

  1. 1. Suffix Arrays in Linear Time
  2. 2. Index text, so substringqueries can be answered fast
  3. 3. The Text C G A C G C TSuffix Tree A C G T G T A C A C
  4. 4. The Text C G A C G C TA C G T G T A C A C Substring C G C Query
  5. 5. Trees take too much space.Are there smaller indices?
  6. 6. The Text C G A C G C TSuffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  7. 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  8. 8. How can one compute theSuffix Array in Linear Time?
  9. 9. TaskString of length n with charactersin the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  10. 10. Divide and ConquerSeparate odd andeven suffixes; sort each recursively, then combine
  11. 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to singlechars in the range 1..n/2 New text of half the length; sort suffixes recursively
  12. 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2pairs, E’s are the even suffixes,whose order we know
  13. 13. Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
  14. 14. Merging O E A,E B,O Do we have any info to determine the relative order of anodd suffix and an even one?
  15. 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groupsinstead of 2, so0 mod 3, 1 mod 3 and 2 mod 3
  16. 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  17. 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3pairs, 0’s are the mod 0 suffixes,whose order we know
  18. 18. Merging 1 2 AB,0 CD,1 We know theorder of all 0,1 suffixes!
  19. 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  20. 20. GeneralizationSet D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  21. 21. Key Property of D x<v x<vFor any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  22. 22. Size of D sqrt(v)sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  23. 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×