Suffix arrays


Published on

Suffix Arrays in Linear Time.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Suffix arrays

  1. 1. Suffix Arrays in Linear Time
  2. 2. Index text, so substringqueries can be answered fast
  3. 3. The Text C G A C G C TSuffix Tree A C G T G T A C A C
  4. 4. The Text C G A C G C TA C G T G T A C A C Substring C G C Query
  5. 5. Trees take too much space.Are there smaller indices?
  6. 6. The Text C G A C G C TSuffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  7. 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  8. 8. How can one compute theSuffix Array in Linear Time?
  9. 9. TaskString of length n with charactersin the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  10. 10. Divide and ConquerSeparate odd andeven suffixes; sort each recursively, then combine
  11. 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to singlechars in the range 1..n/2 New text of half the length; sort suffixes recursively
  12. 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2pairs, E’s are the even suffixes,whose order we know
  13. 13. Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
  14. 14. Merging O E A,E B,O Do we have any info to determine the relative order of anodd suffix and an even one?
  15. 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groupsinstead of 2, so0 mod 3, 1 mod 3 and 2 mod 3
  16. 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  17. 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3pairs, 0’s are the mod 0 suffixes,whose order we know
  18. 18. Merging 1 2 AB,0 CD,1 We know theorder of all 0,1 suffixes!
  19. 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  20. 20. GeneralizationSet D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  21. 21. Key Property of D x<v x<vFor any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  22. 22. Size of D sqrt(v)sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  23. 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.