Suffix arrays
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
898
On Slideshare
784
From Embeds
114
Number of Embeds
3

Actions

Shares
Downloads
1
Comments
0
Likes
1

Embeds 114

http://software.strandls.com 97
http://10.0.0.17 16
http://www.slashdocs.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Suffix Arrays in Linear Time
  • 2. Index text, so substringqueries can be answered fast
  • 3. The Text C G A C G C TSuffix Tree A C G T G T A C A C
  • 4. The Text C G A C G C TA C G T G T A C A C Substring C G C Query
  • 5. Trees take too much space.Are there smaller indices?
  • 6. The Text C G A C G C TSuffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  • 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  • 8. How can one compute theSuffix Array in Linear Time?
  • 9. TaskString of length n with charactersin the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  • 10. Divide and ConquerSeparate odd andeven suffixes; sort each recursively, then combine
  • 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to singlechars in the range 1..n/2 New text of half the length; sort suffixes recursively
  • 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2pairs, E’s are the even suffixes,whose order we know
  • 13. Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
  • 14. Merging O E A,E B,O Do we have any info to determine the relative order of anodd suffix and an even one?
  • 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groupsinstead of 2, so0 mod 3, 1 mod 3 and 2 mod 3
  • 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  • 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3pairs, 0’s are the mod 0 suffixes,whose order we know
  • 18. Merging 1 2 AB,0 CD,1 We know theorder of all 0,1 suffixes!
  • 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  • 20. GeneralizationSet D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  • 21. Key Property of D x<v x<vFor any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  • 22. Size of D sqrt(v)sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  • 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)