Upcoming SlideShare
×

# Suffix arrays

619

Published on

Suffix Arrays in Linear Time.

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
619
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
3
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Suffix arrays

1. 1. Suffix Arrays in Linear Time
2. 2. Index text, so substringqueries can be answered fast
3. 3. The Text C G A C G C TSuffix Tree A C G T G T A C A C
4. 4. The Text C G A C G C TA C G T G T A C A C Substring C G C Query
5. 5. Trees take too much space.Are there smaller indices?
6. 6. The Text C G A C G C TSuffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
7. 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
8. 8. How can one compute theSuffix Array in Linear Time?
9. 9. TaskString of length n with charactersin the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
10. 10. Divide and ConquerSeparate odd andeven suffixes; sort each recursively, then combine
11. 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to singlechars in the range 1..n/2 New text of half the length; sort suffixes recursively
12. 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2pairs, E’s are the even suffixes,whose order we know
13. 13. Time ComplexityT(n) = O(n) + T(n/2) + Time for merging even and odd suffixesO(n)
14. 14. Merging O E A,E B,O Do we have any info to determine the relative order of anodd suffix and an even one?
15. 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groupsinstead of 2, so0 mod 3, 1 mod 3 and 2 mod 3
16. 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
17. 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3pairs, 0’s are the mod 0 suffixes,whose order we know
18. 18. Merging 1 2 AB,0 CD,1 We know theorder of all 0,1 suffixes!
19. 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
20. 20. GeneralizationSet D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
21. 21. Key Property of D x<v x<vFor any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
22. 22. Size of D sqrt(v)sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
23. 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.