Upcoming SlideShare
×

# Suffix Array@Solr勉強会

2,959
-1

Published on

4 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
2,959
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
21
0
Likes
4
Embeds 0
No embeds

No notes for slide
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• \n
• ### Suffix Array@Solr勉強会

1. 1. Sufﬁx ArraySolr 2011/12/19 1
2. 2. • (@nobu_k)• Preferred Infrastructure (PFI FI) • •• Sedue(2 ) 2
3. 3. Sufﬁx Array• Sufﬁx Array(SA):• ( ) 1 • Sedue• SA• • +Sedue• ” - ” 3
4. 4. •• • ( ) • n-gram(q-gram)• 4
5. 5. Sufﬁx Array••• n-gram •• 5
6. 6. Sufﬁx( ) 0: mississippi 1: ississippi 2: ssissippi 3: sissippimississippi 4: issippi 5: ssippi 6: sippi 7: ippi 8: ppi 9: pi 10: i 6
7. 7. Sufﬁx Array 0: mississippi 10: i 1: ississippi 7: ippi 2: ssissippi 4: issippi 3: sissippi 1: ississippi 4: issippi 0: mississippi 5: ssippi 9: pi 6: sippi 8: ppi 7: ippi 6: sippi 8: ppi 3: sissippi 9: pi 5: ssippi10: i 2: ssissippi 7
8. 8. 10: i 7: ippi • mississippi ’si’ 4: issippi 1: ississippi • ’si’ 0: mississippi 9: pi • 8: ppi 6: 3: sippi sissippi • 5: ssippi 2: ssissippi • 3 6 8
9. 9. 10: i SA[i]: 7: ippi 4: issippi 10 7 4 1 0 9 8 6 3 5 2 1: ississippi T[i]: 0: mississippi 9: pi m i s s i s s i p p i 8: ppi 6: sippi 3: sissippi 6 5: ssippi T[SA[6]] 2: ssissippi → T[8] → “ppi” 9
10. 10. (1/3)T[i]: 1 2 3 ... n SA SA[i] 10
11. 11. (2/3)RedBull !!1. RedBull *2 RedBull SA[i] 2. RedBull 1 2 3 ... n 11
12. 12. (3/3)3. RedBull 1 2 3 ... n4.( 1, 3), ( 2, 4), ( 3, 2),...,( n, 2) 12
13. 13. • SA • + • /n-gram• SA• 13
14. 14. SA• (n-gram ) •• n-gram• • • “THIS IS IT” • proximity 14
15. 15. SA• • •• • HDD • ( )• • 15
16. 16. •• ( ) • SAIS •• HDD • (dc3, dc7)• Sedue Haskell C++ • @tanakh++ 16
17. 17. • ( ) • • • 1 100GB/day• Sedue • SA n-gram • n-gram • SA n-gram • 17
18. 18. HDD• HDD• OK • •• SSD • SSD• Sedue 20 (80MB) • SA[i] 18
19. 19. VS1. SA •2. • SSD+ 5003. • O(N) CPU4. •• malloc 19
20. 20. • Sedue 1 56 • : 40 • : 16 (UTF-16) • 2 3• • = • • SSD• 20
21. 21. SA• • 4(+1) • 2-gram • • % OK •• ” ” • 21
22. 22. • • •• • 22
23. 23. : groonga• Sedue groonga ••• Sedue groonga!! 23
24. 24. :••• (http://jubat.us/) • http://github.com/jubatus • @JubatusOfﬁcial• with NTT PF 24
25. 25. : Fluentd• Ruby• Treasure Data, Inc. • @frsyuki, @kzk_mover• Solr• gem install ﬂuentd• Visit http://ﬂuentd.org/doc/ now!! 25
26. 26. • 26
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.