Suffix Array@Solr勉強会

Dec. 19, 2011
1 of 26

Suffix Array@Solr勉強会

• 1. Sufﬁx Array Solr 2011/12/19 1
• 2. (@nobu_k) • Preferred Infrastructure (PFI FI) • • • Sedue(2 ) 2
• 3. Sufﬁx Array • Sufﬁx Array(SA): • ( ) 1 • Sedue • SA • • +Sedue • ” - ” 3
• 4. • • • ( ) • n-gram(q-gram) • 4
• 6. Sufﬁx( ) 0: mississippi 1: ississippi 2: ssissippi 3: sissippi mississippi 4: issippi 5: ssippi 6: sippi 7: ippi 8: ppi 9: pi 10: i 6
• 7. Sufﬁx Array 0: mississippi 10: i 1: ississippi 7: ippi 2: ssissippi 4: issippi 3: sissippi 1: ississippi 4: issippi 0: mississippi 5: ssippi 9: pi 6: sippi 8: ppi 7: ippi 6: sippi 8: ppi 3: sissippi 9: pi 5: ssippi 10: i 2: ssissippi 7
• 8. 10: i 7: ippi • mississippi ’si’ 4: issippi 1: ississippi • ’si’ 0: mississippi 9: pi • 8: ppi 6: 3: sippi sissippi • 5: ssippi 2: ssissippi • 3 6 8
• 9. 10: i SA[i]: 7: ippi 4: issippi 10 7 4 1 0 9 8 6 3 5 2 1: ississippi T[i]: 0: mississippi 9: pi m i s s i s s i p p i 8: ppi 6: sippi 3: sissippi 6 5: ssippi T[SA[6]] 2: ssissippi → T[8] → “ppi” 9
• 10. (1/3) T[i]: 1 2 3 ... n SA SA[i] 10
• 11. (2/3) RedBull !! 1. RedBull *2 RedBull SA[i] 2. RedBull 1 2 3 ... n 11
• 12. (3/3) 3. RedBull 1 2 3 ... n 4. ( 1, 3), ( 2, 4), ( 3, 2),...,( n, 2) 12
• 13. SA • + • /n-gram • SA • 13
• 14. SA • (n-gram ) • • n-gram • • • “THIS IS IT” • proximity 14
• 15. SA • • • • • HDD • ( ) • • 15
• 16. • • ( ) • SAIS • • HDD • (dc3, dc7) • Sedue Haskell C++ • @tanakh++ 16
• 17. ( ) • • • 1 100GB/day • Sedue • SA n-gram • n-gram • SA n-gram • 17
• 18. HDD • HDD • OK • • • SSD • SSD • Sedue 20 (80MB) • SA[i] 18
• 19. VS 1. SA • 2. • SSD+ 500 3. • O(N) CPU 4. • • malloc 19
• 20. Sedue 1 56 • : 40 • : 16 (UTF-16) • 2 3 • • = • • SSD • 20
• 21. SA • • 4(+1) • 2-gram • • % OK • • ” ” • 21
• 22. • • • • 22
• 23. : groonga • Sedue groonga • • • Sedue groonga!! 23
• 24. : • • • (http://jubat.us/) • http://github.com/jubatus • @JubatusOfﬁcial • with NTT PF 24
• 25. : Fluentd • Ruby • Treasure Data, Inc. • @frsyuki, @kzk_mover • Solr • gem install ﬂuentd • Visit http://ﬂuentd.org/doc/ now!! 25
• 26. 26

1. \n
2. \n
3. \n
4. \n
5. \n
6. \n
7. \n
8. \n
9. \n
10. \n
11. \n
12. \n
13. \n
14. \n
15. \n
16. \n
17. \n
18. \n
19. \n
20. \n
21. \n
22. \n
23. \n
24. \n
25. \n
26. \n