Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Suffix ArraySolr       2011/12/19       1
•              (@nobu_k)• Preferred Infrastructure (PFI   FI)  •  •• Sedue(2 )                      2
Suffix Array•   Suffix Array(SA):•   (      )                       1    •    Sedue•   SA•    •    +Sedue•                  ...
••    •              (       )    • n-gram(q-gram)•                       4
Suffix Array••• n-gram  ••                5
Suffix(                )                   0:   mississippi                   1:   ississippi                   2:   ssissi...
Suffix Array 0:   mississippi       10:   i 1:   ississippi         7:   ippi 2:   ssissippi          4:   issippi 3:   sis...
10:   i 7:   ippi          •   mississippi    ’si’ 4:   issippi 1:   ississippi    •   ’si’ 0:   mississippi 9:   pi      ...
10:   i             SA[i]: 7:   ippi 4:   issippi       10 7 4 1 0 9 8 6 3 5 2 1:   ississippi     T[i]: 0:   mississippi ...
(1/3)T[i]:   1    2         3   ...        n                            SA            SA[i]             10
(2/3)RedBull            !!1. RedBull                             *2    RedBull        SA[i]          2.    RedBull     1  ...
(3/3)3.     RedBull      1          2      3        ...          n4.(     1, 3), (       2, 4), (          3, 2),...,(   n...
•       SA    •                  +    •        /n-gram• SA•               13
SA•                        (n-gram        )    ••               n-gram•    •        •   “THIS IS IT”    •   proximity     ...
SA•    •    ••    •   HDD    •         (        )•    •                  15
••                                   (   )    •   SAIS    ••   HDD    •                                   (dc3, dc7)•   Se...
•                         (             )    •    •        •    1       100GB/day•   Sedue    •   SA                n-gram...
HDD•               HDD•                     OK    •    ••   SSD    •   SSD•   Sedue              20       (80MB)    •   SA...
VS1.   SA     •2.     •    SSD+              5003.     •    O(N)       CPU4.     ••           malloc                      ...
•   Sedue   1                  56    •           : 40    •             : 16          (UTF-16)    •                        ...
SA•    •            4(+1)        •   2-gram    •        •                %        OK    ••   ”        ”    •              ...
•    •    ••    •        22
: groonga• Sedue   groonga  ••• Sedue       groonga!!                    23
:•••              (http://jubat.us/)    •   http://github.com/jubatus    •   @JubatusOfficial•                    with NTT ...
: Fluentd•            Ruby• Treasure Data, Inc.  • @frsyuki, @kzk_mover• Solr• gem install fluentd• Visit http://fluentd.org...
•    26
Upcoming SlideShare
Loading in …5
×

Suffix Array@Solr勉強会

4,250 views

Published on

第7回Solr勉強会での発表資料。

  • Be the first to comment

Suffix Array@Solr勉強会

  1. 1. Suffix ArraySolr 2011/12/19 1
  2. 2. • (@nobu_k)• Preferred Infrastructure (PFI FI) • •• Sedue(2 ) 2
  3. 3. Suffix Array• Suffix Array(SA):• ( ) 1 • Sedue• SA• • +Sedue• ” - ” 3
  4. 4. •• • ( ) • n-gram(q-gram)• 4
  5. 5. Suffix Array••• n-gram •• 5
  6. 6. Suffix( ) 0: mississippi 1: ississippi 2: ssissippi 3: sissippimississippi 4: issippi 5: ssippi 6: sippi 7: ippi 8: ppi 9: pi 10: i 6
  7. 7. Suffix Array 0: mississippi 10: i 1: ississippi 7: ippi 2: ssissippi 4: issippi 3: sissippi 1: ississippi 4: issippi 0: mississippi 5: ssippi 9: pi 6: sippi 8: ppi 7: ippi 6: sippi 8: ppi 3: sissippi 9: pi 5: ssippi10: i 2: ssissippi 7
  8. 8. 10: i 7: ippi • mississippi ’si’ 4: issippi 1: ississippi • ’si’ 0: mississippi 9: pi • 8: ppi 6: 3: sippi sissippi • 5: ssippi 2: ssissippi • 3 6 8
  9. 9. 10: i SA[i]: 7: ippi 4: issippi 10 7 4 1 0 9 8 6 3 5 2 1: ississippi T[i]: 0: mississippi 9: pi m i s s i s s i p p i 8: ppi 6: sippi 3: sissippi 6 5: ssippi T[SA[6]] 2: ssissippi → T[8] → “ppi” 9
  10. 10. (1/3)T[i]: 1 2 3 ... n SA SA[i] 10
  11. 11. (2/3)RedBull !!1. RedBull *2 RedBull SA[i] 2. RedBull 1 2 3 ... n 11
  12. 12. (3/3)3. RedBull 1 2 3 ... n4.( 1, 3), ( 2, 4), ( 3, 2),...,( n, 2) 12
  13. 13. • SA • + • /n-gram• SA• 13
  14. 14. SA• (n-gram ) •• n-gram• • • “THIS IS IT” • proximity 14
  15. 15. SA• • •• • HDD • ( )• • 15
  16. 16. •• ( ) • SAIS •• HDD • (dc3, dc7)• Sedue Haskell C++ • @tanakh++ 16
  17. 17. • ( ) • • • 1 100GB/day• Sedue • SA n-gram • n-gram • SA n-gram • 17
  18. 18. HDD• HDD• OK • •• SSD • SSD• Sedue 20 (80MB) • SA[i] 18
  19. 19. VS1. SA •2. • SSD+ 5003. • O(N) CPU4. •• malloc 19
  20. 20. • Sedue 1 56 • : 40 • : 16 (UTF-16) • 2 3• • = • • SSD• 20
  21. 21. SA• • 4(+1) • 2-gram • • % OK •• ” ” • 21
  22. 22. • • •• • 22
  23. 23. : groonga• Sedue groonga ••• Sedue groonga!! 23
  24. 24. :••• (http://jubat.us/) • http://github.com/jubatus • @JubatusOfficial• with NTT PF 24
  25. 25. : Fluentd• Ruby• Treasure Data, Inc. • @frsyuki, @kzk_mover• Solr• gem install fluentd• Visit http://fluentd.org/doc/ now!! 25
  26. 26. • 26

×