Suffix ArraySolr       2011/12/19       1
•              (@nobu_k)• Preferred Infrastructure (PFI   FI)  •  •• Sedue(2 )                      2
Suffix Array•   Suffix Array(SA):•   (      )                       1    •    Sedue•   SA•    •    +Sedue•                  ...
••    •              (       )    • n-gram(q-gram)•                       4
Suffix Array••• n-gram  ••                5
Suffix(                )                   0:   mississippi                   1:   ississippi                   2:   ssissi...
Suffix Array 0:   mississippi       10:   i 1:   ississippi         7:   ippi 2:   ssissippi          4:   issippi 3:   sis...
10:   i 7:   ippi          •   mississippi    ’si’ 4:   issippi 1:   ississippi    •   ’si’ 0:   mississippi 9:   pi      ...
10:   i             SA[i]: 7:   ippi 4:   issippi       10 7 4 1 0 9 8 6 3 5 2 1:   ississippi     T[i]: 0:   mississippi ...
(1/3)T[i]:   1    2         3   ...        n                            SA            SA[i]             10
(2/3)RedBull            !!1. RedBull                             *2    RedBull        SA[i]          2.    RedBull     1  ...
(3/3)3.     RedBull      1          2      3        ...          n4.(     1, 3), (       2, 4), (          3, 2),...,(   n...
•       SA    •                  +    •        /n-gram• SA•               13
SA•                        (n-gram        )    ••               n-gram•    •        •   “THIS IS IT”    •   proximity     ...
SA•    •    ••    •   HDD    •         (        )•    •                  15
••                                   (   )    •   SAIS    ••   HDD    •                                   (dc3, dc7)•   Se...
•                         (             )    •    •        •    1       100GB/day•   Sedue    •   SA                n-gram...
HDD•               HDD•                     OK    •    ••   SSD    •   SSD•   Sedue              20       (80MB)    •   SA...
VS1.   SA     •2.     •    SSD+              5003.     •    O(N)       CPU4.     ••           malloc                      ...
•   Sedue   1                  56    •           : 40    •             : 16          (UTF-16)    •                        ...
SA•    •            4(+1)        •   2-gram    •        •                %        OK    ••   ”        ”    •              ...
•    •    ••    •        22
: groonga• Sedue   groonga  ••• Sedue       groonga!!                    23
:•••              (http://jubat.us/)    •   http://github.com/jubatus    •   @JubatusOfficial•                    with NTT ...
: Fluentd•            Ruby• Treasure Data, Inc.  • @frsyuki, @kzk_mover• Solr• gem install fluentd• Visit http://fluentd.org...
•    26
Upcoming SlideShare
Loading in …5
×

Suffix Array@Solr勉強会

3,864 views

Published on

第7回Solr勉強会での発表資料。

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,864
On SlideShare
0
From Embeds
0
Number of Embeds
1,124
Actions
Shares
0
Downloads
22
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Suffix Array@Solr勉強会

    1. 1. Suffix ArraySolr 2011/12/19 1
    2. 2. • (@nobu_k)• Preferred Infrastructure (PFI FI) • •• Sedue(2 ) 2
    3. 3. Suffix Array• Suffix Array(SA):• ( ) 1 • Sedue• SA• • +Sedue• ” - ” 3
    4. 4. •• • ( ) • n-gram(q-gram)• 4
    5. 5. Suffix Array••• n-gram •• 5
    6. 6. Suffix( ) 0: mississippi 1: ississippi 2: ssissippi 3: sissippimississippi 4: issippi 5: ssippi 6: sippi 7: ippi 8: ppi 9: pi 10: i 6
    7. 7. Suffix Array 0: mississippi 10: i 1: ississippi 7: ippi 2: ssissippi 4: issippi 3: sissippi 1: ississippi 4: issippi 0: mississippi 5: ssippi 9: pi 6: sippi 8: ppi 7: ippi 6: sippi 8: ppi 3: sissippi 9: pi 5: ssippi10: i 2: ssissippi 7
    8. 8. 10: i 7: ippi • mississippi ’si’ 4: issippi 1: ississippi • ’si’ 0: mississippi 9: pi • 8: ppi 6: 3: sippi sissippi • 5: ssippi 2: ssissippi • 3 6 8
    9. 9. 10: i SA[i]: 7: ippi 4: issippi 10 7 4 1 0 9 8 6 3 5 2 1: ississippi T[i]: 0: mississippi 9: pi m i s s i s s i p p i 8: ppi 6: sippi 3: sissippi 6 5: ssippi T[SA[6]] 2: ssissippi → T[8] → “ppi” 9
    10. 10. (1/3)T[i]: 1 2 3 ... n SA SA[i] 10
    11. 11. (2/3)RedBull !!1. RedBull *2 RedBull SA[i] 2. RedBull 1 2 3 ... n 11
    12. 12. (3/3)3. RedBull 1 2 3 ... n4.( 1, 3), ( 2, 4), ( 3, 2),...,( n, 2) 12
    13. 13. • SA • + • /n-gram• SA• 13
    14. 14. SA• (n-gram ) •• n-gram• • • “THIS IS IT” • proximity 14
    15. 15. SA• • •• • HDD • ( )• • 15
    16. 16. •• ( ) • SAIS •• HDD • (dc3, dc7)• Sedue Haskell C++ • @tanakh++ 16
    17. 17. • ( ) • • • 1 100GB/day• Sedue • SA n-gram • n-gram • SA n-gram • 17
    18. 18. HDD• HDD• OK • •• SSD • SSD• Sedue 20 (80MB) • SA[i] 18
    19. 19. VS1. SA •2. • SSD+ 5003. • O(N) CPU4. •• malloc 19
    20. 20. • Sedue 1 56 • : 40 • : 16 (UTF-16) • 2 3• • = • • SSD• 20
    21. 21. SA• • 4(+1) • 2-gram • • % OK •• ” ” • 21
    22. 22. • • •• • 22
    23. 23. : groonga• Sedue groonga ••• Sedue groonga!! 23
    24. 24. :••• (http://jubat.us/) • http://github.com/jubatus • @JubatusOfficial• with NTT PF 24
    25. 25. : Fluentd• Ruby• Treasure Data, Inc. • @frsyuki, @kzk_mover• Solr• gem install fluentd• Visit http://fluentd.org/doc/ now!! 25
    26. 26. • 26

    ×