Ameba




2010   4   23           1
•
                •
                •


2010   4   23       2
2010   4   23   3
2010   4   23   4
2010   4   23   5
✓
                -
                -
                ✓
                ✓




2010   4   23       6
suggest

                1.


                2.
                3.




2010   4   23             7
JavaScript
                    suggest.js   jquery suggest




                -
                -


2010   4   23        ...
-              exact match
                    -      →

                -              predictive match
                 ...
2010   4   23   10
tea




2010   4   23         11
te




                     tea,ted,ten




2010   4   23                      12
inn




                in,inn




2010   4   23                  13
Trie

                •   Double Array Trie Tree
                    - Darts
                      http://chasen.org/~taku...
[J Aoe 1989 ] An Efficient Digital Search Algorithm by Using a Double-Array Structure



                                  ...
•
                •




2010   4   23       16
•

                •


                •
                                                  BC[x].BASE +
                  ...
•

                •


                •
                                                  BC[x].BASE +
                  ...
http://chasen.org/~taku/software/darts/

                                     Double Aray Trie
                LGPL       ...
[O’Neil Delpratt 2006 ] Engineering the LOUDS Succinct Tree Representation



                LOUDS→Level-Order Unary Degr...
http://code.google.com/p/tx-trie/

                                         LOUDS
                BSD

                Goo...
Darts          linux.words
                    ALGOL
                    ANSI
                    ARCO
                   ...
•
                           Trie


                •   Trie          LOUDS
                                     tx


    ...
•
                •




2010   4   23       24
•

                    -
                    -   Jaro-Winkler

                •                      x   y /(|x|*|y|)

  ...
•

                    -
                    -   Jaro-Winkler

                •                      x   y /(|x|*|y|)

  ...
•   Oluolu              (mixi)


                    http://alpha.mixi.co.jp/blog/?p=1425

                •   Lucene


  ...
org.unigram.oluolu.rqe.RelatedQueryExtractionReducer.java




2010   4   23                                               ...
org.apache.lucene.spell.SpellCheker.java




2010   4   23                                              29
1.
                     -   Hadoop
                2.
                     -   PPJoin+
                3.



             ...
[Chuan Xiao+ 2008]Efficient Similarity Joins for Near Duplicate Detection



                Similarity Join




          ...
2010   4   23   32
2010   4   23   33
•



                •   PPJoin+


                •



2010   4   23                 34
✓
                -
                -
                ✓
                ✓




2010   4   23       35
Trie
                -              Trie
                -   Double Array Trie
                -   LOUDS        ... tx rx
...
http://nd-ilab.jp/suggest/sample/sada/index.html




2010   4   23                                                      37
http://nd-ilab.jp/suggest/sample/talent/index.html




2010   4   23                                                      ...
http://nd-ilab.jp/suggest/sample/search/index.html
                Ameba




2010   4   23                                ...
http://nd-ilab.jp/suggest/sample/search/index.html
                Ameba


                                     Ameba     ...
✓
                -
                -
                ✓
                ✓




2010   4   23       41
-
                -




                    or API


2010   4   23                42
2010   4   23   43
Upcoming SlideShare
Loading in …5
×

Amebaサーチのデータを用いた応用

1,873 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,873
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Amebaサーチのデータを用いた応用

  1. 1. Ameba 2010 4 23 1
  2. 2. • • • 2010 4 23 2
  3. 3. 2010 4 23 3
  4. 4. 2010 4 23 4
  5. 5. 2010 4 23 5
  6. 6. ✓ - - ✓ ✓ 2010 4 23 6
  7. 7. suggest 1. 2. 3. 2010 4 23 7
  8. 8. JavaScript suggest.js jquery suggest - - 2010 4 23 8
  9. 9. - exact match - → - predictive match - → 03 ... - common prefix search - → ... ※Key Value Store exact match 2010 4 23 9
  10. 10. 2010 4 23 10
  11. 11. tea 2010 4 23 11
  12. 12. te tea,ted,ten 2010 4 23 12
  13. 13. inn in,inn 2010 4 23 13
  14. 14. Trie • Double Array Trie Tree - Darts http://chasen.org/~taku/software/darts/ • LOUDS - tx http://code.google.com/p/tx-trie/ 2010 4 23 14
  15. 15. [J Aoe 1989 ] An Efficient Digital Search Algorithm by Using a Double-Array Structure (BASE/CHECK) Trie 2010 4 23 15
  16. 16. • • 2010 4 23 16
  17. 17. • • • BC[x].BASE + CODE[c] = y BC[y].CHECK = x • BASE=( 2010 4 23 17
  18. 18. • • • BC[x].BASE + CODE[c] = y BC[y].CHECK = x • BASE=( 2010 4 23 18
  19. 19. http://chasen.org/~taku/software/darts/ Double Aray Trie LGPL BSD MeCab ChaSen 2010 4 23 19
  20. 20. [O’Neil Delpratt 2006 ] Engineering the LOUDS Succinct Tree Representation LOUDS→Level-Order Unary Degree Sequence - 96bit - LOUDS bit ※http://www-tsujii.is.s.u-tokyo.ac.jp/~hillbig/papers/kvs_okanohara.pptx 2010 4 23 20
  21. 21. http://code.google.com/p/tx-trie/ LOUDS BSD Google tx rx 2010 4 23 21
  22. 22. Darts linux.words ALGOL ANSI ARCO ARPA ARPANET ASCII ※ byte 2010 4 23 22
  23. 23. • Trie • Trie LOUDS tx • 2010 4 23 23
  24. 24. • • 2010 4 23 24
  25. 25. • - - Jaro-Winkler • x y /(|x|*|y|) • x y /(Σxi+Σyi-x*y) 2010 4 23 25
  26. 26. • - - Jaro-Winkler • x y /(|x|*|y|) • x y /(Σxi+Σyi-x*y) 2010 4 23 26
  27. 27. • Oluolu (mixi) http://alpha.mixi.co.jp/blog/?p=1425 • Lucene http://wiki.apache.org/lucene-java/SpellChecker 2010 4 23 27
  28. 28. org.unigram.oluolu.rqe.RelatedQueryExtractionReducer.java 2010 4 23 28
  29. 29. org.apache.lucene.spell.SpellCheker.java 2010 4 23 29
  30. 30. 1. - Hadoop 2. - PPJoin+ 3. ※ 0.8 1:10 2010 4 23 30
  31. 31. [Chuan Xiao+ 2008]Efficient Similarity Joins for Near Duplicate Detection Similarity Join ※ ( PPJoin+ just-do-neet ) http://ameblo.jp/just-do-neet/entry-10317825348.html 2010 4 23 31
  32. 32. 2010 4 23 32
  33. 33. 2010 4 23 33
  34. 34. • • PPJoin+ • 2010 4 23 34
  35. 35. ✓ - - ✓ ✓ 2010 4 23 35
  36. 36. Trie - Trie - Double Array Trie - LOUDS ... tx rx 2010 4 23 36
  37. 37. http://nd-ilab.jp/suggest/sample/sada/index.html 2010 4 23 37
  38. 38. http://nd-ilab.jp/suggest/sample/talent/index.html 2010 4 23 38
  39. 39. http://nd-ilab.jp/suggest/sample/search/index.html Ameba 2010 4 23 39
  40. 40. http://nd-ilab.jp/suggest/sample/search/index.html Ameba Ameba IME JavaScript Trie Common Prefix Search 2010 4 23 40
  41. 41. ✓ - - ✓ ✓ 2010 4 23 41
  42. 42. - - or API 2010 4 23 42
  43. 43. 2010 4 23 43

×