•↓                        ↓                • http://code.google.com/p/fujene/                                             ...
P2P distributed-search                        engine                       Yusuke FUJISAKA2011   5   30
•[       ]                    (→   )                •                             32011   5   30
•                •                •                •                • etc...                           42011   5   30
1.GB( TB)                2.                3.                4.                            52011   5   30
• Namazu                • Senna                • Lucene                • Solr                • Hyper Estraier             ...
•                    •   …                •                    •                            72011   5   30
Fujene(     )                •       :                    •                    • P2P                    •                 ...
•              →FARE system                    • Fast       →   (        )                    • Autonomous →              ...
•                •                •                •                •                •                    102011   5   30
•                •                    112011   5   30
•                                      Content    =   Title                                      Content    =   Body      ...
•                     IP                •                    10.0.1.5                               Fujene --secondary 10....
Existing node           New node2011   5   30
F                                A                                    B                E                            C     ...
F                                A                                    B                E                            C     ...
F                                A                                    B                E                            C     ...
Replicate chain                • Chord chain[1]        :                 •                 • Consistent hash[2]           ...
Chord chain                        F                                    A                                        B        ...
Chord chain                         F                   Hash: 0xEF459AB...                                            A   ...
Topic:                •1                •        12011   5   30
:                             1           Node 1: 56%                                 Node 2:                             ...
:                             1           Node 1: 56%                                 Node 2:                             ...
:                             2                                                   3                             Node 1 56%...
:                             2                                                   3                             Node 1 56%...
Indexing                •                •                •              (         )                    •                 ...
Indexing                •                •2011   5   30
Indexing           ID: 12345          F            Title: ...                              A       Content            Body...
Indexing                        F                                       A       Content                                   ...
Indexing                        F                                              A                                          ...
Skip pointer             …                 Dictionary      …                Invert index     …                           I...
(8) Lookup                Skip pointer                                        …                                           ...
•                                  :                    • Contents ... (1) (7)                    • Dictionary ... (8) (12...
Searching                •                    →                •                    →                •                    ...
Searching                        F   Term                                   Analyze                            Term       ...
Searching                              F                                                      A   Intersection            ...
Searching                        F                          Output                                         A              ...
Query                Skip pointer                       …                 Dictionary                …                Inver...
•                •2011   5   30
F                        beacon       A                                         B                E                        ...
“live”                    F                                         A                                             B       ...
F         ×                                          A                        × ×                                         ...
6                        F         ×                                                  A                5           × ×    ...
A   B   C   D   E   F                4   5   6   1   2   3                5   6   1   2   3   4                6   1   2  ...
Future work                •                    •                    •                        •                    • Web a...
Topic:                         412011   5   30
Topic:                    Index Server   Search Server                          Node Manager /                           S...
Topic: Intersection                •            : r1, r2, ..., rn   O(∑ r)                •                r1 1 4   6 10 1...
Topic: Intersection                1.                2.                     2.1.                     2.2.                r...
Topic:                 MemoryBlockPool           withdraw                             deposit                             ...
Bibliography(1)                (1) I. Stoica, et al.; Chord: A Scalable Peer-                  to-peer Lookup Service for ...
Bibliography(2)                (3) C. D. Manning, et al.; An Introduction to                  Information Retrieval; Cambr...
Upcoming SlideShare
Loading in …5
×

P2p search engine

2,027 views
1,877 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,027
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

P2p search engine

  1. 1. •↓ ↓ • http://code.google.com/p/fujene/ 12011 5 30
  2. 2. P2P distributed-search engine Yusuke FUJISAKA2011 5 30
  3. 3. •[ ] (→ ) • 32011 5 30
  4. 4. • • • • • etc... 42011 5 30
  5. 5. 1.GB( TB) 2. 3. 4. 52011 5 30
  6. 6. • Namazu • Senna • Lucene • Solr • Hyper Estraier • ... 62011 5 30
  7. 7. • • … • • 72011 5 30
  8. 8. Fujene( ) • : • • P2P • • • 82011 5 30
  9. 9. • →FARE system • Fast → ( ) • Autonomous → • Retrieval → • Engine → • system 92011 5 30
  10. 10. • • • • • • 102011 5 30
  11. 11. • • 112011 5 30
  12. 12. • Content = Title Content = Body Appendix = ID Appendix = URL Fujene --primary SettingFile2011 5 30
  13. 13. • IP • 10.0.1.5 Fujene --secondary 10.0.1.52011 5 30
  14. 14. Existing node New node2011 5 30
  15. 15. F A B E C D2011 5 30
  16. 16. F A B E C D2011 5 30
  17. 17. F A B E C D2011 5 30
  18. 18. Replicate chain • Chord chain[1] : • • Consistent hash[2] •2011 5 30
  19. 19. Chord chain F A B E C D2011 5 30
  20. 20. Chord chain F Hash: 0xEF459AB... A B E C D2011 5 30
  21. 21. Topic: •1 • 12011 5 30
  22. 22. : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  23. 23. : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  24. 24. : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  25. 25. : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  26. 26. Indexing • • • ( ) • : Sen(=MeCab) • : Bi-gram, Uni-gram •2011 5 30
  27. 27. Indexing • •2011 5 30
  28. 28. Indexing ID: 12345 F Title: ... A Content Body: ... RPC/API( ) Term URL: ... Term B E C D2011 5 30
  29. 29. Indexing F A Content B Hash Hash E C D2011 5 30
  30. 30. Indexing F A B E (replication=3) C D2011 5 30
  31. 31. Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix …2011 5 30
  32. 32. (8) Lookup Skip pointer … (12) (9) Lookup Dictionary (10) … (16) (11) (13) Lookup Invert index (14) … (15) (2) Lookup Skip pointer … (7) (3) Lookup Content (4) Appendix (1) … (5) (6)2011 5 30
  33. 33. • : • Contents ... (1) (7) • Dictionary ... (8) (12) • Invert index ... (13) (16) • 282011 5 30
  34. 34. Searching • → • → • 292011 5 30
  35. 35. Searching F Term Analyze Term A Query Term B E C D2011 5 30
  36. 36. Searching F A Intersection ID: 12, 24, 35, 49, ... ID: 12, 30, 49, 55, ... B E ID: 7, 12, 30, 49, ... C D2011 5 30
  37. 37. Searching F Output A ID: 12 ID: 49 B E C D2011 5 30
  38. 38. Query Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix … Output Output2011 5 30
  39. 39. • •2011 5 30
  40. 40. F beacon A B E C D2011 5 30
  41. 41. “live” F A B E C D2011 5 30
  42. 42. F × A × × B E C D2011 5 30
  43. 43. 6 F × A 5 × × 1 B E 2 C 4 D 32011 5 30
  44. 44. A B C D E F 4 5 6 1 2 3 5 6 1 2 3 4 6 1 2 3 4 5 3 4 52011 5 30
  45. 45. Future work • • • • • Web app • • 402011 5 30
  46. 46. Topic: 412011 5 30
  47. 47. Topic: Index Server Search Server Node Manager / Search Gather Store/Lookup, Query Parser Memory/Disk Blocks 422011 5 30
  48. 48. Topic: Intersection • : r1, r2, ..., rn O(∑ r) • r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  49. 49. Topic: Intersection 1. 2. 2.1. 2.2. r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  50. 50. Topic: MemoryBlockPool withdraw deposit … Skip Pointer Invert Index Content 452011 5 30
  51. 51. Bibliography(1) (1) I. Stoica, et al.; Chord: A Scalable Peer- to-peer Lookup Service for Internet Applications; SIGCOMM 2001; October 2001 (2) D. Karger, et al.; Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web; STOC ’97; 1997 462011 5 30
  52. 52. Bibliography(2) (3) C. D. Manning, et al.; An Introduction to Information Retrieval; Cambridge UP; 2009 (4) T. Luu, et al.; ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine; P2PIR ’06; Nov. 2006 472011 5 30

×