Your SlideShare is downloading. ×
  • Like
P2p search engine
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

P2p search engine

  • 1,703 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,703
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. •↓ ↓ • http://code.google.com/p/fujene/ 12011 5 30
  • 2. P2P distributed-search engine Yusuke FUJISAKA2011 5 30
  • 3. •[ ] (→ ) • 32011 5 30
  • 4. • • • • • etc... 42011 5 30
  • 5. 1.GB( TB) 2. 3. 4. 52011 5 30
  • 6. • Namazu • Senna • Lucene • Solr • Hyper Estraier • ... 62011 5 30
  • 7. • • … • • 72011 5 30
  • 8. Fujene( ) • : • • P2P • • • 82011 5 30
  • 9. • →FARE system • Fast → ( ) • Autonomous → • Retrieval → • Engine → • system 92011 5 30
  • 10. • • • • • • 102011 5 30
  • 11. • • 112011 5 30
  • 12. • Content = Title Content = Body Appendix = ID Appendix = URL Fujene --primary SettingFile2011 5 30
  • 13. • IP • 10.0.1.5 Fujene --secondary 10.0.1.52011 5 30
  • 14. Existing node New node2011 5 30
  • 15. F A B E C D2011 5 30
  • 16. F A B E C D2011 5 30
  • 17. F A B E C D2011 5 30
  • 18. Replicate chain • Chord chain[1] : • • Consistent hash[2] •2011 5 30
  • 19. Chord chain F A B E C D2011 5 30
  • 20. Chord chain F Hash: 0xEF459AB... A B E C D2011 5 30
  • 21. Topic: •1 • 12011 5 30
  • 22. : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  • 23. : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  • 24. : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  • 25. : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  • 26. Indexing • • • ( ) • : Sen(=MeCab) • : Bi-gram, Uni-gram •2011 5 30
  • 27. Indexing • •2011 5 30
  • 28. Indexing ID: 12345 F Title: ... A Content Body: ... RPC/API( ) Term URL: ... Term B E C D2011 5 30
  • 29. Indexing F A Content B Hash Hash E C D2011 5 30
  • 30. Indexing F A B E (replication=3) C D2011 5 30
  • 31. Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix …2011 5 30
  • 32. (8) Lookup Skip pointer … (12) (9) Lookup Dictionary (10) … (16) (11) (13) Lookup Invert index (14) … (15) (2) Lookup Skip pointer … (7) (3) Lookup Content (4) Appendix (1) … (5) (6)2011 5 30
  • 33. • : • Contents ... (1) (7) • Dictionary ... (8) (12) • Invert index ... (13) (16) • 282011 5 30
  • 34. Searching • → • → • 292011 5 30
  • 35. Searching F Term Analyze Term A Query Term B E C D2011 5 30
  • 36. Searching F A Intersection ID: 12, 24, 35, 49, ... ID: 12, 30, 49, 55, ... B E ID: 7, 12, 30, 49, ... C D2011 5 30
  • 37. Searching F Output A ID: 12 ID: 49 B E C D2011 5 30
  • 38. Query Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix … Output Output2011 5 30
  • 39. • •2011 5 30
  • 40. F beacon A B E C D2011 5 30
  • 41. “live” F A B E C D2011 5 30
  • 42. F × A × × B E C D2011 5 30
  • 43. 6 F × A 5 × × 1 B E 2 C 4 D 32011 5 30
  • 44. A B C D E F 4 5 6 1 2 3 5 6 1 2 3 4 6 1 2 3 4 5 3 4 52011 5 30
  • 45. Future work • • • • • Web app • • 402011 5 30
  • 46. Topic: 412011 5 30
  • 47. Topic: Index Server Search Server Node Manager / Search Gather Store/Lookup, Query Parser Memory/Disk Blocks 422011 5 30
  • 48. Topic: Intersection • : r1, r2, ..., rn O(∑ r) • r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  • 49. Topic: Intersection 1. 2. 2.1. 2.2. r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  • 50. Topic: MemoryBlockPool withdraw deposit … Skip Pointer Invert Index Content 452011 5 30
  • 51. Bibliography(1) (1) I. Stoica, et al.; Chord: A Scalable Peer- to-peer Lookup Service for Internet Applications; SIGCOMM 2001; October 2001 (2) D. Karger, et al.; Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web; STOC ’97; 1997 462011 5 30
  • 52. Bibliography(2) (3) C. D. Manning, et al.; An Introduction to Information Retrieval; Cambridge UP; 2009 (4) T. Luu, et al.; ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine; P2PIR ’06; Nov. 2006 472011 5 30