P2p search engine
Upcoming SlideShare
Loading in...5
×
 

P2p search engine

on

  • 1,958 views

 

Statistics

Views

Total Views
1,958
Views on SlideShare
1,955
Embed Views
3

Actions

Likes
0
Downloads
9
Comments
0

2 Embeds 3

http://paper.li 2
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

P2p search engine P2p search engine Presentation Transcript

  • •↓ ↓ • http://code.google.com/p/fujene/ 12011 5 30
  • P2P distributed-search engine Yusuke FUJISAKA2011 5 30
  • •[ ] (→ ) • 32011 5 30
  • • • • • • etc... 42011 5 30
  • 1.GB( TB) 2. 3. 4. 52011 5 30
  • • Namazu • Senna • Lucene • Solr • Hyper Estraier • ... 62011 5 30
  • • • … • • 72011 5 30
  • Fujene( ) • : • • P2P • • • 82011 5 30
  • • →FARE system • Fast → ( ) • Autonomous → • Retrieval → • Engine → • system 92011 5 30
  • • • • • • • 102011 5 30
  • • • 112011 5 30
  • • Content = Title Content = Body Appendix = ID Appendix = URL Fujene --primary SettingFile2011 5 30
  • • IP • 10.0.1.5 Fujene --secondary 10.0.1.52011 5 30
  • Existing node New node2011 5 30
  • F A B E C D2011 5 30
  • F A B E C D2011 5 30
  • F A B E C D2011 5 30
  • Replicate chain • Chord chain[1] : • • Consistent hash[2] •2011 5 30
  • Chord chain F A B E C D2011 5 30
  • Chord chain F Hash: 0xEF459AB... A B E C D2011 5 30
  • Topic: •1 • 12011 5 30
  • : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  • : 1 Node 1: 56% Node 2: 20% 2 Node 3: 24% 32011 5 30
  • : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  • : 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 3 2 1 3 2 1 1 2 2 12011 5 30
  • Indexing • • • ( ) • : Sen(=MeCab) • : Bi-gram, Uni-gram •2011 5 30
  • Indexing • •2011 5 30
  • Indexing ID: 12345 F Title: ... A Content Body: ... RPC/API( ) Term URL: ... Term B E C D2011 5 30
  • Indexing F A Content B Hash Hash E C D2011 5 30
  • Indexing F A B E (replication=3) C D2011 5 30
  • Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix …2011 5 30
  • (8) Lookup Skip pointer … (12) (9) Lookup Dictionary (10) … (16) (11) (13) Lookup Invert index (14) … (15) (2) Lookup Skip pointer … (7) (3) Lookup Content (4) Appendix (1) … (5) (6)2011 5 30
  • • : • Contents ... (1) (7) • Dictionary ... (8) (12) • Invert index ... (13) (16) • 282011 5 30
  • Searching • → • → • 292011 5 30
  • Searching F Term Analyze Term A Query Term B E C D2011 5 30
  • Searching F A Intersection ID: 12, 24, 35, 49, ... ID: 12, 30, 49, 55, ... B E ID: 7, 12, 30, 49, ... C D2011 5 30
  • Searching F Output A ID: 12 ID: 49 B E C D2011 5 30
  • Query Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix … Output Output2011 5 30
  • • •2011 5 30
  • F beacon A B E C D2011 5 30
  • “live” F A B E C D2011 5 30
  • F × A × × B E C D2011 5 30
  • 6 F × A 5 × × 1 B E 2 C 4 D 32011 5 30
  • A B C D E F 4 5 6 1 2 3 5 6 1 2 3 4 6 1 2 3 4 5 3 4 52011 5 30
  • Future work • • • • • Web app • • 402011 5 30
  • Topic: 412011 5 30
  • Topic: Index Server Search Server Node Manager / Search Gather Store/Lookup, Query Parser Memory/Disk Blocks 422011 5 30
  • Topic: Intersection • : r1, r2, ..., rn O(∑ r) • r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  • Topic: Intersection 1. 2. 2.1. 2.2. r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47 r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50 r3 4 5 6 10 11 12 23 27 30 37 39 41 43 472011 5 30
  • Topic: MemoryBlockPool withdraw deposit … Skip Pointer Invert Index Content 452011 5 30
  • Bibliography(1) (1) I. Stoica, et al.; Chord: A Scalable Peer- to-peer Lookup Service for Internet Applications; SIGCOMM 2001; October 2001 (2) D. Karger, et al.; Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web; STOC ’97; 1997 462011 5 30
  • Bibliography(2) (3) C. D. Manning, et al.; An Introduction to Information Retrieval; Cambridge UP; 2009 (4) T. Luu, et al.; ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine; P2PIR ’06; Nov. 2006 472011 5 30