Your SlideShare is downloading. ×
  • Like
C. Yang (Oct. 30): Searching in Peer-to-Peer Networks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

C. Yang (Oct. 30): Searching in Peer-to-Peer Networks

  • 399 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
399
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Searching In Peer-To-Peer Networks Chunlin Yang
  • 2. What’s P2P - Unofficial Definition • All of the computers in the network are equal • Each computer functions as a client as well as a server with no administrator • User on each computer decides what data on their computer will be shared on the network
  • 3. What’s P2P – Continue • To share huge volumes of data among peers in the network • No dedicated servers or hierarchy among the computers in the network • Examples: Gnutella, Freenet, and Napster
  • 4. Why P2P • Three internet fundamental assets: information, bandwidth, storage space • Increasing amount of information, find useful information in real time is increasingly difficult • Bandwidth: more have been done, however hot sites like Yahoo, eBay get more and more traffic bottleneck
  • 5. Why P2P - Continue • Computing resource: processors speed increase and storage device capacity get bigger, but data center accumulate more and more computation tasks • P2P networking can greatly improve the utilization of the internet resources
  • 6. Why P2P - Continue • Load balance traffic to reduce the peak load on network • Increase reliability and fault tolerance of the global system • Fault tolerance for server down time, such as email delivery or slice big email package to small packets and transfer through multi- path.
  • 7. Basic Searching Algorithms • Gnutella: BFS • Freenet: DFS • Napster: Index Server
  • 8. Basic Search Algorithm Gnutella • Each node of the network simultaneously acts as a client as well as a server • Conducts searching while listening for incoming queries • Completely decentralized, every node is equal
  • 9. Basic Searching Algorithm Gnutella - Continue • A node send query to all its neighbors and each neighbor searches in its own resource and forward the message to all it’s own neighbors • If a query is satisfied, a response will be sent back to the original requester using the reverse path
  • 10. Basic Searching Algorithm Gnutella - Continue • Queries are assigned GUIDs to avoid repetition • Use a TTL of 7 (about 10000 nodes) to not congest the network • Problem: can be cyclical, and cause excessive traffic
  • 11. Basic Searching Algorithm Freenet • Cooperative file distribution to improve documentation distribution efficiency by sharing bandwidth and disk • Each file has a unique id and its locations • Network of equal nodes, each acting as client and server
  • 12. Basic Searching Algorithm Freenet - Continue • Information stored on hosts under searchable keys • Uses a depth-first search with depth limit D. Each node forwards the query to a single neighbor, and waits for a definite response from the neighbor • If the query was not satisfied, the neighbor forwards the query to another neighbor
  • 13. Basic Searching Algorithm Freenet - Continue • If the query was satisfied, the response will be sent back to the query source using the reverse path • Each node along the path copies data to its own database as well • More popular information becomes easier to access
  • 14. Basic Searching Algorithm Napster • Centralized server has information of online users and songs location in database for quick search • Client use peer-to-peer file transfer when a location of a song found from server • Legal problem: ignores copyright • Problem: same issue for client-server bottleneck and if the index server down
  • 15. Improving Search Algorithms In Peer-to-Peer Network • Iterative Deepening • Directed BFS • Local Indices • Routing Indices • NEVRLATE
  • 16. Iterative Deepening • Multiple breadth-first searches initiated with successively larger depth limits, until the query is satisfied or the • Maximum depth has been reached. • Example: policy P(a,b,c) first depth a, second depth b, and third depth c.
  • 17. Iterative Deepening - Continue • A Source mode S first initiates a BFS of depth a, When a node at depth a receives and process the query, it will store the query temporarily • All messages frozen at nodes of a hops from the source • S receives response messages from nodes that have processes the query
  • 18. Iterative Deepening - Continue • After a time period of predefined W, if the query has been satisfied, S does nothing • Otherwise S starts another round of iteration by initiating a BFS of depth b • S send a resend message of TTL of a, all node will only forward the resend message until to nodes at a hops
  • 19. Iterative Deepening - Continue • A node at hop a will drop the resend message and unfreeze the corresponding query by forwarding the query to all its neighbor with a TTL of b-a • When message reach to node of hop b, the process continues in a similar fashion • When process to level c, query will not be frozen, S will not initiate another iteration even the query is not satisfied. Problem ?
  • 20. Directed BFS • A node sends query to a subset of its neighbors that could return many results for minimum response time • A node maintains simple statistics on its neighbors for past queries or the latency of the connection with that neighbor • From these statistics, some rules can be used to pick up a node to send a query:
  • 21. Directed BFS - Continue • Neighbors that has returned highest number of results for previous queries • Neighbors that returns response message having the lowest average number of hops • Neighbors that has forward the largest number of message • Neighbors that has the shortest message queue
  • 22. Local Indices • Each node n maintains an index over the data of all nodes within r hops of itself • r is a system-wide variable known as the radius of the index • When receive a query, a node can process it on behalf of every node with in r hops, data can be searched on fewer nodes to reduce the cost while keep the satisfaction
  • 23. Local Indices - Continue • A system-wide policy specifies the depths at which the query should be processed • All nodes at the depths not listed in the policy simply forward the query • Example P(1,5), Only nodes with a depths of 1 and 5 process the query while nodes at other depth just forward the query, • Reason: Each node has information of its neighbors within 4 hops.
  • 24. Routing Indices • To allow a node to select the “best” neighbors to send a query to, • Routing Indices is a data structure and associated algorithms that, given a query, returns a list of neighbors, ranked according to their goodness for the query, • The goodness should in general reflect the number of documents in nearby nodes.
  • 25. Routing Indices - Continue • Each node has a local index for quickly finding local documents when a query is received. • Nodes also have a Compound Routing Indices containing: • The number of documents along each path, • The number of documents on each topic of interest,
  • 26. Routing Indices Example
  • 27. Routing Indices Example Documents with topics -------------------------------------------------- Path #docs DB N T L A 150 30 20 0 100 B 100 20 0 10 30 C 1000 0 300 0 50 D 200 100 0 100 150
  • 28. Routing Indices - Maintain • When a connection is established between two nodes, they exchange their routing indices, and update its own indices and send message to its neighbors, • When a node I disconnected from the network, node D detected, it will remove the row for I, and send a new routing indices of its own to all its neighbors to update.
  • 29. NEVRLATE • Network-Efficient Vast Resource Lookup At The Edge • Directory servers to be organized into a logical 2-dimensional grid, or a set of sets of servers • Enabling registration in one “horizontal” dimension and • Lookup in the other “vertical” dimension.
  • 30. NEVRLATE - Continue • Each node is a directory server • Each set of servers, the vertical cloud, can reach each other member of the set • The set of sets of servers is the entire NEVRLATE network.
  • 31. NEVRLATE - Continue
  • 32. NEVRLATE - Continue • Each host register its resource and location to one node of each set • When a query comes, only one set need to be searched to get all location containing the satisfied information • Can also register to two nodes in each sets for fault tolerance
  • 33. Extension • Total rank of neighbor’s : weighed sum of all key ranks • Assumption: high rank nodes should always be better to access or close to resource • Dominating-set mark process: rule1/rule2, when remove a node from the DS, choose the one with less rank instead of uid
  • 34. Extension - Continue • Based on Mark Process (Wu & Li), the connected dominating set nodes will have relatively higher connectivity than non-DS nodes. • The dominating set nodes need to have resource information and location of resource for their neighbor nodes. • When search, request will be sent only to DS nodes to reduce cost and traffic while keep satisfactions.
  • 35. Extension - Continue • Clustering: when construct a cluster, choose the one with highest rank instead of lowest uid, choose the node with lowest rank as the gateway – low traffic • Consider not only its own rank but also total ranks of its neighbors • Max-min ranking: when searching, choose max as well as min for the key index rank
  • 36. Extension - Continue • Reason: max could be high traffic, min, low traffic • Networks are dynamic, resources are dynamic, help to re-rank the networks • Example: Glades Rd/Palmetto Park Rd • SW NE
  • 37. Summary