Searching In Peer-To-Peer
        Networks
        Chunlin Yang
What’s P2P - Unofficial Definition
 • All of the computers in the network are equal
 • Each computer functions as a client...
What’s P2P – Continue
• To share huge volumes of data among peers
  in the network
• No dedicated servers or hierarchy amo...
Why P2P
• Three internet fundamental assets: information,
  bandwidth, storage space
• Increasing amount of information, f...
Why P2P - Continue
• Computing resource: processors speed
  increase and storage device capacity get
  bigger, but data ce...
Why P2P - Continue
• Load balance traffic to reduce the peak
  load on network
• Increase reliability and fault tolerance ...
Basic Searching Algorithms

 • Gnutella: BFS
 • Freenet: DFS
 • Napster: Index Server
Basic Search Algorithm
Gnutella

• Each node of the network simultaneously acts
  as a client as well as a server
• Conduc...
Basic Searching Algorithm
Gnutella - Continue
• A node send query to all its neighbors and
  each neighbor searches in its...
Basic Searching Algorithm
Gnutella - Continue
   • Queries are assigned GUIDs to avoid
     repetition
   • Use a TTL of 7...
Basic Searching Algorithm
Freenet
• Cooperative file distribution to improve
  documentation distribution efficiency by
  ...
Basic Searching Algorithm
Freenet - Continue
• Information stored on hosts under searchable
  keys
• Uses a depth-first se...
Basic Searching Algorithm
Freenet - Continue
• If the query was satisfied, the response
  will be sent back to the query s...
Basic Searching Algorithm
Napster
• Centralized server has information of
  online users and songs location in database
  ...
Improving Search Algorithms In
Peer-to-Peer Network
 •   Iterative Deepening
 •   Directed BFS
 •   Local Indices
 •   Rou...
Iterative Deepening
    • Multiple breadth-first searches initiated
      with successively larger depth limits,
      unt...
Iterative Deepening - Continue
• A Source mode S first initiates a BFS of depth
  a, When a node at depth a receives and
 ...
Iterative Deepening - Continue
• After a time period of predefined W, if the
  query has been satisfied, S does nothing
• ...
Iterative Deepening - Continue
  • A node at hop a will drop the resend message
    and unfreeze the corresponding query b...
Directed BFS
 • A node sends query to a subset of its
   neighbors that could return many results
   for minimum response ...
Directed BFS - Continue
• Neighbors that has returned highest number
  of results for previous queries
• Neighbors that re...
Local Indices
• Each node n maintains an index over the
  data of all nodes within r hops of itself
• r is a system-wide v...
Local Indices - Continue
• A system-wide policy specifies the depths at
  which the query should be processed
• All nodes ...
Routing Indices
  • To allow a node to select the “best”
    neighbors to send a query to,
  • Routing Indices is a data s...
Routing Indices - Continue
  • Each node has a local index for quickly
    finding local documents when a query is
    rec...
Routing Indices Example
Routing Indices Example

                          Documents with topics
    ---------------------------------------------...
Routing Indices - Maintain
 • When a connection is established between
   two nodes, they exchange their routing
   indice...
NEVRLATE
 • Network-Efficient Vast Resource Lookup
   At The Edge
 • Directory servers to be organized into a
   logical 2...
NEVRLATE - Continue

 • Each node is a directory server
 • Each set of servers, the vertical cloud,
   can reach each othe...
NEVRLATE - Continue
NEVRLATE - Continue
• Each host register its resource and location
  to one node of each set
• When a query comes, only on...
Extension
• Total rank of neighbor’s : weighed sum of all
  key ranks
• Assumption: high rank nodes should always
  be bet...
Extension - Continue
• Based on Mark Process (Wu & Li), the connected
  dominating set nodes will have relatively higher
 ...
Extension - Continue
• Clustering: when construct a cluster, choose
  the one with highest rank instead of lowest
  uid, c...
Extension - Continue
 • Reason: max could be high traffic, min,
   low traffic
 • Networks are dynamic, resources are
   d...
Summary
Upcoming SlideShare
Loading in …5
×

C. Yang (Oct. 30): Searching in Peer-to-Peer Networks

553 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
553
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

C. Yang (Oct. 30): Searching in Peer-to-Peer Networks

  1. 1. Searching In Peer-To-Peer Networks Chunlin Yang
  2. 2. What’s P2P - Unofficial Definition • All of the computers in the network are equal • Each computer functions as a client as well as a server with no administrator • User on each computer decides what data on their computer will be shared on the network
  3. 3. What’s P2P – Continue • To share huge volumes of data among peers in the network • No dedicated servers or hierarchy among the computers in the network • Examples: Gnutella, Freenet, and Napster
  4. 4. Why P2P • Three internet fundamental assets: information, bandwidth, storage space • Increasing amount of information, find useful information in real time is increasingly difficult • Bandwidth: more have been done, however hot sites like Yahoo, eBay get more and more traffic bottleneck
  5. 5. Why P2P - Continue • Computing resource: processors speed increase and storage device capacity get bigger, but data center accumulate more and more computation tasks • P2P networking can greatly improve the utilization of the internet resources
  6. 6. Why P2P - Continue • Load balance traffic to reduce the peak load on network • Increase reliability and fault tolerance of the global system • Fault tolerance for server down time, such as email delivery or slice big email package to small packets and transfer through multi- path.
  7. 7. Basic Searching Algorithms • Gnutella: BFS • Freenet: DFS • Napster: Index Server
  8. 8. Basic Search Algorithm Gnutella • Each node of the network simultaneously acts as a client as well as a server • Conducts searching while listening for incoming queries • Completely decentralized, every node is equal
  9. 9. Basic Searching Algorithm Gnutella - Continue • A node send query to all its neighbors and each neighbor searches in its own resource and forward the message to all it’s own neighbors • If a query is satisfied, a response will be sent back to the original requester using the reverse path
  10. 10. Basic Searching Algorithm Gnutella - Continue • Queries are assigned GUIDs to avoid repetition • Use a TTL of 7 (about 10000 nodes) to not congest the network • Problem: can be cyclical, and cause excessive traffic
  11. 11. Basic Searching Algorithm Freenet • Cooperative file distribution to improve documentation distribution efficiency by sharing bandwidth and disk • Each file has a unique id and its locations • Network of equal nodes, each acting as client and server
  12. 12. Basic Searching Algorithm Freenet - Continue • Information stored on hosts under searchable keys • Uses a depth-first search with depth limit D. Each node forwards the query to a single neighbor, and waits for a definite response from the neighbor • If the query was not satisfied, the neighbor forwards the query to another neighbor
  13. 13. Basic Searching Algorithm Freenet - Continue • If the query was satisfied, the response will be sent back to the query source using the reverse path • Each node along the path copies data to its own database as well • More popular information becomes easier to access
  14. 14. Basic Searching Algorithm Napster • Centralized server has information of online users and songs location in database for quick search • Client use peer-to-peer file transfer when a location of a song found from server • Legal problem: ignores copyright • Problem: same issue for client-server bottleneck and if the index server down
  15. 15. Improving Search Algorithms In Peer-to-Peer Network • Iterative Deepening • Directed BFS • Local Indices • Routing Indices • NEVRLATE
  16. 16. Iterative Deepening • Multiple breadth-first searches initiated with successively larger depth limits, until the query is satisfied or the • Maximum depth has been reached. • Example: policy P(a,b,c) first depth a, second depth b, and third depth c.
  17. 17. Iterative Deepening - Continue • A Source mode S first initiates a BFS of depth a, When a node at depth a receives and process the query, it will store the query temporarily • All messages frozen at nodes of a hops from the source • S receives response messages from nodes that have processes the query
  18. 18. Iterative Deepening - Continue • After a time period of predefined W, if the query has been satisfied, S does nothing • Otherwise S starts another round of iteration by initiating a BFS of depth b • S send a resend message of TTL of a, all node will only forward the resend message until to nodes at a hops
  19. 19. Iterative Deepening - Continue • A node at hop a will drop the resend message and unfreeze the corresponding query by forwarding the query to all its neighbor with a TTL of b-a • When message reach to node of hop b, the process continues in a similar fashion • When process to level c, query will not be frozen, S will not initiate another iteration even the query is not satisfied. Problem ?
  20. 20. Directed BFS • A node sends query to a subset of its neighbors that could return many results for minimum response time • A node maintains simple statistics on its neighbors for past queries or the latency of the connection with that neighbor • From these statistics, some rules can be used to pick up a node to send a query:
  21. 21. Directed BFS - Continue • Neighbors that has returned highest number of results for previous queries • Neighbors that returns response message having the lowest average number of hops • Neighbors that has forward the largest number of message • Neighbors that has the shortest message queue
  22. 22. Local Indices • Each node n maintains an index over the data of all nodes within r hops of itself • r is a system-wide variable known as the radius of the index • When receive a query, a node can process it on behalf of every node with in r hops, data can be searched on fewer nodes to reduce the cost while keep the satisfaction
  23. 23. Local Indices - Continue • A system-wide policy specifies the depths at which the query should be processed • All nodes at the depths not listed in the policy simply forward the query • Example P(1,5), Only nodes with a depths of 1 and 5 process the query while nodes at other depth just forward the query, • Reason: Each node has information of its neighbors within 4 hops.
  24. 24. Routing Indices • To allow a node to select the “best” neighbors to send a query to, • Routing Indices is a data structure and associated algorithms that, given a query, returns a list of neighbors, ranked according to their goodness for the query, • The goodness should in general reflect the number of documents in nearby nodes.
  25. 25. Routing Indices - Continue • Each node has a local index for quickly finding local documents when a query is received. • Nodes also have a Compound Routing Indices containing: • The number of documents along each path, • The number of documents on each topic of interest,
  26. 26. Routing Indices Example
  27. 27. Routing Indices Example Documents with topics -------------------------------------------------- Path #docs DB N T L A 150 30 20 0 100 B 100 20 0 10 30 C 1000 0 300 0 50 D 200 100 0 100 150
  28. 28. Routing Indices - Maintain • When a connection is established between two nodes, they exchange their routing indices, and update its own indices and send message to its neighbors, • When a node I disconnected from the network, node D detected, it will remove the row for I, and send a new routing indices of its own to all its neighbors to update.
  29. 29. NEVRLATE • Network-Efficient Vast Resource Lookup At The Edge • Directory servers to be organized into a logical 2-dimensional grid, or a set of sets of servers • Enabling registration in one “horizontal” dimension and • Lookup in the other “vertical” dimension.
  30. 30. NEVRLATE - Continue • Each node is a directory server • Each set of servers, the vertical cloud, can reach each other member of the set • The set of sets of servers is the entire NEVRLATE network.
  31. 31. NEVRLATE - Continue
  32. 32. NEVRLATE - Continue • Each host register its resource and location to one node of each set • When a query comes, only one set need to be searched to get all location containing the satisfied information • Can also register to two nodes in each sets for fault tolerance
  33. 33. Extension • Total rank of neighbor’s : weighed sum of all key ranks • Assumption: high rank nodes should always be better to access or close to resource • Dominating-set mark process: rule1/rule2, when remove a node from the DS, choose the one with less rank instead of uid
  34. 34. Extension - Continue • Based on Mark Process (Wu & Li), the connected dominating set nodes will have relatively higher connectivity than non-DS nodes. • The dominating set nodes need to have resource information and location of resource for their neighbor nodes. • When search, request will be sent only to DS nodes to reduce cost and traffic while keep satisfactions.
  35. 35. Extension - Continue • Clustering: when construct a cluster, choose the one with highest rank instead of lowest uid, choose the node with lowest rank as the gateway – low traffic • Consider not only its own rank but also total ranks of its neighbors • Max-min ranking: when searching, choose max as well as min for the key index rank
  36. 36. Extension - Continue • Reason: max could be high traffic, min, low traffic • Networks are dynamic, resources are dynamic, help to re-rank the networks • Example: Glades Rd/Palmetto Park Rd • SW NE
  37. 37. Summary

×