Gnutella & Searching Algorithms in  Unstructured Peer-to-Peer  Networks CS780-3 Lecture Notes In Courtesy of David Bryan
What is Gnutella? <ul><li>Gnutella is a fully decentralized peer-to-peer  protocol  for locating resources </li></ul><ul><...
Overlay Networks and Gnutella <ul><li>A network on top of a network </li></ul><ul><li>Used only for the search aspect of G...
Connecting to the Network <ul><li>When you start the application, need to first connect to a node - bootstrapping </li></u...
A bit about TTL and Hops <ul><li>TTL and Hops are fields in a Gnutella message that are used to control the diameter or ho...
TTL/Hops Example TTL = 7 Hops = 0 TTL = 6 Hops = 1 TTL = 5 Hops = 2 TTL = 4 Hops = 3 TTL = 3 Hops = 4
Horizon example TTL = 1 TTL = 2 TTL = 4 TTL = 3 TTL = 5 TTL = 6 TTL = 7
More about TTL and Hops <ul><li>In general, you greatly increase the nodes you can connect to as you broaden your horizon ...
Ping and Pong <ul><li>After knowing where to go to join a network, a servent uses  Ping  and  Pong  to locate other nodes ...
Ping/Pong Example PONG
Ping/Pong Example PONG
Ping/Pong Example PONG PING
Ping/Pong Example PONG PING PING PING
Ping/Pong Example PONG PING PING PING PING PING
Ping/Pong Example PONG
Ping/Pong Example PONG PONG PONG
Ping/Pong Example PONG PONG PONG
Ping/Pong Example PONG PONG
Ping/Pong Example
Ping/Pong Example
Pong Caching <ul><li>Problem with this approach was traffic – broadcasting causes too much traffic </li></ul><ul><li>Newer...
Pong Caching, Continued <ul><li>Use modified TTL and Hops  </li></ul><ul><ul><li>TTL=1, Hops=0 is a probe – servent only r...
Pong Caching, Continued <ul><ul><li>TTL=7, Hops=0 is a probe – servent replies about any nearby nodes </li></ul></ul><ul><...
Ping/Pong Example w/Caching PONG PING
Ping/Pong Example w/Caching PONG PONG PONG PONG PONG PONG PONG
Queries in Gnutella <ul><li>Queries work very much like the original Ping/Pong scheme </li></ul><ul><li>Send a  Query  to ...
Queries in Gnutella, Continued <ul><li>Send  Query  to neighbors, who send to their neighbors </li></ul><ul><li>Decrement ...
Query Example File is requested . PONG I need a guitar song Query Message Query Hit Message
Transferring Data <ul><li>When a file is located using the Gnutella protocol, the Query Hit message returns a host/port to...
Firewalls and Push <ul><li>Gnutella protocol can usually pass through firewalls if servents send/receive on same ports – P...
Firewalls and Push Client connects to outside host. Sends from same port Gnutella traffic will be received on. Mapping fro...
Flow Control <ul><li>In order to control network traffic, Gnutella suggests a very simple flow control method </li></ul><u...
Bye <ul><li>Message to tell other clients you have terminated </li></ul><ul><li>Not required, but recommended </li></ul><u...
Format and Style of Gnutella Messages <ul><li>Connections are made via TCP </li></ul><ul><li>Classic HTTP style text tripl...
Problems with Flood Query <ul><li>Traditional Gnutella flood query has a number of problems </li></ul><ul><ul><li>Very lar...
Alternatives to Flood Query <ul><li>Iterative Deepening </li></ul><ul><li>Directed BFS </li></ul><ul><li>Local Indices </l...
Iterative Deepening <ul><li>Essentially, idea is multiple, sequential breadth-first searches </li></ul><ul><li>First, sear...
Iterative Deepening Example
Iterative Deepening <ul><li>To implement more efficiently, [3] proposed the following </li></ul><ul><ul><li>Assume message...
Iterative Deepening <ul><ul><li>If needed, search is repeated after a delay, TTL is set to  a .  </li></ul></ul><ul><ul><u...
Efficient Iterative Deepening Example Send out the packets, but keep them for now. If response, delete after a timeout
Efficient Iterative Deepening Example On the final iteration (maximum depth), searches are not stored. If no response, ser...
Directed BFS <ul><li>Directed BFS works by selecting a subset of the neighbors, and sending to these </li></ul><ul><li>You...
Selection Strategies in BFS <ul><li>Random? </li></ul><ul><li>Highest # results for recent queries </li></ul><ul><li>Resul...
Local Indices <ul><li>Idea is that a node maintains information about what files neighbors, and possibly neighbor’s neighb...
Local Indices Example … … … … … Each node has a radius of 2 – knows about the files on its neighbors and neighbor’s neighb...
Local Indices <ul><li>Memory issue of maintaining lists </li></ul><ul><ul><li>Size is far below a megabyte for radius < 5 ...
Random Walkers <ul><li>Quite a different approach </li></ul><ul><ul><li>Doesn't use TTLs in same way </li></ul></ul><ul><l...
Random Walks <ul><li>Basic random walk (one walker) </li></ul><ul><ul><li>Choose a random neighbor at each step </li></ul>...
TTL  =  7,  6,  5,  4,  3
Random Walks (cont.) <ul><li>Advantage: cut message overhead (av. #msg. per node) </li></ul><ul><li>Disadvantage: increase...
Multiple Parallel Random Walks (K Walker) <ul><li>To decrease delay, increase walkers </li></ul><ul><li>Multiple Walkers <...
TTL  =  7,  6,  5,  4,  3,  2,  1  To reach 10 nodes 2 walkers need 5 steps.
To reach 10 nodes 1 walkers need 10 steps TTL  =  7,  6,  5,  4,  3 ……….
How to terminate the walkers? <ul><ul><li>TTL?  No </li></ul></ul><ul><ul><ul><li>Terminate after certain number of hops <...
TTL  =  7,  6,  5,  4,  3,  2,  1
One improvement: State Keeping <ul><li>Each query has a unique ID and All its K walkers are tagged with that ID </li></ul>...
Issues <ul><li>Several alternatives (Local Indices, Iterative Deepening) require a global policy to be understood by all n...
Evaluating Techniques <ul><li>Do the results suffer? </li></ul><ul><ul><li>Most of these assume a request to be satisfied ...
Iterative Deepening <ul><li>Savings depend on how long to wait between iterations, depth of each search </li></ul><ul><ul>...
Directed BFS <ul><li>Using neighbors with greatest number of results and lowest avg. response time worked best </li></ul><...
Local Indices <ul><li>Same results claimed as BFS </li></ul><ul><li>No timing results, since simulated on existing data </...
Random Walkers <ul><li>Can have large gain in efficiency if </li></ul><ul><ul><li>State to prevent duplicates </li></ul></...
Conclusion <ul><li>Almost all improve, but trade-offs </li></ul><ul><ul><li>Most need to maintain state, increased coding ...
References <ul><li>[1] Gnutella RFC ( http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html ) </li></ul><ul><li>[2] Y...
Upcoming SlideShare
Loading in …5
×

Gnutella

996 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
996
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • 4, 14, 27, 35, 38, 39, 40
  • Roughly 3 for Gnutella as of Oct. 2001. Indicates only about 254 nodes in search strategy. Some mention 7. This gives almost 350K
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Heterogeneous in both the physical nodes and the applications that are run. The protocol makes it very clear that you must allow multiple types of clients to connect to each other.
  • Of course, in worst case, we have done extra searches, since each one was a search.
  • Configuration – all nodes need to know some information, need to track some state Each node may have to store and track a bit more
  • Any state kept around to ignore is also cleaned up by timeouts
  • Possibly won’t require client modification Some techniques require advertising information that might be private (queue length)
  • Possible security risks of this Maintenance of state, relatively Greatest results, least time did best Smallest hops poor – bad assumption that having low hops means close hits – may only mean thin area of tree Greatest degree wasn’t as good as expected – perhaps because the other, more intelligent mechanisms did better
  • Issues of coherence Not only does each node need to store information about neighbors, but has to be aware of a global policy
  • Issues of coherence Not only does each node need to store information about neighbors, but has to be aware of a global policy
  • None of the papers evaluate the local state and how this increases complexity
  • What about coherence issues?
  • What about coherence issues?
  • Gnutella

    1. 1. Gnutella & Searching Algorithms in Unstructured Peer-to-Peer Networks CS780-3 Lecture Notes In Courtesy of David Bryan
    2. 2. What is Gnutella? <ul><li>Gnutella is a fully decentralized peer-to-peer protocol for locating resources </li></ul><ul><li>Standard , not a program. There are many implementations of Gnutella (BearShare, LimeWire, Morpheus) </li></ul><ul><li>Each member of the network may act as a client, looking for data, or a server, sharing that data </li></ul>Serv Client ent Server
    3. 3. Overlay Networks and Gnutella <ul><li>A network on top of a network </li></ul><ul><li>Used only for the search aspect of Gnutella </li></ul><ul><li>Actual transfer of files is not performed with the overlay network </li></ul>
    4. 4. Connecting to the Network <ul><li>When you start the application, need to first connect to a node - bootstrapping </li></ul><ul><li>No set way to do this. Protocol is silent on the issue </li></ul><ul><ul><li>Node Cache – contact a well known node that maintains a list of nodes </li></ul></ul><ul><ul><li>Node Web Page – lists locations of nodes to bootstrap and get started </li></ul></ul>
    5. 5. A bit about TTL and Hops <ul><li>TTL and Hops are fields in a Gnutella message that are used to control the diameter or horizon of queries </li></ul><ul><li>Except when used as a flag in special cases, as you move forward, you decrement TTL and increase Hops </li></ul>
    6. 6. TTL/Hops Example TTL = 7 Hops = 0 TTL = 6 Hops = 1 TTL = 5 Hops = 2 TTL = 4 Hops = 3 TTL = 3 Hops = 4
    7. 7. Horizon example TTL = 1 TTL = 2 TTL = 4 TTL = 3 TTL = 5 TTL = 6 TTL = 7
    8. 8. More about TTL and Hops <ul><li>In general, you greatly increase the nodes you can connect to as you broaden your horizon </li></ul><ul><li>Where n hosts is the number of hosts reachable, n hops is the number of hops, and d is the average number of hosts reachable from each host (degree of connectivity) ( assumes acyclic graph ) </li></ul>
    9. 9. Ping and Pong <ul><li>After knowing where to go to join a network, a servent uses Ping and Pong to locate other nodes to connect to </li></ul><ul><li>Originally, Ping message sent to the node connected to, which forwarded to all neighbors </li></ul><ul><li>All those servents willing to connect to the sender respond with Pong </li></ul><ul><li>Pongs convey IP, Port, Files shared, and distance (hops) </li></ul>
    10. 10. Ping/Pong Example PONG
    11. 11. Ping/Pong Example PONG
    12. 12. Ping/Pong Example PONG PING
    13. 13. Ping/Pong Example PONG PING PING PING
    14. 14. Ping/Pong Example PONG PING PING PING PING PING
    15. 15. Ping/Pong Example PONG
    16. 16. Ping/Pong Example PONG PONG PONG
    17. 17. Ping/Pong Example PONG PONG PONG
    18. 18. Ping/Pong Example PONG PONG
    19. 19. Ping/Pong Example
    20. 20. Ping/Pong Example
    21. 21. Pong Caching <ul><li>Problem with this approach was traffic – broadcasting causes too much traffic </li></ul><ul><li>Newer standard uses “Pong Caching”. Servents keep information about nearby nodes, and send this when Pinged </li></ul><ul><li>Protocol suggests one method for how to maintain the cache, but doesn’t require that it be used </li></ul>
    22. 22. Pong Caching, Continued <ul><li>Use modified TTL and Hops </li></ul><ul><ul><li>TTL=1, Hops=0 is a probe – servent only responds about itself </li></ul></ul><ul><ul><li>TTL=2, Hops=0 is a crawler – servent responds with pongs from immediate neighbors only </li></ul></ul>
    23. 23. Pong Caching, Continued <ul><ul><li>TTL=7, Hops=0 is a probe – servent replies about any nearby nodes </li></ul></ul><ul><ul><ul><li>Protocol suggests 10 pongs should be returned – those 10 that have most recently been received </li></ul></ul></ul><ul><ul><ul><li>Must be good pongs (recent responses, well behaved servents) </li></ul></ul></ul><ul><ul><ul><li>Servents should send every 3 seconds, but responses come from other’s cache, so bandwidth is claimed to be low </li></ul></ul></ul>
    24. 24. Ping/Pong Example w/Caching PONG PING
    25. 25. Ping/Pong Example w/Caching PONG PONG PONG PONG PONG PONG PONG
    26. 26. Queries in Gnutella <ul><li>Queries work very much like the original Ping/Pong scheme </li></ul><ul><li>Send a Query to neighbors, with TTL=7 and Hops=0 </li></ul><ul><li>This defines the default “horizon” for reaching servents. As TTL increases, so does the horizon, and the number of hosts that can be reached, improving odds of a hit </li></ul>
    27. 27. Queries in Gnutella, Continued <ul><li>Send Query to neighbors, who send to their neighbors </li></ul><ul><li>Decrement TTL, increase Hops when you receive </li></ul><ul><li>Continue forwarding while TTL != 0 </li></ul><ul><li>If you have the requested file, return a Query Hit message </li></ul><ul><li>Generates lots of messages. Called “flood” search </li></ul>
    28. 28. Query Example File is requested . PONG I need a guitar song Query Message Query Hit Message
    29. 29. Transferring Data <ul><li>When a file is located using the Gnutella protocol, the Query Hit message returns a host/port to contact to obtain the file </li></ul><ul><ul><li>The connection to get data is made using HTTP </li></ul></ul><ul><ul><li>Usually (but not always) same host/port as used for Gnutella traffic </li></ul></ul><ul><li>Problems occur when servent is behind firewall </li></ul>
    30. 30. Firewalls and Push <ul><li>Gnutella protocol can usually pass through firewalls if servents send/receive on same ports – Ping/Pong mechanism will keep pinholes open </li></ul><ul><li>Problem comes with direct connections from remote host </li></ul>
    31. 31. Firewalls and Push Client connects to outside host. Sends from same port Gnutella traffic will be received on. Mapping from inside host to outside host now allows return traffic, so Gnutella traffic to this host is allowed. Remote client uses Gnutella protocol to search for file. Finds file on the host behind firewall. Messages searching arrive, since they pass through the nodes which have firewall mappings. Return message follows same path, so passes through firewall. Client uses Ping and Pong mechanism to find and connect to several other servents. Firewall mappings are created to these servents as well. The firewall mappings are refreshed by the periodic Ping messages used to keep Ping caches updated If remote client now attempts a direct HTTP connection to retrieve, the connection will be blocked. Since the firewalled servent never contacted the requesting host, there is no mapping in firewall. The Push message instructs the remote machine to send the file using HTTP to the requestor. It is routed through the Gnutella network. Of course, if both ends are behind firewalls and not peers… QUERY PING QUERY PUSH GET PING PING PUSH PUT
    32. 32. Flow Control <ul><li>In order to control network traffic, Gnutella suggests a very simple flow control method </li></ul><ul><ul><li>Servents create a FIFO output queue </li></ul></ul><ul><ul><li>When half full, enter flow control mode, stay as long as more than one quarter full </li></ul></ul><ul><ul><ul><li>Drop all input </li></ul></ul></ul><ul><ul><ul><li>Sort queue to send responses before requests </li></ul></ul></ul><ul><ul><ul><li>Long traveling responses gain priority, Long traveling requests lose priority </li></ul></ul></ul>
    33. 33. Bye <ul><li>Message to tell other clients you have terminated </li></ul><ul><li>Not required, but recommended </li></ul><ul><li>Should include a reason code. Loosely defined in the protocol </li></ul>
    34. 34. Format and Style of Gnutella Messages <ul><li>Connections are made via TCP </li></ul><ul><li>Classic HTTP style text triple handshake – CONNECT, replied to with 200, replied to with 200 </li></ul><ul><li>After this, Gnutella messages are exchanged </li></ul><ul><ul><li>Gnutella messages are binary </li></ul></ul><ul><li>Each requests uses a semi-unique ID number </li></ul>
    35. 35. Problems with Flood Query <ul><li>Traditional Gnutella flood query has a number of problems </li></ul><ul><ul><li>Very large number of packets generated to fulfill queries </li></ul></ul><ul><ul><ul><li>Study [2] indicates that most searches on Gnutella can be satisfied with a search that visits fewer nodes </li></ul></ul></ul><ul><ul><li>Essentially, just a Breadth First Search (BFS) </li></ul></ul><ul><ul><li>Some proposals [3,4] attempt to address this with alternate schemes for searching </li></ul></ul>
    36. 36. Alternatives to Flood Query <ul><li>Iterative Deepening </li></ul><ul><li>Directed BFS </li></ul><ul><li>Local Indices </li></ul><ul><li>Random Walkers </li></ul>
    37. 37. Iterative Deepening <ul><li>Essentially, idea is multiple, sequential breadth-first searches </li></ul><ul><li>First, search to a shallow depth </li></ul><ul><li>If no results are returned, repeat search, but go to deeper depth </li></ul>
    38. 38. Iterative Deepening Example
    39. 39. Iterative Deepening <ul><li>To implement more efficiently, [3] proposed the following </li></ul><ul><ul><li>Assume message depth tried first is a , followed by b , then finally c if needed </li></ul></ul><ul><ul><li>Nodes that receive search at depth a store the search temporarily </li></ul></ul><ul><ul><li>If originating node is satisfied, it doesn’t resend, nodes at depth a will delete the message after a time </li></ul></ul>
    40. 40. Iterative Deepening <ul><ul><li>If needed, search is repeated after a delay, TTL is set to a . </li></ul></ul><ul><ul><ul><li>Nodes on the way to a recognize the repeat message and ignore </li></ul></ul></ul><ul><ul><ul><li>Upon reaching depth a , the message is forwarded with TTL of b – a </li></ul></ul></ul><ul><ul><ul><li>Search is saved at level b now </li></ul></ul></ul><ul><ul><ul><li>Gnutella ID used to track packets </li></ul></ul></ul><ul><ul><li>Finally, if it needs to be repeated again, new packets go out from b . All nodes on the way to depth b ignore. Since c is final level, no need to save on this third iteration </li></ul></ul><ul><ul><li>This requires knowledge by nodes of the policy </li></ul></ul>
    41. 41. Efficient Iterative Deepening Example Send out the packets, but keep them for now. If response, delete after a timeout
    42. 42. Efficient Iterative Deepening Example On the final iteration (maximum depth), searches are not stored. If no response, server will resend. Intermediate nodes ignore, packets reach the stored level. Stored packets are deleted and sent to next depth in the policy, where they will be stored. In further iterations, all nodes between the source and the new stored level will ignore.
    43. 43. Directed BFS <ul><li>Directed BFS works by selecting a subset of the neighbors, and sending to these </li></ul><ul><li>You are trying to guess who is most likely to respond </li></ul><ul><li>That neighbor may pass on using standard BFS, or the neighbor may use Directed BFS to further limit its messages </li></ul><ul><li>Trick is finding a good way to select neighbors </li></ul>
    44. 44. Selection Strategies in BFS <ul><li>Random? </li></ul><ul><li>Highest # results for recent queries </li></ul><ul><li>Results w/low hop counts </li></ul><ul><li>High message count </li></ul><ul><li>Short message queue (flow control) </li></ul><ul><li>Returns results quickly </li></ul><ul><li>Has most neighbors </li></ul><ul><li>Shortest latency </li></ul>
    45. 45. Local Indices <ul><li>Idea is that a node maintains information about what files neighbors, and possibly neighbor’s neighbors store. “radius of knowledge” </li></ul><ul><li>All nodes know about radius, and know about policy </li></ul><ul><ul><li>Policy lists which levels will respond and which will ignore messages </li></ul></ul><ul><ul><li>Servents look at TTL/Hops to determine if they process or ignore </li></ul></ul>
    46. 46. Local Indices Example … … … … … Each node has a radius of 2 – knows about the files on its neighbors and neighbor’s neighbors Policy is that levels 1, 4 and 7 respond … Nodes at level 1 respond with information about levels 1, 2 and 3, and forward to next level Searches move to levels 2 and 3, which ignore and forward When search reaches level 4, it responds with information about levels 4, 5, and 6, then forwards the messages. Layers 5 and 6 simply ignore and forward. Finally, level 7 responds with its own data, and terminates the query.
    47. 47. Local Indices <ul><li>Memory issue of maintaining lists </li></ul><ul><ul><li>Size is far below a megabyte for radius < 5 </li></ul></ul><ul><li>Network issue of building list </li></ul><ul><ul><li>Hit from extra packets </li></ul></ul>
    48. 48. Random Walkers <ul><li>Quite a different approach </li></ul><ul><ul><li>Doesn't use TTLs in same way </li></ul></ul><ul><li>One or a few messages are sent out a random link, if the object is found, return a hit, otherwise send to a different node </li></ul><ul><li>Choice of where to go next is essentially random </li></ul><ul><li>Periodically, when a node receives a query, it checks with source to see if query has been satisfied </li></ul>
    49. 49. Random Walks <ul><li>Basic random walk (one walker) </li></ul><ul><ul><li>Choose a random neighbor at each step </li></ul></ul><ul><ul><li>“Walker” carries the query to search for each hop </li></ul></ul>
    50. 50. TTL = 7, 6, 5, 4, 3
    51. 51. Random Walks (cont.) <ul><li>Advantage: cut message overhead (av. #msg. per node) </li></ul><ul><li>Disadvantage: increase query searching delay(#hops) </li></ul>
    52. 52. Multiple Parallel Random Walks (K Walker) <ul><li>To decrease delay, increase walkers </li></ul><ul><li>Multiple Walkers </li></ul><ul><ul><li>K walkers, T steps = 1 walker, k*T steps </li></ul></ul><ul><ul><li>therefore cut the delay by factor of k. </li></ul></ul>
    53. 53. TTL = 7, 6, 5, 4, 3, 2, 1 To reach 10 nodes 2 walkers need 5 steps.
    54. 54. To reach 10 nodes 1 walkers need 10 steps TTL = 7, 6, 5, 4, 3 ……….
    55. 55. How to terminate the walkers? <ul><ul><li>TTL? No </li></ul></ul><ul><ul><ul><li>Terminate after certain number of hops </li></ul></ul></ul><ul><ul><ul><li>Runs into the TTL selection issue too </li></ul></ul></ul><ul><ul><li>checking? YES </li></ul></ul><ul><ul><ul><li>Walker checks with request node frequently </li></ul></ul></ul><ul><ul><ul><li>Checking once at every fourth step along the way is usually good </li></ul></ul></ul>
    56. 56. TTL = 7, 6, 5, 4, 3, 2, 1
    57. 57. One improvement: State Keeping <ul><li>Each query has a unique ID and All its K walkers are tagged with that ID </li></ul><ul><li>For each ID, a node only forwards the query to a neighbor who is new to this query </li></ul>
    58. 58. Issues <ul><li>Several alternatives (Local Indices, Iterative Deepening) require a global policy to be understood by all nodes </li></ul><ul><li>Sharing information about file index (Local Indices) or even statistics (Directed BFS) leads to possible security risks </li></ul><ul><li>Most, require at least some modification to the servents </li></ul>
    59. 59. Evaluating Techniques <ul><li>Do the results suffer? </li></ul><ul><ul><li>Most of these assume a request to be satisfied with a number of hits – typically 50-100. </li></ul></ul><ul><ul><li>Time for those results? </li></ul></ul><ul><li>What sort of network traffic does this add? </li></ul><ul><li>What sort of processing is needed? </li></ul><ul><li>What sort of memory is needed? </li></ul><ul><li>Local state and increased client complexity? </li></ul>
    60. 60. Iterative Deepening <ul><li>Savings depend on how long to wait between iterations, depth of each search </li></ul><ul><ul><li>80% bandwidth, 60% processing for wait of 8 seconds between iterations </li></ul></ul><ul><li>A policy with depths of 5, 6, 7 and 6 second delays optimal </li></ul><ul><ul><li>Saves 70% bandwidth, 50% processing </li></ul></ul><ul><ul><li>Adds only ~30% to time to satisfy </li></ul></ul><ul><li>Some state needed for knowing policy, storing packets </li></ul>
    61. 61. Directed BFS <ul><li>Using neighbors with greatest number of results and lowest avg. response time worked best </li></ul><ul><ul><li>Past performance is an indicator of future performance </li></ul></ul><ul><li>Both had probability of satisfying >85% of a BFS </li></ul><ul><li>About 70% less bandwidth and processing </li></ul><ul><li>Higher cost than iterative deepening, but slightly faster </li></ul>
    62. 62. Local Indices <ul><li>Same results claimed as BFS </li></ul><ul><li>No timing results, since simulated on existing data </li></ul><ul><li>Traffic is reduced by 60%, processing by 50% for a radius of 1 </li></ul><ul><li>Index size is a factor – radius of 1 71KB, but 7 is 21MB! </li></ul><ul><li>Bandwidth actually increases for large r – too much traffic to maintain indices </li></ul>
    63. 63. Random Walkers <ul><li>Can have large gain in efficiency if </li></ul><ul><ul><li>State to prevent duplicates </li></ul></ul><ul><ul><li>Good policy to terminate once satisfied (not TTL based) </li></ul></ul><ul><ul><li>Very good results obtained for low number of hops (as low as 8, which is similar to depth used for floods) </li></ul></ul><ul><ul><li>Looks like a promising area of research, perhaps combined with some metrics used in Directed BFS to select nodes </li></ul></ul>
    64. 64. Conclusion <ul><li>Almost all improve, but trade-offs </li></ul><ul><ul><li>Most need to maintain state, increased coding complexity </li></ul></ul><ul><ul><li>Usually less results, slower performance </li></ul></ul><ul><ul><li>Security implications of more information sharing </li></ul></ul><ul><li>Good deal of research left to be done in this area, although most work appears to be a year or more old </li></ul>
    65. 65. References <ul><li>[1] Gnutella RFC ( http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html ) </li></ul><ul><li>[2] Yang and Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems, Technical Report ( http://dbpubs.stanford.edu:8090/pub/2000-35 ) </li></ul><ul><li>[3] Yang and Garcia-Molina, Improving Search in Peer-to-Peer Networks, in Proc. of the 22nd International Conference on Distributed Computing Systems (ICDCS’02) , June 2002 ( http://dbpubs.stanford.edu:8090/pub/2002-28 ) </li></ul><ul><li>[4] Q. Lv, P. Cao, E. Cohen, K. Li and S. Shenker, Search and Replication in Unstructured Peer-to-Peer Networks, in Proc. Of the ACM ICS, 2002 ( http://parapet.ee.princeton.edu/~sigm2002/papers/p258-lv.pdf ) </li></ul><ul><li>[5] Yang and Garcia-Molina, Improving Search in Peer-to-Peer Networks, Technical Report Version of [3] ( http://dbpubs.stanford.edu:8090/pub/2001-47 ) </li></ul>

    ×