Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Peer to-peer

1,540 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Peer to-peer

  1. 1. An Overview of Peer-to-Peer Sami Rollins 11/14/02
  2. 2. Outline• P2P Overview – What is a peer? – Example applications – Benefits of P2P• P2P Content Sharing – Challenges – Group management/data placement approaches – Measurement studies
  3. 3. What is Peer-to-Peer (P2P)?• Napster?• Gnutella?• Most people think of P2P as music sharing
  4. 4. What is a peer?• Contrasted with Client-Server model• Servers are centrally maintained and administered• Client has fewer resources than a server
  5. 5. What is a peer?• A peer’s resources are similar to the resources of the other participants• P2P – peers communicating directly with other peers and sharing resources
  6. 6. Levels of P2P-ness• P2P as a mindset – Slashdot• P2P as a model – Gnutella• P2P as an implementation choice – Application-layer multicast• P2P as an inherent property – Ad-hoc networks
  7. 7. P2P Application Taxonomy P2P SystemsDistributed Computing File Sharing Collaboration Platforms SETI@home Gnutella Jabber JXTA
  8. 8. P2P Goals/Benefits• Cost sharing• Resource aggregation• Improved scalability/reliability• Increased autonomy• Anonymity/privacy• Dynamism• Ad-hoc communication
  9. 9. P2P File Sharing• Content exchange – Gnutella• File systems – Oceanstore• Filtering/mining – Opencola
  10. 10. P2P File Sharing Benefits• Cost sharing• Resource aggregation• Improved scalability/reliability• Anonymity/privacy• Dynamism
  11. 11. Research Areas• Peer discovery and group management• Data location and placement• Reliable and efficient file exchange• Security/privacy/anonymity/trust
  12. 12. Current Research• Group management and data placement – Chord, CAN, Tapestry, Pastry• Anonymity – Publius• Performance studies – Gnutella measurement study
  13. 13. Management/Placement Challenges• Per-node state• Bandwidth usage• Search time• Fault tolerance/resiliency
  14. 14. Approaches• Centralized• Flooding• Document Routing
  15. 15. Centralized Bob Alice• Napster model• Benefits: – Efficient search – Limited bandwidth usage – No per-node state• Drawbacks: – Central point of failure Judy Jane – Limited scale
  16. 16. Flooding Carl Jane• Gnutella model• Benefits: – No central point of failure – Limited per-node state• Drawbacks: Bob – Slow searches – Bandwidth intensive Alice Judy
  17. 17. Document Routing 001 012• FreeNet, Chord, CAN, Tapestry, Pastry model 212 ? 212 ?• Benefits: 332 – More efficient searching 212 – Limited per-node state 305• Drawbacks: – Limited fault-tolerance vs redundancy
  18. 18. Document Routing – CAN• Associate to each node and item a unique id in an d-dimensional space• Goals – Scales to hundreds of thousands of nodes – Handles rapid arrival and failure of nodes• Properties – Routing table size O(d) – Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Slide modified from another presentation
  19. 19. CAN Example: Two Dimensional Space• Space divided between nodes 7• All nodes cover the entire 6 space 5• Each node covers either a 4 square or a rectangular area of ratios 1:2 or 2:1 3 n1• Example: 2 – Node n1:(1, 2) first node that 1 joins  cover the entire space 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
  20. 20. CAN Example: Two Dimensional Space• Node n2:(4, 2) joins  space 7 is divided between n1 and n2 6 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
  21. 21. CAN Example: Two Dimensional Space• Node n2:(4, 2) joins  space 7 is divided between n1 and n2 6 n3 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
  22. 22. CAN Example: Two Dimensional Space• Nodes n4:(5, 5) and n5:(6,6) 7 join 6 n5 n3 n4 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
  23. 23. CAN Example: Two Dimensional Space• Nodes: n1:(1, 2); n2:(4,2); n3: 7 (3, 5); n4:(5,5);n5:(6,6) 6 n5• Items: f1:(2,3); f2:(5,1); f3: n3 n4 5 f4 (2,1); f4:(7,5); 4 f1 3 n1 n2 2 f3 1 0 f2 0 1 2 3 4 5 6 7 Slide modified from another presentation
  24. 24. CAN Example: Two Dimensional Space• Each item is stored by the 7 node who owns its mapping in the space 6 n3 n4 n5 5 f4 4 f1 3 n1 n2 2 f3 1 0 f2 0 1 2 3 4 5 6 7 Slide modified from another presentation
  25. 25. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
  26. 26. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
  27. 27. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
  28. 28. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
  29. 29. Node Failure Recovery• Simple failures – know your neighbor’s neighbors – when a node fails, one of its neighbors takes over its zone• More complex failure modes – simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors – hopefully, a rare event Slide modified from another presentation
  30. 30. Document Routing – Chord N5 N10 N110 K19• MIT project N20• Uni-dimensional ID N99 space N32• Keep track of log N nodes N80• Search through log N nodes to find desired key N60
  31. 31. Doc Routing – Tapestry/Pastry 43F 993 13F E E E• Global mesh• Suffix-based routing 73F F99• Uses underlying network E 0 distance in constructing 04F E mesh 999 ABF 0 E 239 E 129 0
  32. 32. Comparing Guarantees Model Search StateChord Uni- log N log N dimensional Multi-CAN dN1/d 2d dimensionalTapestry Global Mesh logbN b logbNPastry Neighbor logbN b logbN + b map
  33. 33. Remaining Problems?• Hard to handle highly dynamic environments• Usable services• Methods don’t consider peer characteristics
  34. 34. Measurement Studies• “Free Riding on Gnutella”• Most studies focus on Gnutella• Want to determine how users behave• Recommendations for the best way to design systems
  35. 35. Free Riding Results• Who is sharing what?• August 2000 The top Share As percent of whole 333 hosts (1%) 1,142,645 37% 1,667 hosts (5%) 2,182,087 70% 3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%) 2,928,905 94% 6,667 hosts (20%) 3,037,232 98% 8,333 hosts (25%) 3,082,572 99%
  36. 36. Saroiu et al Study• How many peers are server-like…client- like? – Bandwidth, latency• Connectivity• Who is sharing what?
  37. 37. Saroiu et al Study• May 2001• Napster crawl – query index server and keep track of results – query about returned peers – don’t capture users sharing unpopular content• Gnutella crawl – send out ping messages with large TTL
  38. 38. Results Overview• Lots of heterogeneity between peers – Systems should consider peer capabilities• Peers lie – Systems must be able to verify reported peer capabilities or measure true capabilities
  39. 39. Measured Bandwidth
  40. 40. Reported Bandwidth
  41. 41. Measured Latency
  42. 42. Measured Uptime
  43. 43. Number of Shared Files
  44. 44. Connectivity
  45. 45. Points of Discussion• Is it all hype?• Should P2P be a research area?• Do P2P applications/systems have common research questions?• What are the “killer apps” for P2P systems?
  46. 46. Conclusion• P2P is an interesting and useful model• There are lots of technical challenges to be solved

×