Peer to-peer

1,215 views
1,052 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,215
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
51
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Upload index to central server when you come online To search, consult central server Request doc directly
  • Everyone knows about some small number of nodes To find a file, ask everyone you know When you find out who has the doc, ask directly
  • More systematic approach Ids for docs and nodes Store doc at node with closest id Keep track of small number of nodes with ids close to yours Route requests toward the document
  • Most projects address the same goal Slightly different models Some specifics? Main goals, minimize search time and routing state
  • Peer to-peer

    1. 1. An Overview of Peer-to-Peer Sami Rollins 11/14/02
    2. 2. Outline• P2P Overview – What is a peer? – Example applications – Benefits of P2P• P2P Content Sharing – Challenges – Group management/data placement approaches – Measurement studies
    3. 3. What is Peer-to-Peer (P2P)?• Napster?• Gnutella?• Most people think of P2P as music sharing
    4. 4. What is a peer?• Contrasted with Client-Server model• Servers are centrally maintained and administered• Client has fewer resources than a server
    5. 5. What is a peer?• A peer’s resources are similar to the resources of the other participants• P2P – peers communicating directly with other peers and sharing resources
    6. 6. Levels of P2P-ness• P2P as a mindset – Slashdot• P2P as a model – Gnutella• P2P as an implementation choice – Application-layer multicast• P2P as an inherent property – Ad-hoc networks
    7. 7. P2P Application Taxonomy P2P SystemsDistributed Computing File Sharing Collaboration Platforms SETI@home Gnutella Jabber JXTA
    8. 8. P2P Goals/Benefits• Cost sharing• Resource aggregation• Improved scalability/reliability• Increased autonomy• Anonymity/privacy• Dynamism• Ad-hoc communication
    9. 9. P2P File Sharing• Content exchange – Gnutella• File systems – Oceanstore• Filtering/mining – Opencola
    10. 10. P2P File Sharing Benefits• Cost sharing• Resource aggregation• Improved scalability/reliability• Anonymity/privacy• Dynamism
    11. 11. Research Areas• Peer discovery and group management• Data location and placement• Reliable and efficient file exchange• Security/privacy/anonymity/trust
    12. 12. Current Research• Group management and data placement – Chord, CAN, Tapestry, Pastry• Anonymity – Publius• Performance studies – Gnutella measurement study
    13. 13. Management/Placement Challenges• Per-node state• Bandwidth usage• Search time• Fault tolerance/resiliency
    14. 14. Approaches• Centralized• Flooding• Document Routing
    15. 15. Centralized Bob Alice• Napster model• Benefits: – Efficient search – Limited bandwidth usage – No per-node state• Drawbacks: – Central point of failure Judy Jane – Limited scale
    16. 16. Flooding Carl Jane• Gnutella model• Benefits: – No central point of failure – Limited per-node state• Drawbacks: Bob – Slow searches – Bandwidth intensive Alice Judy
    17. 17. Document Routing 001 012• FreeNet, Chord, CAN, Tapestry, Pastry model 212 ? 212 ?• Benefits: 332 – More efficient searching 212 – Limited per-node state 305• Drawbacks: – Limited fault-tolerance vs redundancy
    18. 18. Document Routing – CAN• Associate to each node and item a unique id in an d-dimensional space• Goals – Scales to hundreds of thousands of nodes – Handles rapid arrival and failure of nodes• Properties – Routing table size O(d) – Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Slide modified from another presentation
    19. 19. CAN Example: Two Dimensional Space• Space divided between nodes 7• All nodes cover the entire 6 space 5• Each node covers either a 4 square or a rectangular area of ratios 1:2 or 2:1 3 n1• Example: 2 – Node n1:(1, 2) first node that 1 joins  cover the entire space 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
    20. 20. CAN Example: Two Dimensional Space• Node n2:(4, 2) joins  space 7 is divided between n1 and n2 6 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
    21. 21. CAN Example: Two Dimensional Space• Node n2:(4, 2) joins  space 7 is divided between n1 and n2 6 n3 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
    22. 22. CAN Example: Two Dimensional Space• Nodes n4:(5, 5) and n5:(6,6) 7 join 6 n5 n3 n4 5 4 3 n1 n2 2 1 0 0 1 2 3 4 5 6 7 Slide modified from another presentation
    23. 23. CAN Example: Two Dimensional Space• Nodes: n1:(1, 2); n2:(4,2); n3: 7 (3, 5); n4:(5,5);n5:(6,6) 6 n5• Items: f1:(2,3); f2:(5,1); f3: n3 n4 5 f4 (2,1); f4:(7,5); 4 f1 3 n1 n2 2 f3 1 0 f2 0 1 2 3 4 5 6 7 Slide modified from another presentation
    24. 24. CAN Example: Two Dimensional Space• Each item is stored by the 7 node who owns its mapping in the space 6 n3 n4 n5 5 f4 4 f1 3 n1 n2 2 f3 1 0 f2 0 1 2 3 4 5 6 7 Slide modified from another presentation
    25. 25. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
    26. 26. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
    27. 27. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
    28. 28. CAN: Query Example• Each node knows its neighbors in the d-space 7• Forward query to the 6 n5 n4 neighbor that is closest to the 5 n3 f4 query id 4• Example: assume n1 queries 3 f1 f4 n1 n2 2• Can route around some f3 1 failures f2 0 – some failures require local 0 1 2 3 4 5 6 7 flooding Slide modified from another presentation
    29. 29. Node Failure Recovery• Simple failures – know your neighbor’s neighbors – when a node fails, one of its neighbors takes over its zone• More complex failure modes – simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors – hopefully, a rare event Slide modified from another presentation
    30. 30. Document Routing – Chord N5 N10 N110 K19• MIT project N20• Uni-dimensional ID N99 space N32• Keep track of log N nodes N80• Search through log N nodes to find desired key N60
    31. 31. Doc Routing – Tapestry/Pastry 43F 993 13F E E E• Global mesh• Suffix-based routing 73F F99• Uses underlying network E 0 distance in constructing 04F E mesh 999 ABF 0 E 239 E 129 0
    32. 32. Comparing Guarantees Model Search StateChord Uni- log N log N dimensional Multi-CAN dN1/d 2d dimensionalTapestry Global Mesh logbN b logbNPastry Neighbor logbN b logbN + b map
    33. 33. Remaining Problems?• Hard to handle highly dynamic environments• Usable services• Methods don’t consider peer characteristics
    34. 34. Measurement Studies• “Free Riding on Gnutella”• Most studies focus on Gnutella• Want to determine how users behave• Recommendations for the best way to design systems
    35. 35. Free Riding Results• Who is sharing what?• August 2000 The top Share As percent of whole 333 hosts (1%) 1,142,645 37% 1,667 hosts (5%) 2,182,087 70% 3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%) 2,928,905 94% 6,667 hosts (20%) 3,037,232 98% 8,333 hosts (25%) 3,082,572 99%
    36. 36. Saroiu et al Study• How many peers are server-like…client- like? – Bandwidth, latency• Connectivity• Who is sharing what?
    37. 37. Saroiu et al Study• May 2001• Napster crawl – query index server and keep track of results – query about returned peers – don’t capture users sharing unpopular content• Gnutella crawl – send out ping messages with large TTL
    38. 38. Results Overview• Lots of heterogeneity between peers – Systems should consider peer capabilities• Peers lie – Systems must be able to verify reported peer capabilities or measure true capabilities
    39. 39. Measured Bandwidth
    40. 40. Reported Bandwidth
    41. 41. Measured Latency
    42. 42. Measured Uptime
    43. 43. Number of Shared Files
    44. 44. Connectivity
    45. 45. Points of Discussion• Is it all hype?• Should P2P be a research area?• Do P2P applications/systems have common research questions?• What are the “killer apps” for P2P systems?
    46. 46. Conclusion• P2P is an interesting and useful model• There are lots of technical challenges to be solved

    ×