Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Thank you for your attention.
  • ppt

    1. 1. Structuring Unstructured Peer-to-Peer Networks Stefan Schmid Roger Wattenhofer D istributed C omputing G roup HiPC 2007 Goa, India
    2. 2. Networks… DISTRIBUTED COMPUTING Internet Graph Web Graph Neuron Networks Social Graphs Public Transportation Networks <ul><li>Different properties: </li></ul><ul><li>Natural vs. Man-made </li></ul><ul><li>Robustness </li></ul><ul><li>Diameter </li></ul><ul><li>Routability </li></ul><ul><li>... </li></ul>
    3. 3. An Interesting Network: Peer-to-Peer Network <ul><li>Popular Examples: </li></ul><ul><ul><li>File sharing : BitTorrent, eMule, Kazaa, ... </li></ul></ul><ul><ul><li>Streaming : Zattoo, Joost, ... </li></ul></ul><ul><ul><li>Internet telefony : Skype, ... </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>Important: p2p accounts for </li></ul><ul><li>much Internet traffic today! </li></ul><ul><li>(source: cachelogic.com ) </li></ul><ul><li>Network of peers, e.g., to share files </li></ul><ul><li>Desirable properties: </li></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Low degree, low network diameter </li></ul></ul><ul><ul><li>Fast routing </li></ul></ul><ul><ul><li>etc. </li></ul></ul>
    4. 4. Some Own Applications <ul><li>Wuala online storage system </li></ul><ul><li>- Student project, start-up, http://wua.la </li></ul><ul><li>Pulsar streaming </li></ul><ul><li>- tilllate.com, DJ events, ...; pstreams.com </li></ul><ul><li>- cheap infrastructure at content provider </li></ul><ul><li>BitThief BitTorrent downloads </li></ul><ul><li>Distributed Computations </li></ul><ul><li>- BOINC client for ECC discrete </li></ul><ul><li> logarithm challence </li></ul>Successful paradigm & technology, but still important research challenges!
    5. 5. Structured vs. Unstructured Topologies <ul><li>Old „p2p“ systems such as Napster were based on server </li></ul><ul><li>- Server stores index: search for contents is simple </li></ul><ul><li>- Problem: single point of failure </li></ul><ul><li>- Legacy issues... </li></ul><ul><li>Unstructured systems, e.g., Gnutella , allow arbitrary topologies </li></ul><ul><li>and arbitrary data placement </li></ul><ul><li>- Peers just connect to an arbitrary set of other peers </li></ul><ul><li>- No single point of failure </li></ul><ul><li>- But often inefficient: routing based on flooding or random walk </li></ul><ul><li>Structured systems, e.g., eMule‘s Kad network , give guarantees </li></ul><ul><li>- Proactive maintenance of topology </li></ul><ul><li>- Provable network diameter and peer degree </li></ul><ul><li>- Routing possible, look up, e.g., in log(n) hops </li></ul><ul><li> (maybe also low stretch) </li></ul>
    6. 6. What is „better“? <ul><li>Unstructured systems have less maintenance overhead </li></ul><ul><li>- Peers can join and leave wherever they want </li></ul><ul><li>Unstructured systems allow for a richer set of queries </li></ul><ul><li>- e.g., range queries, Boolean queries </li></ul><ul><li>Most importantly: despite the interesting properties (and large body of research) of structured networks, today‘s predominant networks are still </li></ul><ul><li>unstructured (e.g., Gnutella, BitTorrent, etc.) </li></ul>Really? Really? Flooding always possible! <ul><li>But unstructured systems often have scalability problems </li></ul><ul><ul><li>When Napster was unplugged, Gnutella went down. </li></ul></ul>Discussion needs to be continued...!
    7. 7. Routing in Arbitrary Topologies? <ul><li>How to find a file in an arbitrary network? </li></ul><ul><li>Option 1: Flooding (up to a certain hop radius r ) </li></ul><ul><ul><li>Robust , but does not scale . </li></ul></ul><ul><ul><li>Does not find the „needles“, but does a good job finding popular files. </li></ul></ul><ul><li>Option 2: Random Walks </li></ul><ul><ul><li>Less messages, but no lookup performance guarantee. </li></ul></ul><ul><ul><li>Potentially large delay (solution: many parallel „walkers“) </li></ul></ul><ul><ul><li>Walkers can be lost... </li></ul></ul><ul><ul><li>Analysis difficult. </li></ul></ul><ul><ul><li>Again: Good to find popular contents, bad to find needles. </li></ul></ul>
    8. 8. Flooding <ul><li>This talk considers search operations by flooding . </li></ul><ul><li>Efficiency of flooding? </li></ul>Very efficient on trees! Many redundant transimissions... Flooding efficiency depends on network topology!
    9. 9. Clustella <ul><li>We propose Clustella </li></ul><ul><li>- a new P2P client for unstructured peer-to-peer systems </li></ul><ul><li>- based on flooding, but with „ smart neighbor selection “ </li></ul><ul><li>- allows for more efficient flooding ! </li></ul>
    10. 10. Vision <ul><li>Clustella Vision: </li></ul>unstructured p2p network Normal client Clustella client By connecting to peers in far-away parts of the network , small cycles in the topology are avoided, and flooding is more efficient. Not only Clustella clients do benefit, but also all other clients in the network.
    11. 11. Flood Coverage <ul><li>Main open question: How to connect to remote peers ? </li></ul><ul><li>Given a set of potential neighbors, it would be useful to know the hop distance to each of those! </li></ul><ul><li>Then, we could connect to the one furthest away ... </li></ul><ul><li>Goal: Maximize flood coverage , i.e., maximize minimum number of nodes reached by a r -hop flooding – locally and despite dynamics </li></ul>
    12. 12. Hop-Estimation With Clustering <ul><li>Main idea: Use clustering ! </li></ul><ul><li>- Divide network into different clusters. </li></ul><ul><li>- Peers in different clusters belong to different network regions and can safely be connected without creating small cycles. </li></ul><ul><li>How to achieve such a clustering? Introduction of beacons ! </li></ul><ul><li>- Two parameters: radius R d and radius R b (R d < R b ) </li></ul><ul><li>- If a peer has no beacon in R d neighborhood, it becomes a beacon itself. </li></ul><ul><li>- A peer knows all beacons in its R b neighborhood. </li></ul><ul><li>- R b roughly equals the flooding radius R </li></ul>
    13. 13. Clustella Mechanism (1) <ul><li>One beacon in radius R d </li></ul><ul><li>Beacon known in radius R b </li></ul><ul><li>Flooding radius R </li></ul><ul><li>Beacons append their ID to all packets ( piggy-back ) </li></ul><ul><li>If packet expires before, other peers (here: π ‘‘) forward beacon information </li></ul><ul><li>Entire Rb neighborhood will know beacon π ‘ </li></ul><ul><li>Peers try to connect to peers which have no beacons in common! </li></ul>
    14. 14. Clustella Mechanism (2) <ul><li>Edges are undirected </li></ul><ul><li>All peers have degree d or d+1 </li></ul><ul><li>If connection is accepted if own degree is d or smaller; otherwise, a neighbor may have an open slot, or a connection is broken down </li></ul><ul><li>Invariant quickly reestablished! </li></ul><ul><li>Neighbors of existing neighbor are also good candidates , as they are located in the same network region. </li></ul>
    15. 15. Two Challenges <ul><li>Evaluation of current neighbors </li></ul><ul><ul><li>Existing neighbors are always in the same network region </li></ul></ul><ul><ul><li>Evaluating their quality and comparing them to alternative neighbors is difficult </li></ul></ul><ul><ul><li>Include routes in packets ! Exclude beacons known from a neighbor only </li></ul></ul><ul><li>Dynamics </li></ul><ul><ul><li>Clustella must be robust to churn, i.e., frequent joins and leaves </li></ul></ul><ul><ul><li>E.g., node crash : Clustella peer p stores some neighbors for each of its neighbors q ; these neighbors are good candidates as they are in the same network region as q </li></ul></ul>
    16. 16. Evaluation <ul><li>Simulation of three different neighbor selection strategies </li></ul><ul><ul><li>Gnutella-like (unfair?): Peers join at some well-known entry point and ask for their neighbors‘ neighbors until they reach full degree </li></ul></ul><ul><ul><li>Random walk (more interesting?): Peers find new peers by a random walk of length L </li></ul></ul><ul><ul><li>Clustella : Peers find new neighbors by exploring the network using a walk of length L and by taking beacon information into account </li></ul></ul><ul><li>Results </li></ul><ul><ul><li>Gnutella-like topologies result in very inefficient flooding operations </li></ul></ul><ul><ul><li>Clustella yields higher flood coverage than random walk </li></ul></ul>
    17. 17. Future Work <ul><li>Hierarchical clustering (beacons with different radii) </li></ul><ul><li>- Already a small hierarchy can yield better flood coverage </li></ul><ul><li>- However, maintenance of hierarchy can be expensive under churn! </li></ul><ul><li>- Moreover, fairness must be guaranteed: High-level beacon peers should not </li></ul><ul><li> have more work to do! </li></ul><ul><li>Smaller messages </li></ul><ul><ul><li>Reducing the message sizes for large radii is important! </li></ul></ul><ul><ul><li>Idea: Use of Bloom filters instead of sending beacon IDs directly </li></ul></ul>
    18. 18. Conclusion <ul><li>We believe that structuring topologies can be benefitial to peer-to-peer systems! </li></ul><ul><li>Clustering with beacons is simple and probably also useful in other applications, e.g., in music graph </li></ul><ul><li>Implementation must ensure fairness and use small message sizes. </li></ul><ul><li>A good choice of parameters important for both efficiency and stability. </li></ul><ul><li>Incorporation into Gnutella ?? </li></ul>
    19. 19. Thank you. Thank you for your interest.