More Related Content


02 - Topologies of Distributed Systems

  1. Topologies of Distributed Systems CS4262 Distributed Systems Dilum Bandara
  2. Outline  Architectural styles  Layered architectures  Object-based architectures  Data-centered architectures  Event-based architectures  System architectures  Client-server  Peer-to-peer  Unstructured  Structured 2
  3. Architecture  Dictionary definitions  Manner of construction of something & disposition of its parts  Design, the way components fits together  Defines  What are the components of the system?  How are they connected to each other?  How do they communicate? 3
  4. Architectural Styles  Layered architectures  Object-based architectures  Data-centered architectures  Event-based architectures  Hybrid architectures combine multiples of these architecture styles  Some real-world systems are like this  e.g., P2P file transfer, networks of sensors 4
  5. Layered Architectures  Well defined layers  Control typically flows from layer-to-layer  Better results through cross-layer coordination  Requests go down while results go up  e.g., OSI model, some P2P systems 5 Application – Tier 2 File sharing, streaming, VoIP, P2P clouds Application – Tier 1 Indexing/DHT, Caching, replication, access control, reputation, trust Overlay Unstructured, structured, & hybrid Gnutella, Chord, Kademlia, CAN Underlay Internet, Ethernet, Wi-Fi, Bluetooth Request Response
  6. Object-Based Architectures  Looser organization of objects  Communication through Remote Procedure Calls (RPC)  e.g., Java RMI, Web services, REST 6 Source:
  7. Data-Centered Architectures  Components communicate through a common repository  Can be passive or active  e.g., distributed file systems, producer- consumer, web-based data services 7 Source: 10/02/11/distributed-computing-architectures/
  8. Event-Based Architectures  Propagation of events  Occasionally carry data  Components are loosely coupled  e.g., publisher/subscriber, ESB, 8 Source: 10/02/11/distributed-computing-architectures/
  9. Enterprise Service Bus (ESB) 9 Source:
  10. System-Level Architectures  Client-server  Peer-to-peer  Hybrid architectures  Some real-world systems are like this  e.g., P2P file transfer, Google File System, Amazon Dynamo 10
  11. Client-Server  Clients request services from a server  Request-reply communication  Multiple servers for resilience & load balancing  Pros  Easier to build & maintain  Cons  Less scalable  Single point of failure  e.g., web, NFS, MapReduce 11 Source: /mapreduce_vs_data_warehouse
  12. Multi-Tiered Architecture 12 Source:  Increased reliability  Increased scalability
  13. Modern Web Applications 13 Source:
  14. Peer-to-Peer  Distributed systems without any central control  Autonomous peers  Equivalent in functionality/privileges  Both a client & a server  Protocol features  Network overlaid on top of Internet  Protocol constructed at application layer  Supports some type of message routing capability  Typically peers have unique IDs  Fairness & performance  Self-scaling  Peer churn 14 Internet
  15. P2P Characteristics  Tremendous scalability  Millions of peers  Globally distributed  Many concurrent connections  Bandwidth intensive  Aggressive/unfair bandwidth utilization  Heterogeneous  Superpeers  Critical for performance/functionality 15 Internet
  16. P2P Overlay  Peers directly talk to each other  If they aren’t directly connected, uses overlay routing via other peers  Peers are autonomous  Determines its own capabilities based on its resources  Decides on its own when to join, leave  Overlay is scalable & resilient 16 Internet
  17. Terminology  Application  Tier 2 – Services provided to end users  Tier 1 – Middleware services  Overlay  How peers are connected  Application layer network  e.g., dial-up on top of telephone network, BGP, PlanetLab, CDNs  Underlay  Internet, Bluetooth  Peers implement top 3 layers  This layering is an over simplification 17 Application – Tier 2 File sharing, streaming, VoIP, P2P clouds Application – Tier 1 Indexing/DHT, Caching, replication, access control, reputation, trust Overlay Unstructured, structured, & hybrid Gnutella, Chord, Kademlia, CAN Underlay Internet, Ethernet, Wi-Fi, Bluetooth
  18. Overlay Connectivity 18 P2P Overlay Unstructured Deterministic Napster BitTorrent JXTA Nondeterministic Gnutella KaZaA Structured Sub-linear state Chord Kademlia CAN Pastry Tapestry Dynamo Constant state Viceroy Cycloid Hybrid Structella Kelip Local minima search
  19. Bootstrapping  How is an initial overlay is formed from a set of nodes?  Use some known information  Use a well-known server to register initial set of peers  Well-known domain name  Dynamic DNS  Some peer addresses are well known  Use a local broadcast to collect nearby peers, & merge such sets to form larger sets 19
  20. How to Bootstrap  Each peer maintains a random subset of peers  Peers in Skype maintain a cache of superpeers  In BitTorrent peers talk to trackers  An incoming peer talks to 1+ known peers  A known peer accepting an incoming peer  Keeps track of incoming peer  May redirect incoming peer to another peer  Give a random set of peers to contact  Discover more peers by random walk, gossiping, or deterministic walk within overlay 20
  21. Options for Indexing Resources 21 Centralized O(1) Fast lookup Single point of failure Unstructured O(hopsmax) Easy network maintenance Not guaranteed to find resources Distribute Hash Table (DHT) O(log N) Guaranteed performance Not for dynamic systems Superpeer O(hopsmax) Better scalability Not guaranteed to find resources
  22. Centralized  Centralized database for lookup  Guaranteed discovery  Low overhead  Single point of failure  Easy to track  Legal issues  e.g., Napster  File transfer directly between peers 22
  23. Unstructured  Fully distributed  Random connections  Initial entry point is known  Peers maintain dynamic list of neighbors  Connections to multiple peers  Highly resilient to node failures  e.g., Gnutella 23
  24. Unstructured P2P (Cont.)  Flooding-based search  Guaranteed discovery  Implosion  High overhead  Expanding-ring flooding  TTL-based random walk  Discovery isn’t guaranteed  Better performance by biasing random walk toward nodes with higher degree  If response follow same path  Anonymity  e.g., KaZaA, BearShare, LimeWire, McAfee 24 D S D s Flooding Random walk
  25. Superpeers  Resource rich peers  Superpeers  Bandwidth, reliability, trust, memory, CPU, etc.  Flooding or random walk  Only superpeers are involved  Lower overhead  More scalable  Discovery isn’t guaranteed  Better performance when superpeers share list of resources/services  e.g., Gnutella v0.6, FastTrack, Freenet KaZaA, Skype 25 s D
  26. Example – BitTorrent  Most popular P2P file sharing system to date  Features  Centralized search  Multiple downloads  Enforce fairness  Rarest-first dissemination  Incentives  Better contribution  Better download speeds (not always)  Enable content delivery networks  Revenue through ads on search engines 26 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks
  27. BitTorrent Protocol  Content owner creates a .torrent file  File name, length, hash, list of trackers  Place .torrent file on a server  Publish URL of .torrent file to a web site  Torrent search engine  .torrent file points to a tracker(s)  Registry of leaches & seeds for a given file 27 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks 1 2 3 4 1 2 3 4
  28. BitTorrent Protocol (cont.)  Tracker  Provide a random subset of peers sharing same file  Peer contacts subset of peers parallely  Files are shared based on chunk IDs  Chunk – segment of file  Periodically ask tracker for a new set of IPs  E.g., every 15 min  Pick peers with highest upload rate 28 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks 1 2 3 4 1 2 3 4
  29. Summary – Unstructured P2P  Separate resource/service discovery & delivery  Resource/service discovery is mostly outside of P2P overlay  Centralized solutions  Not scalable  Affect resource/service delivery when failed  Distributed solutions  High overhead  May not locate the resource/service  No predictable performance  Delay or message bounds  Lack of QoS or QoE 29
  30. Terminology  Hash function  Converts a large amount of data into a small datum  Hash table  Data structure that uses hashing to index content  Distributed Hash Table (DHT)  A hash table that is distributed  Types of hashing  Consistent or random  Locality preserving 30 f() f() f() g() g() g()
  31. Structured P2P  Deterministic approach to locate resources, services, & peers  Resources/services expressed as a (key, value) pair  Unique key  Hash of file name, metadata, or actual content  128-bit or higher  Peers also have a key  Random bit string or IP address  Index keys on a Distributed Hash Table (DHT)  Distributed address space [0, 2m – 1]  Locate peer(s) responsible for a given key  Deterministic overlay to publish & locate content  Bounded performance under standard conditions, typically O(log n) 31
  32. Structured P2P – Example  2 operations  store(key, value)  locate(key) 32 Ring – 16 addresses Song.mp3 Cars.mpeg f() f() Find Cars.mpeg n + 2i – 1, 1  i  m Successor 11 Song.mp3 6 Cars.mpeg O(log N) hops
  33. Chord  Key space arranged as a ring  Peers responsible for segment of the ring  Called successor of a key  1st peer in clockwise direction  Routing table  Keep a pointer (finger) to m peers  Keep a finger to (2i – 1)-th peer, 1 ≤ i ≤ m  Key resolution  Go to peer with the closest key  Recursively continue until key is find  Can be located within O(log n) hops 33 m =3-bit key ring Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  34. Chord (Cont.)  New peer entering overlay  Takes keys from the successor  Peer leaving overlay  Give keys to the successor  Fingers are updated as peers join & leave  Peer failure or churn makes finger table entries stale 34 New peer with key 6 joins the overlay Peer with key 1 leave the overlay Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  35. Chord Performance  Path length  Worst case O(log N)  Average ½log2N  Updates O(log2 N)  Fingers O(log N)  Alternative paths (log N)!  Balanced distribution of keys  Under uniform distribution  N(log N) virtual nodes provides best load distribution 35 Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  36. Structured P2P – Other Solutions  Kademlia  Used in BitTorrent, eMule, aMule, & AZUREUS  Distance between 2 keys is determined by XOR  Routing in the ring is bidirectional  dist(a  b) = dist(b  a)  Enable nodes to learn about new nodes from received messages  Content-Addressable Network (CAN)  Based on a d-Torus  Pastry  Based on a Hypercube  Cycloid  Based on a cube connected cycle 36
  37. Other Well-Known Solutions 37
  38. Summary – Structured P2P  Resource/service discovery is within P2P overlay  Deterministic performance  Chord  Unidirectional routing  Recursive routing  Peer churn & failure is an issue  Issues  MySong.mp3 is not same as mysong.mp3  High churn  Unbalanced distribution of keys & load 38
  39. Structured vs. Unstructured 39 Unstructured P2P Structured P2P Overlay construction High flexibility Low flexibility Resources Indexed locally Indexed remotely on a distributed hash table Query messages Broadcast or random walk Unicast Content location Best effort Guaranteed Performance Unpredictable Predictable bounds Overhead High Relatively low Object types Mutable, with many complex attributes Immutable, with few simple attributes Peer churn & failure Supports high failure rates Supports moderate failure rates Applicable environments Small-scale or highly dynamic, e.g., mobile P2P Large-scale & relatively stable, e.g., desktop file sharing Examples Gnutella, LimeWire, KaZaA, BitTorrent Chord, CAN, Pastry, eMule, BitTorrent
  40. Example – Amazon Dynamo  Highly-available key-value system  Many large datasets/objects that only require primary key access  Shopping carts, best seller lists, customer preferences, product catalogs, etc.  Relational databases aren’t required, too slow, or bulky  Fast reads, high availability for writes  Always failing servers, disks, switches 40
  41. Amazon Dynamo (Cont.)  Objects are replicated in successors  All peers know about each other using gossiping  Can read/write to any replica  Mechanisms to deal with different versions of objects 41
  42. Amazon Dynamo (Cont.) 42 G. DeCandia et al., "Dynamo: amazon's highly available key-value store," In ACM SIGOPS operating systems review, Vol. 41, No. 6, Oct. 2007.

Editor's Notes

  1. Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.
  2. P2P is a good example to teach distributed systems as they use many ideas of Distributed Systems
  3. Example overlays – dialup, BGP, PlanetLab, CDNs
  4. Application layer is divided to 2 sub-layers (tier 1 may also be referred to as middleware layer) It’s incorrect to assume DHT as the overlay – it’s only an index implemented on top of the overlay. E.g., Chord finger table form overlay & what’s indexed at a node form DHT
  5. Hybrid – structured topology but unstructured communication. E.g., for chord overlay & then use it to broadcast to all the node efficiently. Lee’s best peer selection. Radars are connected as structured P2P because data fusion group is known in advance. Where as best peers for current data fusion is found using broadcast Local minima search – each node has an ID. Resources are indexed in the local node with the closest ID (local minima). When routing first do a random walk then do a deterministic walk looking for the local minima We’ll not talk about hybrid designs in detail
  6. How is the initial P2P network formed from nodes ?
  7. Before specifics – this slide gives an overview Colored circles can be interpreted as different files Bound give the lookup/query cost
  8. (Lookup – process of finding where the content is) Napster is not the first P2P system, but demonstrated the potential (1999) Before that, many organizations (including Intel) used some kind of P2P application(s) to aggregate idle computing power in there machines Inspired many modern P2P systems, very popular, later a test case for many legal issues Uses central server for storing and searching the directory of files (hence not a full P2P system as many subsequent systems were) Step 1 - Peers report their list of files to centralized database Step 2 - users query central database Sep 3 – file is directly download from a peer that have it (no multiple/parallel downloads)
  9. First full P2P filesharing system. Earliest versions (through V0.4) used unstructured overlay with flooding for queries Due to need for scalability (V0.6 and higher) adopted a superpeer architecture. High-capacity peers are super peers, and all queries are routed using a flooding mechanism among superpeers.
  10. FLOODING & Random Walk: Flooding: Implosion – same node getting multiple messages for same query Both: scalability, RW – may not find obj Enhancements: TTL – time to live Expanding ring flooding – first flood to k-hops, if no result flood k+1 hops, if no response then try k+ 2 hops, similarly continue Random walk: – Query failure determined by a timeout or explicit failure message from last node. Several random walk queries may be issued in parallel as well. Additional techniques in UP2P: Overlay topology a) how do decide on peers (number, who to connect to or retain, etc.); base decision on capacity of peers, type of content, connectivity or peer, etc.; b) clustering – ex. Clusters formed when probl of two nodes being connected is higher if they have a common neighbor; - Prefer connectivity to high-degree nodes, with shared contents, peers with objects closer in key space, etc. Object placement – selection of nodes where an object is placed. E.g., base it on popularity, routing mechanism, etc. - distribute replica’s of popular objects (explicit push, caching) Caching - can inform random walk on what objects are nearby (cache summary information about contents of neighbors etc.) Query forwarding criteria Misc: McAfee use P2P to update virus definition within a local network
  11. Superpeers are selected based on Bandwidth, reliability, trust, memory, CPU Gnutella V0.6 and higher, FastTrack a proprietary system: FastTrack: Another P2P system around same time as Gnutella. Used by no of clients such as KaZaA, Grokster and Imesh. Proprietary system using an encrypted protocol high-capacity nodes are supernodes (SN), and low-capacity nodes are ordinary nodes (ON); Each SN maintains connections to 40-50 other SNs (in a network of ~3M nodes and ~30K SNs - practical numbers) and 50-80 ONs . Each ON connects to one SN; SN provides ON with a list of other SNs which ON caches; after ON issues a query and a SN responds, ON disconnects from current SN and attaches to a new SN from list. ON now receieves a new SN list which it merges with its list. Average SN-SN connection ~35 mins, SN-ON connection ~10 mins (~30% <30 secs) These changes (also in connectivity) help balance load in the network, improving locality, and connection shuffling that increases long range coverage. It also makes tracking peer transfers difficult. Freenet: (Open Source) Proposed in 1999 – P2P file-sharing system - contains security, anonymity and deniability features Objects and peers have identifiers – aka routing keys (created using a hash function). Each peer – a fixed sized routing table (containing keys of peers); mesh; Requests forwarded to peers with closest matching routing key. If request fails, it tries again with peer with next closest routing key. (Algorithm- steepest accent hill climbing with back tracking until TTL expires) -Also caches objects along the return path to reduce failure. FastFreenet: improves the hit-rate by: Peers share a fuzzy description of files it has with neighbors, which allows nodes to forward query to peers likely to have the object. Fuzzy description- an N bit number where each bit corresponds to 1/N segment of the key space.
  12. Users – Better contribution  better download speeds (not always) Content providers – Enable content delivery networks 3rd parties - Revenue through ads on search engines Guaranteed to find content because of centralized search
  13. Trackers can be contacted using TCP, UDP, or HTTP
  14. Unstructured P2P – easy to implement, inefficient routing, inability to locate rare objects. Gradual changes – e.g. clustering, near/rar links, semantic links etc. to improve efficiency. SP2P takes this one step further. QoE – quality of experience
  15. Structured overlay - designs overlays with routing mechanisms that are deterministic, and allowing for location of any objects (in bounded time). P2P supports key based routing - object identifiers are mapped to peer identifier address space, and object requested routed to the nearest peer in P2P address space Goal – Distributed object location and routing (DOLR). A specific scheme is DHT (distributed hash table)
  16. M – key length in bits Original paper use iterative routing (s go to x, x inform y to s, s go to y, y inform z, s go to z …) to implement recursive routing
  17. Fig 2 – 10,000 nodes 1,000,000 keys Virtual node – 1 physical node acting as multiple nodes distributed across the ring Ideally 1 physical node should represent log N virtual nodes
  18. Conceptually chord is recursive – but actual implementation in paper uses iterative routing (no difference in performance just increase hop count)