Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

02 - Topologies of Distributed Systems

421 views

Published on

Architectural styles of distributed systems and application-level peer-to-peer (P2P) topologies such as unstructure and structured.

Published in: Engineering
  • Be the first to comment

02 - Topologies of Distributed Systems

  1. 1. Topologies of Distributed Systems CS4262 Distributed Systems Dilum Bandara Dilum.Bandara@uom.lk
  2. 2. Outline  Architectural styles  Layered architectures  Object-based architectures  Data-centered architectures  Event-based architectures  System architectures  Client-server  Peer-to-peer  Unstructured  Structured 2
  3. 3. Architecture  Dictionary definitions  Manner of construction of something & disposition of its parts  Design, the way components fits together  Defines  What are the components of the system?  How are they connected to each other?  How do they communicate? 3
  4. 4. Architectural Styles  Layered architectures  Object-based architectures  Data-centered architectures  Event-based architectures  Hybrid architectures combine multiples of these architecture styles  Some real-world systems are like this  e.g., P2P file transfer, networks of sensors 4
  5. 5. Layered Architectures  Well defined layers  Control typically flows from layer-to-layer  Better results through cross-layer coordination  Requests go down while results go up  e.g., OSI model, some P2P systems 5 Application – Tier 2 File sharing, streaming, VoIP, P2P clouds Application – Tier 1 Indexing/DHT, Caching, replication, access control, reputation, trust Overlay Unstructured, structured, & hybrid Gnutella, Chord, Kademlia, CAN Underlay Internet, Ethernet, Wi-Fi, Bluetooth Request Response
  6. 6. Object-Based Architectures  Looser organization of objects  Communication through Remote Procedure Calls (RPC)  e.g., Java RMI, Web services, REST 6 Source: http://computersciencesource.wordpress.com/2010/02/11/distributed-computing-architectures/
  7. 7. Data-Centered Architectures  Components communicate through a common repository  Can be passive or active  e.g., distributed file systems, producer- consumer, web-based data services 7 Source: http://computersciencesource.wordpress.com/20 10/02/11/distributed-computing-architectures/
  8. 8. Event-Based Architectures  Propagation of events  Occasionally carry data  Components are loosely coupled  e.g., publisher/subscriber, ESB, akka.io 8 Source: http://computersciencesource.wordpress.com/20 10/02/11/distributed-computing-architectures/
  9. 9. Enterprise Service Bus (ESB) 9 Source: www.fiorano.com/products/ESB-enterprise-service-bus/Fiorano-ESB-enterprise-service-bus.php
  10. 10. System-Level Architectures  Client-server  Peer-to-peer  Hybrid architectures  Some real-world systems are like this  e.g., P2P file transfer, Google File System, Amazon Dynamo 10
  11. 11. Client-Server  Clients request services from a server  Request-reply communication  Multiple servers for resilience & load balancing  Pros  Easier to build & maintain  Cons  Less scalable  Single point of failure  e.g., web, NFS, MapReduce 11 Source: www.cbsolution.net/techniques/ontarget /mapreduce_vs_data_warehouse
  12. 12. Multi-Tiered Architecture 12 Source: http://en.kioskea.net/contents/151-networking-3-tier-client-server-architecture  Increased reliability  Increased scalability
  13. 13. Modern Web Applications 13 Source: www.css-cloud.com/solutions/web-application-hosting-in-the-cloud.php
  14. 14. Peer-to-Peer  Distributed systems without any central control  Autonomous peers  Equivalent in functionality/privileges  Both a client & a server  Protocol features  Network overlaid on top of Internet  Protocol constructed at application layer  Supports some type of message routing capability  Typically peers have unique IDs  Fairness & performance  Self-scaling  Peer churn 14 Internet
  15. 15. P2P Characteristics  Tremendous scalability  Millions of peers  Globally distributed  Many concurrent connections  Bandwidth intensive  Aggressive/unfair bandwidth utilization  Heterogeneous  Superpeers  Critical for performance/functionality 15 Internet
  16. 16. P2P Overlay  Peers directly talk to each other  If they aren’t directly connected, uses overlay routing via other peers  Peers are autonomous  Determines its own capabilities based on its resources  Decides on its own when to join, leave  Overlay is scalable & resilient 16 Internet
  17. 17. Terminology  Application  Tier 2 – Services provided to end users  Tier 1 – Middleware services  Overlay  How peers are connected  Application layer network  e.g., dial-up on top of telephone network, BGP, PlanetLab, CDNs  Underlay  Internet, Bluetooth  Peers implement top 3 layers  This layering is an over simplification 17 Application – Tier 2 File sharing, streaming, VoIP, P2P clouds Application – Tier 1 Indexing/DHT, Caching, replication, access control, reputation, trust Overlay Unstructured, structured, & hybrid Gnutella, Chord, Kademlia, CAN Underlay Internet, Ethernet, Wi-Fi, Bluetooth
  18. 18. Overlay Connectivity 18 P2P Overlay Unstructured Deterministic Napster BitTorrent JXTA Nondeterministic Gnutella KaZaA Structured Sub-linear state Chord Kademlia CAN Pastry Tapestry Dynamo Constant state Viceroy Cycloid Hybrid Structella Kelip Local minima search
  19. 19. Bootstrapping  How is an initial overlay is formed from a set of nodes?  Use some known information  Use a well-known server to register initial set of peers  Well-known domain name  Dynamic DNS  Some peer addresses are well known  Use a local broadcast to collect nearby peers, & merge such sets to form larger sets 19
  20. 20. How to Bootstrap  Each peer maintains a random subset of peers  Peers in Skype maintain a cache of superpeers  In BitTorrent peers talk to trackers  An incoming peer talks to 1+ known peers  A known peer accepting an incoming peer  Keeps track of incoming peer  May redirect incoming peer to another peer  Give a random set of peers to contact  Discover more peers by random walk, gossiping, or deterministic walk within overlay 20
  21. 21. Options for Indexing Resources 21 Centralized O(1) Fast lookup Single point of failure Unstructured O(hopsmax) Easy network maintenance Not guaranteed to find resources Distribute Hash Table (DHT) O(log N) Guaranteed performance Not for dynamic systems Superpeer O(hopsmax) Better scalability Not guaranteed to find resources
  22. 22. Centralized  Centralized database for lookup  Guaranteed discovery  Low overhead  Single point of failure  Easy to track  Legal issues  e.g., Napster  File transfer directly between peers 22
  23. 23. Unstructured  Fully distributed  Random connections  Initial entry point is known  Peers maintain dynamic list of neighbors  Connections to multiple peers  Highly resilient to node failures  e.g., Gnutella 23
  24. 24. Unstructured P2P (Cont.)  Flooding-based search  Guaranteed discovery  Implosion  High overhead  Expanding-ring flooding  TTL-based random walk  Discovery isn’t guaranteed  Better performance by biasing random walk toward nodes with higher degree  If response follow same path  Anonymity  e.g., KaZaA, BearShare, LimeWire, McAfee 24 D S D s Flooding Random walk
  25. 25. Superpeers  Resource rich peers  Superpeers  Bandwidth, reliability, trust, memory, CPU, etc.  Flooding or random walk  Only superpeers are involved  Lower overhead  More scalable  Discovery isn’t guaranteed  Better performance when superpeers share list of resources/services  e.g., Gnutella v0.6, FastTrack, Freenet KaZaA, Skype 25 s D
  26. 26. Example – BitTorrent  Most popular P2P file sharing system to date  Features  Centralized search  Multiple downloads  Enforce fairness  Rarest-first dissemination  Incentives  Better contribution  Better download speeds (not always)  Enable content delivery networks  Revenue through ads on search engines 26 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks
  27. 27. BitTorrent Protocol  Content owner creates a .torrent file  File name, length, hash, list of trackers  Place .torrent file on a server  Publish URL of .torrent file to a web site  Torrent search engine  .torrent file points to a tracker(s)  Registry of leaches & seeds for a given file 27 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks 1 2 3 4 1 2 3 4
  28. 28. BitTorrent Protocol (cont.)  Tracker  Provide a random subset of peers sharing same file  Peer contacts subset of peers parallely  Files are shared based on chunk IDs  Chunk – segment of file  Periodically ask tracker for a new set of IPs  E.g., every 15 min  Pick peers with highest upload rate 28 User Trackers Web-based search engine Content owner Keyword search .torrent file server Download .torrent file Get list of peers Download/ upload chunks 1 2 3 4 1 2 3 4
  29. 29. Summary – Unstructured P2P  Separate resource/service discovery & delivery  Resource/service discovery is mostly outside of P2P overlay  Centralized solutions  Not scalable  Affect resource/service delivery when failed  Distributed solutions  High overhead  May not locate the resource/service  No predictable performance  Delay or message bounds  Lack of QoS or QoE 29
  30. 30. Terminology  Hash function  Converts a large amount of data into a small datum  Hash table  Data structure that uses hashing to index content  Distributed Hash Table (DHT)  A hash table that is distributed  Types of hashing  Consistent or random  Locality preserving 30 f() f() f() g() g() g()
  31. 31. Structured P2P  Deterministic approach to locate resources, services, & peers  Resources/services expressed as a (key, value) pair  Unique key  Hash of file name, metadata, or actual content  128-bit or higher  Peers also have a key  Random bit string or IP address  Index keys on a Distributed Hash Table (DHT)  Distributed address space [0, 2m – 1]  Locate peer(s) responsible for a given key  Deterministic overlay to publish & locate content  Bounded performance under standard conditions, typically O(log n) 31
  32. 32. Structured P2P – Example  2 operations  store(key, value)  locate(key) 32 Ring – 16 addresses Song.mp3 Cars.mpeg f() f() Find Cars.mpeg n + 2i – 1, 1  i  m Successor 11 Song.mp3 6 Cars.mpeg O(log N) hops
  33. 33. Chord  Key space arranged as a ring  Peers responsible for segment of the ring  Called successor of a key  1st peer in clockwise direction  Routing table  Keep a pointer (finger) to m peers  Keep a finger to (2i – 1)-th peer, 1 ≤ i ≤ m  Key resolution  Go to peer with the closest key  Recursively continue until key is find  Can be located within O(log n) hops 33 m =3-bit key ring Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  34. 34. Chord (Cont.)  New peer entering overlay  Takes keys from the successor  Peer leaving overlay  Give keys to the successor  Fingers are updated as peers join & leave  Peer failure or churn makes finger table entries stale 34 New peer with key 6 joins the overlay Peer with key 1 leave the overlay Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  35. 35. Chord Performance  Path length  Worst case O(log N)  Average ½log2N  Updates O(log2 N)  Fingers O(log N)  Alternative paths (log N)!  Balanced distribution of keys  Under uniform distribution  N(log N) virtual nodes provides best load distribution 35 Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
  36. 36. Structured P2P – Other Solutions  Kademlia  Used in BitTorrent, eMule, aMule, & AZUREUS  Distance between 2 keys is determined by XOR  Routing in the ring is bidirectional  dist(a  b) = dist(b  a)  Enable nodes to learn about new nodes from received messages  Content-Addressable Network (CAN)  Based on a d-Torus  Pastry  Based on a Hypercube  Cycloid  Based on a cube connected cycle 36
  37. 37. Other Well-Known Solutions 37
  38. 38. Summary – Structured P2P  Resource/service discovery is within P2P overlay  Deterministic performance  Chord  Unidirectional routing  Recursive routing  Peer churn & failure is an issue  Issues  MySong.mp3 is not same as mysong.mp3  High churn  Unbalanced distribution of keys & load 38
  39. 39. Structured vs. Unstructured 39 Unstructured P2P Structured P2P Overlay construction High flexibility Low flexibility Resources Indexed locally Indexed remotely on a distributed hash table Query messages Broadcast or random walk Unicast Content location Best effort Guaranteed Performance Unpredictable Predictable bounds Overhead High Relatively low Object types Mutable, with many complex attributes Immutable, with few simple attributes Peer churn & failure Supports high failure rates Supports moderate failure rates Applicable environments Small-scale or highly dynamic, e.g., mobile P2P Large-scale & relatively stable, e.g., desktop file sharing Examples Gnutella, LimeWire, KaZaA, BitTorrent Chord, CAN, Pastry, eMule, BitTorrent
  40. 40. Example – Amazon Dynamo  Highly-available key-value system  Many large datasets/objects that only require primary key access  Shopping carts, best seller lists, customer preferences, product catalogs, etc.  Relational databases aren’t required, too slow, or bulky  Fast reads, high availability for writes  Always failing servers, disks, switches 40
  41. 41. Amazon Dynamo (Cont.)  Objects are replicated in successors  All peers know about each other using gossiping  Can read/write to any replica  Mechanisms to deal with different versions of objects 41
  42. 42. Amazon Dynamo (Cont.) 42 G. DeCandia et al., "Dynamo: amazon's highly available key-value store," In ACM SIGOPS operating systems review, Vol. 41, No. 6, Oct. 2007.

×