Architecture
Dictionary definitions
Manner of construction of something & disposition of
its parts
Design, the way components fits together
Defines
What are the components of the system?
How are they connected to each other?
How do they communicate?
3
Architectural Styles
Layered architectures
Object-based architectures
Data-centered architectures
Event-based architectures
Hybrid architectures combine multiples of these
architecture styles
Some real-world systems are like this
e.g., P2P file transfer, networks of sensors
4
Layered Architectures
Well defined layers
Control typically flows
from layer-to-layer
Better results through
cross-layer coordination
Requests go down
while results go up
e.g., OSI model, some
P2P systems
5
Application – Tier 2
File sharing, streaming, VoIP, P2P clouds
Application – Tier 1
Indexing/DHT, Caching, replication, access
control, reputation, trust
Overlay
Unstructured, structured, & hybrid
Gnutella, Chord, Kademlia, CAN
Underlay
Internet, Ethernet, Wi-Fi, Bluetooth
Request
Response
Object-Based Architectures
Looser organization of objects
Communication through Remote Procedure
Calls (RPC)
e.g., Java RMI, Web services, REST
6
Source: http://computersciencesource.wordpress.com/2010/02/11/distributed-computing-architectures/
Data-Centered Architectures
Components
communicate through a
common repository
Can be passive or active
e.g., distributed file
systems, producer-
consumer, web-based
data services
7
Source:
http://computersciencesource.wordpress.com/20
10/02/11/distributed-computing-architectures/
Event-Based Architectures
Propagation of events
Occasionally carry data
Components are loosely coupled
e.g., publisher/subscriber, ESB, akka.io
8
Source:
http://computersciencesource.wordpress.com/20
10/02/11/distributed-computing-architectures/
Enterprise Service Bus (ESB)
9
Source: www.fiorano.com/products/ESB-enterprise-service-bus/Fiorano-ESB-enterprise-service-bus.php
System-Level Architectures
Client-server
Peer-to-peer
Hybrid architectures
Some real-world systems are like this
e.g., P2P file transfer, Google File System, Amazon
Dynamo
10
Client-Server
Clients request services from a server
Request-reply communication
Multiple servers for resilience & load balancing
Pros
Easier to build & maintain
Cons
Less scalable
Single point of failure
e.g., web, NFS, MapReduce
11
Source:
www.cbsolution.net/techniques/ontarget
/mapreduce_vs_data_warehouse
Peer-to-Peer
Distributed systems without any central control
Autonomous peers
Equivalent in functionality/privileges
Both a client & a server
Protocol features
Network overlaid on top of Internet
Protocol constructed at application layer
Supports some type of message routing capability
Typically peers have unique IDs
Fairness & performance
Self-scaling
Peer churn
14
Internet
P2P Characteristics
Tremendous scalability
Millions of peers
Globally distributed
Many concurrent connections
Bandwidth intensive
Aggressive/unfair bandwidth utilization
Heterogeneous
Superpeers
Critical for performance/functionality
15
Internet
P2P Overlay
Peers directly talk to each other
If they aren’t directly connected, uses overlay routing via other
peers
Peers are autonomous
Determines its own capabilities based on its resources
Decides on its own when to join, leave
Overlay is scalable & resilient
16
Internet
Terminology
Application
Tier 2 – Services provided to end
users
Tier 1 – Middleware services
Overlay
How peers are connected
Application layer network
e.g., dial-up on top of telephone
network, BGP, PlanetLab, CDNs
Underlay
Internet, Bluetooth
Peers implement top 3 layers
This layering is an over
simplification 17
Application – Tier 2
File sharing, streaming, VoIP, P2P clouds
Application – Tier 1
Indexing/DHT, Caching, replication, access
control, reputation, trust
Overlay
Unstructured, structured, & hybrid
Gnutella, Chord, Kademlia, CAN
Underlay
Internet, Ethernet, Wi-Fi, Bluetooth
Bootstrapping
How is an initial overlay is formed from a set of
nodes?
Use some known information
Use a well-known server to register initial set of peers
Well-known domain name
Dynamic DNS
Some peer addresses are well known
Use a local broadcast to collect nearby peers, &
merge such sets to form larger sets
19
How to Bootstrap
Each peer maintains a random subset of peers
Peers in Skype maintain a cache of superpeers
In BitTorrent peers talk to trackers
An incoming peer talks to 1+ known peers
A known peer accepting an incoming peer
Keeps track of incoming peer
May redirect incoming peer to another peer
Give a random set of peers to contact
Discover more peers by random walk, gossiping,
or deterministic walk within overlay
20
Options for Indexing Resources
21
Centralized
O(1)
Fast lookup
Single point of failure
Unstructured
O(hopsmax)
Easy network maintenance
Not guaranteed to find resources
Distribute Hash Table (DHT)
O(log N)
Guaranteed performance
Not for dynamic systems
Superpeer
O(hopsmax)
Better scalability
Not guaranteed to find resources
Centralized
Centralized database for
lookup
Guaranteed discovery
Low overhead
Single point of failure
Easy to track
Legal issues
e.g., Napster
File transfer directly
between peers
22
Unstructured
Fully distributed
Random connections
Initial entry point is
known
Peers maintain dynamic
list of neighbors
Connections to multiple
peers
Highly resilient to node
failures
e.g., Gnutella
23
Unstructured P2P (Cont.)
Flooding-based search
Guaranteed discovery
Implosion High overhead
Expanding-ring flooding
TTL-based random walk
Discovery isn’t guaranteed
Better performance by biasing
random walk toward nodes with
higher degree
If response follow same path
Anonymity
e.g., KaZaA, BearShare,
LimeWire, McAfee
24
D
S
D
s
Flooding
Random walk
Superpeers
Resource rich peers
Superpeers
Bandwidth, reliability, trust,
memory, CPU, etc.
Flooding or random walk
Only superpeers are
involved
Lower overhead
More scalable
Discovery isn’t guaranteed
Better performance when
superpeers share list of
resources/services
e.g., Gnutella v0.6, FastTrack,
Freenet KaZaA, Skype
25
s D
Example – BitTorrent
Most popular P2P file sharing
system to date
Features
Centralized search
Multiple downloads
Enforce fairness
Rarest-first dissemination
Incentives
Better contribution Better
download speeds (not always)
Enable content delivery
networks
Revenue through ads on search
engines 26
User
Trackers
Web-based
search engine
Content
owner
Keyword search
.torrent file
server
Download
.torrent file
Get list of
peers
Download/
upload
chunks
BitTorrent Protocol
Content owner creates a
.torrent file
File name, length, hash,
list of trackers
Place .torrent file on a
server
Publish URL of .torrent
file to a web site
Torrent search engine
.torrent file points to a
tracker(s)
Registry of leaches &
seeds for a given file
27
User
Trackers
Web-based
search engine
Content
owner
Keyword search
.torrent file
server
Download
.torrent file
Get list of
peers
Download/
upload
chunks
1
2
3
4
1
2
3
4
BitTorrent Protocol (cont.)
Tracker
Provide a random subset
of peers sharing same file
Peer contacts subset of
peers parallely
Files are shared based
on chunk IDs
Chunk – segment of file
Periodically ask tracker
for a new set of IPs
E.g., every 15 min
Pick peers with highest
upload rate 28
User
Trackers
Web-based
search engine
Content
owner
Keyword search
.torrent file
server
Download
.torrent file
Get list of
peers
Download/
upload
chunks
1
2
3
4
1
2
3
4
Summary – Unstructured P2P
Separate resource/service discovery & delivery
Resource/service discovery is mostly outside of P2P overlay
Centralized solutions
Not scalable
Affect resource/service delivery when failed
Distributed solutions
High overhead
May not locate the resource/service
No predictable performance
Delay or message bounds
Lack of QoS or QoE
29
Terminology
Hash function
Converts a large amount of data into
a small datum
Hash table
Data structure that uses hashing to
index content
Distributed Hash Table (DHT)
A hash table that is distributed
Types of hashing
Consistent or random
Locality preserving
30
f()
f()
f() g()
g()
g()
Structured P2P
Deterministic approach to locate resources, services, &
peers
Resources/services expressed as a (key, value) pair
Unique key
Hash of file name, metadata, or actual content
128-bit or higher
Peers also have a key
Random bit string or IP address
Index keys on a Distributed Hash Table (DHT)
Distributed address space [0, 2m – 1]
Locate peer(s) responsible for a given key
Deterministic overlay to publish & locate content
Bounded performance under standard conditions, typically O(log n)
31
Structured P2P – Example
2 operations
store(key, value)
locate(key)
32
Ring – 16 addresses
Song.mp3
Cars.mpeg
f()
f()
Find Cars.mpeg
n + 2i – 1, 1 i m
Successor
11 Song.mp3
6 Cars.mpeg
O(log N) hops
Chord
Key space arranged as a ring
Peers responsible for segment of
the ring
Called successor of a key
1st peer in clockwise direction
Routing table
Keep a pointer (finger) to m peers
Keep a finger to (2i – 1)-th peer, 1 ≤ i ≤ m
Key resolution
Go to peer with the closest key
Recursively continue until key is find
Can be located within O(log n) hops
33
m =3-bit key ring
Stoica et al., "Chord: A scalable peer-to-peer lookup service for internet
applications," ACM SIGCOMM Computer Communication Review, 31(4), 149-160, 2001.
Chord (Cont.)
New peer entering overlay
Takes keys from the successor
Peer leaving overlay
Give keys to the successor
Fingers are updated as peers join & leave
Peer failure or churn makes finger table entries stale 34
New peer with key 6 joins the overlay Peer with key 1 leave the overlay
Stoica et al., "Chord: A scalable peer-to-peer
lookup service for internet applications," ACM
SIGCOMM Computer Communication Review,
31(4), 149-160, 2001.
Chord Performance
Path length
Worst case O(log N)
Average ½log2N
Updates O(log2 N)
Fingers O(log N)
Alternative paths (log N)!
Balanced distribution of
keys
Under uniform distribution
N(log N) virtual nodes
provides best load
distribution
35
Stoica et al., "Chord: A scalable peer-to-peer lookup service
for internet applications," ACM SIGCOMM Computer
Communication Review, 31(4), 149-160, 2001.
Structured P2P – Other Solutions
Kademlia
Used in BitTorrent, eMule, aMule, & AZUREUS
Distance between 2 keys is determined by XOR
Routing in the ring is bidirectional
dist(a b) = dist(b a)
Enable nodes to learn about new nodes from received messages
Content-Addressable Network (CAN)
Based on a d-Torus
Pastry
Based on a Hypercube
Cycloid
Based on a cube connected cycle 36
Summary – Structured P2P
Resource/service discovery is within P2P overlay
Deterministic performance
Chord
Unidirectional routing
Recursive routing
Peer churn & failure is an issue
Issues
MySong.mp3 is not same as mysong.mp3
High churn
Unbalanced distribution of keys & load
38
Structured vs. Unstructured
39
Unstructured P2P Structured P2P
Overlay
construction
High flexibility Low flexibility
Resources Indexed locally Indexed remotely on a distributed
hash table
Query messages Broadcast or random walk Unicast
Content location Best effort Guaranteed
Performance Unpredictable Predictable bounds
Overhead High Relatively low
Object types Mutable, with many complex
attributes
Immutable, with few simple
attributes
Peer churn &
failure
Supports high failure rates Supports moderate failure rates
Applicable
environments
Small-scale or highly dynamic, e.g.,
mobile P2P
Large-scale & relatively stable,
e.g., desktop file sharing
Examples Gnutella, LimeWire, KaZaA,
BitTorrent
Chord, CAN, Pastry, eMule,
BitTorrent
Example – Amazon Dynamo
Highly-available key-value system
Many large datasets/objects that only require primary
key access
Shopping carts, best seller lists, customer preferences,
product catalogs, etc.
Relational databases aren’t required, too slow, or
bulky
Fast reads, high availability for writes
Always failing servers, disks, switches
40
Amazon Dynamo (Cont.)
Objects are replicated in successors
All peers know about each other using gossiping
Can read/write to any replica
Mechanisms to deal with different versions of objects
41
Amazon Dynamo (Cont.)
42
G. DeCandia et al., "Dynamo: amazon's highly
available key-value store," In ACM SIGOPS
operating systems review, Vol. 41, No. 6, Oct. 2007.
Editor's Notes
Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.
P2P is a good example to teach distributed systems as they use many ideas of Distributed Systems
Example overlays – dialup, BGP, PlanetLab, CDNs
Application layer is divided to 2 sub-layers (tier 1 may also be referred to as middleware layer)
It’s incorrect to assume DHT as the overlay – it’s only an index implemented on top of the overlay.
E.g., Chord finger table form overlay & what’s indexed at a node form DHT
Hybrid – structured topology but unstructured communication.
E.g., for chord overlay & then use it to broadcast to all the node efficiently.
Lee’s best peer selection. Radars are connected as structured P2P because data fusion group is known in advance. Where as best peers for current data fusion is found using broadcast
Local minima search – each node has an ID. Resources are indexed in the local node with the closest ID (local minima). When routing first do a random walk then do a deterministic walk looking for the local minima
We’ll not talk about hybrid designs in detail
How is the initial P2P network formed from nodes ?
Before specifics – this slide gives an overview
Colored circles can be interpreted as different files
Bound give the lookup/query cost
(Lookup – process of finding where the content is)
Napster is not the first P2P system, but demonstrated the potential (1999)
Before that, many organizations (including Intel) used some kind of P2P application(s) to aggregate idle computing power in there machines
Inspired many modern P2P systems, very popular, later a test case for many legal issues
Uses central server for storing and searching the directory of files (hence not a full P2P system as many subsequent systems were)
Step 1 - Peers report their list of files to centralized database
Step 2 - users query central database
Sep 3 – file is directly download from a peer that have it (no multiple/parallel downloads)
First full P2P filesharing system.
Earliest versions (through V0.4) used unstructured overlay with flooding for queries
Due to need for scalability (V0.6 and higher) adopted a superpeer architecture.
High-capacity peers are super peers, and all queries are routed using a flooding mechanism among superpeers.
FLOODING & Random Walk:Flooding: Implosion – same node getting multiple messages for same query
Both: scalability, RW – may not find obj
Enhancements: TTL – time to live
Expanding ring flooding – first flood to k-hops, if no result flood k+1 hops, if no response then try k+ 2 hops, similarly continue
Random walk: – Query failure determined by a timeout or explicit failure message from last node.
Several random walk queries may be issued in parallel as well.
Additional techniques in UP2P:
Overlay topology a) how do decide on peers (number, who to connect to or retain, etc.); base decision on capacity of peers, type of content, connectivity or peer, etc.; b) clustering – ex. Clusters formed when probl of two nodes being connected is higher if they have a common neighbor; - Prefer connectivity to high-degree nodes, with shared contents, peers with objects closer in key space, etc.
Object placement – selection of nodes where an object is placed. E.g., base it on popularity, routing mechanism, etc.- distribute replica’s of popular objects (explicit push, caching)
Caching - can inform random walk on what objects are nearby (cache summary information about contents of neighbors etc.)
Query forwarding criteria
Misc: McAfee use P2P to update virus definition within a local network
Superpeers are selected based on Bandwidth, reliability, trust, memory, CPU
Gnutella V0.6 and higher,
FastTrack a proprietary system:
FastTrack: Another P2P system around same time as Gnutella.
Used by no of clients such as KaZaA, Grokster and Imesh.
Proprietary system using an encrypted protocol
high-capacity nodes are supernodes (SN), and low-capacity nodes are ordinary nodes (ON);
Each SN maintains connections to 40-50 other SNs (in a network of ~3M nodes and ~30K SNs - practical numbers) and 50-80 ONs .
Each ON connects to one SN; SN provides ON with a list of other SNs which ON caches; after ON issues a query and a SN responds, ON disconnects from current SN and attaches to a new SN from list. ON now receieves a new SN list which it merges with its list.
Average SN-SN connection ~35 mins, SN-ON connection ~10 mins (~30% <30 secs)
These changes (also in connectivity) help balance load in the network, improving locality, and connection shuffling that increases long range coverage. It also makes tracking peer transfers difficult.
Freenet: (Open Source)
Proposed in 1999 – P2P file-sharing system - contains security, anonymity and deniability features
Objects and peers have identifiers – aka routing keys (created using a hash function).
Each peer – a fixed sized routing table (containing keys of peers); mesh;
Requests forwarded to peers with closest matching routing key. If request fails, it tries again with peer with next closest routing key. (Algorithm- steepest accent hill climbing with back tracking until TTL expires)
-Also caches objects along the return path to reduce failure.
FastFreenet: improves the hit-rate by:
Peers share a fuzzy description of files it has with neighbors, which allows nodes to forward query to peers likely to have the object.
Fuzzy description- an N bit number where each bit corresponds to 1/N segment of the key space.
Users – Better contribution better download speeds (not always)
Content providers – Enable content delivery networks
3rd parties - Revenue through ads on search engines
Guaranteed to find content because of centralized search
Trackers can be contacted using TCP, UDP, or HTTP
Unstructured P2P – easy to implement, inefficient routing, inability to locate rare objects.
Gradual changes – e.g. clustering, near/rar links, semantic links etc. to improve efficiency.
SP2P takes this one step further.
QoE – quality of experience
Structured overlay - designs overlays with routing mechanisms that are deterministic, and allowing for location of any objects (in bounded time).
P2P supports key based routing - object identifiers are mapped to peer identifier address space, and object requested routed to the nearest peer in P2P address space
Goal – Distributed object location and routing (DOLR). A specific scheme is DHT (distributed hash table)
M – key length in bits
Original paper use iterative routing (s go to x, x inform y to s, s go to y, y inform z, s go to z …) to implement recursive routing
Fig 2 – 10,000 nodes 1,000,000 keys
Virtual node – 1 physical node acting as multiple nodes distributed across the ring
Ideally 1 physical node should represent log N virtual nodes
Conceptually chord is recursive – but actual implementation in paper uses iterative routing (no difference in performance just increase hop count)