This document presents an economic approach for scalable and highly-available distributed applications called Scarce. Scarce uses an agent-based architecture where each server runs an agent that communicates with other agents to make decentralized placement decisions. Components are placed based on balancing performance, availability, and load. Components pay virtual rent to servers based on resources used, and servers charge rent. Rents are gossiped between agents with no central authority. The approach dynamically replicates, migrates, or removes components to optimize for availability and load balancing as resources and demand change over time. Evaluation shows Scarce outperforms static placement in adapting to new resources and workloads, and provides fairness between applications sharing cloud resources.
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
An economic approach for scalable and highly-available distributed applications
1. An economic approach for
scalable and highly-available
distributed applications
Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer
CLOUD 2010, July 5-10 2010, Miami, Florida, USA
nicolas.bonvin@epfl.ch
LSIR - EPFL
2. Introduction
● A distributed application = many (remote) components
● A component is
– A piece of software
– Loosely coupled
– Self-Contained
E
● e.g. a SOA-based application
B C D
A
2 EPFL – LSIR - Nicolas Bonvin
3. Placement: first problem
● Where should the components be placed to maximize the
application performance ?
E
B C D
A
?
1 2 3 4
3 EPFL – LSIR - Nicolas Bonvin
4. Placement: first problem
● Where should the components be placed to maximize the
application performance ?
E
– Random placement ?
B C D
A
1 2 3 4
A D B
C E
Bad resource utilization !
4 EPFL – LSIR - Nicolas Bonvin
5. Placement: first problem
● Where should the components be placed to maximize the
application performance ? E
– « Clever » random placement ?
B C D
A
1 2 3 4
A E D B
C
D and E should probably be hosted on the same server !
Not always optimal !
5 EPFL – LSIR - Nicolas Bonvin
6. Even more components !
● High Availability: software, hardware, network failures
● Scalability: growing load, peaks, scaling down, ...
Replication !
E E
B B C C D D
A A A
6 EPFL – LSIR - Nicolas Bonvin
7. Placement: second problem
● Where should the components be placed to maximize the
application availability ?
E E
B B C C D D
A A A
?
Rack 1 Rack 2 Rack 3 Rack 4
Datacenter 1 Datacenter 2
7 EPFL – LSIR - Nicolas Bonvin
8. Multi Objective Optimization Problem
● Maximize the geographical distance of replicas
– Greater availability
● Minimize the geographical distance between related
components
– Lower latency
● Balance the load (disk I/O, network I/O, CPU) between the
servers
– Better application performance
NP-Complete
8 EPFL – LSIR - Nicolas Bonvin
10. Architecture overview
● An agent on each server
– starts/stops/monitors the components
– Takes decisions on behalf of the components
● An agent communicates with other agents
– Routing table
– Status of the server (resources usage)
Server Agent
Agent
A
B Agent GOSSIPING
+ BROADCAST
Agent
Agent
E
Agent
10 EPFL – LSIR - Nicolas Bonvin
11. An economic approach
● Time is split into epochs (no synchronization between servers)
● Servers charge a virtual rent for hosting a component according to
– Current resource usage (I/O, CPU, ...) of the server
– Technical factors (HW, connectivity, ...)
– Non-technical factors (country stability, ....)
● Components
– Pay virtual rent at each epoch
– Gain virtual money by processing requests
– Take decisions based on balance ( = gain – rent )
● Replicate, migrate, suicide, stay
● Virtual rents are updated by gossiping (no centralized board)
11 EPFL – LSIR - Nicolas Bonvin
12. Economic model
● Replication of a component
– If minimum availability is not reached
– If b' > 0 for last n epochs
● Migration/Suicide of a component
– If balance c < 0 for last n epochs
12 EPFL – LSIR - Nicolas Bonvin
13. Availability (i)
● Increase availability by increasing geographical diversity
● Handled by replication
– Granularity: rack, room, datacenter, country, ...
– Label: NA-US-NY1-C01-R12-S02
● Each component must satisfy a minimum availability
● Si is the set of server hosting a replica of component i
13 EPFL – LSIR - Nicolas Bonvin
14. Availability (ii)
● Similarity: computes the distance between 2 servers
● Diversity:
● Choosing a candidate server j
● gj : weight related to the proximity of the server location to the
geographical distribution of the client requests to the component
14 EPFL – LSIR - Nicolas Bonvin
15. Summary
● High Availability: software, hardware, network failures
– Geographical aware placement (netbenef maximization)
– Minimum availability level per component
● Scalability: growing load, peaks, scaling down, ...
– Quick replication of busy components
● Load Balancing: load has to be shared by all available servers
– Replication of busy components
– Migration of less busy components
– Reach equilibrium when load is stable
● No synchronization, fully decentralized
15 EPFL – LSIR - Nicolas Bonvin