An economic approach for
scalable and highly-available
distributed applications

 Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

 CLOUD 2010, July 5-10 2010, Miami, Florida, USA

 nicolas.bonvin@epfl.ch
 LSIR - EPFL
Introduction

    ●    A distributed application = many (remote) components
    ●    A component is
                     –   A piece of software
                     –   Loosely coupled
                     –   Self-Contained
                                                                E
    ●    e.g. a SOA-based application

                                               B    C           D




                                                    A




2   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?
                                                   E



                                   B       C       D




                                           A

                                                       ?
                               1       2       3       4




3   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?
                                                                       E
                     –   Random placement ?
                                                           B   C       D



                                                               A




                                 1     2               3           4
                             A                     D               B
                             C                                     E




                                     Bad resource utilization !

4   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?                                        E
                     –   « Clever » random placement ?
                                                              B       C   D



                                                                      A



                                   1   2              3           4
                               A       E          D           B
                               C




                D and E should probably be hosted on the same server !

                                       Not always optimal !


5   EPFL – LSIR - Nicolas Bonvin
Even more components !

    ●    High Availability: software, hardware, network failures
    ●    Scalability: growing load, peaks, scaling down, ...

                                               Replication !



                                                               E   E




                             B     B       C         C         D   D




                                       A         A       A




6   EPFL – LSIR - Nicolas Bonvin
Placement: second problem

    ●    Where should the components be placed to maximize the
         application availability ?
                                                                E       E



                                   B   B        C       C       D       D



                                            A       A       A

                                                                                ?

                           Rack 1      Rack 2                       Rack 3   Rack 4

                             Datacenter 1                            Datacenter 2


7   EPFL – LSIR - Nicolas Bonvin
Multi Objective Optimization Problem

    ●    Maximize the geographical distance of replicas
                     –   Greater availability
    ●    Minimize the geographical distance between related
         components
                     –   Lower latency
    ●    Balance the load (disk I/O, network I/O, CPU) between the
         servers
                     –   Better application performance


                                                NP-Complete




8   EPFL – LSIR - Nicolas Bonvin
Scarce:
a framework to build scalable cloud applications
Architecture overview

     ●    An agent on each server
                      –    starts/stops/monitors the components
                      –    Takes decisions on behalf of the components
     ●    An agent communicates with other agents
                      –    Routing table
                      –    Status of the server (resources usage)


                          Server                        Agent
                                                                               Agent
                 A

                 B              Agent                            GOSSIPING
                                                                + BROADCAST
                                                    Agent
                                                                                Agent
                 E


                                                                       Agent


10   EPFL – LSIR - Nicolas Bonvin
An economic approach

     ●    Time is split into epochs (no synchronization between servers)
     ●    Servers charge a virtual rent for hosting a component according to
                      –   Current resource usage (I/O, CPU, ...) of the server
                      –   Technical factors (HW, connectivity, ...)
                      –   Non-technical factors (country stability, ....)


     ●    Components
                      –   Pay virtual rent at each epoch
                      –   Gain virtual money by processing requests
                      –   Take decisions based on balance ( = gain – rent )
                                    ●   Replicate, migrate, suicide, stay

     ●    Virtual rents are updated by gossiping (no centralized board)

11   EPFL – LSIR - Nicolas Bonvin
Economic model




     ●    Replication of a component
                      –   If minimum availability is not reached
                      –   If b' > 0 for last n epochs
     ●    Migration/Suicide of a component
                      –   If balance c < 0 for last n epochs



12   EPFL – LSIR - Nicolas Bonvin
Availability (i)

     ●    Increase availability by increasing geographical diversity
     ●    Handled by replication
                      –   Granularity: rack, room, datacenter, country, ...
                      –   Label: NA-US-NY1-C01-R12-S02
     ●    Each component must satisfy a minimum availability




     ●    Si is the set of server hosting a replica of component i




13   EPFL – LSIR - Nicolas Bonvin
Availability (ii)

     ●    Similarity: computes the distance between 2 servers




     ●    Diversity:

     ●    Choosing a candidate server j




     ●    gj : weight related to the proximity of the server location to the
          geographical distribution of the client requests to the component


14   EPFL – LSIR - Nicolas Bonvin
Summary

     ●    High Availability: software, hardware, network failures
                      –   Geographical aware placement (netbenef maximization)
                      –   Minimum availability level per component


     ●    Scalability: growing load, peaks, scaling down, ...
                      –   Quick replication of busy components


     ●    Load Balancing: load has to be shared by all available servers
                      –   Replication of busy components
                      –   Migration of less busy components
                      –   Reach equilibrium when load is stable


     ●    No synchronization, fully decentralized

15   EPFL – LSIR - Nicolas Bonvin
Evaluation
Evaluation: Setup

     ●    E-Ticketing application (print@home)




     ●    1 or 3 applications deployed in the cloud
     ●    7 or 15 servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-
          trunk-amd64)
     ●    Servers dedicated to the components: 4 or 10

17   EPFL – LSIR - Nicolas Bonvin
Static vs Dynamic placement (i)




18   EPFL – LSIR - Nicolas Bonvin
Static vs Dynamic placement (ii)




19   EPFL – LSIR - Nicolas Bonvin
Adaptability to new resources




     ●    1500 concurrent users



20   EPFL – LSIR - Nicolas Bonvin
Fairness between applications




21   EPFL – LSIR - Nicolas Bonvin
Conclusion
Conclusion

     ●    Framework for building cloud applications
     ●    Maximize cloud resource utilization
     ●    Maximize availability
     ●    React to sudden load changes
     ●    Elastic (add/remove resources)
     ●    No synchronization
     ●    Fully decentralized




23   EPFL – LSIR - Nicolas Bonvin
Thank you !

An economic approach for scalable and highly-available distributed applications

  • 1.
    An economic approachfor scalable and highly-available distributed applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer CLOUD 2010, July 5-10 2010, Miami, Florida, USA nicolas.bonvin@epfl.ch LSIR - EPFL
  • 2.
    Introduction ● A distributed application = many (remote) components ● A component is – A piece of software – Loosely coupled – Self-Contained E ● e.g. a SOA-based application B C D A 2 EPFL – LSIR - Nicolas Bonvin
  • 3.
    Placement: first problem ● Where should the components be placed to maximize the application performance ? E B C D A ? 1 2 3 4 3 EPFL – LSIR - Nicolas Bonvin
  • 4.
    Placement: first problem ● Where should the components be placed to maximize the application performance ? E – Random placement ? B C D A 1 2 3 4 A D B C E Bad resource utilization ! 4 EPFL – LSIR - Nicolas Bonvin
  • 5.
    Placement: first problem ● Where should the components be placed to maximize the application performance ? E – « Clever » random placement ? B C D A 1 2 3 4 A E D B C D and E should probably be hosted on the same server ! Not always optimal ! 5 EPFL – LSIR - Nicolas Bonvin
  • 6.
    Even more components! ● High Availability: software, hardware, network failures ● Scalability: growing load, peaks, scaling down, ... Replication ! E E B B C C D D A A A 6 EPFL – LSIR - Nicolas Bonvin
  • 7.
    Placement: second problem ● Where should the components be placed to maximize the application availability ? E E B B C C D D A A A ? Rack 1 Rack 2 Rack 3 Rack 4 Datacenter 1 Datacenter 2 7 EPFL – LSIR - Nicolas Bonvin
  • 8.
    Multi Objective OptimizationProblem ● Maximize the geographical distance of replicas – Greater availability ● Minimize the geographical distance between related components – Lower latency ● Balance the load (disk I/O, network I/O, CPU) between the servers – Better application performance NP-Complete 8 EPFL – LSIR - Nicolas Bonvin
  • 9.
    Scarce: a framework tobuild scalable cloud applications
  • 10.
    Architecture overview ● An agent on each server – starts/stops/monitors the components – Takes decisions on behalf of the components ● An agent communicates with other agents – Routing table – Status of the server (resources usage) Server Agent Agent A B Agent GOSSIPING + BROADCAST Agent Agent E Agent 10 EPFL – LSIR - Nicolas Bonvin
  • 11.
    An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....) ● Components – Pay virtual rent at each epoch – Gain virtual money by processing requests – Take decisions based on balance ( = gain – rent ) ● Replicate, migrate, suicide, stay ● Virtual rents are updated by gossiping (no centralized board) 11 EPFL – LSIR - Nicolas Bonvin
  • 12.
    Economic model ● Replication of a component – If minimum availability is not reached – If b' > 0 for last n epochs ● Migration/Suicide of a component – If balance c < 0 for last n epochs 12 EPFL – LSIR - Nicolas Bonvin
  • 13.
    Availability (i) ● Increase availability by increasing geographical diversity ● Handled by replication – Granularity: rack, room, datacenter, country, ... – Label: NA-US-NY1-C01-R12-S02 ● Each component must satisfy a minimum availability ● Si is the set of server hosting a replica of component i 13 EPFL – LSIR - Nicolas Bonvin
  • 14.
    Availability (ii) ● Similarity: computes the distance between 2 servers ● Diversity: ● Choosing a candidate server j ● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component 14 EPFL – LSIR - Nicolas Bonvin
  • 15.
    Summary ● High Availability: software, hardware, network failures – Geographical aware placement (netbenef maximization) – Minimum availability level per component ● Scalability: growing load, peaks, scaling down, ... – Quick replication of busy components ● Load Balancing: load has to be shared by all available servers – Replication of busy components – Migration of less busy components – Reach equilibrium when load is stable ● No synchronization, fully decentralized 15 EPFL – LSIR - Nicolas Bonvin
  • 16.
  • 17.
    Evaluation: Setup ● E-Ticketing application (print@home) ● 1 or 3 applications deployed in the cloud ● 7 or 15 servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32- trunk-amd64) ● Servers dedicated to the components: 4 or 10 17 EPFL – LSIR - Nicolas Bonvin
  • 18.
    Static vs Dynamicplacement (i) 18 EPFL – LSIR - Nicolas Bonvin
  • 19.
    Static vs Dynamicplacement (ii) 19 EPFL – LSIR - Nicolas Bonvin
  • 20.
    Adaptability to newresources ● 1500 concurrent users 20 EPFL – LSIR - Nicolas Bonvin
  • 21.
    Fairness between applications 21 EPFL – LSIR - Nicolas Bonvin
  • 22.
  • 23.
    Conclusion ● Framework for building cloud applications ● Maximize cloud resource utilization ● Maximize availability ● React to sudden load changes ● Elastic (add/remove resources) ● No synchronization ● Fully decentralized 23 EPFL – LSIR - Nicolas Bonvin
  • 24.