SlideShare a Scribd company logo
An economic approach for
scalable and highly-available
distributed applications

 Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

 CLOUD 2010, July 5-10 2010, Miami, Florida, USA

 nicolas.bonvin@epfl.ch
 LSIR - EPFL
Introduction

    ●    A distributed application = many (remote) components
    ●    A component is
                     –   A piece of software
                     –   Loosely coupled
                     –   Self-Contained
                                                                E
    ●    e.g. a SOA-based application

                                               B    C           D




                                                    A




2   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?
                                                   E



                                   B       C       D




                                           A

                                                       ?
                               1       2       3       4




3   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?
                                                                       E
                     –   Random placement ?
                                                           B   C       D



                                                               A




                                 1     2               3           4
                             A                     D               B
                             C                                     E




                                     Bad resource utilization !

4   EPFL – LSIR - Nicolas Bonvin
Placement: first problem

    ●    Where should the components be placed to maximize the
         application performance ?                                        E
                     –   « Clever » random placement ?
                                                              B       C   D



                                                                      A



                                   1   2              3           4
                               A       E          D           B
                               C




                D and E should probably be hosted on the same server !

                                       Not always optimal !


5   EPFL – LSIR - Nicolas Bonvin
Even more components !

    ●    High Availability: software, hardware, network failures
    ●    Scalability: growing load, peaks, scaling down, ...

                                               Replication !



                                                               E   E




                             B     B       C         C         D   D




                                       A         A       A




6   EPFL – LSIR - Nicolas Bonvin
Placement: second problem

    ●    Where should the components be placed to maximize the
         application availability ?
                                                                E       E



                                   B   B        C       C       D       D



                                            A       A       A

                                                                                ?

                           Rack 1      Rack 2                       Rack 3   Rack 4

                             Datacenter 1                            Datacenter 2


7   EPFL – LSIR - Nicolas Bonvin
Multi Objective Optimization Problem

    ●    Maximize the geographical distance of replicas
                     –   Greater availability
    ●    Minimize the geographical distance between related
         components
                     –   Lower latency
    ●    Balance the load (disk I/O, network I/O, CPU) between the
         servers
                     –   Better application performance


                                                NP-Complete




8   EPFL – LSIR - Nicolas Bonvin
Scarce:
a framework to build scalable cloud applications
Architecture overview

     ●    An agent on each server
                      –    starts/stops/monitors the components
                      –    Takes decisions on behalf of the components
     ●    An agent communicates with other agents
                      –    Routing table
                      –    Status of the server (resources usage)


                          Server                        Agent
                                                                               Agent
                 A

                 B              Agent                            GOSSIPING
                                                                + BROADCAST
                                                    Agent
                                                                                Agent
                 E


                                                                       Agent


10   EPFL – LSIR - Nicolas Bonvin
An economic approach

     ●    Time is split into epochs (no synchronization between servers)
     ●    Servers charge a virtual rent for hosting a component according to
                      –   Current resource usage (I/O, CPU, ...) of the server
                      –   Technical factors (HW, connectivity, ...)
                      –   Non-technical factors (country stability, ....)


     ●    Components
                      –   Pay virtual rent at each epoch
                      –   Gain virtual money by processing requests
                      –   Take decisions based on balance ( = gain – rent )
                                    ●   Replicate, migrate, suicide, stay

     ●    Virtual rents are updated by gossiping (no centralized board)

11   EPFL – LSIR - Nicolas Bonvin
Economic model




     ●    Replication of a component
                      –   If minimum availability is not reached
                      –   If b' > 0 for last n epochs
     ●    Migration/Suicide of a component
                      –   If balance c < 0 for last n epochs



12   EPFL – LSIR - Nicolas Bonvin
Availability (i)

     ●    Increase availability by increasing geographical diversity
     ●    Handled by replication
                      –   Granularity: rack, room, datacenter, country, ...
                      –   Label: NA-US-NY1-C01-R12-S02
     ●    Each component must satisfy a minimum availability




     ●    Si is the set of server hosting a replica of component i




13   EPFL – LSIR - Nicolas Bonvin
Availability (ii)

     ●    Similarity: computes the distance between 2 servers




     ●    Diversity:

     ●    Choosing a candidate server j




     ●    gj : weight related to the proximity of the server location to the
          geographical distribution of the client requests to the component


14   EPFL – LSIR - Nicolas Bonvin
Summary

     ●    High Availability: software, hardware, network failures
                      –   Geographical aware placement (netbenef maximization)
                      –   Minimum availability level per component


     ●    Scalability: growing load, peaks, scaling down, ...
                      –   Quick replication of busy components


     ●    Load Balancing: load has to be shared by all available servers
                      –   Replication of busy components
                      –   Migration of less busy components
                      –   Reach equilibrium when load is stable


     ●    No synchronization, fully decentralized

15   EPFL – LSIR - Nicolas Bonvin
Evaluation
Evaluation: Setup

     ●    E-Ticketing application (print@home)




     ●    1 or 3 applications deployed in the cloud
     ●    7 or 15 servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-
          trunk-amd64)
     ●    Servers dedicated to the components: 4 or 10

17   EPFL – LSIR - Nicolas Bonvin
Static vs Dynamic placement (i)




18   EPFL – LSIR - Nicolas Bonvin
Static vs Dynamic placement (ii)




19   EPFL – LSIR - Nicolas Bonvin
Adaptability to new resources




     ●    1500 concurrent users



20   EPFL – LSIR - Nicolas Bonvin
Fairness between applications




21   EPFL – LSIR - Nicolas Bonvin
Conclusion
Conclusion

     ●    Framework for building cloud applications
     ●    Maximize cloud resource utilization
     ●    Maximize availability
     ●    React to sudden load changes
     ●    Elastic (add/remove resources)
     ●    No synchronization
     ●    Fully decentralized




23   EPFL – LSIR - Nicolas Bonvin
Thank you !

More Related Content

Viewers also liked

Cancer de tiroides (Parte 1)
Cancer de tiroides (Parte 1)Cancer de tiroides (Parte 1)
Cancer de tiroides (Parte 1)
Consultoris Vitae
 
Génie Sol, Matériaux et Environnement
Génie Sol, Matériaux et EnvironnementGénie Sol, Matériaux et Environnement
Génie Sol, Matériaux et Environnement
etancrez
 
Parlin mobile
Parlin mobileParlin mobile
Parlin mobile
yosin
 
FlinkMakeWeb2.0Tool
FlinkMakeWeb2.0ToolFlinkMakeWeb2.0Tool
FlinkMakeWeb2.0Tool
Peter Dublin
 
Génie - Infrastructures
Génie - InfrastructuresGénie - Infrastructures
Génie - Infrastructures
etancrez
 
Omnivue Hosting Presentation
Omnivue Hosting PresentationOmnivue Hosting Presentation
Omnivue Hosting Presentation
Shawnmtaylor
 
SALSA: A Framework for Dynamic Configuration of Cloud Services
SALSA: A Framework for Dynamic Configuration of Cloud ServicesSALSA: A Framework for Dynamic Configuration of Cloud Services
SALSA: A Framework for Dynamic Configuration of Cloud Services
Duc-Hung LE
 

Viewers also liked (20)

ΜΕ ΤΟΥΣ ΚΕΝΤΑΥΡΟΥΣ
ΜΕ ΤΟΥΣ ΚΕΝΤΑΥΡΟΥΣΜΕ ΤΟΥΣ ΚΕΝΤΑΥΡΟΥΣ
ΜΕ ΤΟΥΣ ΚΕΝΤΑΥΡΟΥΣ
 
Controlling ppt
Controlling pptControlling ppt
Controlling ppt
 
Efectos de la leche de vaca
Efectos de la leche de vacaEfectos de la leche de vaca
Efectos de la leche de vaca
 
Cancer de tiroides (Parte 1)
Cancer de tiroides (Parte 1)Cancer de tiroides (Parte 1)
Cancer de tiroides (Parte 1)
 
A two circuit approach to Economic Development
A two circuit approach to Economic DevelopmentA two circuit approach to Economic Development
A two circuit approach to Economic Development
 
New Clinical Quality Measures and PQRS EHR Reporting
New Clinical Quality Measures and PQRS EHR ReportingNew Clinical Quality Measures and PQRS EHR Reporting
New Clinical Quality Measures and PQRS EHR Reporting
 
PQRS Claims-based Reporting in 2014
PQRS Claims-based Reporting in 2014PQRS Claims-based Reporting in 2014
PQRS Claims-based Reporting in 2014
 
Génie Sol, Matériaux et Environnement
Génie Sol, Matériaux et EnvironnementGénie Sol, Matériaux et Environnement
Génie Sol, Matériaux et Environnement
 
Siðaklemmur og fagmennska í mannauðsstjórnun
Siðaklemmur og fagmennska í mannauðsstjórnunSiðaklemmur og fagmennska í mannauðsstjórnun
Siðaklemmur og fagmennska í mannauðsstjórnun
 
Parlin mobile
Parlin mobileParlin mobile
Parlin mobile
 
Lois pereiro power point.2 formato2
Lois pereiro power point.2 formato2Lois pereiro power point.2 formato2
Lois pereiro power point.2 formato2
 
A3 Lean hjá Össuri
A3 Lean hjá ÖssuriA3 Lean hjá Össuri
A3 Lean hjá Össuri
 
Os cogumelos
Os cogumelosOs cogumelos
Os cogumelos
 
FlinkMakeWeb2.0Tool
FlinkMakeWeb2.0ToolFlinkMakeWeb2.0Tool
FlinkMakeWeb2.0Tool
 
Writing process
Writing processWriting process
Writing process
 
Metrix global coaching roi briefing
Metrix global coaching roi briefingMetrix global coaching roi briefing
Metrix global coaching roi briefing
 
Génie - Infrastructures
Génie - InfrastructuresGénie - Infrastructures
Génie - Infrastructures
 
Omnivue Hosting Presentation
Omnivue Hosting PresentationOmnivue Hosting Presentation
Omnivue Hosting Presentation
 
Autonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud ApplicationsAutonomic SLA-driven Provisioning for Cloud Applications
Autonomic SLA-driven Provisioning for Cloud Applications
 
SALSA: A Framework for Dynamic Configuration of Cloud Services
SALSA: A Framework for Dynamic Configuration of Cloud ServicesSALSA: A Framework for Dynamic Configuration of Cloud Services
SALSA: A Framework for Dynamic Configuration of Cloud Services
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

An economic approach for scalable and highly-available distributed applications

  • 1. An economic approach for scalable and highly-available distributed applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer CLOUD 2010, July 5-10 2010, Miami, Florida, USA nicolas.bonvin@epfl.ch LSIR - EPFL
  • 2. Introduction ● A distributed application = many (remote) components ● A component is – A piece of software – Loosely coupled – Self-Contained E ● e.g. a SOA-based application B C D A 2 EPFL – LSIR - Nicolas Bonvin
  • 3. Placement: first problem ● Where should the components be placed to maximize the application performance ? E B C D A ? 1 2 3 4 3 EPFL – LSIR - Nicolas Bonvin
  • 4. Placement: first problem ● Where should the components be placed to maximize the application performance ? E – Random placement ? B C D A 1 2 3 4 A D B C E Bad resource utilization ! 4 EPFL – LSIR - Nicolas Bonvin
  • 5. Placement: first problem ● Where should the components be placed to maximize the application performance ? E – « Clever » random placement ? B C D A 1 2 3 4 A E D B C D and E should probably be hosted on the same server ! Not always optimal ! 5 EPFL – LSIR - Nicolas Bonvin
  • 6. Even more components ! ● High Availability: software, hardware, network failures ● Scalability: growing load, peaks, scaling down, ... Replication ! E E B B C C D D A A A 6 EPFL – LSIR - Nicolas Bonvin
  • 7. Placement: second problem ● Where should the components be placed to maximize the application availability ? E E B B C C D D A A A ? Rack 1 Rack 2 Rack 3 Rack 4 Datacenter 1 Datacenter 2 7 EPFL – LSIR - Nicolas Bonvin
  • 8. Multi Objective Optimization Problem ● Maximize the geographical distance of replicas – Greater availability ● Minimize the geographical distance between related components – Lower latency ● Balance the load (disk I/O, network I/O, CPU) between the servers – Better application performance NP-Complete 8 EPFL – LSIR - Nicolas Bonvin
  • 9. Scarce: a framework to build scalable cloud applications
  • 10. Architecture overview ● An agent on each server – starts/stops/monitors the components – Takes decisions on behalf of the components ● An agent communicates with other agents – Routing table – Status of the server (resources usage) Server Agent Agent A B Agent GOSSIPING + BROADCAST Agent Agent E Agent 10 EPFL – LSIR - Nicolas Bonvin
  • 11. An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....) ● Components – Pay virtual rent at each epoch – Gain virtual money by processing requests – Take decisions based on balance ( = gain – rent ) ● Replicate, migrate, suicide, stay ● Virtual rents are updated by gossiping (no centralized board) 11 EPFL – LSIR - Nicolas Bonvin
  • 12. Economic model ● Replication of a component – If minimum availability is not reached – If b' > 0 for last n epochs ● Migration/Suicide of a component – If balance c < 0 for last n epochs 12 EPFL – LSIR - Nicolas Bonvin
  • 13. Availability (i) ● Increase availability by increasing geographical diversity ● Handled by replication – Granularity: rack, room, datacenter, country, ... – Label: NA-US-NY1-C01-R12-S02 ● Each component must satisfy a minimum availability ● Si is the set of server hosting a replica of component i 13 EPFL – LSIR - Nicolas Bonvin
  • 14. Availability (ii) ● Similarity: computes the distance between 2 servers ● Diversity: ● Choosing a candidate server j ● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component 14 EPFL – LSIR - Nicolas Bonvin
  • 15. Summary ● High Availability: software, hardware, network failures – Geographical aware placement (netbenef maximization) – Minimum availability level per component ● Scalability: growing load, peaks, scaling down, ... – Quick replication of busy components ● Load Balancing: load has to be shared by all available servers – Replication of busy components – Migration of less busy components – Reach equilibrium when load is stable ● No synchronization, fully decentralized 15 EPFL – LSIR - Nicolas Bonvin
  • 17. Evaluation: Setup ● E-Ticketing application (print@home) ● 1 or 3 applications deployed in the cloud ● 7 or 15 servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32- trunk-amd64) ● Servers dedicated to the components: 4 or 10 17 EPFL – LSIR - Nicolas Bonvin
  • 18. Static vs Dynamic placement (i) 18 EPFL – LSIR - Nicolas Bonvin
  • 19. Static vs Dynamic placement (ii) 19 EPFL – LSIR - Nicolas Bonvin
  • 20. Adaptability to new resources ● 1500 concurrent users 20 EPFL – LSIR - Nicolas Bonvin
  • 21. Fairness between applications 21 EPFL – LSIR - Nicolas Bonvin
  • 23. Conclusion ● Framework for building cloud applications ● Maximize cloud resource utilization ● Maximize availability ● React to sudden load changes ● Elastic (add/remove resources) ● No synchronization ● Fully decentralized 23 EPFL – LSIR - Nicolas Bonvin