Autonomic SLA-drivenProvisioning for CloudApplications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer CCGRID 2011, May ...
Cloud Apps – Issue #1 : Placement    ●    A distributed, component-based application running on an elastic         infrast...
Cloud Apps – Issue #1 : Placement    ●    A distributed, component-based application running on an elastic         infrast...
Cloud Apps – Issue #1 : Placement    ●    A distributed, component-based application running on an elastic         infrast...
Cloud Apps – Issue #1 : Placement    ●    A distributed, component-based application running on an elastic         infrast...
Cloud Apps – Issue #2 : Unstability    ●    Load-balanced trafic to 4 identical components on 4 identical VMs             ...
Cloud Apps – Issue #2 : Unstability    ●    Load-balanced trafic to 4 identical components on 4 identical VMs             ...
Cloud Apps – Issue #2 : Unstability    ●    Load-balanced trafic to 4 identical components on 4 identical VMs             ...
Cloud Apps – Issue #2 : Unstability    ●    Load-balanced trafic to 4 identical components on 4 identical VMs             ...
Cloud Apps – Issue #2 : Unstability     ●    Load-balanced trafic to 4 identical components on 4 identical VMs            ...
Cloud Apps – Issue #2 : Unstability     ●    Load-balanced trafic to 4 identical components on 4 identical VMs            ...
Cloud Apps – Overview     ●    Build for failures                      –   Do not trust the underlying infrastructure     ...
Scarce:a framework to build scalable cloud applications
Architecture Overview     ●    An agent on each server / VM                      –    starts/stops/monitors the components...
An economic approach     ●    Time is split into epochs (no synchronization between servers)     ●    Servers charge a vir...
An economic approach     ●    Time is split into epochs (no synchronization between servers)     ●    Servers charge a vir...
Economic model (i)     ●    The rent of a server is different for each component !17   EPFL – LSIR - Nicolas Bonvin
Economic model (ii)                                                                      CPU : 70%                        ...
Economic model (iii)     ●    Choosing a candidate server j during replication/migration of a          component i        ...
SLA Performance Guarantees (i)     ●    Each component has its own SLA constraints     ●    SLA derived directly from entr...
SLA Performance Guarantees (ii)     ●    SLA propagation from parents to children     ●    Parent j sends its performance ...
SLA Performance Guarantees (iii)     ●    SLA propagation from parents to children22   EPFL – LSIR - Nicolas Bonvin
Automatic Provisioning     ●    Usage of allocated resources is maximized :                      –   autonomic migration /...
Adaptivity to slow servers     ●    Each component keeps statistics about its children                      –   e.g. 95th ...
Evaluation
Evaluation: Setup     ●    5 components, mostly CPU-intensive (wc >> wm,wn,wd)                                            ...
Adaptation to Varying Load (i)     ●    5 rps to 60 rps at minute 8, step 5 rps/min     ●    Static setup : 2 servers with...
Adaptation to Varying Load (ii)     ●    5 rps to 60 rps at minute 8, step 5 rps/min     ●    Static setup : 2 servers wit...
Adaptation to Slow Server     ●    Max 2 cores/server, 25 rps     ●    At minute 4, a server gets slower (200 ms delay)29 ...
Scalability     ●    Add 5 rps            per minute until 150 rps     ●    Max 6 cores/server30   EPFL – LSIR - Nicolas B...
Conclusion
Conclusion     ●    Framework for building cloud applications     ●    Elasticity : add/remove resources     ●    High Ava...
Thank you !
Upcoming SlideShare
Loading in...5
×

Autonomic SLA-driven Provisioning for Cloud Applications

1,129
-1

Published on

Significant achievements have been made for automated allocation of cloud resources. However, the performance of applications may be poor in peak load periods, unless their cloud resources are dynamically adjusted. Moreover, although cloud resources dedicated to different applications are virtually isolated, performance fluctuations do occur because of resource sharing, and software or hardware failures (e.g. unstable virtual machines, power outages, etc.). We propose a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. According to our approach, the dynamic economic fitness of a Web service determines whether it is replicated or migrated to another server, or deleted. The economic fitness of a Web service depends on its individual performance constraints, its load, and the utilization of the resources where it resides. Cascading performance objectives are dynamically calculated for individual tasks in the application workflow according to the user requirements.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,129
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Autonomic SLA-driven Provisioning for Cloud Applications

  1. 1. Autonomic SLA-drivenProvisioning for CloudApplications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA nicolas.bonvin@epfl.ch LSIR - EPFL
  2. 2. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure C1 C1 C2 C2 C3 C3 C4 C42 EPFL – LSIR - Nicolas Bonvin
  3. 3. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 VM43 EPFL – LSIR - Nicolas Bonvin
  4. 4. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure ● Performance of C1, C2 and C3 is probably less than C4 ● No info on other VMs colocated on same server ! C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 VM4 Server 1 Server 24 EPFL – LSIR - Nicolas Bonvin
  5. 5. Cloud Apps – Issue #1 : Placement ● A distributed, component-based application running on an elastic infrastructure ● Performance of C1, C2 and C3 is probably less than C4 ● No info on other VMs colocated on same server ! C1 C1 C2 C2 C3 C3 C4 C4 VM1 VM2 VM3 VM4 Server 1 Server 2 No control on placement5 EPFL – LSIR - Nicolas Bonvin
  6. 6. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 100 ms 100 ms 100 ms 100 ms6 EPFL – LSIR - Nicolas Bonvin
  7. 7. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 100 ms 140 ms 100 ms 100 ms7 EPFL – LSIR - Nicolas Bonvin
  8. 8. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 130 ms 140 ms 100 ms 100 ms8 EPFL – LSIR - Nicolas Bonvin
  9. 9. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 130 ms 140 ms 100 ms infinity9 EPFL – LSIR - Nicolas Bonvin
  10. 10. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... ● Failure of C1 on VM4 -> load is rebalanced C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 140 ms 150 ms 130 ms infinity10 EPFL – LSIR - Nicolas Bonvin
  11. 11. Cloud Apps – Issue #2 : Unstability ● Load-balanced trafic to 4 identical components on 4 identical VMs – VM performance can vary up to a ratio 4 ! [Dej2009] ● Physical server, Hypervisor, Storage, ... ● Component overloaded ● Component bug, crash, deadlock, ... ● Failure of C1 on VM4 -> load is rebalanced C1 C1 C1 C1 C1 C1 C1 C1 VM1 VM2 VM3 VM4 140 ms 150 ms 130 ms infinity Application should react early !11 EPFL – LSIR - Nicolas Bonvin
  12. 12. Cloud Apps – Overview ● Build for failures – Do not trust the underlying infrastructure – Do not trust your components either ! ● Components should adapt to the changing conditions – Quickly – Automatically – e.g. by replacing a wonky VM by a new one12 EPFL – LSIR - Nicolas Bonvin
  13. 13. Scarce:a framework to build scalable cloud applications
  14. 14. Architecture Overview ● An agent on each server / VM – starts/stops/monitors the components – Takes decisions on behalf of the components ● An agent communicates with other agents – Routing table – Status of the server (resources usage) Server Agent Agent A B Agent GOSSIPING + BROADCAST Agent Agent E Agent14 EPFL – LSIR - Nicolas Bonvin
  15. 15. An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....)15 EPFL – LSIR - Nicolas Bonvin
  16. 16. An economic approach ● Time is split into epochs (no synchronization between servers) ● Servers charge a virtual rent for hosting a component according to – Current resource usage (I/O, CPU, ...) of the server – Technical factors (HW, connectivity, ...) – Non-technical factors (country stability, ....) ● Components – Pay virtual rent at each epoch – Gain virtual money by processing requests – Take decisions based on balance ( = gain – rent ) ● Replicate, migrate, suicide, stay ● Virtual rents are updated by gossiping (no centralized board)16 EPFL – LSIR - Nicolas Bonvin
  17. 17. Economic model (i) ● The rent of a server is different for each component !17 EPFL – LSIR - Nicolas Bonvin
  18. 18. Economic model (ii) CPU : 70% I/O : 20% VM1 CPU : 30% I/O : 5% C1 C1 ? CPU : 25% I/O : 65% VM2 ● VM1 and VM2 have an « identical » resources usage : 45% ● Server rent = servers resources usage with components weights – Rent for C1 @ VM1 > rent for C1 @ VM2 Multiplexing of server resources18 EPFL – LSIR - Nicolas Bonvin
  19. 19. Economic model (iii) ● Choosing a candidate server j during replication/migration of a component i – netbenefit maximization ● 2 optimization goals : – high-availability by geographical diversity of replicas – low latency by grouping related components ● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component ● Si is the set of server hosting a replica of component i19 EPFL – LSIR - Nicolas Bonvin
  20. 20. SLA Performance Guarantees (i) ● Each component has its own SLA constraints ● SLA derived directly from entry components C2 C2 C4 C4 C1 C1 SLA :: 500ms SLA 500ms C3 C3 C5 C5 ● Resp. Time = Service Time + max (Resp. Time of Dependencies)20 EPFL – LSIR - Nicolas Bonvin
  21. 21. SLA Performance Guarantees (ii) ● SLA propagation from parents to children ● Parent j sends its performance constraints (e.g. response time upper bound) to its dependencies D(j) : ● Child i computes its own performance constraints : ● : group of constraints sent by the replicas of the parent g21 EPFL – LSIR - Nicolas Bonvin
  22. 22. SLA Performance Guarantees (iii) ● SLA propagation from parents to children22 EPFL – LSIR - Nicolas Bonvin
  23. 23. Automatic Provisioning ● Usage of allocated resources is maximized : – autonomic migration / replication / suicide of components – not enough to ensure end-to-end response time ● Cloud resources managed by framework via cloud API ● Each individual component has to satisfy its own SLA – SLA easily met -> decrease resources (scale down) – SLA not met -> increase resources (scale up, scale out)23 EPFL – LSIR - Nicolas Bonvin
  24. 24. Adaptivity to slow servers ● Each component keeps statistics about its children – e.g. 95th perc. response time ● A routing coefficient is computed for each child at each epoch – Send more requests to more performant children24 EPFL – LSIR - Nicolas Bonvin
  25. 25. Evaluation
  26. 26. Evaluation: Setup ● 5 components, mostly CPU-intensive (wc >> wm,wn,wd) C2 C2 C4 C4 C1 C1 SLA :: 500ms SLA 500ms C3 C3 C5 C5 ● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32- trunk-amd64) ● d=0, C=110, k =10000, xs* = 25%26 EPFL – LSIR - Nicolas Bonvin
  27. 27. Adaptation to Varying Load (i) ● 5 rps to 60 rps at minute 8, step 5 rps/min ● Static setup : 2 servers with 2 cores27 EPFL – LSIR - Nicolas Bonvin
  28. 28. Adaptation to Varying Load (ii) ● 5 rps to 60 rps at minute 8, step 5 rps/min ● Static setup : 2 servers with 2 cores28 EPFL – LSIR - Nicolas Bonvin
  29. 29. Adaptation to Slow Server ● Max 2 cores/server, 25 rps ● At minute 4, a server gets slower (200 ms delay)29 EPFL – LSIR - Nicolas Bonvin
  30. 30. Scalability ● Add 5 rps per minute until 150 rps ● Max 6 cores/server30 EPFL – LSIR - Nicolas Bonvin
  31. 31. Conclusion
  32. 32. Conclusion ● Framework for building cloud applications ● Elasticity : add/remove resources ● High Availability : software, hardware, network failures ● Scalability : growing load, peaks, scaling down, ... – Quick replication of busy components ● Load Balancing : load has to be shared by all available servers – Replication of busy components – Migration of less busy components – Reach equilibrium when load is stable ● SLA performance guarantees – Automatic provisioning ● No synchronization, fully decentralized32 EPFL – LSIR - Nicolas Bonvin
  33. 33. Thank you !
  34. 34. Availability (i) ● Increase availability by increasing geographical diversity ● Handled by replication – Granularity: rack, room, datacenter, country, ... – Label: NA-US-NY1-C01-R12-S02 ● Each component must satisfy a minimum availability ● Si is the set of server hosting a replica of component i34 EPFL – LSIR - Nicolas Bonvin
  35. 35. Availability (ii) ● Similarity: computes the distance between 2 servers ● Diversity: ● Choosing a candidate server j ● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component35 EPFL – LSIR - Nicolas Bonvin
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×