Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

An SLA is only useful if it guarantees a certain level of quality. Current Cloud SLAs cover availability but ignore a key ingredient: Response and Throughput Performance. A Performance SLA would need to relate to the applications performance itself, something that no Cloud Provider has control over. We will discuss how Application Performance Monitoring can be used to define, measure and enforce a usable SLA for both sides. We will talk about the differences between IaaS and PaaS cloud providers concerning such an SLA. We will also show how this will lead to better User Experience with less R&D effort. Finally it enables us to easily compare cloud performance across vendors in terms that really matter: Response Time per Cost.

  • Login to see the comments

SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

  1. 1. SLAs and Performance in the Cloud:Because There is More Than “Just”Availability[February, 20, 2012]
  2. 2. Everybody wants a Cloud SLA • Security • Availability • Performance Amazon did not violate its SLA2
  3. 3. Current State of Cloud SLAs • GoGrid – 100% Server Uptime, Credit is given if violated – reasonable efforts to insure that server storage is "persistent” – Network Latency SLA, Credit for prepared fees – Load Balancer: Uptime, latency and throughput • Rackspace – 100% Network excluding scheduled maintenance – Server outage repaired within the hour • Amazon: – 99.95% Annual Regional Availability • more than 1 Zone in the same Region unavailable • Instances have no outside connectivity for at least 5 minutes • API is not available to start new instances3
  4. 4. No Capacity Guarantees 1000 800 600 Response 400 Time 200 Throughput 0 09:15 09:18 09:12 09:09 09:00 09:03 09:06800600400 Response Steal Time! Time200 Throughput Shared Resources! 0 09:00 09:03 09:06 09:09 09:12 09:15 09:18
  5. 5. Priorities have changed! • I don’t care about the underlying Hardware • Focus is on Business Value  My own Application • Performance Management reflects that Performance SLA must impact Application5
  6. 6. Meaningful SLAs • Application Performance – End-to-End Response Time – Throughput • Application Availability – Reachable by the End Users Performance SLA is Application specific Cloud SLA cannot cover that directly6
  7. 7. Possible Cloud Performance SLAs • IaaS – Guaranteed Capacity (CPU, Memory, Bandwidth…) – Guaranteed Latencies (Network, Load balancer, Disk…) – Meaning and Enforcement outside app context? • PaaS – Guaranteed on Application Interfaces – Meaning and Enforcement outside app context?7
  8. 8. Side Effect of missing Performance SLAsNo viable way to compare Price/Performance between multiple providers 8
  9. 9. APM to the Rescue9
  10. 10. What we care about But slow is bad Faster is not better10
  11. 11. End-to-End Response Time PerformanceUser Click On the Web Server In the Application In the Cloud
  12. 12. Application Response Time Cloud DB Latency Performance12
  13. 13. Cloud Performance SLA • Response Time SLA is Application based • Latencies can be measured in the Application • Latency SLA impacts Application and is enforceable13
  14. 14. Capacity Usage Used CPU Time14
  15. 15. AWS Elastic Map/Reduce Performance
  16. 16. Cloud Performance SLA • CPU Usage can be measured in the Application (Attention: this is not utilization!) • Capacity SLA is measurable and enforceable16
  17. 17. Detect application hotspots17
  18. 18. Putting Cloud Monitoring in Context Steal Time or out of CPU? Cause for Latency
  19. 19. Benefits of APM for Cloud Application? • Identify Performance Problems End-to-End! • Determine Cloud vs. Application Issue • Enforce Cloud Performance SLA • Enforce Third-Party SLAs Optimization can reduce the number of instances Reduces Cost!19
  20. 20. Side Effect: A Price Performance Index• Dollar Value for acceptable Performance: 90th response time/(Total Cost/Number of Transactions) Desired Throughput/Total Cost – Mind Volatility – Price Performance Index is comparable• Cost Scalability – Cost per Transaction must remain stable Performance is no longer defined by Capacity It is a function of desired User Experience and associated Cost
  21. 21. Questions THANK YOUMichael KoppMichael.kopp@dynaTrace.com