SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

1,654 views

Published on

An SLA is only useful if it guarantees a certain level of quality. Current Cloud SLAs cover availability but ignore a key ingredient: Response and Throughput Performance. A Performance SLA would need to relate to the applications performance itself, something that no Cloud Provider has control over. We will discuss how Application Performance Monitoring can be used to define, measure and enforce a usable SLA for both sides. We will talk about the differences between IaaS and PaaS cloud providers concerning such an SLA. We will also show how this will lead to better User Experience with less R&D effort. Finally it enables us to easily compare cloud performance across vendors in terms that really matter: Response Time per Cost.

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,654
On SlideShare
0
From Embeds
0
Number of Embeds
121
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • SLAs and Performance in the Cloud: Because There is More Than "Just" Availability

    1. 1. SLAs and Performance in the Cloud:Because There is More Than “Just”Availability[February, 20, 2012]
    2. 2. Everybody wants a Cloud SLA • Security • Availability • Performance Amazon did not violate its SLA2
    3. 3. Current State of Cloud SLAs • GoGrid – 100% Server Uptime, Credit is given if violated – reasonable efforts to insure that server storage is "persistent” – Network Latency SLA, Credit for prepared fees – Load Balancer: Uptime, latency and throughput • Rackspace – 100% Network excluding scheduled maintenance – Server outage repaired within the hour • Amazon: – 99.95% Annual Regional Availability • more than 1 Zone in the same Region unavailable • Instances have no outside connectivity for at least 5 minutes • API is not available to start new instances3
    4. 4. No Capacity Guarantees 1000 800 600 Response 400 Time 200 Throughput 0 09:15 09:18 09:12 09:09 09:00 09:03 09:06800600400 Response Steal Time! Time200 Throughput Shared Resources! 0 09:00 09:03 09:06 09:09 09:12 09:15 09:18
    5. 5. Priorities have changed! • I don’t care about the underlying Hardware • Focus is on Business Value  My own Application • Performance Management reflects that Performance SLA must impact Application5
    6. 6. Meaningful SLAs • Application Performance – End-to-End Response Time – Throughput • Application Availability – Reachable by the End Users Performance SLA is Application specific Cloud SLA cannot cover that directly6
    7. 7. Possible Cloud Performance SLAs • IaaS – Guaranteed Capacity (CPU, Memory, Bandwidth…) – Guaranteed Latencies (Network, Load balancer, Disk…) – Meaning and Enforcement outside app context? • PaaS – Guaranteed on Application Interfaces – Meaning and Enforcement outside app context?7
    8. 8. Side Effect of missing Performance SLAsNo viable way to compare Price/Performance between multiple providers 8
    9. 9. APM to the Rescue9
    10. 10. What we care about But slow is bad Faster is not better10
    11. 11. End-to-End Response Time PerformanceUser Click On the Web Server In the Application In the Cloud
    12. 12. Application Response Time Cloud DB Latency Performance12
    13. 13. Cloud Performance SLA • Response Time SLA is Application based • Latencies can be measured in the Application • Latency SLA impacts Application and is enforceable13
    14. 14. Capacity Usage Used CPU Time14
    15. 15. AWS Elastic Map/Reduce Performancehttp://blog.dynatrace.com/2012/01/25/about-the-performance-of-map-reduce-jobs/15
    16. 16. Cloud Performance SLA • CPU Usage can be measured in the Application (Attention: this is not utilization!) • Capacity SLA is measurable and enforceable16
    17. 17. Detect application hotspots17
    18. 18. Putting Cloud Monitoring in Context Steal Time or out of CPU? Cause for Latency
    19. 19. Benefits of APM for Cloud Application? • Identify Performance Problems End-to-End! • Determine Cloud vs. Application Issue • Enforce Cloud Performance SLA • Enforce Third-Party SLAs Optimization can reduce the number of instances Reduces Cost!19
    20. 20. Side Effect: A Price Performance Index• Dollar Value for acceptable Performance: 90th response time/(Total Cost/Number of Transactions) Desired Throughput/Total Cost – Mind Volatility – Price Performance Index is comparable• Cost Scalability – Cost per Transaction must remain stable Performance is no longer defined by Capacity It is a function of desired User Experience and associated Cost
    21. 21. Questions THANK YOUMichael KoppMichael.kopp@dynaTrace.comhttp://blog.dynatrace.com@mikopp

    ×