Your SlideShare is downloading. ×
What does performance mean in the cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

What does performance mean in the cloud

703
views

Published on

Performance problems are one of the most cited concerns about to the cloud. But is it really the cloud or the application? What does performance mean anyway when you can scale to thousands of servers? …

Performance problems are one of the most cited concerns about to the cloud. But is it really the cloud or the application? What does performance mean anyway when you can scale to thousands of servers? This session will discuss why traditional means of performance management and troubleshooting no longer work and how this affects everything. Most importantly we will look at how to identify the root cause of performance problems in such dynamic environments. Finally we will explain how to assess and manage performance when capacity is no longer the issue.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
703
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Surveys find that (http://callcenterinfo.tmcnet.com/analysis/articles/149923-survey-finds-cloud-application-performance-concern-delaying-adoption.htm) performance concerns about the cloud rise.Delay of cloud adoption due to perceived and measured bad performance. When we look at it however the real problem is not the bad performance itself, but that it is not understood what to do in such a case?- Cloud Provider SLAs are purely on availability metrics and there mostly on availability of their APIs and not the instances them selves- There are no SLAs on actual provided capacity nor reports on actual consumed capacityTo make matters worse, due to the technology itself traditional APM tools fail to deliver these metrics, so the cloud customer is left in the lerge.Is it the Cloud or is it the Application. Or both? or None?So the first thing that we need to solve the cloud performance concern is the ability to measure our application and identify the root cause of performance issues be it the cloud, a thirdparty service, the application itself or further upfront in the delivery chain.That however brings up a far more important question, what does performance mean? And here it can be said that actually the term performance does not change in the cloud. If we define performance as pure speed, then it is independent of the cloud, it does not matter how much instances we have. Speed is defined by the response time of a single transaction under defined circumstances. To make things simple, lets define performance being the speed of a single transaction when there is nothing else going on.Flow.Raw speed can be impacted by cloud hardware, services and everything else. While we can measure that by looking at things like node response time, the only way to analyze it is to get visibility into the transaction. Then we see whether it is the application that is slow, squandering resources or if it is waiting for resources or simply not getting enough CPU. The beauty is that can now be compared with speed on premise in a similar distrubted setup. A comparision will show the differences. and while we can never analyze cloud issues on premise we can understand where the cloud has impact in comparision to on premise. and we can identify these issues even if we don't compare.Now about scalability, this is the main case for the cloud. Scalability defines how much parallel transactions can be served without degradition of response time. or if we talk batch or transaction processing. How does throughput increase when adding another node. Now if "performance" goes down under load we scale up. if performance is than satisfactory again we say it scales. if performance goes down although we add resources than it does not scale. Or if we need to add thrise the number of resources for twice the load, it might scale but not very good. The important thing to understand now is that these kind of scalability issues can be again both in the application or the cloud. Only here it will not be a matter of cpu or disk most likely. the most likely congestion will happen in cloud services and network. And again we see why the current offered cloud monitoring is not enough to help. While we might be able to see the slow down of a service under load we will not see if it is uniformly slower or only for certain requests. so we do not see if it isreally the load that is the problem. the same is true for network. Of course for the application itself its even worse if we can't look inside.---- Scaling on application metrics. understand application impact, business impact.So in order to solve this we must again look inside the application. What's more we need to understand what the application is doing, which different transactions are doing what and how they might effect each other. In reallity it is not so much different from an on premise installation. But with much more moving parts.However with proper tools we can master this challenge.Now that we can measure, understand and diagnose our applications in the cloud we can also finally understand what performance means in the cloud. Or more presicely how the performance and scalability of our application differs there. We can now define what performance in the cloud means. It means Response Time/$ or Throughput/$. In this scenario the response time or throughput is something that you define and measure. once you achieve this than the in the cloud of your choice performance is not a "concern". However more importantly this kind of price performance index allows you to compare not only cloud against on premise it allows you to compare cloud vendors to each other!
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!First a cloud build upon sharing resources can never perform better than a dedicated environment. But that is not even the question. The real question is
  • End User Performance equals PurePerformance + Scalability
  • Profilers will not work, cloud monitoring is not application monitoring. Application monitoring in its traditional sense only tells us when something is slow but not why. This is important because we cannot replicate it in a normal environment and we need to understand it fast, because tomorrow we will deploy again, new changes will make analysis all the harder and might add new problems. On the other hand if we find it fast, we have the chance of fixing and improving tomorrow without changing our schedule.
  • As wehaveseenevnethe real utilizationcannottellperformance.Time is relativeUtilization in theguestisuselessUtilization on thehostdoes not allowtoinferperformanceThresholdscannotbemanagedPerformance can not beinferedfromresourceusage
  • This can and should be measured outside the cloud. We can do this via synthetic transaction monitoring which gives us a good feel about the base line performance and wholesome degradations. Of course we need to be sure we do this from the most important locations in the world to take backbones into account. Another way of doing this is even closer to the user, which is called RUM or UEM. This measures the responsetime directly from the browser of the customer via injected java scrip agents.
  • purepath
  • If you don’t see anything here, then you really don’t care about it.
  • One General and one Detail Transaction Flow with Database Impact. About Business Transactions
  • CPU Usage on the Web Server is the cause for volatility here. This is really usage, not a percentage, which means it is really an application issue.If on the other hand we would see wait or I/O growing than it might well be virtualization that is the cause for volatility here.This is of course only a high level picture, but I think you get the idea.
  • Scalability comes before performance in the cloud. Or to be more specific, Scalability trumps resource usage. We used to make a tradeoff between scalability and resource usage like CPU, Memory or disk usage. That does not hold true in a cloud. We have cpu, memory we have disk. The one thing that are still limiting factors are network and database. That needs to be taken care in the design. We can remove sync points in the database with NoSQL and Data denormalization. We can take care of network by using multiple zones and clouds and cdns to some degree. But to a larger degree bandwitdh needs to be taken care in the design.All that makes our application more scalable, the downside is that it makes it harder to understand single transactions, harder to monitor and harder to analyze. And of course, once we have an application, finding scalability issues is not easy, and cloud sizing does make it all the harder.
  • Transcript

    • 1. What does performance mean in thecloud?
    • 2. What are the risks of moving to the cloud? IDC(Survey Q4 „09) Results from actual pilots (March 2010) Perception Primary Benefits Biggest Issues Before Reduced IT costs Security “The Maturing Cloud: What It Will Take to Win” (Published Mar After Scalability Performance 2010) What are the major risks in the Agility SLA Management Cloud? • Security – 87.5% • Availability – 83.3% • Performance – 82.9% (88.6% stated that cloud “All About The Cloud” Conference (May 2010) service providers need to “Security in the Cloud isn‟t any harder than it is in the provide SLAs) Enterprise – it‟s just different” (Unisys) “[Application] Performance Management in the Cloud is becoming the hot topic” (THINKstrategies)  Projects fail to deliver acceptable performance  Moving Legacy Applications is harder than thought
    • 3. What is Performance?
    • 4. Performance ≠ Scalability The Cloud scales, but does it perform?
    • 5. How do we measure Performance  Response Time  Transaction Level Metric  Don’t use averages  High Volatility  Be specific  Which type of transaction  Throughput  Volume of Transactions per Timeframe  Average Speed of Transaction  Be specific  Which type of transactions
    • 6. What does Scalability mean  More concurrent Transactions with same response time  Linear growing Throughput with linear more hardware Scalability depends on Performance
    • 7. Performance in the cloud  “Pure Performance” is never better in a Cloud!  Co Tenancy  Resource sharing  Commodity and generally smaller hardware  Scalability can be better in the Cloud  Rapid elasticity  Depends on Application Design and Performance  Legacy Applications have limitations  End User Performance depends on both and more  Web Delivery Chain  Network!  Can be better than on premise!
    • 8. Performance Management in theCloud
    • 9. Traditional Performance Management - Fails  Sniffing and other appliances do not work  Are based on System metrics which are  Corrupted  Do not answer application performance questions  Are not manageable  To many unrelated metrics  Does not deal well Exponential Complexity Increase
    • 10. Why is Cloud Monitoring not enough?  Only System and High Level Response Metrics  No Visibility into Application (Regressions, MTTR, Application Dependencies)  No Visibility into End User Impact  Business Impact We need Application Focus
    • 11. What we really care about Availability and Baseline PerformanceWeb 2.0 Load Balancer WebServer Frontend(s) Backend(s) Private Datacenter Detailed Contribution End User Performance Times
    • 12. Key Challenge - Volatility Real vs. Measured Performance ^= F(Capacity) 60 Utilization 40 20 0
    • 13. Measure Performance where it matters But slow is bad Faster is not better
    • 14. Understand your Transactions
    • 15. End To End: Don„t forget the Chain User Click On the Web Server In the In the Application Cloud
    • 16. Details, Details, Details, but be aware… High Volatility  Steal Time  Shared I/O Virtualization Aware Timers  Shared network
    • 17. Monitoring the Complex
    • 18. Cloud Designs are simple, yet…  Everything Fails!  Tight Couple End User Delivery Components  Few Tiers  Response Time  Scale Upfront 100.000s users
    • 19. Cloud Designs are simple, yet…  Everything Fails!  Tight Couple End User Delivery Components  Few Tiers  Response Time  Scale Upfront  Loosely Couple everything else  Throughput  Scale everything independent Simple Designs still lead to Complex Systems Complex Systems are hard to manage
    • 20. Monitoring Complex Systems – Look at what matters
    • 21. Context matters  Too much Aggregation will blur the picture Buying Books Buying Context DVDs matters! Buying Cloth
    • 22. Measure what Matters  The Application and its Business Transactions  Measure End User Performance  Measure Throughput on Transaction Type Level  How Performance effects your business  e.g. Conversion Rate  SLA Window  Cost vs. Gain  Prioritize based on what matters most
    • 23. Identify cause of End User Impact Flow of single Transaction Response Time Hotspots
    • 24. Cloud vs. Application Application shows otherwise Cloud Monitoring would show CPU
    • 25. Application or Cloud Instance? Application Hotspots: CPU, Wait, I/O, Sync, Susp ension? Cause for Volatility?
    • 26. putting Cloud Monitoring in Context Steal Time or out of CPU? Cause for Latency
    • 27. We want to scale the Application and not the Cloud  Auto Scaling on System metrics  Is indirect and not goal oriented  Fails when application changes  Scale on Application Metrics and Application Components  Transaction Load  Response Time Contribution and Trend  Throughput Goals
    • 28. Rapid Deployment and Availability
    • 29. Understand your Flow  Understand the Application Flow  Always Capture Performance Data  Everything is transitory  Reproducing problems is hard  Analyze offline  Identify Contributors
    • 30. Automatically detect Regressions  Deploy  Compare  Fix small  Start again
    • 31. Reacting Automatically to Issues  Disk Latency Degradation  Too much steal time  Hardware Issues  Detect “Application” Degradation Terminate! And start new
    • 32. Make sure you are not blind  Application Monitoring must be high available  Outside and Inside  Failover  not in the same zone.  Automated Deployments  Zero Configuration Monitoring
    • 33. Assessing Performance/Value
    • 34. What is the goal?  Performance and Scalability are not self serving  “Desired” End User Experience  Faster than that is not better  Using less resources is cheaper!
    • 35. A Price Performance Index  Dollar Value for acceptable Performance: 90th response time/(Total Cost/Number of Transactions) Desired Throughput/Total Cost  Mind Volatility  Price Performance Index is comparable  Cost Scalability  Cost per Transaction must remain stable Performance is not based on Capacity It is a function of desired User Experience and associated Cost
    • 36. Questions Michael Kopp Michael.kopp@dynaTrace.com http://blog.dynatrace.com @mikopp