Application PerformanceManagement in the CloudsLessons Learned
Application Performance Management                                $                    DB        App          App         ...
Application Performance is a Business IssueImproving performance lowers cost and increases revenueCost                    ...
Lesson #1    Private and Public Clouds are not alike4
Private CloudApplications in a private CloudCommunity portal   Web Server Application Server   Backend   Database      wik...
Private CloudApplications in a private Cloud
Identifying hidden Application Impact                                   1000                                   500        ...
Private Cloud   Overcoming Organisational BarriersApplication Team   No Line of Communication   Application TeamTwo       ...
Private CloudAPM in a private Cloud
Private Cloud Application Balance and Resource Optimization                                                               ...
Public Cloud  Hidden Impact in the Public Cloud                                   1000                                   5...
Public Cloud Your Application is your only concern                                                 Infrastructure Issue?  ...
Lesson #2     Cloud Monitoring must be     Application (Performance) Monitoring13
Application Performance starts with the End User                                        Public Cloud                      ...
Public Cloud     Managing End User Experience (EUE)                                                                       ...
Real End-to-End Application Performance                                 Our Application                   Third Party     ...
Lesson #3     Time and resources are relative in the Clouds17
How do you measure response time…
…when your clock is skewed?   • Use virtualization Aware times   • and/or Exclude heavily suspended Transaction from analy...
Public Cloud Utilization of Cloud Resources?                                     Real Instance CPU Time     Over 60% Steal...
Impact on Business Transactions!     Use Latency and Transaction               Impact        instead of Utilization       ...
Lesson #4     Rapid Elasticity is painful, but a success     requirement!22
Why do we want elasticity?                        Resource     Enterprise Data Center                                     ...
Why do we want elasticity?                              Resource          Enterprise Data Center     Unplanned and     unh...
Why do we want elasticity?                  Resource              Enterprise Data Center         Unplanned and         unh...
The Cloud Reason              Enterprise Data Center                        Cloud         Unplanned and                   ...
The Cloud Reason              Enterprise Data Center                                Cloud         Unplanned and         un...
Why we scale down again     Enterprise Data Center                                   Cloud                                ...
Why we scale down again     Enterprise Data Center                Cloud           Elastic Scaling is a         Business Re...
Lesson #5Public Cloud APM is not about resources…it’sabout operational cost!
Capacity Planning and Resource                                  Time                     Optimization       Next investmen...
On Demand Provisioning and Resource        Time                     OptimizationLoad and Resources                        ...
Load and Resources                     On Demand Provisioning                     Time                                    ...
Cloud Resources            Where does              the cost            come from?      Resource Usage        how our appli...
Cloud Resources                                           Developer decisionsUser behavior    Resource Usage              ...
Manage by planning?      Enterprise Data Center             Cloud                                          Developer decis...
Manage by monitoring!     Enterprise Data Center                     Cloud                                       Developer...
Managing Cost Cost functionsof our resources                $                                         Cloud Resources     ...
Managing Cost Cost functionsof our resources                $                          $                        $         ...
Managing Cost                                         $                        $                        $     End-user vis...
Managing Cost                             $                        $                        $                             ...
THANK YOU     Michael Kopp, Technology Strategist     michael.kopp@compuware.com     @mikopp42     blog.dynatrace.com
Upcoming SlideShare
Loading in …5
×

Application Performance Management in the Clouds - Lessons Learned

1,013 views

Published on

We face the challenge of monitoring and managing performance in clouds every other day. Not only is application performance management different in a cloud, but all clouds are not equal either. This lessons learned session will show how to do APM in several different Clouds (Azure, EC2, VMware private Clouds) and how it differs from more traditional environments. The session will also cover performance monitoring, troubleshooting and tuning in environments where resources are virtually infinite, but application performance is not.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,013
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Last updated or created: April ‘11Key themes:You improve application performance to improve your business. It is a business issue more than a technical one.Talk trackWhy worry about application performance? Because it improves your businessThere are numerous studies that prove that improving application performance can reduce cost and increase revenue. Reduce Cost-- one study demonstrated that improving application performance lowered the effort – and cost – needed to resolve problems by 83%. That not only saves money and effort, but it delivers results more quickly-- another study determined that improving application performance reduced calls to the call center by 61%. If those calls are customer calls, that will also directly increase revenue.Improve Revenue-- There is a direct and clear correlation between website performance and customer conversion rates. Time and time again customers are proving with their actions that the faster the site is the more likely they are to stay on a it and move through a conversion process. We’ve seen, on average, that conversion rates can increase by over 70% if page load times decrease from 8 seconds to 2 seconds.There is also a direct correlation between abandonment rates and website performance. Using another set of observed data we’ve seen a 39% DECREASE in abandonment rates when page load times drop from 8 seconds to 2 seconds.Bottom line: improving app performance improves your business
  • Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • In a sense virtualziationistheopositeofDevOpswhichhasbeenmy last forthisgroup.Ops – App People, problem, but directreleation.Now App People don‘tseewhentheyhave a opsproblem? Opssayseverythingfine. This is not only a problem in production, thinkabouttesting in a virtualizedenvironemnt, youeitherhavetomakesureyouget „dedicated“ environemntoryouhavetofilter out thenoise.This abstractionmakesproblemsolvingevenharderthantoday. Itmakesitevenmoreobviousthatthe total separationofappandopsisnolongfeasible.
  • Correlationevenharder
  • Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • Now especially in a public cloud we have less visibility and control than in a private cloud. At the same time more of what we use is third party (internet back bone, Cdn, load balancers, databases). To know what is going on we need that visibility back, and we need to start where it matters, in our case at the user.End user  Application (impact of IT)  Services like DB and WebServiceAzure? EC2?
  • Last updated or created: April ‘11Key themes:major change #3: the Cloud has arrivedTalk trackIf it wasn’t complicated enough to have the data center and the web be more complex, now we also have the cloud as part of the equation.More and more companies are moving some or all of their applications to a private or public cloud. And that certainly changes the way you do APM – the cloud is opaque, so you can’t monitor its inner workings, and the cloud is shared, so you need to be careful that someone else’s app is not making yours slow.THIS is today’s app delivery chain. Far more complex than just a few years ago.
  • From virtualization we already knew that timing is sometimes a problem. However to do proper fault domain resolution we needed to have accurate timing at least at the tier and service level.The timing problem.There is more, the timing issue leads to the problem that guest meassures are skewed, this is a problem for APM as we need to know how utialized things are. In a private cloud we can use the VM and vHost metrics to make up for that. We can correlate them on a time basis, thus we ignore the guest metrics for the most part. But for performance analysis we need to know more detailed CPU break downs on our application. Lukily vendors like vmware ensure to a large degree that the CPU time accounted on threads works out, in addition we correlate the steal time so that we know which transactions we must simply ignore in the analysis because they are skewed beyond repair.In a public cloud things a little more difficult, we get less insight into the metrics. But there are other caviats. Let’s take EC2, we found that CPU…Azure?
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • Add Load Balancer and RDS DashboardCorrelate as much of the Cloud and guest metrics as we can.
  • And finally, That brought us to the most important lesson learned. And that is that we don’t really care about resource usage in a public cloud at all. We care about application SLAs and about cost effectiveness. And In a public cloud cost effectiveness is not the same as resource effectiveness. … So we need again to monitor the right things. We need to know the cost structure of a transaction and what kind of revenue it brings in order to set priorities. E.g. optimizing the search function so that it A) delivers better results and is not executed 5 times by every user and B) to use less database calls saves us money even it maybe uses a little bit more CPU and is not a bit faster to the end user, remember the end users performance depends on more than the server anyway.
  • … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • Add QR Code
  • Application Performance Management in the Clouds - Lessons Learned

    1. 1. Application PerformanceManagement in the CloudsLessons Learned
    2. 2. Application Performance Management $ DB App App Web Servers Servers Servers Servers Maria Carl Tenant Manage End User Satisfaction Manage Application Performance Ensure SLA compliance Ensure optimal Performance and Resource Utilization
    3. 3. Application Performance is a Business IssueImproving performance lowers cost and increases revenueCost RevenueREDUCED… IMPROVED…• Reduce Problem • Improved Conversion Rate Resolution Time • Improved Capacity• Reduce Hardware and • Improved Productivity other operational Cost• Less Production Issues Sources varied, including Compuware ROI studies and actual observed user behavior over 180M+ page views
    4. 4. Lesson #1 Private and Public Clouds are not alike4
    5. 5. Private CloudApplications in a private CloudCommunity portal Web Server Application Server Backend Database wiki Web Server Application Server Backend Database
    6. 6. Private CloudApplications in a private Cloud
    7. 7. Identifying hidden Application Impact 1000 500 0 Response Time Throughput800600 Another Application? Infrastructure?400200 0 Response Time Throughput
    8. 8. Private Cloud Overcoming Organisational BarriersApplication Team No Line of Communication Application TeamTwo One Operations Team One Operations Team Two Virtualization Team
    9. 9. Private CloudAPM in a private Cloud
    10. 10. Private Cloud Application Balance and Resource Optimization Is this due to application failures? How healthy is my application? Do we have any problems? What is the resource utilization of my virtualized hosts and Or do we have an guests? Infrastructure Issue? CPU Overcommit? Which other Application is impacted Memory Overcommit? or impacts ours?10
    11. 11. Public Cloud Hidden Impact in the Public Cloud 1000 500 0 Response Time Throughput800600400 No Resource Utilization No Visibility in200 underlying Layer Goal! 0 Response Time Throughput
    12. 12. Public Cloud Your Application is your only concern Infrastructure Issue? Did we reach capacity? Steal time? Application is King! Infrastructurerecycle! Scale up or is a black box and commodity12
    13. 13. Lesson #2 Cloud Monitoring must be Application (Performance) Monitoring13
    14. 14. Application Performance starts with the End User Public Cloud Users Load Balancers ▪ Web ▪ Application logic ▪ Database ▪ Network Third ISPs ▪ Mobile carriers ▪ Browsers ▪ CDNs ▪ Third party services Party ▪ Devices ▪ AJAX ▪ JavaScript ▪ Mobile apps Customers Load BalancerApplication Application Database CDN Employees Browser & DeviceInfrastructure Cloud Monitoring Cloud Internet Backbone Performance
    15. 15. Public Cloud Managing End User Experience (EUE) What did the user do?Where do my users come from? Is the problem in the Browser? In the AJAX Call? The Web-Server? In the Application? Or is it a 3rd party Which devices to they use? service? Which users suffer from bad user experience? Any Bandwidth Issues? Mobile Carrier? If the problem is my 3rd Party Content – Who was it? Does Facebook, LinkedIn or Google Ads have a negative impact?
    16. 16. Real End-to-End Application Performance Our Application Third Party External End User Services Identify Fault Domain Impact of Cloud Resources16
    17. 17. Lesson #3 Time and resources are relative in the Clouds17
    18. 18. How do you measure response time…
    19. 19. …when your clock is skewed? • Use virtualization Aware times • and/or Exclude heavily suspended Transaction from analysisTier Response Times VM Suspension Inter Tier Latency
    20. 20. Public Cloud Utilization of Cloud Resources? Real Instance CPU Time Over 60% Steal Time? Know how toreal CPU, but you EC2 shows shows allocated VM Ware interpret resourceCPU, but you metrics! you boughtcannot use more than get all of it might not
    21. 21. Impact on Business Transactions! Use Latency and Transaction Impact instead of Utilization Latency can be easily correlated Latency on specific Transaction21
    22. 22. Lesson #4 Rapid Elasticity is painful, but a success requirement!22
    23. 23. Why do we want elasticity? Resource Enterprise Data Center Static Provisioning is easy, save and performance can be Database guaranteed. CPU Storage Elastic Scaling has no advantage for the Application!23
    24. 24. Why do we want elasticity? Resource Enterprise Data Center Unplanned and unhandled load Available Capacity Database CPU Storage24
    25. 25. Why do we want elasticity? Resource Enterprise Data Center Unplanned and unhandled loadSlow Application Database CPU StorageUnsatisfied UsersLoss of Revenue 25
    26. 26. The Cloud Reason Enterprise Data Center Cloud Unplanned and On-Demand unhandled load ProvisioningSlow Application CPU Database Database CPU CPU Storage StorageUnsatisfied UsersLoss of Revenue No Capacity Barrier 26
    27. 27. The Cloud Reason Enterprise Data Center Cloud Unplanned and unhandled load Easy ScalingSlow Application Sowhydoup iswant to PublicCloud is easy! Scaling we about Purchased CPU On scale down again? Scaling down is hard! Demand Provisioning Database Database capacity CPU Storage StorageUnsatisfied UsersLoss of Revenue No Capacity Barrier 27
    28. 28. Why we scale down again Enterprise Data Center Cloud Overprovisioned Saving Resources Additional But already paid Runtime$$ and Costs! CPU Database Database CPU Storage28 Storage
    29. 29. Why we scale down again Enterprise Data Center Cloud Elastic Scaling is a Business Requirement CPU Database Database CPU Storage Storage Not a technical one!29
    30. 30. Lesson #5Public Cloud APM is not about resources…it’sabout operational cost!
    31. 31. Capacity Planning and Resource Time Optimization Next investment: estimate and buy… Capacity planning: …and postpone estimate future load new investmentsLoad and Resources and buy infrastructure Available Resources Load Resource Usage time Performance optimization: use existing infrastructure as long as possible…
    32. 32. On Demand Provisioning and Resource Time OptimizationLoad and Resources Available Resources Load Resource Usage Performance optimization: lower the cost structure time
    33. 33. Load and Resources On Demand Provisioning Time Available Resources Load Resource Usage Optimization in the Cloud Performance optimization: lower the cost structure is about Cost Savings! time
    34. 34. Cloud Resources Where does the cost come from? Resource Usage how our application consumes these resources34
    35. 35. Cloud Resources Developer decisionsUser behavior Resource Usage Implementation how our application consumes these resources35
    36. 36. Manage by planning? Enterprise Data Center Cloud Developer decisionsUser behavior Implementation Purchased Planned capacity Cost capacity estimation36
    37. 37. Manage by monitoring! Enterprise Data Center Cloud Developer decisions Implementation User behavior Purchased Planned CPU Database capacity Cost capacity Storage estimation APM Business Transactions37 UEM Application-centric
    38. 38. Managing Cost Cost functionsof our resources $ Cloud Resources $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Resource Usage Database Storage Billing … how our application Search .2% consumes these resources 18 How these resources are usedPurchase .4% 3 1 by our application Identify costly Identify costly transactions features38
    39. 39. Managing Cost Cost functionsof our resources $ $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Database Storage Billing … Search .2% 18 How these resources are usedPurchase .4% 3 1 by our application Identify costly Identify costly Identify costly Identify costly transactions features user behavior tenants39
    40. 40. Managing Cost $ $ $ End-user visibility Understand how our Amount of Resource Amount of Resource Amount of Re application drives cost Compute Database Storag Search .2% 18 Purchase .4% 340
    41. 41. Managing Cost $ $ $ Amount of Resource Amount of Resource Amount of Re APM let’s you Compute Database Storag Optimize the Cost Structure of Search .2% 18 your Business Transactions Purchase .4% 341
    42. 42. THANK YOU Michael Kopp, Technology Strategist michael.kopp@compuware.com @mikopp42 blog.dynatrace.com

    ×