• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Application Performance Management in the Clouds - Lessons Learned
 

Application Performance Management in the Clouds - Lessons Learned

on

  • 625 views

We face the challenge of monitoring and managing performance in clouds every other day. Not only is application performance management different in a cloud, but all clouds are not equal either. This ...

We face the challenge of monitoring and managing performance in clouds every other day. Not only is application performance management different in a cloud, but all clouds are not equal either. This lessons learned session will show how to do APM in several different Clouds (Azure, EC2, VMware private Clouds) and how it differs from more traditional environments. The session will also cover performance monitoring, troubleshooting and tuning in environments where resources are virtually infinite, but application performance is not.

Statistics

Views

Total Views
625
Views on SlideShare
623
Embed Views
2

Actions

Likes
3
Downloads
0
Comments
0

2 Embeds 2

http://www.slashdocs.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Last updated or created: April ‘11Key themes:You improve application performance to improve your business. It is a business issue more than a technical one.Talk trackWhy worry about application performance? Because it improves your businessThere are numerous studies that prove that improving application performance can reduce cost and increase revenue. Reduce Cost-- one study demonstrated that improving application performance lowered the effort – and cost – needed to resolve problems by 83%. That not only saves money and effort, but it delivers results more quickly-- another study determined that improving application performance reduced calls to the call center by 61%. If those calls are customer calls, that will also directly increase revenue.Improve Revenue-- There is a direct and clear correlation between website performance and customer conversion rates. Time and time again customers are proving with their actions that the faster the site is the more likely they are to stay on a it and move through a conversion process. We’ve seen, on average, that conversion rates can increase by over 70% if page load times decrease from 8 seconds to 2 seconds.There is also a direct correlation between abandonment rates and website performance. Using another set of observed data we’ve seen a 39% DECREASE in abandonment rates when page load times drop from 8 seconds to 2 seconds.Bottom line: improving app performance improves your business
  • Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • In a sense virtualziationistheopositeofDevOpswhichhasbeenmy last forthisgroup.Ops – App People, problem, but directreleation.Now App People don‘tseewhentheyhave a opsproblem? Opssayseverythingfine. This is not only a problem in production, thinkabouttesting in a virtualizedenvironemnt, youeitherhavetomakesureyouget „dedicated“ environemntoryouhavetofilter out thenoise.This abstractionmakesproblemsolvingevenharderthantoday. Itmakesitevenmoreobviousthatthe total separationofappandopsisnolongfeasible.
  • Correlationevenharder
  • Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • Now especially in a public cloud we have less visibility and control than in a private cloud. At the same time more of what we use is third party (internet back bone, Cdn, load balancers, databases). To know what is going on we need that visibility back, and we need to start where it matters, in our case at the user.End user  Application (impact of IT)  Services like DB and WebServiceAzure? EC2?
  • Last updated or created: April ‘11Key themes:major change #3: the Cloud has arrivedTalk trackIf it wasn’t complicated enough to have the data center and the web be more complex, now we also have the cloud as part of the equation.More and more companies are moving some or all of their applications to a private or public cloud. And that certainly changes the way you do APM – the cloud is opaque, so you can’t monitor its inner workings, and the cloud is shared, so you need to be careful that someone else’s app is not making yours slow.THIS is today’s app delivery chain. Far more complex than just a few years ago.
  • From virtualization we already knew that timing is sometimes a problem. However to do proper fault domain resolution we needed to have accurate timing at least at the tier and service level.The timing problem.There is more, the timing issue leads to the problem that guest meassures are skewed, this is a problem for APM as we need to know how utialized things are. In a private cloud we can use the VM and vHost metrics to make up for that. We can correlate them on a time basis, thus we ignore the guest metrics for the most part. But for performance analysis we need to know more detailed CPU break downs on our application. Lukily vendors like vmware ensure to a large degree that the CPU time accounted on threads works out, in addition we correlate the steal time so that we know which transactions we must simply ignore in the analysis because they are skewed beyond repair.In a public cloud things a little more difficult, we get less insight into the metrics. But there are other caviats. Let’s take EC2, we found that CPU…Azure?
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • Add Load Balancer and RDS DashboardCorrelate as much of the Cloud and guest metrics as we can.
  • And finally, That brought us to the most important lesson learned. And that is that we don’t really care about resource usage in a public cloud at all. We care about application SLAs and about cost effectiveness. And In a public cloud cost effectiveness is not the same as resource effectiveness. … So we need again to monitor the right things. We need to know the cost structure of a transaction and what kind of revenue it brings in order to set priorities. E.g. optimizing the search function so that it A) delivers better results and is not executed 5 times by every user and B) to use less database calls saves us money even it maybe uses a little bit more CPU and is not a bit faster to the end user, remember the end users performance depends on more than the server anyway.
  • … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • Add QR Code

Application Performance Management in the Clouds - Lessons Learned Application Performance Management in the Clouds - Lessons Learned Presentation Transcript

  • Application PerformanceManagement in the CloudsLessons Learned
  • Application Performance Management $ DB App App Web Servers Servers Servers Servers Maria Carl Tenant Manage End User Satisfaction Manage Application Performance Ensure SLA compliance Ensure optimal Performance and Resource Utilization
  • Application Performance is a Business IssueImproving performance lowers cost and increases revenueCost RevenueREDUCED… IMPROVED…• Reduce Problem • Improved Conversion Rate Resolution Time • Improved Capacity• Reduce Hardware and • Improved Productivity other operational Cost• Less Production Issues Sources varied, including Compuware ROI studies and actual observed user behavior over 180M+ page views
  • Lesson #1 Private and Public Clouds are not alike4
  • Private CloudApplications in a private CloudCommunity portal Web Server Application Server Backend Database wiki Web Server Application Server Backend Database
  • Private CloudApplications in a private Cloud
  • Identifying hidden Application Impact 1000 500 0 Response Time Throughput800600 Another Application? Infrastructure?400200 0 Response Time Throughput
  • Private Cloud Overcoming Organisational BarriersApplication Team No Line of Communication Application TeamTwo One Operations Team One Operations Team Two Virtualization Team
  • Private CloudAPM in a private Cloud
  • Private Cloud Application Balance and Resource Optimization Is this due to application failures? How healthy is my application? Do we have any problems? What is the resource utilization of my virtualized hosts and Or do we have an guests? Infrastructure Issue? CPU Overcommit? Which other Application is impacted Memory Overcommit? or impacts ours?10
  • Public Cloud Hidden Impact in the Public Cloud 1000 500 0 Response Time Throughput800600400 No Resource Utilization No Visibility in200 underlying Layer Goal! 0 Response Time Throughput
  • Public Cloud Your Application is your only concern Infrastructure Issue? Did we reach capacity? Steal time? Application is King! Infrastructurerecycle! Scale up or is a black box and commodity12
  • Lesson #2 Cloud Monitoring must be Application (Performance) Monitoring13
  • Application Performance starts with the End User Public Cloud Users Load Balancers ▪ Web ▪ Application logic ▪ Database ▪ Network Third ISPs ▪ Mobile carriers ▪ Browsers ▪ CDNs ▪ Third party services Party ▪ Devices ▪ AJAX ▪ JavaScript ▪ Mobile apps Customers Load BalancerApplication Application Database CDN Employees Browser & DeviceInfrastructure Cloud Monitoring Cloud Internet Backbone Performance
  • Public Cloud Managing End User Experience (EUE) What did the user do?Where do my users come from? Is the problem in the Browser? In the AJAX Call? The Web-Server? In the Application? Or is it a 3rd party Which devices to they use? service? Which users suffer from bad user experience? Any Bandwidth Issues? Mobile Carrier? If the problem is my 3rd Party Content – Who was it? Does Facebook, LinkedIn or Google Ads have a negative impact?
  • Real End-to-End Application Performance Our Application Third Party External End User Services Identify Fault Domain Impact of Cloud Resources16
  • Lesson #3 Time and resources are relative in the Clouds17
  • How do you measure response time…
  • …when your clock is skewed? • Use virtualization Aware times • and/or Exclude heavily suspended Transaction from analysisTier Response Times VM Suspension Inter Tier Latency
  • Public Cloud Utilization of Cloud Resources? Real Instance CPU Time Over 60% Steal Time? Know how toreal CPU, but you EC2 shows shows allocated VM Ware interpret resourceCPU, but you metrics! you boughtcannot use more than get all of it might not
  • Impact on Business Transactions! Use Latency and Transaction Impact instead of Utilization Latency can be easily correlated Latency on specific Transaction21
  • Lesson #4 Rapid Elasticity is painful, but a success requirement!22
  • Why do we want elasticity? Resource Enterprise Data Center Static Provisioning is easy, save and performance can be Database guaranteed. CPU Storage Elastic Scaling has no advantage for the Application!23
  • Why do we want elasticity? Resource Enterprise Data Center Unplanned and unhandled load Available Capacity Database CPU Storage24
  • Why do we want elasticity? Resource Enterprise Data Center Unplanned and unhandled loadSlow Application Database CPU StorageUnsatisfied UsersLoss of Revenue 25
  • The Cloud Reason Enterprise Data Center Cloud Unplanned and On-Demand unhandled load ProvisioningSlow Application CPU Database Database CPU CPU Storage StorageUnsatisfied UsersLoss of Revenue No Capacity Barrier 26
  • The Cloud Reason Enterprise Data Center Cloud Unplanned and unhandled load Easy ScalingSlow Application Sowhydoup iswant to PublicCloud is easy! Scaling we about Purchased CPU On scale down again? Scaling down is hard! Demand Provisioning Database Database capacity CPU Storage StorageUnsatisfied UsersLoss of Revenue No Capacity Barrier 27
  • Why we scale down again Enterprise Data Center Cloud Overprovisioned Saving Resources Additional But already paid Runtime$$ and Costs! CPU Database Database CPU Storage28 Storage
  • Why we scale down again Enterprise Data Center Cloud Elastic Scaling is a Business Requirement CPU Database Database CPU Storage Storage Not a technical one!29
  • Lesson #5Public Cloud APM is not about resources…it’sabout operational cost!
  • Capacity Planning and Resource Time Optimization Next investment: estimate and buy… Capacity planning: …and postpone estimate future load new investmentsLoad and Resources and buy infrastructure Available Resources Load Resource Usage time Performance optimization: use existing infrastructure as long as possible…
  • On Demand Provisioning and Resource Time OptimizationLoad and Resources Available Resources Load Resource Usage Performance optimization: lower the cost structure time
  • Load and Resources On Demand Provisioning Time Available Resources Load Resource Usage Optimization in the Cloud Performance optimization: lower the cost structure is about Cost Savings! time
  • Cloud Resources Where does the cost come from? Resource Usage how our application consumes these resources34
  • Cloud Resources Developer decisionsUser behavior Resource Usage Implementation how our application consumes these resources35
  • Manage by planning? Enterprise Data Center Cloud Developer decisionsUser behavior Implementation Purchased Planned capacity Cost capacity estimation36
  • Manage by monitoring! Enterprise Data Center Cloud Developer decisions Implementation User behavior Purchased Planned CPU Database capacity Cost capacity Storage estimation APM Business Transactions37 UEM Application-centric
  • Managing Cost Cost functionsof our resources $ Cloud Resources $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Resource Usage Database Storage Billing … how our application Search .2% consumes these resources 18 How these resources are usedPurchase .4% 3 1 by our application Identify costly Identify costly transactions features38
  • Managing Cost Cost functionsof our resources $ $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Database Storage Billing … Search .2% 18 How these resources are usedPurchase .4% 3 1 by our application Identify costly Identify costly Identify costly Identify costly transactions features user behavior tenants39
  • Managing Cost $ $ $ End-user visibility Understand how our Amount of Resource Amount of Resource Amount of Re application drives cost Compute Database Storag Search .2% 18 Purchase .4% 340
  • Managing Cost $ $ $ Amount of Resource Amount of Resource Amount of Re APM let’s you Compute Database Storag Optimize the Cost Structure of Search .2% 18 your Business Transactions Purchase .4% 341
  • THANK YOU Michael Kopp, Technology Strategist michael.kopp@compuware.com @mikopp42 blog.dynatrace.com