Application Performance
Management in the Clouds
Lessons Learned
Application Performance Management                                $




                    DB        App          App         Web
                  Servers    Servers      Servers     Servers   Maria
                                                                Carl


                                                                Tenant

   Manage End User Satisfaction
   Manage Application Performance
   Ensure SLA compliance
   Ensure optimal Performance and Resource Utilization
Application Performance is a Business Issue
Improving performance lowers cost and increases revenue




Cost                                                             Revenue
REDUCED…                                                            IMPROVED…

• Reduce Problem                                                  • Improved Conversion Rate
  Resolution Time
                                                                  • Improved Capacity
• Reduce Hardware and
                                                                  • Improved Productivity
  other operational Cost
• Less Production Issues




 Sources varied, including Compuware ROI studies and actual observed user behavior over 180M+ page views
Lesson #1
    Private and Public Clouds are not alike



4
Private Cloud
Applications in a private Cloud
Community portal   Web Server Application Server   Backend   Database




      wiki         Web Server Application Server   Backend   Database
Private Cloud
Applications in a private Cloud
Identifying hidden Application Impact
                                   1000


                                   500


                                     0
                                          Response Time       Throughput

800
600
                                                     Another Application?
                                                       Infrastructure?
400
200
 0
      Response Time   Throughput
Private Cloud
   Overcoming Organisational Barriers
Application Team   No Line of Communication   Application Team
Two                                           One




                                                      Operations Team
                                                      One
 Operations Team
 Two
                          Virtualization
                          Team
Private Cloud
APM in a private Cloud
Private Cloud
 Application Balance and Resource Optimization
                                                                   Is this due to
                                                                    application
                                                                      failures?

            How healthy is my
          application? Do we have
               any problems?




     What is the resource
        utilization of my
     virtualized hosts and                                           Or do we have an
             guests?                                               Infrastructure Issue?
                                            CPU Overcommit?




                                                                   Which other
                                                              Application is impacted
                                    Memory Overcommit?          or impacts ours?



10
Public Cloud
  Hidden Impact in the Public Cloud
                                   1000


                                   500


                                     0
                                          Response Time     Throughput

800
600
400
                                                  No Resource Utilization
                                                      No Visibility in
200                                                  underlying Layer
                                                          Goal!
 0
      Response Time   Throughput
Public Cloud
 Your Application is your only concern




                                                 Infrastructure Issue?
     Did we reach capacity?



                              Steal time?




                                              Application is King!
                                            Infrastructurerecycle!
                                              Scale up or is a black
                                              box and commodity



12
Lesson #2
     Cloud Monitoring must be
     Application (Performance) Monitoring




13
Application Performance starts with the End User


                                        Public Cloud                                                           Users
              Load Balancers ▪ Web ▪ Application logic ▪ Database ▪ Network      Third        ISPs ▪ Mobile carriers ▪ Browsers
                               ▪ CDNs ▪ Third party services                     Party   ▪ Devices ▪ AJAX ▪ JavaScript ▪ Mobile apps

                                                                                                                            Customers


                                                                Load
                                                              Balancer


Application                                                 Application

   Database
                                                                         CDN

                                                                                                                            Employees
                                                                                                              Browser & Device
Infrastructure   Cloud Monitoring                                              Cloud Internet Backbone          Performance
Public Cloud
     Managing End User Experience (EUE)
                                                                                     What did the user do?




Where do my users come from?




                                                                                       Is the problem in the Browser? In the
                                                                                         AJAX Call? The Web-Server? In the
                                                                                           Application? Or is it a 3rd party
                               Which devices to they use?                                             service?
                               Which users suffer from bad user experience?   Any Bandwidth Issues? Mobile Carrier?



         If the problem is my 3rd
           Party Content – Who
                was it? Does
           Facebook, LinkedIn or
             Google Ads have a
              negative impact?
Real End-to-End Application Performance



                                 Our Application
                   Third Party
                                                          External

        End User

                    Services




                                          Identify Fault Domain
                                        Impact of Cloud Resources



16
Lesson #3
     Time and resources are relative in the Clouds




17
How do you measure response time…
…when your clock is skewed?

   • Use virtualization Aware times
   • and/or Exclude heavily suspended Transaction from analysis
Tier Response Times




                                                           VM Suspension


                                      Inter Tier Latency
Public Cloud
 Utilization of Cloud Resources?




                                     Real Instance CPU Time




     Over 60% Steal Time?




 Know how toreal CPU, but you
   EC2 shows shows allocated
    VM Ware interpret resource
CPU, but you metrics! you bought
cannot use more than get all of it
             might not
Impact on Business Transactions!
     Use Latency and Transaction
               Impact
        instead of Utilization

                           Latency can be easily
                                correlated




                                                   Latency on specific
                                                       Transaction




21
Lesson #4
     Rapid Elasticity is painful, but a success
     requirement!


22
Why do we want elasticity?                        Resource

     Enterprise Data Center

                                     Static Provisioning is
                                     easy, save and
                                     performance can be
                          Database   guaranteed.
          CPU

                Storage




                                     Elastic Scaling has no
                                     advantage for the
                                     Application!


23
Why do we want elasticity?                              Resource

          Enterprise Data Center

     Unplanned and
     unhandled load

                                             Available
                                             Capacity

                                  Database
                  CPU

                        Storage




24
Why do we want elasticity?                  Resource

              Enterprise Data Center

         Unplanned and
         unhandled load




Slow Application
                                      Database
                      CPU

                            Storage




Unsatisfied Users


Loss of Revenue




    25
The Cloud Reason
              Enterprise Data Center                        Cloud

         Unplanned and                           On-Demand
         unhandled load                          Provisioning


Slow Application




                                                      CPU
                                      Database




                                                                       Database
                      CPU




                                                    CPU
                            Storage




                                                             Storage
Unsatisfied Users


Loss of Revenue

                                                   No Capacity
                                                     Barrier
    26
The Cloud Reason
              Enterprise Data Center                                Cloud

         Unplanned and
         unhandled load                                      Easy Scaling


Slow Application
                  Sowhydoup iswant to
                  PublicCloud is easy!
                    Scaling we about
                                                 Purchased




                                                              CPU
                 On scale down again?
                  Scaling down is hard!
                    Demand Provisioning
                                      Database




                                                                               Database
                                                  capacity
                      CPU

                            Storage




                                                                     Storage
Unsatisfied Users


Loss of Revenue

                                                             No Capacity
                                                               Barrier
    27
Why we scale down again
     Enterprise Data Center                                   Cloud


                                     Overprovisioned
                                                                   Saving Resources
                                                                      Additional
                                     But already paid               Runtime$$
                                                                        and Costs!




                                                        CPU
                          Database




                                                                         Database
          CPU

                Storage




28                                                             Storage
Why we scale down again
     Enterprise Data Center                Cloud



           Elastic Scaling is a
         Business Requirement



                                     CPU
                          Database




                                                      Database
          CPU

                Storage




                                            Storage
           Not a technical one!

29
Lesson #5
Public Cloud APM is not about resources…it’s
about operational cost!
Capacity Planning and Resource                                  Time
                     Optimization       Next investment:
                                                   estimate and buy…
                        Capacity planning:                              …and postpone
                       estimate future load                             new investments
Load and Resources




                      and buy infrastructure                                              Available Resources
                                                                                          Load
                                                                                          Resource Usage




                                                       time
                                        Performance optimization:
                                       use existing infrastructure as
                                            long as possible…
On Demand Provisioning and Resource        Time
                     Optimization
Load and Resources




                                                                 Available Resources
                                                                 Load
                                                                 Resource Usage


                                    Performance optimization:
                                     lower the cost structure

                                         time
Load and Resources
                     On Demand Provisioning                     Time




                                                                 Available Resources
                                                                 Load
                                                                 Resource Usage



                        Optimization in the Cloud
                                    Performance optimization:
                                     lower the cost structure

                         is about Cost Savings!
                                         time
Cloud Resources
            Where does
              the cost
            come from?
      Resource Usage
        how our application
     consumes these resources




34
Cloud Resources
                                           Developer decisions


User behavior    Resource Usage               Implementation
                   how our application
                consumes these resources




35
Manage by planning?
      Enterprise Data Center             Cloud


                                          Developer decisions


User behavior                                 Implementation
                           Purchased                 Planned
                            capacity      Cost       capacity
                                       estimation




36
Manage by monitoring!
     Enterprise Data Center                     Cloud

                                       Developer decisions
                                          Implementation
                                           User behavior

                          Purchased                                   Planned




                                          CPU




                                                           Database
                           capacity          Cost                     capacity




                                                 Storage
                                          estimation
                                                                      APM
                                           Business
                                        Transactions

37
                 UEM                  Application-centric
Managing Cost
 Cost functions
of our resources

                $
                                         Cloud Resources
                                           $                        $                        $                        $




                    Amount of Resource         Amount of Resource       Amount of Resource       Amount of Resource       Amount of Resource




                     Compute
                                          Resource Usage
                                           Database Storage                                          Billing                     …
                                        how our application
     Search               .2%        consumes these resources
                                                      18                                                           How these
                                                                                                               resources are used
Purchase                  .4%                         3                                                  1     by our application




         Identify costly                 Identify costly
          transactions                      features

38
Managing Cost
 Cost functions
of our resources

                $                          $                        $                        $                        $




                    Amount of Resource         Amount of Resource       Amount of Resource       Amount of Resource       Amount of Resource




                     Compute                    Database                   Storage                   Billing                     …

     Search               .2%                                                 18                                   How these
                                                                                                               resources are used
Purchase                  .4%                         3                                                  1     by our application




         Identify costly                 Identify costly                 Identify costly                  Identify costly
          transactions                      features                     user behavior                       tenants

39
Managing Cost


                                         $                        $                        $
     End-user visibility   Understand
                            how our
                                             Amount of Resource       Amount of Resource       Amount of Re
                           application
                           drives cost
                                              Compute                  Database                  Storag

                             Search                .2%                                               18

                            Purchase               .4%                       3




40
Managing Cost


                             $                        $                        $




                                 Amount of Resource       Amount of Resource       Amount of Re



             APM let’s you        Compute                  Database                  Storag

     Optimize the Cost Structure of
                   Search              .2%                                               18


      your Business Transactions
                  Purchase             .4%                       3




41
THANK YOU
     Michael Kopp, Technology Strategist
     michael.kopp@compuware.com
     @mikopp
42
     blog.dynatrace.com

Application Performance Management in the Clouds - Lessons Learned

  • 1.
    Application Performance Management inthe Clouds Lessons Learned
  • 2.
    Application Performance Management $ DB App App Web Servers Servers Servers Servers Maria Carl Tenant  Manage End User Satisfaction  Manage Application Performance  Ensure SLA compliance  Ensure optimal Performance and Resource Utilization
  • 3.
    Application Performance isa Business Issue Improving performance lowers cost and increases revenue Cost Revenue REDUCED… IMPROVED… • Reduce Problem • Improved Conversion Rate Resolution Time • Improved Capacity • Reduce Hardware and • Improved Productivity other operational Cost • Less Production Issues Sources varied, including Compuware ROI studies and actual observed user behavior over 180M+ page views
  • 4.
    Lesson #1 Private and Public Clouds are not alike 4
  • 5.
    Private Cloud Applications ina private Cloud Community portal Web Server Application Server Backend Database wiki Web Server Application Server Backend Database
  • 6.
  • 7.
    Identifying hidden ApplicationImpact 1000 500 0 Response Time Throughput 800 600 Another Application? Infrastructure? 400 200 0 Response Time Throughput
  • 8.
    Private Cloud Overcoming Organisational Barriers Application Team No Line of Communication Application Team Two One Operations Team One Operations Team Two Virtualization Team
  • 9.
    Private Cloud APM ina private Cloud
  • 10.
    Private Cloud ApplicationBalance and Resource Optimization Is this due to application failures? How healthy is my application? Do we have any problems? What is the resource utilization of my virtualized hosts and Or do we have an guests? Infrastructure Issue? CPU Overcommit? Which other Application is impacted Memory Overcommit? or impacts ours? 10
  • 11.
    Public Cloud Hidden Impact in the Public Cloud 1000 500 0 Response Time Throughput 800 600 400 No Resource Utilization No Visibility in 200 underlying Layer Goal! 0 Response Time Throughput
  • 12.
    Public Cloud YourApplication is your only concern Infrastructure Issue? Did we reach capacity? Steal time? Application is King! Infrastructurerecycle! Scale up or is a black box and commodity 12
  • 13.
    Lesson #2 Cloud Monitoring must be Application (Performance) Monitoring 13
  • 14.
    Application Performance startswith the End User Public Cloud Users Load Balancers ▪ Web ▪ Application logic ▪ Database ▪ Network Third ISPs ▪ Mobile carriers ▪ Browsers ▪ CDNs ▪ Third party services Party ▪ Devices ▪ AJAX ▪ JavaScript ▪ Mobile apps Customers Load Balancer Application Application Database CDN Employees Browser & Device Infrastructure Cloud Monitoring Cloud Internet Backbone Performance
  • 15.
    Public Cloud Managing End User Experience (EUE) What did the user do? Where do my users come from? Is the problem in the Browser? In the AJAX Call? The Web-Server? In the Application? Or is it a 3rd party Which devices to they use? service? Which users suffer from bad user experience? Any Bandwidth Issues? Mobile Carrier? If the problem is my 3rd Party Content – Who was it? Does Facebook, LinkedIn or Google Ads have a negative impact?
  • 16.
    Real End-to-End ApplicationPerformance Our Application Third Party External End User Services Identify Fault Domain Impact of Cloud Resources 16
  • 17.
    Lesson #3 Time and resources are relative in the Clouds 17
  • 18.
    How do youmeasure response time…
  • 19.
    …when your clockis skewed? • Use virtualization Aware times • and/or Exclude heavily suspended Transaction from analysis Tier Response Times VM Suspension Inter Tier Latency
  • 20.
    Public Cloud Utilizationof Cloud Resources? Real Instance CPU Time Over 60% Steal Time? Know how toreal CPU, but you EC2 shows shows allocated VM Ware interpret resource CPU, but you metrics! you bought cannot use more than get all of it might not
  • 21.
    Impact on BusinessTransactions! Use Latency and Transaction Impact instead of Utilization Latency can be easily correlated Latency on specific Transaction 21
  • 22.
    Lesson #4 Rapid Elasticity is painful, but a success requirement! 22
  • 23.
    Why do wewant elasticity? Resource Enterprise Data Center Static Provisioning is easy, save and performance can be Database guaranteed. CPU Storage Elastic Scaling has no advantage for the Application! 23
  • 24.
    Why do wewant elasticity? Resource Enterprise Data Center Unplanned and unhandled load Available Capacity Database CPU Storage 24
  • 25.
    Why do wewant elasticity? Resource Enterprise Data Center Unplanned and unhandled load Slow Application Database CPU Storage Unsatisfied Users Loss of Revenue 25
  • 26.
    The Cloud Reason Enterprise Data Center Cloud Unplanned and On-Demand unhandled load Provisioning Slow Application CPU Database Database CPU CPU Storage Storage Unsatisfied Users Loss of Revenue No Capacity Barrier 26
  • 27.
    The Cloud Reason Enterprise Data Center Cloud Unplanned and unhandled load Easy Scaling Slow Application Sowhydoup iswant to PublicCloud is easy! Scaling we about Purchased CPU On scale down again? Scaling down is hard! Demand Provisioning Database Database capacity CPU Storage Storage Unsatisfied Users Loss of Revenue No Capacity Barrier 27
  • 28.
    Why we scaledown again Enterprise Data Center Cloud Overprovisioned Saving Resources Additional But already paid Runtime$$ and Costs! CPU Database Database CPU Storage 28 Storage
  • 29.
    Why we scaledown again Enterprise Data Center Cloud Elastic Scaling is a Business Requirement CPU Database Database CPU Storage Storage Not a technical one! 29
  • 30.
    Lesson #5 Public CloudAPM is not about resources…it’s about operational cost!
  • 31.
    Capacity Planning andResource Time Optimization Next investment: estimate and buy… Capacity planning: …and postpone estimate future load new investments Load and Resources and buy infrastructure Available Resources Load Resource Usage time Performance optimization: use existing infrastructure as long as possible…
  • 32.
    On Demand Provisioningand Resource Time Optimization Load and Resources Available Resources Load Resource Usage Performance optimization: lower the cost structure time
  • 33.
    Load and Resources On Demand Provisioning Time Available Resources Load Resource Usage Optimization in the Cloud Performance optimization: lower the cost structure is about Cost Savings! time
  • 34.
    Cloud Resources Where does the cost come from? Resource Usage how our application consumes these resources 34
  • 35.
    Cloud Resources Developer decisions User behavior Resource Usage Implementation how our application consumes these resources 35
  • 36.
    Manage by planning? Enterprise Data Center Cloud Developer decisions User behavior Implementation Purchased Planned capacity Cost capacity estimation 36
  • 37.
    Manage by monitoring! Enterprise Data Center Cloud Developer decisions Implementation User behavior Purchased Planned CPU Database capacity Cost capacity Storage estimation APM Business Transactions 37 UEM Application-centric
  • 38.
    Managing Cost Costfunctions of our resources $ Cloud Resources $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Resource Usage Database Storage Billing … how our application Search .2% consumes these resources 18 How these resources are used Purchase .4% 3 1 by our application Identify costly Identify costly transactions features 38
  • 39.
    Managing Cost Costfunctions of our resources $ $ $ $ $ Amount of Resource Amount of Resource Amount of Resource Amount of Resource Amount of Resource Compute Database Storage Billing … Search .2% 18 How these resources are used Purchase .4% 3 1 by our application Identify costly Identify costly Identify costly Identify costly transactions features user behavior tenants 39
  • 40.
    Managing Cost $ $ $ End-user visibility Understand how our Amount of Resource Amount of Resource Amount of Re application drives cost Compute Database Storag Search .2% 18 Purchase .4% 3 40
  • 41.
    Managing Cost $ $ $ Amount of Resource Amount of Resource Amount of Re APM let’s you Compute Database Storag Optimize the Cost Structure of Search .2% 18 your Business Transactions Purchase .4% 3 41
  • 42.
    THANK YOU Michael Kopp, Technology Strategist michael.kopp@compuware.com @mikopp 42 blog.dynatrace.com

Editor's Notes

  • #4 Last updated or created: April ‘11Key themes:You improve application performance to improve your business. It is a business issue more than a technical one.Talk trackWhy worry about application performance? Because it improves your businessThere are numerous studies that prove that improving application performance can reduce cost and increase revenue. Reduce Cost-- one study demonstrated that improving application performance lowered the effort – and cost – needed to resolve problems by 83%. That not only saves money and effort, but it delivers results more quickly-- another study determined that improving application performance reduced calls to the call center by 61%. If those calls are customer calls, that will also directly increase revenue.Improve Revenue-- There is a direct and clear correlation between website performance and customer conversion rates. Time and time again customers are proving with their actions that the faster the site is the more likely they are to stay on a it and move through a conversion process. We’ve seen, on average, that conversion rates can increase by over 70% if page load times decrease from 8 seconds to 2 seconds.There is also a direct correlation between abandonment rates and website performance. Using another set of observed data we’ve seen a 39% DECREASE in abandonment rates when page load times drop from 8 seconds to 2 seconds.Bottom line: improving app performance improves your business
  • #8 Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • #9 In a sense virtualziationistheopositeofDevOpswhichhasbeenmy last forthisgroup.Ops – App People, problem, but directreleation.Now App People don‘tseewhentheyhave a opsproblem? Opssayseverythingfine. This is not only a problem in production, thinkabouttesting in a virtualizedenvironemnt, youeitherhavetomakesureyouget „dedicated“ environemntoryouhavetofilter out thenoise.This abstractionmakesproblemsolvingevenharderthantoday. Itmakesitevenmoreobviousthatthe total separationofappandopsisnolongfeasible.
  • #10 Correlationevenharder
  • #12 Thismeansthattwoapplications, ormore, canimpacteachother. This impactisreallyhiddenfromyourapplication, all itseesisthatitslows down orthatitdoesn‘tget 100% CPU. Even morethethingscaneffecteachotherthatcouldn‘tbefore: networkand I/O.
  • #14 Now especially in a public cloud we have less visibility and control than in a private cloud. At the same time more of what we use is third party (internet back bone, Cdn, load balancers, databases). To know what is going on we need that visibility back, and we need to start where it matters, in our case at the user.End user  Application (impact of IT)  Services like DB and WebServiceAzure? EC2?
  • #15 Last updated or created: April ‘11Key themes:major change #3: the Cloud has arrivedTalk trackIf it wasn’t complicated enough to have the data center and the web be more complex, now we also have the cloud as part of the equation.More and more companies are moving some or all of their applications to a private or public cloud. And that certainly changes the way you do APM – the cloud is opaque, so you can’t monitor its inner workings, and the cloud is shared, so you need to be careful that someone else’s app is not making yours slow.THIS is today’s app delivery chain. Far more complex than just a few years ago.
  • #18 From virtualization we already knew that timing is sometimes a problem. However to do proper fault domain resolution we needed to have accurate timing at least at the tier and service level.The timing problem.There is more, the timing issue leads to the problem that guest meassures are skewed, this is a problem for APM as we need to know how utialized things are. In a private cloud we can use the VM and vHost metrics to make up for that. We can correlate them on a time basis, thus we ignore the guest metrics for the most part. But for performance analysis we need to know more detailed CPU break downs on our application. Lukily vendors like vmware ensure to a large degree that the CPU time accounted on threads works out, in addition we correlate the steal time so that we know which transactions we must simply ignore in the analysis because they are skewed beyond repair.In a public cloud things a little more difficult, we get less insight into the metrics. But there are other caviats. Let’s take EC2, we found that CPU…Azure?
  • #19 A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • #20 A common misnomer is that Scalability takes care of performance. That is not true. Performance is about speed of a single transaction or throughput at a given size. Scalability is about being able to get the same speed with more transactions and more nodes. Scalability is about doubling throughput when doubling the size. This actually means that an application needs to perform in order to scale!
  • #21 Add Load Balancer and RDS DashboardCorrelate as much of the Cloud and guest metrics as we can.
  • #31 And finally, That brought us to the most important lesson learned. And that is that we don’t really care about resource usage in a public cloud at all. We care about application SLAs and about cost effectiveness. And In a public cloud cost effectiveness is not the same as resource effectiveness. … So we need again to monitor the right things. We need to know the cost structure of a transaction and what kind of revenue it brings in order to set priorities. E.g. optimizing the search function so that it A) delivers better results and is not executed 5 times by every user and B) to use less database calls saves us money even it maybe uses a little bit more CPU and is not a bit faster to the end user, remember the end users performance depends on more than the server anyway.
  • #37 … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • #38 … thatswhywe still do someplanning. In thecloud, we plan toget a costestimation—not forthepurposeofpurchasinginfrastructure.Sincethecloudis agile, wedon‘tneedtoover-provision ourresources, sinceweeasilycanacquirethem. In thiscase, weare lucky and save resourcesandthuscost.
  • #43 Add QR Code