SlideShare a Scribd company logo
Cost Based Performance Modeling:

Addressing Uncertainties




 Eugene Margulis
 Telus Health Solutions
 eugene_margulis@yahoo.ca

 October, 2009
Outline




          • Background
          • What issues/questions are addressed by performance “stuff” ?
          • Performance/capacity cost model
          • Examples and demo
          • Using the cost based model
          • Cost Based Model and the development cycle
          • Benefits




                                         2
Areas addressed by Performance “Stuff”




      • Key performance “competencies”:

         –Validation
         –Tracking
         –Non-degradation
         –Characterization
         –Forecasting
         –Business Mapping
         –Optimization


      • All these activities have a common goal...




                                     3
Performance activities Goal




       • Ability to articulate/quantify resource
         requirements for a given system behaviour
         on a given h/w provided a number of
         constraints.


       • Many ways of “getting” there – direct testing, load
         measurements, modelling, etc.




                                        4
Reference System



     Inputs/Devices:                                  Outputs: GUI, devices,
     events/data                                      data, reports
     (periodic, streaming)                 DISK

                                                             GUI      GUI

               A




               A                                                       C
                                        System
               B                                                       C


                             •   General purpose (Unix, VxWorks, Windows, etc)
               B             •   Multiprocessor / virtual instances (cloud)
                             •   Non-HRT but may have Hard Real Time Component
                             •   Has 3rd party code – binary only, no access
                             •   Heterogeneous s/w – scripts, C, multiple JVMs




                                                  5
Real world “challenges” - uncertainties:




      • Requirements / Behavior uncertainty:
         – Performance Requirements are not well defined
         – Load levels (S/M/L) or “KPI”s are speculated, not measured (cannot
           be measured)
      • Code uncertainty:
         – No Access to large portions of code / can’t rebuild/recompile
      • H/W uncertainty:
         – Underlying H/W architecture is not fixed (and can be very different)



      • This is not a strictly Testing/Verification activity but rather an exploratory
        exercise where we need to discover/understand rather then verify




                                            6
Additional Complications:




      • Performance limits are multi-dimensional (CPU, Threading, IO,
        disk space)


      • Designers in India, Testers in Vietnam, Architects in Canada,
        Customers in Spain (How to exchange information??)



      • Need ability to articulate/communicate performance results
        efficiently




                                      7
Examples of questions addressed by performance “stuff”


      • Will timing requirements be met? All the time? Under what conditions? Can
        we guarantee it? What is relationship between latency and max rate?
      • Will we have enough disk space (for how long)?
      • What if we run the system on HW with 32 slow processors? (instead of 4 fast
        ones?) What would be max supported rate of events then?
      • What if the amount of memory is reduced? What would be max supported rate
        of events then?
      • What if some GUIs are in China? (increase RTT)
      • Do we have enough spare capacity for an additional application X?
      • Is our system performing better/worse compare to the last release
        (degradation)?
      • What customer visible activity (not process name/id, not an IP port, not a 3rd
        party DB) uses the most of resources? (e.g. CPU? Memory? Heap? BW?
        Disk?)
      • What if we have two times as many of type A devices? What is the max size
        of network we can support? How does performance map to Business Model?



                                             8
How can these be answered?




      • Yes, we can test it in the lab (at least some) ….

                      … but can we have the answers by




                               tomorrow ??




                                     9
What we need...




      • Lab testing alone does not address this (efficiently)
      • Addressed by a combination of methods/approaches
      • But need a common “framework” to drive this




                                   10
What we really need...




       • A flexible mapping between customer behaviour
         and performance/capacity metrics of the system
         (recall performance goals)




       • But there is a problem…There is HUGE number of different
         behaviours – even in the simplest of system…




                                   11
Can we simplify the problem?




      • Can we reduce the problem space and still have
        something useful/practical?
         – Very few performance aspects are pass/fail (outside of
           HRT/military/etc.)


      • Willing to trade-off accuracy for speed
         – No need to be more accurate then inputs




                                      12
Transaction – an “atomic performance unit”




       • System processes TRANSACTIONS


         – 80/20 Rule - 20% of TRANSACTIONS responsible for 80%
           of “performance” during Steady state operations
         – Focus on steady state (payload) - but other operation
           states can be defined




                                      13
What is a TRANSACTION from performance perspective?



      • What does the system do most of the time (payload)?
         – Processes events of type X from device B (….transaction T1)
         – Produces reports of type Y (… transaction T2)
         – Updates GUI (… transaction T3)
         – Processes login’s from GUI (… transaction T4)

      • How often does it do it?
         – Processes events of type X from device B – on avg, 3 per sec.
         – Produces reports of type Y – once per hour
         – Updates GUI – once every 30 sec
         – Processes login’s from GUI – on demand, on avg 1 per 10 min.

      • How much do we “pay” for it?
        – cpu?
        – Memory?
        – Disk?
        …




                                        14
Cost Based Model




                   15
Performance/Capacity – 3+ way view

                                                                  Latency +
                                                       HW
                                                               Other Constraints
    • Behaviour – transactions
      and frequencies                          Costs
       – E.g. faults, 10 faults/sec                           COST            Resource
                                                              MODEL
       – authentication,                                                    Requirements
                                             Behaviour
         1 authentication/sec

    • Costs – the price in terms of resources “paid” per transaction
       – E.g. 2% of CPU for every fault/sec
       – E.g. 8% of CPU for every RAD Authentication per/sec
    • Resource Utilization – the price in terms of resources for the given
      behaviour:
       – E.g. (2% of CPU for every fault/sec * 10 faults/sec) +
         (8% of CPU for every Authentication per/sec * 1 authentication/sec) =
         28%
    • Costs can be used directly to estimate latency impact (lower bound)
       – E.g.: 2 AA/sec -> 16% CPU impact
       – 3 sec 10 AA/sec burst with only 10% CPU available -> 24 sec latency (at least!)


                                              16
Steps to build the Cost Model


       • Behaviour
         – Decompose system into mutually-orthogonal performance
           transactions
         – Identify expected frequencies (ranges of frequencies) per transaction

       • Costs
         – Measure the incremental costs per transaction on a given h/w – one TX
           at a time
         – Identify boundary conditions (Cpu? Threading? Memory? Heap?)

       • Constraints
         – Identify latency requirements and other constraints

       • Build spreadsheet model
         – COSTS x BEHAVIOR -> REQUIREMENTS (assume linearity at first)
         – Calibrate based on combined tests




                                          17
Identifying Transactions


   • Identify main end-to-end “workflows”
     though the system and their frequencies

   • However since workflows contain common
     portions they are not “orthogonal” from
     performance perspective (resources/rates
     may not be additive)




   • Identify common portions of the workflows

   • The common portions are “transactions”

   • A workflow is represented by a sequence of
     one or more transactions




                                                  18
Costs example

     80.0
                                                                                                                                                                                                                                          60%                          y = 0.048x + 0.006           12
     70.0                                                           TOTAL                                                          Vmstat1:                                                                                                                                  2
                                                                                                                                    CPU                                                                                                                                     R = 0.9876
     60.0                                                           usr                                                                                                                                                                   50%                                                       10
     50.0                                                           sys                                                                                                                                                                                     CPU%

     40.0                                                                                                                                                                                                                                 40%               LATENCY                                 8

     30.0                                                                                                                                                                                                                                                   Linear
                                                                                                                                                                                                                                          30%
                                                                                                                                                                                                                                                            (CPU%)                                  6
     20.0

     10.0                                                                                                                                                                                                                                 20%                                                       4
      0.0
            11/05-08:54:23

                             11/05-08:59:23

                                              11/05-09:04:23

                                                               11/05-09:09:23

                                                                                11/05-09:14:24

                                                                                                 11/05-09:19:24

                                                                                                                  11/05-09:24:24

                                                                                                                                    11/05-09:29:24

                                                                                                                                                     11/05-09:34:24

                                                                                                                                                                      11/05-09:39:25

                                                                                                                                                                                       11/05-09:44:25

                                                                                                                                                                                                        11/05-09:49:25

                                                                                                                                                                                                                         11/05-09:54:25
                                                                                                                                                                                                                                          10%                                                       2

                                                                                                                                                                                                                                          0%                                                        0
                                                                                                                                                                                                                                                1   2   3    4     5    6    7   8   9   10 11 12




       Resources (CPU%/Latency) Measured for 2/4/6/8/10/12 requests/sec
       LATENCY = exponential after 10 RPS => MAX RATE = 10 RPS
                                    • Process is NOT CPU bound (there is lots of spare CPU% @ 10 RPS)
                                    • (In this case it is limited by the size of a JVM’s heap)
       Incremental CPU utilization = 4.8% of CPU per request
                                    • Measured on Sun N440 (4 CPUs, 1.6 GHz each) – 6400 MHz total capacity
                                    • COST = 4.8% * 6400 MHz = 307.2 MHz per request



                                                                                                                                                                                                                                            19
Transaction Cost Matrix


       • Transaction Costs
         – Include resource cost (can be multiple resources)
         – Can depend on additional parameters (e.g. “DB Insert" depends on the
           number of DB records)
         – Can include MaxRate (if limited by a constraint other then the resource,
           e.g. CPU).

       • Example of a transaction cost matrix (SA is a parameter the
         particular transaction deepens on - db size)


         ALMINS Cost
                         SA MHz      MaxR
                          0     12.0    125.0
                      10000     15.5      96.6
                      30000     16.5      90.8
                      60000     18.0      83.3
                     100000     20.0      75.0
                     200000     24.9      60.1




                                           20
Constraints / Resources

 • CPU
    – Overall CPU utilization is additive per transaction (most of the time)
    – If not – then transactions are not orthogonal – break down or use worst case

 • MEMORY / Java HEAPs
    – If there is no virtual memory (e.g. vxWorks) then additive; treat like CPU
    – If there is virtual memory – then much trickier, no concept of X% utilization need to do direct testing.
    – Heap sizes for each JVM – can be additive within each JVM

 • DISK
    – Additive, must take purging policies and retention periods into account.

 • IO
    – Additive, read/write rates are additive, but total capacity would depend on %waiting / svt and depend on
      manufacturer, io pattern, etc. Safe limits can be tested separately

 • BW
    – Additive
    – “effective” BW depends on RTT

 • Threading
    – Identify threading model for each TX – if TX is single-threaded then scale w.r.t.clock rate of the single HW Thread; if
      multithreaded then scale w.r.t. entire system e.g:
          • Suppose a transaction X “costs” 1000 MHz and is is executed on a 32 CPU system with 500 MHz per CPU
          • If it is single-threaded – it will take NO LESS then 2 seconds
          • If it is multi-threaded – it will take NO LESS then 1000/(32*1000) ~ 0.03 seconds

 • Latency
    – For “long” transactions - measure base latency – then scale using threading. Use RTT to compute impact if relevant
    – Measure MAX rate on different architectures – to calibrate



                                                                   21
Do we need to address everything???




  • There are lots of constraints…

  • May be additional constraints based on 3rd party processing
    – Addressing ALL of the in a single model may be impractical

  • However – not all of them need to be addressed in every case for
    a useful model. For example:
    – vxWorks, 1 CPU, 512MB of memory, no virtual memory, pre-emptive
      scheduling – focus on MEM
    – Solaris, 8 CPUs, 32 h/w strands, 32G memory, - focus on
      CPU/Threading

  • Only model what is relevant for the system




                                         22
Model / Example


                          Rate
              Workflow   (/sec)
                    AU      5
                 AUPE       7
                  RAD       0
               PAMFTP       0
               PAMTEL       0
              PAMFTPC       0                  Constraint Audit                               Total CPU%         64.2%
              PAMTELC       0                      Security/AM     Total Security rate greater then AM Max
              MGMUSR        0                     Security/PAM     OK
              Behaviour                           Sustainability
                                                    Alarm Rate
                                                                   At least one rate is not sustainable
                                                                   Composite alarm rate (INS+UPD) not sustainable
                                                   NOS Trigger     OK
                                                  CWD Clients      OK
                                                   Overall CPU     Unlikely Sustainable

                                  COST         Resource
                                  MODEL
              Workflow s-MHz              Requirements
                    AU    111
                 AUPE     222                                Constraint       CPU      Es Disk    Nes Disk     BW
                  GET     333                     Max Utilization                75%        80%        90%       80%
                  RAD     777                     Max NE Supported              623        800        4482     21836
               PAMFTP     555                                Constraint      AEPS       RRPS
                                                  Max Utilization                 80          5
               PAMTEL     444
                                                  Max NE Supported              3154       4485
                  Costs                        Projected Max Nes                           623




                                          23
Model Hierarchy


      • Transaction Model
         – Cost and constraints per individual transaction w.r.t. a number
           of options/parameters
         – E.g. 300Mhz to process an event

      • System Model
         – Composite Cost of executing a specific transaction load on a
           given h/w
         – E.g. 35% cpu for 10 events/sec and 4 user queries/sec on N440

      • Business Model
         – Mapping of System model to Business metrics
         – E.g. N440 can support up to 100 NE




                                       24
Using model for scalability and bottleneck analysis




       • Mapping between any behavior and capacity requirements

       • Mapping the model to different processor architectures

       • Can Quantify the impact of a Business request

       • Can iterate over multiple “behaviors”
          – Extends “What-if” analysis
          – Enables operating envelope visualization
          – Enables resource bottleneck identification




                                          25
Using the Cost Based Model / Demo




                                26
Identifying resource allocation – by TRANSACTIONS /
Applications

     40%
                                                              CPU Distribution by feature                                                                                                                  IPCom m s
     35%                                                                                                                                32%
                                                                  (500A x 15K cph)
                                                                                                                                                                                                                Base
     30%                                      27%
     25%                                                                                                                                                                                                         IMF

     20%                                                                                                                                                                              RAM - Top 10 Users           Logs
           14%                  13%
     15%                                                                                                                                                                                                           OSI
     10%                                                                                                                                                                                                               STACK
                      4%                                                                                                      5%                                                               AppLd
                                                                                                                   3%
      5%                                                                         2%                  1%                                                                                                                PP
                                                        0%           0%                        0%                                                     0%             0%
      0%
                                                                                                                                                                                                                       GES
                                                        Give IVR

                                                                      Give RAN




                                                                                                                                                      RTData API
                                                                                                                                         RTDisplay
                                                                                                     Intrinsics
                      Queuing




                                                                                 MlinkScrPop




                                                                                                                  CalByCall


                                                                                                                              Blue DB
            Base CP




                                              Collect




                                                                                               Hdx




                                                                                                                                                                      Reports
                                 Broadcasts


                                              Digits




                                                                                                                     DB
                                                                                                                                                                                                                       HIDiags

                                                                                                                                                                                                                       OTHER


                                                                                                      Disk_NE_Loads_GB,
                                                                                                              7

                                                                   Disk_Security_Logs
                                                                         _GB, 1
                                                                                                                                                                                                 FREE
                                                 Disk CACP_GB, 3                                                                                                   spare, 37



                                              Disk NE B&R_GB, 7




                                                                                                                                                                          Disk_Alarm_DB_GB,
                                                                                                                                                                                  10
                                                        Disk_NE_LOG_GB,
                                                               57                                                                                                  Disk_Alarms_REDO_
                                                                                                                                                                          GB, 5
                                                                                                                                                     Disk_PM_GB, 11




                                                                                                                                                     27
Compute operating envelope


      Iterate over multiple behaviours – to compute operating envelope
          Operating Envelope



              18
             17.2
             16.4
              15.6
              14.8
       EVENT
                14
       RATE
               13.2
               12.4
               11.6
                10.8
                  10




                                                                                      10000
                       25000




                                                                                     14000
                                                                                    18000
                               26800




                                                                                  22000
                                                                                 26000
                                       28600




                                                                               30000
                                                                              34000
                                               30400




                                                                                        NRECORDS1
                                                                            38000


                NUSERS
                                                                                                 80
                                                                          42000
                                                       32200




                                                                        46000




                                                                                                                 Operating Envelope
                                                                                                 70
                                                                50000
                                                               34000




                                                                                                 60                                          MAX
                                                                                                                                             MAX_cpu
                                                                                                 50
                                                                                                                                             MAX_ed
                                                                                          #NE2


                                                                                                 40                                          MAX_ned
                                                                                                 30                                          MAX_bw
                                                                                                                                             MAX_aeps
                                                                                                 20
                                                                                                 10
                                                                                                  0
                                                                                                      0   200   400      600          800   1000      1200
                                                                                                                        #NE1
                                                                                          28
Nice charts – but how accurate are they?




       Models are from God…. Data is from the Devil
         (http://www.perfdynamics.com/)


       • Initially WAY more accurate then behavior data

       • Within 10% of combined metrics – for an “established” model

       • Less accurate as you extrapolate further form measurements

       • Model includes guesses as well as measurements

       • The value is to establish patterns rather then absolute numbers.




                                          29
Projects where this was applied




       • Call Centre Server (WinNT platform, C++)

       • Optical Switch (VxWorks, C, Java)

       • Network Management System (Solaris, Mixed, 3rd party, Java)

       • Management Platform Application (Solaris, Mixed, 3rd party, Java)

       • …




                                       30
Addressing Uncertainties - recap




       Uncertainty    Cost Based Model            “Traditional”

       Behavior       Forecast ANY behavior       Worst Case Over-Engineering
                      ERALY
                                                  TigerTeam- LATE
                      Compute Operating
                      Envelope
       Code           Treat as “black box”        ??? KPI ??? BST ???
                      No access needed
                      Costs w.r.t. behavior not
                      code
       H/W            Forecast h/w impact EARLY   Worst Case Over-Engineering
                      Small number of “pilot”     TigerTeam- LATE
                      tests
                      Compute Operating
                      Envelope




                                         31
Cost Reduction




     • Significantly reduces the number of tests needed to compute
       operating envelope.
        – Suppose the system has 5 transactions defined, need to compute operating
          envelope with 10 “steps” for each transaction (e.g. 1 per sec, 2 per sec, ... 10
          per sec).
        – Using BST type “brute force” testing we will need to run 10 * 10 * 10 * 10 *10
          tests (one for each rate combination), in total 100,000 tests
        – Using the model approach we would need to run 10+10+10+10+10 tests, in
          total 50 tests (there will be additional work for calibration, model building, etc
          but the total costs will be much smaller then running 100K big system tests)
        – Each individual test is much simpler then BST and can be automated
        – H/w cost reduction – less reliance on BST h/w, using pilot tests can map from
          one h/w platform to another




                                               32
How does the Cost Model fit in the dev cycle?




                                   33
Performance/Capacity
Typical Focus at the wrong places




             Planning               Development         Product
                                                       Verification           Tiger Team

             KPI                                                         KPI
           Definition                    ?                            Validation
            (PLM)                                                      (PT/SV)


    • Uncertainty of expected customer scenarios at planning stage (at the time of
      KPI commitment – specifically for platform)
    • Issues discovered late – expensive to fix (=tiger teams) or over-engineering
    • No early capacity/performance estimates to customers
    • No sensitivity analysis – what is the smallest/greatest contributor to resources?
      Under what conditions?
    • Validation involves BST type of tests; expensive; small number of scenarios
      (S/M/L)
    • No results portability: validation results are difficult/impossible to map/generalize
      to specific customer requirements



                                                  34
Performance/Capacity – Activities




       Performance                 With Cost Based Model   “Traditional”
       “Competency”
       Validation                  Validate Model          Validate Requirements
                                                           BST (S/M/L) ???
       Tracking                    Transaction Costs       ??? KPI ???

       Non-Degradation             w.r.t. Transaction      ??? KPI ??


       Characterization /          w.r.t Transaction       ??? Worst Case ???
       Forecasting / Sensitivity

       Optimization                Proactive, focus on     Tiger Team
                                   specific
                                   Transaction/Behavior
       Perf Info Communication /   Model Based             ??? KPI ???
       Portability
                                   Transaction Based



                                              35
Performance/Capacity – Key approaches




    • All activities are focused on “transactions” metrics (these are
      “atomic” metrics and are much easier to deal with then the
      “composite” metrics such as KPI, BST, etc)


    • All activities are flexible and proactive


    • Start performance activities as early as possible and increase
      accuracy throughout the design cycle




                                         36
Performance/Capacity – Model driven



    • Identify key transactions throughout the dev cycle
    • Quantify behaviour in terms of transactions
    • Automate test/measurements per transaction (not all, but most
      important)
    • Automate monitor/measurement/tracking of transaction costs – as
      part of sanity process (weekly? Daily? – automated)
    • Tight cooperation between testers/designers
    • Model is developed in small steps and contains latest
      measurements and guesses
    • Product verification – focus on model verification/calibration
      – runs “official” test suite (automated) per transaction
      – Runs combined “BST” (multiple transactions) – to calibrate the model




                                        37
Automated Transaction Cost Tracking


  •             Approximately 40 performance/Capacity CRs raised prior system verification
                stage
  •             Identification of bottlenecks (and feed-back to design)
  •             Continuous capacity monitoring – load-to-load view
  •             Other metrics collected regularly

               CPU (%)
          25



          20
                                                                                                     TotCPU(vmstat)

                                                                                                     JavaCPU
          15                                                                                         OracleCPU

                                                                 300                                 TotCPU(prstat)
                                                                                            PropD(ms)                               Delay
          10                                                                                   SysCPU(vmstat)
                                                                                            QueueD(ms)
                                                                 250                           PagingCPU
                                                                                            PubD(ms)
                                                                                               OtherCPU
          5                                                                                 ProcD(ms)
                                                                 200
                                                                                                     MPSTDEV

                                                                                                     SY/msec
      -                                                          150
                 cj



                         cn




                                   db




                                             dp




                                                            ed
                              ct




                                                  ds



                                                       dz
                                        df




                                                                 ef



                                                                                                     CS/msec

                                                                 100


                                                                     50


                                                                 -                                                1400dm
                                                                                                         1400di




                                                                                                                                                               1400gl
                                                                          1400co




                                                                                                                           1400dp

                                                                                                                                    1400du




                                                                                                                                                                        1400go

                                                                                                                                                                                 1400gu
                                                                                   1400ct

                                                                                            1400dc




                                                                                                                                             1400dy

                                                                                                                                                      1400gf
                                                                 38
Cost Based Approach – Responsibilities and Roles



                                  Design
       Design focus –             Monitoring
       decompose into             Focus – track             Validation
       transactions               transaction               Focus – verify
                                  costs                     capacity as
                                                            estimated
                        Costs
                                           Resource Req
                    Behavior


                                                  Forecasting focus –
                                                  estimate
                                                  requirements,
              Business focus                      sensitivity analysis,
              –quantify                           what if...
              behavior




                                     39
Benefits of using the model-driven performance engineering




                                  40
Benefits – technical and others



    • Communication across groups – everyone speaks the same language (well
      defined transactions/costs).
    • “De-politization” of performance eng – can’t argue/negotiate – the numbers and
      trade-offs are clear.
    • Better requirements – quantifiable, PLM/Customer can see value in quantifying
      behaviour
    • Documentation reduction – engineering guides are replaced by the model; the
      perf related documentation can focus on improvements, etc.
    • Early problem detection - most performance problems are discovered before
      the official verification cycle
    • Easy resource leak detection – easily traceable to code changes
    • Reproducible/automated tests – same tests scripts used by design/PV
    • Cost Reduction – less need for BST type of tests, less effort to run PV, reduced
      “over-engineering”




                                              41
Things not discussed here…




                             42
Other issues to consider


    • Tools
       – Automation (!!!!)
       – perf tracing/collection tools, transaction stat tools, transaction load, visualization, data
         archiving
       – native, simple, ascii + excel
    • Organization (info flow/responsibilities)
       – good question, would depend on size and maturity of the project
       – Best if driven by design rather then qa/verification
       – Start slowly
    • Performance Requirements definition
       – trade-offs, customer traceable, never “locked”
    • Performance documentation
       – Is ENG Guide necessary?
    • Using LOADS instead of transactions
       – possible if measurable directly
    • Linear Regression instead of single TX testing
       – possibly for stable systems



                                                        43
Questions?




             44
Appendix: useful links

 http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx
                              – Microsoft’s Transaction Cost Analysis
 www.contentional.nl          – mBrace – Transaction Aware Performance Modelling
 www.spe-ed.com               – Software Performance Engineering
 www.perfdynamics.com         – Performance Dynamics
 www.cmg.org                  – CMG: Computer Measurement Group




                                             45
Appendix: Good Test




                  46
Transaction cost testing


    • How to measure workflow cost?
      – For each workflow , run at least 4 test cases, each corresponding to the different rate of
        workflow execution.
             • For example, for RAD1 run 4 test cases for 1, 3, 6 and 10 radius requests per
               second. The actual rate should result in CPU utilization between 20% and 75% for
               the duration of the test. If the resulting CPU is outside of these boundaries – modify
               the rate and rerun the test (the reason is that we want the results to represent
               sustainable scenarios, short term burst analysis is a separate issue).
      – For each test collect and report CPU, memory and latency (as well as failure rate) before,
        during and after the test (about 5 min before, 5 min for test, 5 min after).
      – Preserve all raw data (top/prstat, etc. outputs) for all tests – these may be required for
        further analysis.
                                                                              • Automate the test-case so that it is
                                         e.g. 10 RAD1
                                         per second
                                                                                possible to run it after each sanity to
       Resource
       (CPU%)                                                                   track changes
       CPU_tst
                                                                              • Data to report/publish
                                                                                 – Marginal CPU/resource per workflow
       CPU_bcg                                                                     rate
       CPU_pp

                                                                                 – I can help with details
                                                                  time
                   T_R_start T_E_start     T_E_end      T_R_end
                                           T_PP_end




                                                                         47
Metrics to be recorded/collected during a test
In this chart CPU is used as an example, but the same methodology applies to all resources – memory, heap, disk io, CPU, etc.

Key metrics to collect during a test                                                           Derived Metrics – to be included in performance report
T_R_start          Time data recording started                                                 mCPU_tst            CPU_tst – CPU_bcg (marginal test cost)
T_E_start          Time Event injection started. Assuming events are injected at               mCPU_pp             CPU_pp – CPU_bcg (marginal post-processing cost. If
                   a constant rate for the entire duration of the test                                             post-processing is not 0 then the EPS rate is not
T_E_end            Time Event injection ended                                                                      sustainable over long time)

EPS                Rate of event injection during the test (between T_E_start and              mT_tst              T_E_end – T_E_start (duration of the test/injection)
                   T_E_end). Rate is constant during the test
                                                                                               mT_pp               T_PP_end – T_E_end (duration of post-processing –
T_PP_end           Time Post-Processing ended                                                                      ideally this should be 0)
T_R_end            Time Recording is ended                                                     Ideally the resource utilization during the test is “flat” and returns to pre-test levels
                                                                                               after the test is completed. To verify this compare the measurements before/after
CPU_tst            CPU% utilization during test                                                tests (points 1 and 5 on the chart) and at the beginning and at the end of the test
CPU_pp             CPU% utilization during post-processing                                     (points 2 and 3 on the chart)

CPU_bcg            Background CPU% utilization                                                 dCPU_bcg            CPU_5 – CPU_1 (if >0 then resource is not fully released
                                                                                                                   after test)
Enough samples must be collected to be able to produce a chart as below for all
resources: CPU (total and by process) Memory (total and by process); Heap (for specific        dCPU_tst            CPU_3 – CPU_2 (if >0 then there may be a resource “leak”
JVMs), IO, disk. The chart does not need to be included in the report but it must be                               during the test)
available for analysis.
Application should also monitor/record its Latency and Failure rate – this is application
specific, but it should be collected/recorded in such a way that it can be correlated with     TOOLS
the resource chart. Avg latency and Avg Failure rate during the test is NOT sufficient
– it does not show the trends.                                                                 Any tool can be used to collect the metrics – as long as it can collect multiple
                                                                                               periodic samples. As a rule of thumb collect about 100 samples for the pre-test idle,
                                                                                               200 samples per test and another 100 after test. If you collect a sample once per 10
            Resource                                                 CPU%,                     sec the overall test duration will be a bit more then 1 hrs. The following are
            (CPU%)                                                   Memory,                   examples:
                                                                     latency, heap
CPU_tst                               2               3                                       prstat –n700        for individual process CPU and memory (-n700 to collect up
                                                                                                                  to 700 processes regardless of their cpu% - to make sure
                                                                                                                  you get a complete memory picture)
                                                              4
CPU_pp                                                                                        TOP / ps            These can be used instead of prstat

                                                                                              vmstat              For global memory/cpu/kernel CPU
CPU_bcg
                             1                                               5
                                                                                              Iostat              If you suspect io issues
                                                                                              jstat -gc           For individual JVM heap/GC metrics; look at OC and OU
                                                                                       time                       parameters.


                 T_R_start       T_E_start        T_E_end         T_PP_end       T_R_end



                                                                                              48
Perfect Case - SUSTAINABLE

No Post processing         mT_pp =0
Resource utilization is    dCPU_tst = dMEM_tst = 0
flat during test

All resources recover      dCPU_bcg = dMEM_bcg = 0
completely

CPU% per 1 EPS             mCPU_tst/EPS
Memory specific            Process /system memory and heap may grow – if logically the events create objects that
                           should stay in memory (e.g. you are doing discovery and are adding new data structures).
                           Memory/heap may also grow initially after T_E_start but should stabilize before T_E_end –
                           this represents build-up of working sets. In this case memory may not be released fully
                           upon completion of the test. In that case run the test again – if the memory keeps on
                           increasing this may indicate a leak.




                                      e.g. 10 RAD1
                                      per second
    Resource                                                               The overall CPU used in this case is the area
    (CPU%)
                                                                           under the utilization curve – the blue square
    CPU_tst

                                                                           Make sure that latency/success rate during
                                                                           the test is acceptable. It is possible that the
   CPU_bcg
   CPU_pp
                                                                           resource profile may look perfect but the
                                                                           events may be rejected due to the lack of
                                                               time        resources.
                T_R_start T_E_start      T_E_end     T_R_end
                                         T_PP_end




                                                                      49
Post-Processing – NOT SUSTAINABLE / BURST test
   Post processing        mT_pp > 0 means that the system was not able to process events at the rate they arrive, this can
   detected               be due to CPU utilization, or due to threading or other resource contention. In this case you may
                          see that
                                  – the latency is continuously increasing between T_E_Start and T_E_end
                                  – Memory (or old heap partition of a JVM) is continuously increasing between T_E_start and
                                  T_E_end and then starts decreasing during post-processing. This is because the events that
                                  cannot be processed in time must be stored somewhere. (see green line on the chart).
                                  – The failure rate may increase towards the end of the test
   Load                   This load is unsustainable over the long period of time - can be hours or days – but the
   unsustainable          system/process will either run out of memory or will be forced to drop outstanding events
   May be                 Although this rate is unsustainable for long time it may be acceptable for short bursts / peaks. The
   acceptable for         duration of post-processing and the rate of growth of the bounding factor (memory/heap/threads)
   short burst/peaks      will help determine the max duration of the burst.
   CPU% per 1 EPS         The overall CPU used in this case is the area under the utilization curve – the blue square and the
                          pink square. It is possible to predict how much CPU would be used by 1 EPS if the bottleneck is
                          removed (e.g. if threading is a bottleneck and we add more threads):
                          (mCPU_tst + mCPU_pp*mT_pp/mT_tst) / EPS

                                                                      In case of post processing it is important to determine what
Resource                                                              is the boundary condition:
(CPU%)
                CPU%
                                                  Memory,             f CPU utilization during test is90% or more then it is likely
CPU_tst                                           latency,            that we are bounded by CPU
                                                    heap
CPU_pp                                                                If memory/heap of component A grows and component A
                                                                      has to pass events to component B, then B may be a
CPU_bcg                                                               bottleneck
                                                                      If component B uses 1 full CPU (25%) then it is likely single-
                                                                      threaded and threading is the issue
                                                             time
            T_R_start T_E_start     T_E_end T_PP_end T_R_end          If component B does disk io, or other type of access that
                                                                      requires waiting then this can be the bottleneck
                                                                    50

More Related Content

Similar to Slides Cost Based Performance Modelling

Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
Omid Vahdaty
 
Know More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy KKnow More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy KRoopa Nadkarni
 
3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_kIBM
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Heroku
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
jClarity
 
Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systems
jeronimored
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
David Giard
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
Shivalik college of engineering
 
CQRS + Event Sourcing
CQRS + Event SourcingCQRS + Event Sourcing
CQRS + Event Sourcing
Mike Bild
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
Rodrigo Campos
 
04 performance
04 performance04 performance
04 performance
marangburu42
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp
 
Software engineering
Software engineeringSoftware engineering
Software engineering
Rohan Bhatkar
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
Rodrigo Campos
 
Introduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptxIntroduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptx
priyaanaparthy
 
Intro720T5.pptx
Intro720T5.pptxIntro720T5.pptx
Intro720T5.pptx
MahendraShukla27
 
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Maxime Cordy
 

Similar to Slides Cost Based Performance Modelling (20)

Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Know More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy KKnow More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy K
 
3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
 
Resource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time SystemsResource Management in (Embedded) Real-Time Systems
Resource Management in (Embedded) Real-Time Systems
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
CQRS + Event Sourcing
CQRS + Event SourcingCQRS + Event Sourcing
CQRS + Event Sourcing
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
04 performance
04 performance04 performance
04 performance
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Introduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptxIntroduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptx
 
Intro720T5.pptx
Intro720T5.pptxIntro720T5.pptx
Intro720T5.pptx
 
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
 

Slides Cost Based Performance Modelling

  • 1. Cost Based Performance Modeling: Addressing Uncertainties Eugene Margulis Telus Health Solutions eugene_margulis@yahoo.ca October, 2009
  • 2. Outline • Background • What issues/questions are addressed by performance “stuff” ? • Performance/capacity cost model • Examples and demo • Using the cost based model • Cost Based Model and the development cycle • Benefits 2
  • 3. Areas addressed by Performance “Stuff” • Key performance “competencies”: –Validation –Tracking –Non-degradation –Characterization –Forecasting –Business Mapping –Optimization • All these activities have a common goal... 3
  • 4. Performance activities Goal • Ability to articulate/quantify resource requirements for a given system behaviour on a given h/w provided a number of constraints. • Many ways of “getting” there – direct testing, load measurements, modelling, etc. 4
  • 5. Reference System Inputs/Devices: Outputs: GUI, devices, events/data data, reports (periodic, streaming) DISK GUI GUI A A C System B C • General purpose (Unix, VxWorks, Windows, etc) B • Multiprocessor / virtual instances (cloud) • Non-HRT but may have Hard Real Time Component • Has 3rd party code – binary only, no access • Heterogeneous s/w – scripts, C, multiple JVMs 5
  • 6. Real world “challenges” - uncertainties: • Requirements / Behavior uncertainty: – Performance Requirements are not well defined – Load levels (S/M/L) or “KPI”s are speculated, not measured (cannot be measured) • Code uncertainty: – No Access to large portions of code / can’t rebuild/recompile • H/W uncertainty: – Underlying H/W architecture is not fixed (and can be very different) • This is not a strictly Testing/Verification activity but rather an exploratory exercise where we need to discover/understand rather then verify 6
  • 7. Additional Complications: • Performance limits are multi-dimensional (CPU, Threading, IO, disk space) • Designers in India, Testers in Vietnam, Architects in Canada, Customers in Spain (How to exchange information??) • Need ability to articulate/communicate performance results efficiently 7
  • 8. Examples of questions addressed by performance “stuff” • Will timing requirements be met? All the time? Under what conditions? Can we guarantee it? What is relationship between latency and max rate? • Will we have enough disk space (for how long)? • What if we run the system on HW with 32 slow processors? (instead of 4 fast ones?) What would be max supported rate of events then? • What if the amount of memory is reduced? What would be max supported rate of events then? • What if some GUIs are in China? (increase RTT) • Do we have enough spare capacity for an additional application X? • Is our system performing better/worse compare to the last release (degradation)? • What customer visible activity (not process name/id, not an IP port, not a 3rd party DB) uses the most of resources? (e.g. CPU? Memory? Heap? BW? Disk?) • What if we have two times as many of type A devices? What is the max size of network we can support? How does performance map to Business Model? 8
  • 9. How can these be answered? • Yes, we can test it in the lab (at least some) …. … but can we have the answers by tomorrow ?? 9
  • 10. What we need... • Lab testing alone does not address this (efficiently) • Addressed by a combination of methods/approaches • But need a common “framework” to drive this 10
  • 11. What we really need... • A flexible mapping between customer behaviour and performance/capacity metrics of the system (recall performance goals) • But there is a problem…There is HUGE number of different behaviours – even in the simplest of system… 11
  • 12. Can we simplify the problem? • Can we reduce the problem space and still have something useful/practical? – Very few performance aspects are pass/fail (outside of HRT/military/etc.) • Willing to trade-off accuracy for speed – No need to be more accurate then inputs 12
  • 13. Transaction – an “atomic performance unit” • System processes TRANSACTIONS – 80/20 Rule - 20% of TRANSACTIONS responsible for 80% of “performance” during Steady state operations – Focus on steady state (payload) - but other operation states can be defined 13
  • 14. What is a TRANSACTION from performance perspective? • What does the system do most of the time (payload)? – Processes events of type X from device B (….transaction T1) – Produces reports of type Y (… transaction T2) – Updates GUI (… transaction T3) – Processes login’s from GUI (… transaction T4) • How often does it do it? – Processes events of type X from device B – on avg, 3 per sec. – Produces reports of type Y – once per hour – Updates GUI – once every 30 sec – Processes login’s from GUI – on demand, on avg 1 per 10 min. • How much do we “pay” for it? – cpu? – Memory? – Disk? … 14
  • 16. Performance/Capacity – 3+ way view Latency + HW Other Constraints • Behaviour – transactions and frequencies Costs – E.g. faults, 10 faults/sec COST Resource MODEL – authentication, Requirements Behaviour 1 authentication/sec • Costs – the price in terms of resources “paid” per transaction – E.g. 2% of CPU for every fault/sec – E.g. 8% of CPU for every RAD Authentication per/sec • Resource Utilization – the price in terms of resources for the given behaviour: – E.g. (2% of CPU for every fault/sec * 10 faults/sec) + (8% of CPU for every Authentication per/sec * 1 authentication/sec) = 28% • Costs can be used directly to estimate latency impact (lower bound) – E.g.: 2 AA/sec -> 16% CPU impact – 3 sec 10 AA/sec burst with only 10% CPU available -> 24 sec latency (at least!) 16
  • 17. Steps to build the Cost Model • Behaviour – Decompose system into mutually-orthogonal performance transactions – Identify expected frequencies (ranges of frequencies) per transaction • Costs – Measure the incremental costs per transaction on a given h/w – one TX at a time – Identify boundary conditions (Cpu? Threading? Memory? Heap?) • Constraints – Identify latency requirements and other constraints • Build spreadsheet model – COSTS x BEHAVIOR -> REQUIREMENTS (assume linearity at first) – Calibrate based on combined tests 17
  • 18. Identifying Transactions • Identify main end-to-end “workflows” though the system and their frequencies • However since workflows contain common portions they are not “orthogonal” from performance perspective (resources/rates may not be additive) • Identify common portions of the workflows • The common portions are “transactions” • A workflow is represented by a sequence of one or more transactions 18
  • 19. Costs example 80.0 60% y = 0.048x + 0.006 12 70.0 TOTAL Vmstat1: 2 CPU R = 0.9876 60.0 usr 50% 10 50.0 sys CPU% 40.0 40% LATENCY 8 30.0 Linear 30% (CPU%) 6 20.0 10.0 20% 4 0.0 11/05-08:54:23 11/05-08:59:23 11/05-09:04:23 11/05-09:09:23 11/05-09:14:24 11/05-09:19:24 11/05-09:24:24 11/05-09:29:24 11/05-09:34:24 11/05-09:39:25 11/05-09:44:25 11/05-09:49:25 11/05-09:54:25 10% 2 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 Resources (CPU%/Latency) Measured for 2/4/6/8/10/12 requests/sec LATENCY = exponential after 10 RPS => MAX RATE = 10 RPS • Process is NOT CPU bound (there is lots of spare CPU% @ 10 RPS) • (In this case it is limited by the size of a JVM’s heap) Incremental CPU utilization = 4.8% of CPU per request • Measured on Sun N440 (4 CPUs, 1.6 GHz each) – 6400 MHz total capacity • COST = 4.8% * 6400 MHz = 307.2 MHz per request 19
  • 20. Transaction Cost Matrix • Transaction Costs – Include resource cost (can be multiple resources) – Can depend on additional parameters (e.g. “DB Insert" depends on the number of DB records) – Can include MaxRate (if limited by a constraint other then the resource, e.g. CPU). • Example of a transaction cost matrix (SA is a parameter the particular transaction deepens on - db size) ALMINS Cost SA MHz MaxR 0 12.0 125.0 10000 15.5 96.6 30000 16.5 90.8 60000 18.0 83.3 100000 20.0 75.0 200000 24.9 60.1 20
  • 21. Constraints / Resources • CPU – Overall CPU utilization is additive per transaction (most of the time) – If not – then transactions are not orthogonal – break down or use worst case • MEMORY / Java HEAPs – If there is no virtual memory (e.g. vxWorks) then additive; treat like CPU – If there is virtual memory – then much trickier, no concept of X% utilization need to do direct testing. – Heap sizes for each JVM – can be additive within each JVM • DISK – Additive, must take purging policies and retention periods into account. • IO – Additive, read/write rates are additive, but total capacity would depend on %waiting / svt and depend on manufacturer, io pattern, etc. Safe limits can be tested separately • BW – Additive – “effective” BW depends on RTT • Threading – Identify threading model for each TX – if TX is single-threaded then scale w.r.t.clock rate of the single HW Thread; if multithreaded then scale w.r.t. entire system e.g: • Suppose a transaction X “costs” 1000 MHz and is is executed on a 32 CPU system with 500 MHz per CPU • If it is single-threaded – it will take NO LESS then 2 seconds • If it is multi-threaded – it will take NO LESS then 1000/(32*1000) ~ 0.03 seconds • Latency – For “long” transactions - measure base latency – then scale using threading. Use RTT to compute impact if relevant – Measure MAX rate on different architectures – to calibrate 21
  • 22. Do we need to address everything??? • There are lots of constraints… • May be additional constraints based on 3rd party processing – Addressing ALL of the in a single model may be impractical • However – not all of them need to be addressed in every case for a useful model. For example: – vxWorks, 1 CPU, 512MB of memory, no virtual memory, pre-emptive scheduling – focus on MEM – Solaris, 8 CPUs, 32 h/w strands, 32G memory, - focus on CPU/Threading • Only model what is relevant for the system 22
  • 23. Model / Example Rate Workflow (/sec) AU 5 AUPE 7 RAD 0 PAMFTP 0 PAMTEL 0 PAMFTPC 0 Constraint Audit Total CPU% 64.2% PAMTELC 0 Security/AM Total Security rate greater then AM Max MGMUSR 0 Security/PAM OK Behaviour Sustainability Alarm Rate At least one rate is not sustainable Composite alarm rate (INS+UPD) not sustainable NOS Trigger OK CWD Clients OK Overall CPU Unlikely Sustainable COST Resource MODEL Workflow s-MHz Requirements AU 111 AUPE 222 Constraint CPU Es Disk Nes Disk BW GET 333 Max Utilization 75% 80% 90% 80% RAD 777 Max NE Supported 623 800 4482 21836 PAMFTP 555 Constraint AEPS RRPS Max Utilization 80 5 PAMTEL 444 Max NE Supported 3154 4485 Costs Projected Max Nes 623 23
  • 24. Model Hierarchy • Transaction Model – Cost and constraints per individual transaction w.r.t. a number of options/parameters – E.g. 300Mhz to process an event • System Model – Composite Cost of executing a specific transaction load on a given h/w – E.g. 35% cpu for 10 events/sec and 4 user queries/sec on N440 • Business Model – Mapping of System model to Business metrics – E.g. N440 can support up to 100 NE 24
  • 25. Using model for scalability and bottleneck analysis • Mapping between any behavior and capacity requirements • Mapping the model to different processor architectures • Can Quantify the impact of a Business request • Can iterate over multiple “behaviors” – Extends “What-if” analysis – Enables operating envelope visualization – Enables resource bottleneck identification 25
  • 26. Using the Cost Based Model / Demo 26
  • 27. Identifying resource allocation – by TRANSACTIONS / Applications 40% CPU Distribution by feature IPCom m s 35% 32% (500A x 15K cph) Base 30% 27% 25% IMF 20% RAM - Top 10 Users Logs 14% 13% 15% OSI 10% STACK 4% 5% AppLd 3% 5% 2% 1% PP 0% 0% 0% 0% 0% 0% GES Give IVR Give RAN RTData API RTDisplay Intrinsics Queuing MlinkScrPop CalByCall Blue DB Base CP Collect Hdx Reports Broadcasts Digits DB HIDiags OTHER Disk_NE_Loads_GB, 7 Disk_Security_Logs _GB, 1 FREE Disk CACP_GB, 3 spare, 37 Disk NE B&R_GB, 7 Disk_Alarm_DB_GB, 10 Disk_NE_LOG_GB, 57 Disk_Alarms_REDO_ GB, 5 Disk_PM_GB, 11 27
  • 28. Compute operating envelope Iterate over multiple behaviours – to compute operating envelope Operating Envelope 18 17.2 16.4 15.6 14.8 EVENT 14 RATE 13.2 12.4 11.6 10.8 10 10000 25000 14000 18000 26800 22000 26000 28600 30000 34000 30400 NRECORDS1 38000 NUSERS 80 42000 32200 46000 Operating Envelope 70 50000 34000 60 MAX MAX_cpu 50 MAX_ed #NE2 40 MAX_ned 30 MAX_bw MAX_aeps 20 10 0 0 200 400 600 800 1000 1200 #NE1 28
  • 29. Nice charts – but how accurate are they? Models are from God…. Data is from the Devil (http://www.perfdynamics.com/) • Initially WAY more accurate then behavior data • Within 10% of combined metrics – for an “established” model • Less accurate as you extrapolate further form measurements • Model includes guesses as well as measurements • The value is to establish patterns rather then absolute numbers. 29
  • 30. Projects where this was applied • Call Centre Server (WinNT platform, C++) • Optical Switch (VxWorks, C, Java) • Network Management System (Solaris, Mixed, 3rd party, Java) • Management Platform Application (Solaris, Mixed, 3rd party, Java) • … 30
  • 31. Addressing Uncertainties - recap Uncertainty Cost Based Model “Traditional” Behavior Forecast ANY behavior Worst Case Over-Engineering ERALY TigerTeam- LATE Compute Operating Envelope Code Treat as “black box” ??? KPI ??? BST ??? No access needed Costs w.r.t. behavior not code H/W Forecast h/w impact EARLY Worst Case Over-Engineering Small number of “pilot” TigerTeam- LATE tests Compute Operating Envelope 31
  • 32. Cost Reduction • Significantly reduces the number of tests needed to compute operating envelope. – Suppose the system has 5 transactions defined, need to compute operating envelope with 10 “steps” for each transaction (e.g. 1 per sec, 2 per sec, ... 10 per sec). – Using BST type “brute force” testing we will need to run 10 * 10 * 10 * 10 *10 tests (one for each rate combination), in total 100,000 tests – Using the model approach we would need to run 10+10+10+10+10 tests, in total 50 tests (there will be additional work for calibration, model building, etc but the total costs will be much smaller then running 100K big system tests) – Each individual test is much simpler then BST and can be automated – H/w cost reduction – less reliance on BST h/w, using pilot tests can map from one h/w platform to another 32
  • 33. How does the Cost Model fit in the dev cycle? 33
  • 34. Performance/Capacity Typical Focus at the wrong places Planning Development Product Verification Tiger Team KPI KPI Definition ? Validation (PLM) (PT/SV) • Uncertainty of expected customer scenarios at planning stage (at the time of KPI commitment – specifically for platform) • Issues discovered late – expensive to fix (=tiger teams) or over-engineering • No early capacity/performance estimates to customers • No sensitivity analysis – what is the smallest/greatest contributor to resources? Under what conditions? • Validation involves BST type of tests; expensive; small number of scenarios (S/M/L) • No results portability: validation results are difficult/impossible to map/generalize to specific customer requirements 34
  • 35. Performance/Capacity – Activities Performance With Cost Based Model “Traditional” “Competency” Validation Validate Model Validate Requirements BST (S/M/L) ??? Tracking Transaction Costs ??? KPI ??? Non-Degradation w.r.t. Transaction ??? KPI ?? Characterization / w.r.t Transaction ??? Worst Case ??? Forecasting / Sensitivity Optimization Proactive, focus on Tiger Team specific Transaction/Behavior Perf Info Communication / Model Based ??? KPI ??? Portability Transaction Based 35
  • 36. Performance/Capacity – Key approaches • All activities are focused on “transactions” metrics (these are “atomic” metrics and are much easier to deal with then the “composite” metrics such as KPI, BST, etc) • All activities are flexible and proactive • Start performance activities as early as possible and increase accuracy throughout the design cycle 36
  • 37. Performance/Capacity – Model driven • Identify key transactions throughout the dev cycle • Quantify behaviour in terms of transactions • Automate test/measurements per transaction (not all, but most important) • Automate monitor/measurement/tracking of transaction costs – as part of sanity process (weekly? Daily? – automated) • Tight cooperation between testers/designers • Model is developed in small steps and contains latest measurements and guesses • Product verification – focus on model verification/calibration – runs “official” test suite (automated) per transaction – Runs combined “BST” (multiple transactions) – to calibrate the model 37
  • 38. Automated Transaction Cost Tracking • Approximately 40 performance/Capacity CRs raised prior system verification stage • Identification of bottlenecks (and feed-back to design) • Continuous capacity monitoring – load-to-load view • Other metrics collected regularly CPU (%) 25 20 TotCPU(vmstat) JavaCPU 15 OracleCPU 300 TotCPU(prstat) PropD(ms) Delay 10 SysCPU(vmstat) QueueD(ms) 250 PagingCPU PubD(ms) OtherCPU 5 ProcD(ms) 200 MPSTDEV SY/msec - 150 cj cn db dp ed ct ds dz df ef CS/msec 100 50 - 1400dm 1400di 1400gl 1400co 1400dp 1400du 1400go 1400gu 1400ct 1400dc 1400dy 1400gf 38
  • 39. Cost Based Approach – Responsibilities and Roles Design Design focus – Monitoring decompose into Focus – track Validation transactions transaction Focus – verify costs capacity as estimated Costs Resource Req Behavior Forecasting focus – estimate requirements, Business focus sensitivity analysis, –quantify what if... behavior 39
  • 40. Benefits of using the model-driven performance engineering 40
  • 41. Benefits – technical and others • Communication across groups – everyone speaks the same language (well defined transactions/costs). • “De-politization” of performance eng – can’t argue/negotiate – the numbers and trade-offs are clear. • Better requirements – quantifiable, PLM/Customer can see value in quantifying behaviour • Documentation reduction – engineering guides are replaced by the model; the perf related documentation can focus on improvements, etc. • Early problem detection - most performance problems are discovered before the official verification cycle • Easy resource leak detection – easily traceable to code changes • Reproducible/automated tests – same tests scripts used by design/PV • Cost Reduction – less need for BST type of tests, less effort to run PV, reduced “over-engineering” 41
  • 42. Things not discussed here… 42
  • 43. Other issues to consider • Tools – Automation (!!!!) – perf tracing/collection tools, transaction stat tools, transaction load, visualization, data archiving – native, simple, ascii + excel • Organization (info flow/responsibilities) – good question, would depend on size and maturity of the project – Best if driven by design rather then qa/verification – Start slowly • Performance Requirements definition – trade-offs, customer traceable, never “locked” • Performance documentation – Is ENG Guide necessary? • Using LOADS instead of transactions – possible if measurable directly • Linear Regression instead of single TX testing – possibly for stable systems 43
  • 45. Appendix: useful links http://technet.microsoft.com/en-us/commerceserver/bb608757.aspx – Microsoft’s Transaction Cost Analysis www.contentional.nl – mBrace – Transaction Aware Performance Modelling www.spe-ed.com – Software Performance Engineering www.perfdynamics.com – Performance Dynamics www.cmg.org – CMG: Computer Measurement Group 45
  • 47. Transaction cost testing • How to measure workflow cost? – For each workflow , run at least 4 test cases, each corresponding to the different rate of workflow execution. • For example, for RAD1 run 4 test cases for 1, 3, 6 and 10 radius requests per second. The actual rate should result in CPU utilization between 20% and 75% for the duration of the test. If the resulting CPU is outside of these boundaries – modify the rate and rerun the test (the reason is that we want the results to represent sustainable scenarios, short term burst analysis is a separate issue). – For each test collect and report CPU, memory and latency (as well as failure rate) before, during and after the test (about 5 min before, 5 min for test, 5 min after). – Preserve all raw data (top/prstat, etc. outputs) for all tests – these may be required for further analysis. • Automate the test-case so that it is e.g. 10 RAD1 per second possible to run it after each sanity to Resource (CPU%) track changes CPU_tst • Data to report/publish – Marginal CPU/resource per workflow CPU_bcg rate CPU_pp – I can help with details time T_R_start T_E_start T_E_end T_R_end T_PP_end 47
  • 48. Metrics to be recorded/collected during a test In this chart CPU is used as an example, but the same methodology applies to all resources – memory, heap, disk io, CPU, etc. Key metrics to collect during a test Derived Metrics – to be included in performance report T_R_start Time data recording started mCPU_tst CPU_tst – CPU_bcg (marginal test cost) T_E_start Time Event injection started. Assuming events are injected at mCPU_pp CPU_pp – CPU_bcg (marginal post-processing cost. If a constant rate for the entire duration of the test post-processing is not 0 then the EPS rate is not T_E_end Time Event injection ended sustainable over long time) EPS Rate of event injection during the test (between T_E_start and mT_tst T_E_end – T_E_start (duration of the test/injection) T_E_end). Rate is constant during the test mT_pp T_PP_end – T_E_end (duration of post-processing – T_PP_end Time Post-Processing ended ideally this should be 0) T_R_end Time Recording is ended Ideally the resource utilization during the test is “flat” and returns to pre-test levels after the test is completed. To verify this compare the measurements before/after CPU_tst CPU% utilization during test tests (points 1 and 5 on the chart) and at the beginning and at the end of the test CPU_pp CPU% utilization during post-processing (points 2 and 3 on the chart) CPU_bcg Background CPU% utilization dCPU_bcg CPU_5 – CPU_1 (if >0 then resource is not fully released after test) Enough samples must be collected to be able to produce a chart as below for all resources: CPU (total and by process) Memory (total and by process); Heap (for specific dCPU_tst CPU_3 – CPU_2 (if >0 then there may be a resource “leak” JVMs), IO, disk. The chart does not need to be included in the report but it must be during the test) available for analysis. Application should also monitor/record its Latency and Failure rate – this is application specific, but it should be collected/recorded in such a way that it can be correlated with TOOLS the resource chart. Avg latency and Avg Failure rate during the test is NOT sufficient – it does not show the trends. Any tool can be used to collect the metrics – as long as it can collect multiple periodic samples. As a rule of thumb collect about 100 samples for the pre-test idle, 200 samples per test and another 100 after test. If you collect a sample once per 10 Resource CPU%, sec the overall test duration will be a bit more then 1 hrs. The following are (CPU%) Memory, examples: latency, heap CPU_tst 2 3 prstat –n700 for individual process CPU and memory (-n700 to collect up to 700 processes regardless of their cpu% - to make sure you get a complete memory picture) 4 CPU_pp TOP / ps These can be used instead of prstat vmstat For global memory/cpu/kernel CPU CPU_bcg 1 5 Iostat If you suspect io issues jstat -gc For individual JVM heap/GC metrics; look at OC and OU time parameters. T_R_start T_E_start T_E_end T_PP_end T_R_end 48
  • 49. Perfect Case - SUSTAINABLE No Post processing mT_pp =0 Resource utilization is dCPU_tst = dMEM_tst = 0 flat during test All resources recover dCPU_bcg = dMEM_bcg = 0 completely CPU% per 1 EPS mCPU_tst/EPS Memory specific Process /system memory and heap may grow – if logically the events create objects that should stay in memory (e.g. you are doing discovery and are adding new data structures). Memory/heap may also grow initially after T_E_start but should stabilize before T_E_end – this represents build-up of working sets. In this case memory may not be released fully upon completion of the test. In that case run the test again – if the memory keeps on increasing this may indicate a leak. e.g. 10 RAD1 per second Resource The overall CPU used in this case is the area (CPU%) under the utilization curve – the blue square CPU_tst Make sure that latency/success rate during the test is acceptable. It is possible that the CPU_bcg CPU_pp resource profile may look perfect but the events may be rejected due to the lack of time resources. T_R_start T_E_start T_E_end T_R_end T_PP_end 49
  • 50. Post-Processing – NOT SUSTAINABLE / BURST test Post processing mT_pp > 0 means that the system was not able to process events at the rate they arrive, this can detected be due to CPU utilization, or due to threading or other resource contention. In this case you may see that – the latency is continuously increasing between T_E_Start and T_E_end – Memory (or old heap partition of a JVM) is continuously increasing between T_E_start and T_E_end and then starts decreasing during post-processing. This is because the events that cannot be processed in time must be stored somewhere. (see green line on the chart). – The failure rate may increase towards the end of the test Load This load is unsustainable over the long period of time - can be hours or days – but the unsustainable system/process will either run out of memory or will be forced to drop outstanding events May be Although this rate is unsustainable for long time it may be acceptable for short bursts / peaks. The acceptable for duration of post-processing and the rate of growth of the bounding factor (memory/heap/threads) short burst/peaks will help determine the max duration of the burst. CPU% per 1 EPS The overall CPU used in this case is the area under the utilization curve – the blue square and the pink square. It is possible to predict how much CPU would be used by 1 EPS if the bottleneck is removed (e.g. if threading is a bottleneck and we add more threads): (mCPU_tst + mCPU_pp*mT_pp/mT_tst) / EPS In case of post processing it is important to determine what Resource is the boundary condition: (CPU%) CPU% Memory, f CPU utilization during test is90% or more then it is likely CPU_tst latency, that we are bounded by CPU heap CPU_pp If memory/heap of component A grows and component A has to pass events to component B, then B may be a CPU_bcg bottleneck If component B uses 1 full CPU (25%) then it is likely single- threaded and threading is the issue time T_R_start T_E_start T_E_end T_PP_end T_R_end If component B does disk io, or other type of access that requires waiting then this can be the bottleneck 50