The document discusses cost based performance modeling to address uncertainties in requirements, code, and hardware. It introduces the concept of modeling system behavior as transactions with costs that map resource requirements. Examples of questions that can be answered include maximum supported load for different hardware. The approach involves defining transactions, measuring individual costs, and using a spreadsheet model to estimate overall resource utilization and constraints for a given transaction load. This allows exploring performance across different architectures and identifying bottlenecks.
Video and slides synchronized, mp3 and slide download available at http://bit.ly/14w07bK.
Martin Thompson explores performance testing, how to avoid the common pitfalls, how to profile when the results cause your team to pull a funny face, and what you can do about that funny face. Specific issues to Java and managed runtimes in general will be explored, but if other languages are your poison, don't be put off as much of the content can be applied to any development. Filmed at qconlondon.com.
Martin Thompson is a high-performance and low-latency specialist, with over two decades working with large scale transactional and big-data systems, in the automotive, gaming, financial, mobile, and content management domains. Martin was the co-founder and CTO of LMAX, until he left to specialize in helping other people achieve great performance with their software.
Video and slides synchronized, mp3 and slide download available at http://bit.ly/14w07bK.
Martin Thompson explores performance testing, how to avoid the common pitfalls, how to profile when the results cause your team to pull a funny face, and what you can do about that funny face. Specific issues to Java and managed runtimes in general will be explored, but if other languages are your poison, don't be put off as much of the content can be applied to any development. Filmed at qconlondon.com.
Martin Thompson is a high-performance and low-latency specialist, with over two decades working with large scale transactional and big-data systems, in the automotive, gaming, financial, mobile, and content management domains. Martin was the co-founder and CTO of LMAX, until he left to specialize in helping other people achieve great performance with their software.
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsHeroku
Librato's CTO Joseph Ruscio took to the Waza 2013 stage to present "Instrumenting Twelve-Factor Apps". For more from Ruscio ping him at @josephruscio. For more on Waza visit http://waza.heroku.com/2013.
For Waza videos stay tuned at http://blog.heroku.com or visit http://vimeo.com/herokuwaza
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
Talk from FOSDEM 2022
It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.
Topics:
1. Understand what we mean by high concurrency.
2. Understand ACID & MVCC in Postgres.
3. Understand how high concurrency affects Postgres performance.
4. Understand how locks/latches affect Postgres performance.
5. Understand how high transaction rates can affect Postgres.
6. Mitigation strategies for high concurrency scenarios.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsHeroku
Librato's CTO Joseph Ruscio took to the Waza 2013 stage to present "Instrumenting Twelve-Factor Apps". For more from Ruscio ping him at @josephruscio. For more on Waza visit http://waza.heroku.com/2013.
For Waza videos stay tuned at http://blog.heroku.com or visit http://vimeo.com/herokuwaza
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
Talk from FOSDEM 2022
It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.
Topics:
1. Understand what we mean by high concurrency.
2. Understand ACID & MVCC in Postgres.
3. Understand how high concurrency affects Postgres performance.
4. Understand how locks/latches affect Postgres performance.
5. Understand how high transaction rates can affect Postgres.
6. Mitigation strategies for high concurrency scenarios.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
Efficient Evaluation of Embedded-System Design Alternatives (SPLC Tutorial 2019)
Slides Cost Based Performance Modelling
1. Cost Based Performance Modeling:
Addressing Uncertainties
Eugene Margulis
Telus Health Solutions
eugene_margulis@yahoo.ca
October, 2009
2. Outline
• Background
• What issues/questions are addressed by performance “stuff” ?
• Performance/capacity cost model
• Examples and demo
• Using the cost based model
• Cost Based Model and the development cycle
• Benefits
2
3. Areas addressed by Performance “Stuff”
• Key performance “competencies”:
–Validation
–Tracking
–Non-degradation
–Characterization
–Forecasting
–Business Mapping
–Optimization
• All these activities have a common goal...
3
4. Performance activities Goal
• Ability to articulate/quantify resource
requirements for a given system behaviour
on a given h/w provided a number of
constraints.
• Many ways of “getting” there – direct testing, load
measurements, modelling, etc.
4
5. Reference System
Inputs/Devices: Outputs: GUI, devices,
events/data data, reports
(periodic, streaming) DISK
GUI GUI
A
A C
System
B C
• General purpose (Unix, VxWorks, Windows, etc)
B • Multiprocessor / virtual instances (cloud)
• Non-HRT but may have Hard Real Time Component
• Has 3rd party code – binary only, no access
• Heterogeneous s/w – scripts, C, multiple JVMs
5
6. Real world “challenges” - uncertainties:
• Requirements / Behavior uncertainty:
– Performance Requirements are not well defined
– Load levels (S/M/L) or “KPI”s are speculated, not measured (cannot
be measured)
• Code uncertainty:
– No Access to large portions of code / can’t rebuild/recompile
• H/W uncertainty:
– Underlying H/W architecture is not fixed (and can be very different)
• This is not a strictly Testing/Verification activity but rather an exploratory
exercise where we need to discover/understand rather then verify
6
7. Additional Complications:
• Performance limits are multi-dimensional (CPU, Threading, IO,
disk space)
• Designers in India, Testers in Vietnam, Architects in Canada,
Customers in Spain (How to exchange information??)
• Need ability to articulate/communicate performance results
efficiently
7
8. Examples of questions addressed by performance “stuff”
• Will timing requirements be met? All the time? Under what conditions? Can
we guarantee it? What is relationship between latency and max rate?
• Will we have enough disk space (for how long)?
• What if we run the system on HW with 32 slow processors? (instead of 4 fast
ones?) What would be max supported rate of events then?
• What if the amount of memory is reduced? What would be max supported rate
of events then?
• What if some GUIs are in China? (increase RTT)
• Do we have enough spare capacity for an additional application X?
• Is our system performing better/worse compare to the last release
(degradation)?
• What customer visible activity (not process name/id, not an IP port, not a 3rd
party DB) uses the most of resources? (e.g. CPU? Memory? Heap? BW?
Disk?)
• What if we have two times as many of type A devices? What is the max size
of network we can support? How does performance map to Business Model?
8
9. How can these be answered?
• Yes, we can test it in the lab (at least some) ….
… but can we have the answers by
tomorrow ??
9
10. What we need...
• Lab testing alone does not address this (efficiently)
• Addressed by a combination of methods/approaches
• But need a common “framework” to drive this
10
11. What we really need...
• A flexible mapping between customer behaviour
and performance/capacity metrics of the system
(recall performance goals)
• But there is a problem…There is HUGE number of different
behaviours – even in the simplest of system…
11
12. Can we simplify the problem?
• Can we reduce the problem space and still have
something useful/practical?
– Very few performance aspects are pass/fail (outside of
HRT/military/etc.)
• Willing to trade-off accuracy for speed
– No need to be more accurate then inputs
12
13. Transaction – an “atomic performance unit”
• System processes TRANSACTIONS
– 80/20 Rule - 20% of TRANSACTIONS responsible for 80%
of “performance” during Steady state operations
– Focus on steady state (payload) - but other operation
states can be defined
13
14. What is a TRANSACTION from performance perspective?
• What does the system do most of the time (payload)?
– Processes events of type X from device B (….transaction T1)
– Produces reports of type Y (… transaction T2)
– Updates GUI (… transaction T3)
– Processes login’s from GUI (… transaction T4)
• How often does it do it?
– Processes events of type X from device B – on avg, 3 per sec.
– Produces reports of type Y – once per hour
– Updates GUI – once every 30 sec
– Processes login’s from GUI – on demand, on avg 1 per 10 min.
• How much do we “pay” for it?
– cpu?
– Memory?
– Disk?
…
14
16. Performance/Capacity – 3+ way view
Latency +
HW
Other Constraints
• Behaviour – transactions
and frequencies Costs
– E.g. faults, 10 faults/sec COST Resource
MODEL
– authentication, Requirements
Behaviour
1 authentication/sec
• Costs – the price in terms of resources “paid” per transaction
– E.g. 2% of CPU for every fault/sec
– E.g. 8% of CPU for every RAD Authentication per/sec
• Resource Utilization – the price in terms of resources for the given
behaviour:
– E.g. (2% of CPU for every fault/sec * 10 faults/sec) +
(8% of CPU for every Authentication per/sec * 1 authentication/sec) =
28%
• Costs can be used directly to estimate latency impact (lower bound)
– E.g.: 2 AA/sec -> 16% CPU impact
– 3 sec 10 AA/sec burst with only 10% CPU available -> 24 sec latency (at least!)
16
17. Steps to build the Cost Model
• Behaviour
– Decompose system into mutually-orthogonal performance
transactions
– Identify expected frequencies (ranges of frequencies) per transaction
• Costs
– Measure the incremental costs per transaction on a given h/w – one TX
at a time
– Identify boundary conditions (Cpu? Threading? Memory? Heap?)
• Constraints
– Identify latency requirements and other constraints
• Build spreadsheet model
– COSTS x BEHAVIOR -> REQUIREMENTS (assume linearity at first)
– Calibrate based on combined tests
17
18. Identifying Transactions
• Identify main end-to-end “workflows”
though the system and their frequencies
• However since workflows contain common
portions they are not “orthogonal” from
performance perspective (resources/rates
may not be additive)
• Identify common portions of the workflows
• The common portions are “transactions”
• A workflow is represented by a sequence of
one or more transactions
18
19. Costs example
80.0
60% y = 0.048x + 0.006 12
70.0 TOTAL Vmstat1: 2
CPU R = 0.9876
60.0 usr 50% 10
50.0 sys CPU%
40.0 40% LATENCY 8
30.0 Linear
30%
(CPU%) 6
20.0
10.0 20% 4
0.0
11/05-08:54:23
11/05-08:59:23
11/05-09:04:23
11/05-09:09:23
11/05-09:14:24
11/05-09:19:24
11/05-09:24:24
11/05-09:29:24
11/05-09:34:24
11/05-09:39:25
11/05-09:44:25
11/05-09:49:25
11/05-09:54:25
10% 2
0% 0
1 2 3 4 5 6 7 8 9 10 11 12
Resources (CPU%/Latency) Measured for 2/4/6/8/10/12 requests/sec
LATENCY = exponential after 10 RPS => MAX RATE = 10 RPS
• Process is NOT CPU bound (there is lots of spare CPU% @ 10 RPS)
• (In this case it is limited by the size of a JVM’s heap)
Incremental CPU utilization = 4.8% of CPU per request
• Measured on Sun N440 (4 CPUs, 1.6 GHz each) – 6400 MHz total capacity
• COST = 4.8% * 6400 MHz = 307.2 MHz per request
19
20. Transaction Cost Matrix
• Transaction Costs
– Include resource cost (can be multiple resources)
– Can depend on additional parameters (e.g. “DB Insert" depends on the
number of DB records)
– Can include MaxRate (if limited by a constraint other then the resource,
e.g. CPU).
• Example of a transaction cost matrix (SA is a parameter the
particular transaction deepens on - db size)
ALMINS Cost
SA MHz MaxR
0 12.0 125.0
10000 15.5 96.6
30000 16.5 90.8
60000 18.0 83.3
100000 20.0 75.0
200000 24.9 60.1
20
21. Constraints / Resources
• CPU
– Overall CPU utilization is additive per transaction (most of the time)
– If not – then transactions are not orthogonal – break down or use worst case
• MEMORY / Java HEAPs
– If there is no virtual memory (e.g. vxWorks) then additive; treat like CPU
– If there is virtual memory – then much trickier, no concept of X% utilization need to do direct testing.
– Heap sizes for each JVM – can be additive within each JVM
• DISK
– Additive, must take purging policies and retention periods into account.
• IO
– Additive, read/write rates are additive, but total capacity would depend on %waiting / svt and depend on
manufacturer, io pattern, etc. Safe limits can be tested separately
• BW
– Additive
– “effective” BW depends on RTT
• Threading
– Identify threading model for each TX – if TX is single-threaded then scale w.r.t.clock rate of the single HW Thread; if
multithreaded then scale w.r.t. entire system e.g:
• Suppose a transaction X “costs” 1000 MHz and is is executed on a 32 CPU system with 500 MHz per CPU
• If it is single-threaded – it will take NO LESS then 2 seconds
• If it is multi-threaded – it will take NO LESS then 1000/(32*1000) ~ 0.03 seconds
• Latency
– For “long” transactions - measure base latency – then scale using threading. Use RTT to compute impact if relevant
– Measure MAX rate on different architectures – to calibrate
21
22. Do we need to address everything???
• There are lots of constraints…
• May be additional constraints based on 3rd party processing
– Addressing ALL of the in a single model may be impractical
• However – not all of them need to be addressed in every case for
a useful model. For example:
– vxWorks, 1 CPU, 512MB of memory, no virtual memory, pre-emptive
scheduling – focus on MEM
– Solaris, 8 CPUs, 32 h/w strands, 32G memory, - focus on
CPU/Threading
• Only model what is relevant for the system
22
23. Model / Example
Rate
Workflow (/sec)
AU 5
AUPE 7
RAD 0
PAMFTP 0
PAMTEL 0
PAMFTPC 0 Constraint Audit Total CPU% 64.2%
PAMTELC 0 Security/AM Total Security rate greater then AM Max
MGMUSR 0 Security/PAM OK
Behaviour Sustainability
Alarm Rate
At least one rate is not sustainable
Composite alarm rate (INS+UPD) not sustainable
NOS Trigger OK
CWD Clients OK
Overall CPU Unlikely Sustainable
COST Resource
MODEL
Workflow s-MHz Requirements
AU 111
AUPE 222 Constraint CPU Es Disk Nes Disk BW
GET 333 Max Utilization 75% 80% 90% 80%
RAD 777 Max NE Supported 623 800 4482 21836
PAMFTP 555 Constraint AEPS RRPS
Max Utilization 80 5
PAMTEL 444
Max NE Supported 3154 4485
Costs Projected Max Nes 623
23
24. Model Hierarchy
• Transaction Model
– Cost and constraints per individual transaction w.r.t. a number
of options/parameters
– E.g. 300Mhz to process an event
• System Model
– Composite Cost of executing a specific transaction load on a
given h/w
– E.g. 35% cpu for 10 events/sec and 4 user queries/sec on N440
• Business Model
– Mapping of System model to Business metrics
– E.g. N440 can support up to 100 NE
24
25. Using model for scalability and bottleneck analysis
• Mapping between any behavior and capacity requirements
• Mapping the model to different processor architectures
• Can Quantify the impact of a Business request
• Can iterate over multiple “behaviors”
– Extends “What-if” analysis
– Enables operating envelope visualization
– Enables resource bottleneck identification
25
27. Identifying resource allocation – by TRANSACTIONS /
Applications
40%
CPU Distribution by feature IPCom m s
35% 32%
(500A x 15K cph)
Base
30% 27%
25% IMF
20% RAM - Top 10 Users Logs
14% 13%
15% OSI
10% STACK
4% 5% AppLd
3%
5% 2% 1% PP
0% 0% 0% 0% 0%
0%
GES
Give IVR
Give RAN
RTData API
RTDisplay
Intrinsics
Queuing
MlinkScrPop
CalByCall
Blue DB
Base CP
Collect
Hdx
Reports
Broadcasts
Digits
DB
HIDiags
OTHER
Disk_NE_Loads_GB,
7
Disk_Security_Logs
_GB, 1
FREE
Disk CACP_GB, 3 spare, 37
Disk NE B&R_GB, 7
Disk_Alarm_DB_GB,
10
Disk_NE_LOG_GB,
57 Disk_Alarms_REDO_
GB, 5
Disk_PM_GB, 11
27
29. Nice charts – but how accurate are they?
Models are from God…. Data is from the Devil
(http://www.perfdynamics.com/)
• Initially WAY more accurate then behavior data
• Within 10% of combined metrics – for an “established” model
• Less accurate as you extrapolate further form measurements
• Model includes guesses as well as measurements
• The value is to establish patterns rather then absolute numbers.
29
30. Projects where this was applied
• Call Centre Server (WinNT platform, C++)
• Optical Switch (VxWorks, C, Java)
• Network Management System (Solaris, Mixed, 3rd party, Java)
• Management Platform Application (Solaris, Mixed, 3rd party, Java)
• …
30
31. Addressing Uncertainties - recap
Uncertainty Cost Based Model “Traditional”
Behavior Forecast ANY behavior Worst Case Over-Engineering
ERALY
TigerTeam- LATE
Compute Operating
Envelope
Code Treat as “black box” ??? KPI ??? BST ???
No access needed
Costs w.r.t. behavior not
code
H/W Forecast h/w impact EARLY Worst Case Over-Engineering
Small number of “pilot” TigerTeam- LATE
tests
Compute Operating
Envelope
31
32. Cost Reduction
• Significantly reduces the number of tests needed to compute
operating envelope.
– Suppose the system has 5 transactions defined, need to compute operating
envelope with 10 “steps” for each transaction (e.g. 1 per sec, 2 per sec, ... 10
per sec).
– Using BST type “brute force” testing we will need to run 10 * 10 * 10 * 10 *10
tests (one for each rate combination), in total 100,000 tests
– Using the model approach we would need to run 10+10+10+10+10 tests, in
total 50 tests (there will be additional work for calibration, model building, etc
but the total costs will be much smaller then running 100K big system tests)
– Each individual test is much simpler then BST and can be automated
– H/w cost reduction – less reliance on BST h/w, using pilot tests can map from
one h/w platform to another
32
33. How does the Cost Model fit in the dev cycle?
33
34. Performance/Capacity
Typical Focus at the wrong places
Planning Development Product
Verification Tiger Team
KPI KPI
Definition ? Validation
(PLM) (PT/SV)
• Uncertainty of expected customer scenarios at planning stage (at the time of
KPI commitment – specifically for platform)
• Issues discovered late – expensive to fix (=tiger teams) or over-engineering
• No early capacity/performance estimates to customers
• No sensitivity analysis – what is the smallest/greatest contributor to resources?
Under what conditions?
• Validation involves BST type of tests; expensive; small number of scenarios
(S/M/L)
• No results portability: validation results are difficult/impossible to map/generalize
to specific customer requirements
34
35. Performance/Capacity – Activities
Performance With Cost Based Model “Traditional”
“Competency”
Validation Validate Model Validate Requirements
BST (S/M/L) ???
Tracking Transaction Costs ??? KPI ???
Non-Degradation w.r.t. Transaction ??? KPI ??
Characterization / w.r.t Transaction ??? Worst Case ???
Forecasting / Sensitivity
Optimization Proactive, focus on Tiger Team
specific
Transaction/Behavior
Perf Info Communication / Model Based ??? KPI ???
Portability
Transaction Based
35
36. Performance/Capacity – Key approaches
• All activities are focused on “transactions” metrics (these are
“atomic” metrics and are much easier to deal with then the
“composite” metrics such as KPI, BST, etc)
• All activities are flexible and proactive
• Start performance activities as early as possible and increase
accuracy throughout the design cycle
36
37. Performance/Capacity – Model driven
• Identify key transactions throughout the dev cycle
• Quantify behaviour in terms of transactions
• Automate test/measurements per transaction (not all, but most
important)
• Automate monitor/measurement/tracking of transaction costs – as
part of sanity process (weekly? Daily? – automated)
• Tight cooperation between testers/designers
• Model is developed in small steps and contains latest
measurements and guesses
• Product verification – focus on model verification/calibration
– runs “official” test suite (automated) per transaction
– Runs combined “BST” (multiple transactions) – to calibrate the model
37
38. Automated Transaction Cost Tracking
• Approximately 40 performance/Capacity CRs raised prior system verification
stage
• Identification of bottlenecks (and feed-back to design)
• Continuous capacity monitoring – load-to-load view
• Other metrics collected regularly
CPU (%)
25
20
TotCPU(vmstat)
JavaCPU
15 OracleCPU
300 TotCPU(prstat)
PropD(ms) Delay
10 SysCPU(vmstat)
QueueD(ms)
250 PagingCPU
PubD(ms)
OtherCPU
5 ProcD(ms)
200
MPSTDEV
SY/msec
- 150
cj
cn
db
dp
ed
ct
ds
dz
df
ef
CS/msec
100
50
- 1400dm
1400di
1400gl
1400co
1400dp
1400du
1400go
1400gu
1400ct
1400dc
1400dy
1400gf
38
39. Cost Based Approach – Responsibilities and Roles
Design
Design focus – Monitoring
decompose into Focus – track Validation
transactions transaction Focus – verify
costs capacity as
estimated
Costs
Resource Req
Behavior
Forecasting focus –
estimate
requirements,
Business focus sensitivity analysis,
–quantify what if...
behavior
39
41. Benefits – technical and others
• Communication across groups – everyone speaks the same language (well
defined transactions/costs).
• “De-politization” of performance eng – can’t argue/negotiate – the numbers and
trade-offs are clear.
• Better requirements – quantifiable, PLM/Customer can see value in quantifying
behaviour
• Documentation reduction – engineering guides are replaced by the model; the
perf related documentation can focus on improvements, etc.
• Early problem detection - most performance problems are discovered before
the official verification cycle
• Easy resource leak detection – easily traceable to code changes
• Reproducible/automated tests – same tests scripts used by design/PV
• Cost Reduction – less need for BST type of tests, less effort to run PV, reduced
“over-engineering”
41
43. Other issues to consider
• Tools
– Automation (!!!!)
– perf tracing/collection tools, transaction stat tools, transaction load, visualization, data
archiving
– native, simple, ascii + excel
• Organization (info flow/responsibilities)
– good question, would depend on size and maturity of the project
– Best if driven by design rather then qa/verification
– Start slowly
• Performance Requirements definition
– trade-offs, customer traceable, never “locked”
• Performance documentation
– Is ENG Guide necessary?
• Using LOADS instead of transactions
– possible if measurable directly
• Linear Regression instead of single TX testing
– possibly for stable systems
43
47. Transaction cost testing
• How to measure workflow cost?
– For each workflow , run at least 4 test cases, each corresponding to the different rate of
workflow execution.
• For example, for RAD1 run 4 test cases for 1, 3, 6 and 10 radius requests per
second. The actual rate should result in CPU utilization between 20% and 75% for
the duration of the test. If the resulting CPU is outside of these boundaries – modify
the rate and rerun the test (the reason is that we want the results to represent
sustainable scenarios, short term burst analysis is a separate issue).
– For each test collect and report CPU, memory and latency (as well as failure rate) before,
during and after the test (about 5 min before, 5 min for test, 5 min after).
– Preserve all raw data (top/prstat, etc. outputs) for all tests – these may be required for
further analysis.
• Automate the test-case so that it is
e.g. 10 RAD1
per second
possible to run it after each sanity to
Resource
(CPU%) track changes
CPU_tst
• Data to report/publish
– Marginal CPU/resource per workflow
CPU_bcg rate
CPU_pp
– I can help with details
time
T_R_start T_E_start T_E_end T_R_end
T_PP_end
47
48. Metrics to be recorded/collected during a test
In this chart CPU is used as an example, but the same methodology applies to all resources – memory, heap, disk io, CPU, etc.
Key metrics to collect during a test Derived Metrics – to be included in performance report
T_R_start Time data recording started mCPU_tst CPU_tst – CPU_bcg (marginal test cost)
T_E_start Time Event injection started. Assuming events are injected at mCPU_pp CPU_pp – CPU_bcg (marginal post-processing cost. If
a constant rate for the entire duration of the test post-processing is not 0 then the EPS rate is not
T_E_end Time Event injection ended sustainable over long time)
EPS Rate of event injection during the test (between T_E_start and mT_tst T_E_end – T_E_start (duration of the test/injection)
T_E_end). Rate is constant during the test
mT_pp T_PP_end – T_E_end (duration of post-processing –
T_PP_end Time Post-Processing ended ideally this should be 0)
T_R_end Time Recording is ended Ideally the resource utilization during the test is “flat” and returns to pre-test levels
after the test is completed. To verify this compare the measurements before/after
CPU_tst CPU% utilization during test tests (points 1 and 5 on the chart) and at the beginning and at the end of the test
CPU_pp CPU% utilization during post-processing (points 2 and 3 on the chart)
CPU_bcg Background CPU% utilization dCPU_bcg CPU_5 – CPU_1 (if >0 then resource is not fully released
after test)
Enough samples must be collected to be able to produce a chart as below for all
resources: CPU (total and by process) Memory (total and by process); Heap (for specific dCPU_tst CPU_3 – CPU_2 (if >0 then there may be a resource “leak”
JVMs), IO, disk. The chart does not need to be included in the report but it must be during the test)
available for analysis.
Application should also monitor/record its Latency and Failure rate – this is application
specific, but it should be collected/recorded in such a way that it can be correlated with TOOLS
the resource chart. Avg latency and Avg Failure rate during the test is NOT sufficient
– it does not show the trends. Any tool can be used to collect the metrics – as long as it can collect multiple
periodic samples. As a rule of thumb collect about 100 samples for the pre-test idle,
200 samples per test and another 100 after test. If you collect a sample once per 10
Resource CPU%, sec the overall test duration will be a bit more then 1 hrs. The following are
(CPU%) Memory, examples:
latency, heap
CPU_tst 2 3 prstat –n700 for individual process CPU and memory (-n700 to collect up
to 700 processes regardless of their cpu% - to make sure
you get a complete memory picture)
4
CPU_pp TOP / ps These can be used instead of prstat
vmstat For global memory/cpu/kernel CPU
CPU_bcg
1 5
Iostat If you suspect io issues
jstat -gc For individual JVM heap/GC metrics; look at OC and OU
time parameters.
T_R_start T_E_start T_E_end T_PP_end T_R_end
48
49. Perfect Case - SUSTAINABLE
No Post processing mT_pp =0
Resource utilization is dCPU_tst = dMEM_tst = 0
flat during test
All resources recover dCPU_bcg = dMEM_bcg = 0
completely
CPU% per 1 EPS mCPU_tst/EPS
Memory specific Process /system memory and heap may grow – if logically the events create objects that
should stay in memory (e.g. you are doing discovery and are adding new data structures).
Memory/heap may also grow initially after T_E_start but should stabilize before T_E_end –
this represents build-up of working sets. In this case memory may not be released fully
upon completion of the test. In that case run the test again – if the memory keeps on
increasing this may indicate a leak.
e.g. 10 RAD1
per second
Resource The overall CPU used in this case is the area
(CPU%)
under the utilization curve – the blue square
CPU_tst
Make sure that latency/success rate during
the test is acceptable. It is possible that the
CPU_bcg
CPU_pp
resource profile may look perfect but the
events may be rejected due to the lack of
time resources.
T_R_start T_E_start T_E_end T_R_end
T_PP_end
49
50. Post-Processing – NOT SUSTAINABLE / BURST test
Post processing mT_pp > 0 means that the system was not able to process events at the rate they arrive, this can
detected be due to CPU utilization, or due to threading or other resource contention. In this case you may
see that
– the latency is continuously increasing between T_E_Start and T_E_end
– Memory (or old heap partition of a JVM) is continuously increasing between T_E_start and
T_E_end and then starts decreasing during post-processing. This is because the events that
cannot be processed in time must be stored somewhere. (see green line on the chart).
– The failure rate may increase towards the end of the test
Load This load is unsustainable over the long period of time - can be hours or days – but the
unsustainable system/process will either run out of memory or will be forced to drop outstanding events
May be Although this rate is unsustainable for long time it may be acceptable for short bursts / peaks. The
acceptable for duration of post-processing and the rate of growth of the bounding factor (memory/heap/threads)
short burst/peaks will help determine the max duration of the burst.
CPU% per 1 EPS The overall CPU used in this case is the area under the utilization curve – the blue square and the
pink square. It is possible to predict how much CPU would be used by 1 EPS if the bottleneck is
removed (e.g. if threading is a bottleneck and we add more threads):
(mCPU_tst + mCPU_pp*mT_pp/mT_tst) / EPS
In case of post processing it is important to determine what
Resource is the boundary condition:
(CPU%)
CPU%
Memory, f CPU utilization during test is90% or more then it is likely
CPU_tst latency, that we are bounded by CPU
heap
CPU_pp If memory/heap of component A grows and component A
has to pass events to component B, then B may be a
CPU_bcg bottleneck
If component B uses 1 full CPU (25%) then it is likely single-
threaded and threading is the issue
time
T_R_start T_E_start T_E_end T_PP_end T_R_end If component B does disk io, or other type of access that
requires waiting then this can be the bottleneck
50