Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Predictive Performance Testing

Integrating Statistical Tests into Agile Development Lifecycles
Tom Kleingarn
Lead, Performance Engineering
Digital River
http://www.linkedin.com/in/tomkleingarn
http://www.perftom.com

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Agenda
> Introduction
> Performance engineering
> Agile
> Outputs from LoadRunner
> Basic statistics
> Advanced statistics
> Summary
> Practical application

About Me
> Tom Kleingarn
> Lead, Performance Engineering - Digital River

> 4 years in performance engineering
> Tested over 100 systems/applications
> 100’s of performance tests
> Tools
> LoadRunner
> JMeter
> Webmetrics, Keynote, Gomez
> ‘R’ and Excel
> Quality Center
> QuickTest Professional

> Leading provider of global e-commerce solutions
> Builds and manages online businesses for software and game
publishers, consumer electronics manufacturers, distributors,
online retailers and affiliates.
> Comprehensive platform offers
>
>
>
>
>
>
>
>

Site development and hosting
Order management
Fraud management
Export control
Tax management
Physical and digital product fulfillment
Multi-lingual customer service
Advanced reporting and strategic marketing

Performance Engineering
> The process of experimental design, test execution, and
results analysis, utilized to validate system performance as
part of the Software Development Lifecycle (SDLC).
> Performance requirements – measureable targets of speed,
reliability, and/or capacity used in performance validation.
> Latency < 10ms, measured at the 99th percentile
> 99.95% uptime
> Throughput of 1,000 requests per second

Performance Testing Cycle
1. Requirements Analysis
2. Create test plan
3. Create automated scripts
4. Define workload model
5. Execute scenarios
6. Analyze results
>

Rinse and repeat if…
> Defects identified
> Change in requirements
> Setup or environment issues
> Performance requirement not met

Digital River Test Automation

Agile
> A software development paradigm that emphasizes rapid
process cycles, cross-functional teams, frequent
examination of progress, and adaptability.
Initial Plan

Scrum

Deploy

Agile Performance Engineering
> Clear and constant communication
> Involvement in initial requirements and design phase
> Identify key business processes before they are built
> Coordinate with analysts and development to build key
business processes first
> Integrate load generation requirements into project schedule

> Test immediately with v1.0
> Schedule tests to auto-start, run independently
> Identify invalid test results before deep analysis

LoadRunner Results

> Measures of central tendency
> Average = ∑(all samples)/(sample size) =
> Median = 50th percentile
> Mode – highest frequency, the value that occurred the most

> Measures of variability
> Min, max
> Standard Deviation =
> 90th percentile

LoadRunner Results

90%
50%

50%

10%

Basic Statistics – Sample vs. Population
> Performance requirement: average latency < 3 seconds

> What if you ran 50 rounds? 100 rounds?

> Sample – set of values, subset of population
> Population – all potentially observable values
> Measurements
> Statistic – the estimated value from a collection of samples
> Parameter – the “true” value you are attempting to estimate
Not a representative
sample!

> Sampling distribution – the probability distribution of a given
statistic based on a random sample of size n
> Dependent on the underlying population

>

How do you know the system under test met the performance requirement?

Basic Statistics – Normal Distribution
> With larger samples, data tend to cluster around the mean

Basic Statistics – Normal Distribution

Sir Francis Galton’s “Bean Machine”

Confidence Intervals
> The probability that an interval made up of two endpoints
will contain the true mean parameter μ
>

95% confidence interval:

>

… where 1.96 is a score from the normal distribution associated with 95% probability:

Confidence Intervals
> In repeated rounds of testing, a confidence interval will contain the
true mean parameter with a certain probability:

True Average

Confidence Intervals in Excel
Statistic

Value 95%

Value 99%

Formula

Average

3.40

3.40

Standard Deviation

1.45

1.45

Sample size

500

500

Confidence Level

0.95

0.99

Significance Level

0.05

0.01

0.0127

0.167

=CONFIDENCE(Sig. Level, Std Dev, Sample Size)

Lower Bound

3.273

3.233

=Average - Margin of Error

Upper Bound

3.527

3.567

=Average + Margin of Error

Margin of Error

=1-(Confidence Level)

>

95% confidence - true average latency 3.273 to 3.527 seconds

>

99% confidence - true average latency 3.233 to 3.567 seconds

>

Our range is wider at 99% compared to 95%, 0.334 sec vs. 0.254 sec

The T-test
> Test that your sample mean is
greater than/less than a certain
value
> Performance requirement:
Mean latency < 3 seconds
> Null hypothesis:
Mean latency >= 3 seconds
> Alternative hypothesis:
Mean latency is < 3 seconds

Add pic

T-test – Raw Data from LoadRunner

n = 500

T-test in ‘R’
> ‘R’ for statistical analysis
> http://www.r-project.org/

Load test data from a file:
> datafile <- read.table("C:Datatest.data",
header = FALSE, col.names= c("latency"))

Attach the dataframe:
> attach(datafile)
Create a “vector” from the dataframe:
> latency <- datafile$latency

T.Test in ‘R’
> t.test(latency, alternative="less", mu=3, tails=1)
One Sample t-test
data:

latency

t = -2.9968, df = 499, p-value = 0.001432
alternative hypothesis: true mean is less than 3

> There is a 0.14% probability that the true average latency of the
system is greater than 3 seconds. In this case we would reject
the null hypothesis.
> There is a 99.86% probability that the true average latency is
less than 3 seconds

T-test – Number of Samples Required
> power.t.test(sd=sd, sig.level=0.05, power=0.90,
delta=mean(latency)*0.01, type="one.sample")
One-sample t test power calculation

n = 215.5319
delta = 0.03241267
sd = 0.1461401
sig.level = 0.05
power = 0.9
alternative = two.sided

> We need at least 216 samples
> Our sample size is 500, we have enough samples to proceed

Test for Normality
> Test that the data is “normal”
> Clustered around a central value, no outliers
> Roughly fits the normal distribution
> shapiro.test(latency)
Shapiro-Wilk normality test
data:

latency

p-value = 0.8943
> Our sample distribution is approximately normal
> p-value < 0.05 indicates the distribution is not normal

Review
> Sample vs. Population
> Normal distribution
> Confidence intervals
> T-test
> Sample size
> Test for normality
> Practical application
> Performance requirements
> Compare two code builds
> Compare system infrastructure changes

Case Study
> Engaged in a new web service project
> Average latency < 25ms
> Applied statistical analysis
> System did not meet requirement
> Identified problem transaction
> Development fix applied
> Additional test, requirement met

> Prevented a failure in production

Implementation in Agile Projects
> Involvement in early design stages
> Identify performance requirements
> Build key business processes first

> Calculate required sample size
> Apply statistical analysis

> Run fewer tests with greater confidence in your results
> Prevent performance defects from entering production
> Prevent SLA violations in production

Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Similar to Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles (20)

Recently uploaded

Recently uploaded (20)

Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

Editor's Notes