What is A/B-testing? An Introduction

A/B-Testing: An Introduction
What is it? Why Use it?

Prediction in Predictable Environments
Predictable Models Excel in Deterministic
Environments
Statics & Dynamics Don’t Change
• ‘Fitness’ for purpose always
measured the same
• Frictionless Pendulum swing Very
Predictable
– Simple Harmonic Motion
• Control Systems
– e.g. Anti-lock Braking System
Sacrilege:
Learning is pointless (it’s all known), thus
Waterfall/Heavy Development Methods
Excel! :-O
Time Period give by

Uncertain/Unpredictable Contexts
• Human Interaction
Uncertain.
• Everyone is…
– Different
– [Relatively] fickle
– Growing Older
– Influenced By Other Stuff
– …
• Definition of fitness for
purposes changes
• In fact, Everything
Changes!

Story of the Foot
• Once upon a time there was a foot which Belonged to the
King of a Powerful Kingdom
• He Reigned Supreme because All Swords Had to be 7 ft
Long
• King dies naturally and a new King is Coronated
• But he has a Big Ego and Really Small Feet
– Half the length of Previous King
• He Ordains All Swords Now not Fit for Purpose
• So they’re Melted & Remade to 7 of his feet
• Along come Evil Army with swords now Twice as Long
• Nobody in the Kingdom Lived Happily Ever After! :-(

Q: HOW CAN WE EVER BE
PREDICTABLE?

Pick Your Tool: Certainty v Uncertainty
Predictable Environments
• Lots known up front
• ‘Variables/factors’ can all be identified…
• …So can predict with high certainty
where whole systems will be in t time-steps
– seconds, minutes, hours, days, weeks,
months, years…
• Little Need to Adapt
• Most appropriate for Standards Models
– SI Units
– HTTP/SMTP/POP3…
• ‘Dictate works’, not nice, but true
• e.g. ‘7ft’ Swords will have continued to
exist
– Even if the heads of the blacksmiths didn’t.
Uncertain Environments
• Very little known up front
• Variable levels of traffic,
experience etc.
• ‘Fitness function’ itself
changes
– e.g. King changes = Foot
changes
• Continual need to check the
fitness function…
– e.g. Customer reviews,
performance metrics
• Infers Continual Need to
Change/Improve Systems

EXAMPLE: Running a Bath (Uncertain)
Predictable Models
• Don’t know the water temperature
• Never done it before
1. Put hot tap on for 5 minutes
2. Cold Tap on for 2 minutes
3. Get in
Risks
Scolding your Jewels and More!
Uncertainty Models
• Don’t know the water temperature
• Never done it before
1. Put hot tap on for 5 seconds
2. Put cold tap on for 2 seconds
3. Dip toe in
4. If
• Too hot add cold water
• Too cold add hot water
• Else get in & relax
5. Go to 1 (Rinse, Repeat)
Risks
Slightly more time to get to ideal
temperature, but gets there with much less
risk of burning crucial elements and
potential less water waste.

EXAMPLE: Running a Bath Cycle
Run Water
(Hot and
Cold) - Build
Test with
‘Toe’ -
Measure
Evaluate
Temperature
- Learn
Best test this with
my toe, so I don’t
scald myself…
Ahh, F@#*!!!
THAT’S HOT!
I burnt my
toe! Not
doing that
again!

Dealing with Uncertainty
• More variables than equations to solve them…
• …Hence optimisation problem (no unique solution)
• Like it or not, iterative cycles work best
– Build-Measure-Learn; DMAIC
• Frequent Experiments & Actionable Change
• Control by Experimental Design Principles
– Test one change in isolation
– Compare against a control group/result
– Randomise Groupings
– Double Blind
• Plus, smaller tasks = smaller variance = greater certainty
Gold Standard: Randomised Double Blind Controlled Trial

Definition: Randomised
• Two groups
• Randomly Assign
Subjects to Each Group

Definition: Double Blind
Both Researcher & Subject
Don’t know which group
they are assigned to.
So researcher and subject
behave the same for A
and B tests.
TIP: Automated allocation
Image via ’John the Math Guy’

Definition: Controlled
Every potential factor is
fixed aside from the factor
under test.
Minimises ‘confounding
variables’
e.g. If someone goes outside and
gets wet, does it mean it’s raining?
Image via ‘Not the average’ blog

Designing Experiments
• Start with Hypothesis
– Include theory if analytical
• Experiment AGAINST a control group!
– Control Group = Baseline to compare against (B-test)
– Experimental Group is A-test
• Randomly Allocate Control & Experimental Group
– Ideally Researcher & Subject Can’t Know
• Analyse Results, Conclude AND Act!

Caution
• Change only one thing at once!
– Can do A/B/n tests, but have to be linearly independent variables
• statistically, not a certainty!
• Objective: Make sure results aren’t by chance (e.g. against placebo)!
• Analyse against ‘Null’ Hypothesis
– Opposite of what you are trying to prove
• Factor in type 1 & 2 statistical errors
– False positive and Negatives
• Your test is alternate hypothesis
• If Null hypothesis (Chance) is very very small, accept Alternate
hypothesis…
– ‘Small-p’ = probability null hypothesis is true
• …which you are trying to prove!
• Otherwise, no choice but to accept null hypothesis

Q: Where Can A/B-testing Be Used?
A: EVERYWHERE!

Where Can A/B-Tests Be Used?
• Guerrilla testing
• Lean-Startup A/B-Tests (tech, marketing etc.)
• Pilots
• Experiments
• Proof of Concepts
• Software Development Team Retrospectives
• Manufacturing Processes
• Change Programmes
• Departmental Effectiveness
• …

Q: What tools can we use?
A: STATISTICS

Toolbox: Normal Distribution
Data that is normally distributed
shown as a continuous line.
Fixed width histogram = Same
(right)
Pros:
1. Incredibly diverse
2. Tables/Excel Functions exist
Cons:
1. Needs many samples (25+)
– Errors significantly impact
result & need other ways (e.g.
t-test)
2. Can’t Always Force Normality
– But story point estimates can!
Source: Critical Numbers Group Sheffield University

Toolbox: Confidence Intervals
Indicates reliability of estimate, given
data = Likelihood that result falls
within values of x-standard
deviations of the mean.
Answers “How sure are you that this
result was expected?”
Pros:
1. Easy to do
2. Excel Functions/Libraries exist
Cons:
1. Same weakness as normal
distribution
2. Arbitrary confidence intervals
– Researcher chooses, but 95%
defacto standard (2 sigma)
Source: Moz.com

Toolbox: Correlation Matrix
Matrix of elements. Each is
correlation coefficient of data v data.
“How strongly does this relate to
that?”
High correlation -> dig deeper
Pros:
1. Excel Functions/Libraries exist
Cons:
1. Correlation isn’t Causation!
2. More of a ‘faff’ in Excel
– Prone to human error in analysis
Source: Genome biology

Toolbox: Factor Analysis
Using correlation matrix to identify
factors, determine independent
variables for dependent variables.
Pros:
1. Linear Algebra tools to help
2. Identifies combinations of
factors
Cons:
1. Excel doesn’t support it native
2. ‘Cancelling’ factors or
confounding factors problematic
3. Have to understand linear
algebra
4. Basically an approximation (so
what’s good enough?)
Source: Kovach Computing Services

Definitions
TERM DESCRIPTION
Dependent Variable A variable that depends on one or more other
variables (y = x + 2, y is dependent, x is independent)
Independent Variable A variable that does not depend on the value of any
other variable.
Confounding Variable A variable that could independently present the same
result as some other variable. This reduces the
credibility and certainty of a result (e.g. if I go outside
and I get wet, is it because it was raining?)
Distribution The ‘shape’ of the graph of a random variable
Type 1 Error (False
Positive)
Declaring a result as confirmed when it’s not, usually
through experimental error.
Type 2 Error (False
Negative)
Declaring a result as false when it’s true. Usually by
experimental or interpretive error..

Thanks for Viewing
Further Reading
Random Variables and Probability Distributions
https://www.khanacademy.org/math/probability/random-variables-topic/
random_variables_prob_dist/v/random-variables
Khan Academy
Confidence Intervals
http://en.wikipedia.org/wiki/Confidence_interval
Normal Distribution
http://en.wikipedia.org/wiki/Normal_distribution
“Correlation & Dependence” Wikipedia
http://en.wikipedia.org/wiki/Correlation_and_dependence
Factor Analysis
http://en.wikipedia.org/wiki/Factor_analysis
Genome Biology
http://genomebiology.com/
Publishes research, software and new methods
Ethar Alali @EtharUK @Dynacognetics
Managing Director & Chief Architect
Polymath-MathMo. Programming since 9 years old. TOGAF 9 Certified, change agent.
Blog: GoadingtheITGeek.blogspot.co.uk
About Us
Specialist ICT Strategists & Advisors.
Member of HiveMind Network for some of
the biggest household and corporate multi-nationals.
Accredited Growth Voucher Advisors
certified to deliver IT & Web Growth
Consultancy as part of the government’s
Growth Voucher Scheme.
Accreditations & Associations

What is A/B-testing? An Introduction

Recommended

Recommended

More Related Content

Similar to What is A/B-testing? An Introduction

Similar to What is A/B-testing? An Introduction (20)

More from Axelisys Limited

More from Axelisys Limited (12)

Recently uploaded

Recently uploaded (20)

What is A/B-testing? An Introduction