SlideShare a Scribd company logo
1 of 23
A/B-Testing: An Introduction 
What is it? Why Use it?
Prediction in Predictable Environments 
Predictable Models Excel in Deterministic 
Environments 
Statics & Dynamics Don’t Change 
• ‘Fitness’ for purpose always 
measured the same 
• Frictionless Pendulum swing Very 
Predictable 
– Simple Harmonic Motion 
• Control Systems 
– e.g. Anti-lock Braking System 
Sacrilege: 
Learning is pointless (it’s all known), thus 
Waterfall/Heavy Development Methods 
Excel! :-O 
Time Period give by
Uncertain/Unpredictable Contexts 
• Human Interaction 
Uncertain. 
• Everyone is… 
– Different 
– [Relatively] fickle 
– Growing Older 
– Influenced By Other Stuff 
– … 
• Definition of fitness for 
purposes changes 
• In fact, Everything 
Changes!
Story of the Foot 
• Once upon a time there was a foot which Belonged to the 
King of a Powerful Kingdom 
• He Reigned Supreme because All Swords Had to be 7 ft 
Long 
• King dies naturally and a new King is Coronated 
• But he has a Big Ego and Really Small Feet 
– Half the length of Previous King 
• He Ordains All Swords Now not Fit for Purpose 
• So they’re Melted & Remade to 7 of his feet 
• Along come Evil Army with swords now Twice as Long 
• Nobody in the Kingdom Lived Happily Ever After! :-(
Q: HOW CAN WE EVER BE 
PREDICTABLE?
Pick Your Tool: Certainty v Uncertainty 
Predictable Environments 
• Lots known up front 
• ‘Variables/factors’ can all be identified… 
• …So can predict with high certainty 
where whole systems will be in t time-steps 
– seconds, minutes, hours, days, weeks, 
months, years… 
• Little Need to Adapt 
• Most appropriate for Standards Models 
– SI Units 
– HTTP/SMTP/POP3… 
• ‘Dictate works’, not nice, but true 
• e.g. ‘7ft’ Swords will have continued to 
exist 
– Even if the heads of the blacksmiths didn’t. 
Uncertain Environments 
• Very little known up front 
• Variable levels of traffic, 
experience etc. 
• ‘Fitness function’ itself 
changes 
– e.g. King changes = Foot 
changes 
• Continual need to check the 
fitness function… 
– e.g. Customer reviews, 
performance metrics 
• Infers Continual Need to 
Change/Improve Systems
EXAMPLE: Running a Bath (Uncertain) 
Predictable Models 
• Don’t know the water temperature 
• Never done it before 
1. Put hot tap on for 5 minutes 
2. Cold Tap on for 2 minutes 
3. Get in 
Risks 
Scolding your Jewels and More! 
Uncertainty Models 
• Don’t know the water temperature 
• Never done it before 
1. Put hot tap on for 5 seconds 
2. Put cold tap on for 2 seconds 
3. Dip toe in 
4. If 
• Too hot add cold water 
• Too cold add hot water 
• Else get in & relax 
5. Go to 1 (Rinse, Repeat) 
Risks 
Slightly more time to get to ideal 
temperature, but gets there with much less 
risk of burning crucial elements and 
potential less water waste.
EXAMPLE: Running a Bath Cycle 
Run Water 
(Hot and 
Cold) - Build 
Test with 
‘Toe’ - 
Measure 
Evaluate 
Temperature 
- Learn 
Best test this with 
my toe, so I don’t 
scald myself… 
Ahh, F@#*!!! 
THAT’S HOT! 
I burnt my 
toe! Not 
doing that 
again!
Dealing with Uncertainty 
• More variables than equations to solve them… 
• …Hence optimisation problem (no unique solution) 
• Like it or not, iterative cycles work best 
– Build-Measure-Learn; DMAIC 
• Frequent Experiments & Actionable Change 
• Control by Experimental Design Principles 
– Test one change in isolation 
– Compare against a control group/result 
– Randomise Groupings 
– Double Blind 
• Plus, smaller tasks = smaller variance = greater certainty 
Gold Standard: Randomised Double Blind Controlled Trial
Definition: Randomised 
• Two groups 
• Randomly Assign 
Subjects to Each Group
Definition: Double Blind 
Both Researcher & Subject 
Don’t know which group 
they are assigned to. 
So researcher and subject 
behave the same for A 
and B tests. 
TIP: Automated allocation 
Image via ’John the Math Guy’
Definition: Controlled 
Every potential factor is 
fixed aside from the factor 
under test. 
Minimises ‘confounding 
variables’ 
e.g. If someone goes outside and 
gets wet, does it mean it’s raining? 
Image via ‘Not the average’ blog
Designing Experiments 
• Start with Hypothesis 
– Include theory if analytical 
• Experiment AGAINST a control group! 
– Control Group = Baseline to compare against (B-test) 
– Experimental Group is A-test 
• Randomly Allocate Control & Experimental Group 
– Ideally Researcher & Subject Can’t Know 
• Analyse Results, Conclude AND Act!
Caution 
• Change only one thing at once! 
– Can do A/B/n tests, but have to be linearly independent variables 
• statistically, not a certainty! 
• Objective: Make sure results aren’t by chance (e.g. against placebo)! 
• Analyse against ‘Null’ Hypothesis 
– Opposite of what you are trying to prove 
• Factor in type 1 & 2 statistical errors 
– False positive and Negatives 
• Your test is alternate hypothesis 
• If Null hypothesis (Chance) is very very small, accept Alternate 
hypothesis… 
– ‘Small-p’ = probability null hypothesis is true 
• …which you are trying to prove! 
• Otherwise, no choice but to accept null hypothesis
Q: Where Can A/B-testing Be Used? 
A: EVERYWHERE!
Where Can A/B-Tests Be Used? 
• Guerrilla testing 
• Lean-Startup A/B-Tests (tech, marketing etc.) 
• Pilots 
• Experiments 
• Proof of Concepts 
• Software Development Team Retrospectives 
• Manufacturing Processes 
• Change Programmes 
• Departmental Effectiveness 
• …
Q: What tools can we use? 
A: STATISTICS
Toolbox: Normal Distribution 
Data that is normally distributed 
shown as a continuous line. 
Fixed width histogram = Same 
(right) 
Pros: 
1. Incredibly diverse 
2. Tables/Excel Functions exist 
Cons: 
1. Needs many samples (25+) 
– Errors significantly impact 
result & need other ways (e.g. 
t-test) 
2. Can’t Always Force Normality 
– But story point estimates can! 
Source: Critical Numbers Group Sheffield University
Toolbox: Confidence Intervals 
Indicates reliability of estimate, given 
data = Likelihood that result falls 
within values of x-standard 
deviations of the mean. 
Answers “How sure are you that this 
result was expected?” 
Pros: 
1. Easy to do 
2. Excel Functions/Libraries exist 
Cons: 
1. Same weakness as normal 
distribution 
2. Arbitrary confidence intervals 
– Researcher chooses, but 95% 
defacto standard (2 sigma) 
Source: Moz.com
Toolbox: Correlation Matrix 
Matrix of elements. Each is 
correlation coefficient of data v data. 
“How strongly does this relate to 
that?” 
High correlation -> dig deeper 
Pros: 
1. Excel Functions/Libraries exist 
Cons: 
1. Correlation isn’t Causation! 
2. More of a ‘faff’ in Excel 
– Prone to human error in analysis 
Source: Genome biology
Toolbox: Factor Analysis 
Using correlation matrix to identify 
factors, determine independent 
variables for dependent variables. 
Pros: 
1. Linear Algebra tools to help 
2. Identifies combinations of 
factors 
Cons: 
1. Excel doesn’t support it native 
2. ‘Cancelling’ factors or 
confounding factors problematic 
3. Have to understand linear 
algebra 
4. Basically an approximation (so 
what’s good enough?) 
Source: Kovach Computing Services
Definitions 
TERM DESCRIPTION 
Dependent Variable A variable that depends on one or more other 
variables (y = x + 2, y is dependent, x is independent) 
Independent Variable A variable that does not depend on the value of any 
other variable. 
Confounding Variable A variable that could independently present the same 
result as some other variable. This reduces the 
credibility and certainty of a result (e.g. if I go outside 
and I get wet, is it because it was raining?) 
Distribution The ‘shape’ of the graph of a random variable 
Type 1 Error (False 
Positive) 
Declaring a result as confirmed when it’s not, usually 
through experimental error. 
Type 2 Error (False 
Negative) 
Declaring a result as false when it’s true. Usually by 
experimental or interpretive error..
Thanks for Viewing 
Further Reading 
Random Variables and Probability Distributions 
https://www.khanacademy.org/math/probability/random-variables-topic/ 
random_variables_prob_dist/v/random-variables 
Khan Academy 
Confidence Intervals 
http://en.wikipedia.org/wiki/Confidence_interval 
Normal Distribution 
http://en.wikipedia.org/wiki/Normal_distribution 
“Correlation & Dependence” Wikipedia 
http://en.wikipedia.org/wiki/Correlation_and_dependence 
Factor Analysis 
http://en.wikipedia.org/wiki/Factor_analysis 
Genome Biology 
http://genomebiology.com/ 
Publishes research, software and new methods 
Ethar Alali @EtharUK @Dynacognetics 
Managing Director & Chief Architect 
Polymath-MathMo. Programming since 9 years old. TOGAF 9 Certified, change agent. 
Blog: GoadingtheITGeek.blogspot.co.uk 
About Us 
Specialist ICT Strategists & Advisors. 
Member of HiveMind Network for some of 
the biggest household and corporate multi-nationals. 
Accredited Growth Voucher Advisors 
certified to deliver IT & Web Growth 
Consultancy as part of the government’s 
Growth Voucher Scheme. 
Accreditations & Associations

More Related Content

Similar to What is A/B-testing? An Introduction

Chapter 2 class version a
Chapter 2 class version aChapter 2 class version a
Chapter 2 class version ajbnx
 
Scientificmethod 120724142851-phpapp01
Scientificmethod 120724142851-phpapp01Scientificmethod 120724142851-phpapp01
Scientificmethod 120724142851-phpapp01Renee Davis
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3jschmied
 
Research methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASResearch methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASNicky Burt
 
Science Inquiry: Experiment Design
Science Inquiry: Experiment DesignScience Inquiry: Experiment Design
Science Inquiry: Experiment DesignI Wonder Why Science
 
Psych Investigations Revision PowerPoint
Psych Investigations Revision PowerPointPsych Investigations Revision PowerPoint
Psych Investigations Revision PowerPointJamie Davies
 
Formulating a Hypothesis
Formulating a HypothesisFormulating a Hypothesis
Formulating a Hypothesisbjkim0228
 
Analytic emperical Mehods
Analytic emperical MehodsAnalytic emperical Mehods
Analytic emperical MehodsM Surendar
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
 
PSYA4 - Research methods
PSYA4 - Research methodsPSYA4 - Research methods
PSYA4 - Research methodsNicky Burt
 
Handout S T Lesson 11 12
Handout  S  T Lesson 11   12Handout  S  T Lesson 11   12
Handout S T Lesson 11 12debalex
 
About your graduate studies part 2
About your graduate studies part 2About your graduate studies part 2
About your graduate studies part 2Seppo Karrila
 
Stage 6 science skills
Stage 6 science skillsStage 6 science skills
Stage 6 science skillscristalbeam
 
L7 method validation and modeling
L7 method validation and modelingL7 method validation and modeling
L7 method validation and modelingSeppo Karrila
 
Things Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression TestingThings Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression TestingTechWell
 

Similar to What is A/B-testing? An Introduction (20)

Chapter 2 class version a
Chapter 2 class version aChapter 2 class version a
Chapter 2 class version a
 
Scientificmethod 120724142851-phpapp01
Scientificmethod 120724142851-phpapp01Scientificmethod 120724142851-phpapp01
Scientificmethod 120724142851-phpapp01
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3
 
Research methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASResearch methods - PSYA1 psychology AS
Research methods - PSYA1 psychology AS
 
Science Inquiry: Experiment Design
Science Inquiry: Experiment DesignScience Inquiry: Experiment Design
Science Inquiry: Experiment Design
 
Psych Investigations Revision PowerPoint
Psych Investigations Revision PowerPointPsych Investigations Revision PowerPoint
Psych Investigations Revision PowerPoint
 
Formulating a Hypothesis
Formulating a HypothesisFormulating a Hypothesis
Formulating a Hypothesis
 
Analytic emperical Mehods
Analytic emperical MehodsAnalytic emperical Mehods
Analytic emperical Mehods
 
Experimental Design
Experimental DesignExperimental Design
Experimental Design
 
ECGS Module 3A
ECGS Module 3AECGS Module 3A
ECGS Module 3A
 
Item analysis
Item analysisItem analysis
Item analysis
 
Science Inquiry: Conclusion
Science Inquiry:  ConclusionScience Inquiry:  Conclusion
Science Inquiry: Conclusion
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
 
PSYA4 - Research methods
PSYA4 - Research methodsPSYA4 - Research methods
PSYA4 - Research methods
 
Handout S T Lesson 11 12
Handout  S  T Lesson 11   12Handout  S  T Lesson 11   12
Handout S T Lesson 11 12
 
About your graduate studies part 2
About your graduate studies part 2About your graduate studies part 2
About your graduate studies part 2
 
Stage 6 science skills
Stage 6 science skillsStage 6 science skills
Stage 6 science skills
 
L7 method validation and modeling
L7 method validation and modelingL7 method validation and modeling
L7 method validation and modeling
 
designs_151.ppt
designs_151.pptdesigns_151.ppt
designs_151.ppt
 
Things Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression TestingThings Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression Testing
 

More from Axelisys Limited

Why Health-Climate-Economics
Why Health-Climate-EconomicsWhy Health-Climate-Economics
Why Health-Climate-EconomicsAxelisys Limited
 
BarCamp Manchester 2016: Neuro, fuzzyio, logical
BarCamp Manchester 2016: Neuro, fuzzyio, logicalBarCamp Manchester 2016: Neuro, fuzzyio, logical
BarCamp Manchester 2016: Neuro, fuzzyio, logicalAxelisys Limited
 
Ethar Alali - Agile Yorkshire September 2015
Ethar Alali - Agile Yorkshire September 2015Ethar Alali - Agile Yorkshire September 2015
Ethar Alali - Agile Yorkshire September 2015Axelisys Limited
 
Taming Uncertainty: Planning Robust A/B-Testing
Taming Uncertainty: Planning Robust A/B-TestingTaming Uncertainty: Planning Robust A/B-Testing
Taming Uncertainty: Planning Robust A/B-TestingAxelisys Limited
 
Analysis 101: What is a System?
Analysis 101: What is a System?Analysis 101: What is a System?
Analysis 101: What is a System?Axelisys Limited
 
Analysis 101 correlation v causation
Analysis 101   correlation v causationAnalysis 101   correlation v causation
Analysis 101 correlation v causationAxelisys Limited
 
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!Agile Estimation @ Lean Agile Manchester: Make Estimates Small!
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!Axelisys Limited
 
Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAgile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAxelisys Limited
 

More from Axelisys Limited (12)

Why Health-Climate-Economics
Why Health-Climate-EconomicsWhy Health-Climate-Economics
Why Health-Climate-Economics
 
Agile Games CRM Saturday
Agile Games CRM SaturdayAgile Games CRM Saturday
Agile Games CRM Saturday
 
BarCamp Manchester 2016: Neuro, fuzzyio, logical
BarCamp Manchester 2016: Neuro, fuzzyio, logicalBarCamp Manchester 2016: Neuro, fuzzyio, logical
BarCamp Manchester 2016: Neuro, fuzzyio, logical
 
Agile games
Agile gamesAgile games
Agile games
 
Ethar Alali - Agile Yorkshire September 2015
Ethar Alali - Agile Yorkshire September 2015Ethar Alali - Agile Yorkshire September 2015
Ethar Alali - Agile Yorkshire September 2015
 
Taming Uncertainty: Planning Robust A/B-Testing
Taming Uncertainty: Planning Robust A/B-TestingTaming Uncertainty: Planning Robust A/B-Testing
Taming Uncertainty: Planning Robust A/B-Testing
 
Start-Up: A Call To Arms
Start-Up: A Call To ArmsStart-Up: A Call To Arms
Start-Up: A Call To Arms
 
Analysis 101: What is a System?
Analysis 101: What is a System?Analysis 101: What is a System?
Analysis 101: What is a System?
 
Analysis 101 correlation v causation
Analysis 101   correlation v causationAnalysis 101   correlation v causation
Analysis 101 correlation v causation
 
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!Agile Estimation @ Lean Agile Manchester: Make Estimates Small!
Agile Estimation @ Lean Agile Manchester: Make Estimates Small!
 
Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control MathsAgile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101: Agile Stats v Command & Control Maths
 
What is Cloud Computing?
What is Cloud Computing?What is Cloud Computing?
What is Cloud Computing?
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

What is A/B-testing? An Introduction

  • 1. A/B-Testing: An Introduction What is it? Why Use it?
  • 2. Prediction in Predictable Environments Predictable Models Excel in Deterministic Environments Statics & Dynamics Don’t Change • ‘Fitness’ for purpose always measured the same • Frictionless Pendulum swing Very Predictable – Simple Harmonic Motion • Control Systems – e.g. Anti-lock Braking System Sacrilege: Learning is pointless (it’s all known), thus Waterfall/Heavy Development Methods Excel! :-O Time Period give by
  • 3. Uncertain/Unpredictable Contexts • Human Interaction Uncertain. • Everyone is… – Different – [Relatively] fickle – Growing Older – Influenced By Other Stuff – … • Definition of fitness for purposes changes • In fact, Everything Changes!
  • 4. Story of the Foot • Once upon a time there was a foot which Belonged to the King of a Powerful Kingdom • He Reigned Supreme because All Swords Had to be 7 ft Long • King dies naturally and a new King is Coronated • But he has a Big Ego and Really Small Feet – Half the length of Previous King • He Ordains All Swords Now not Fit for Purpose • So they’re Melted & Remade to 7 of his feet • Along come Evil Army with swords now Twice as Long • Nobody in the Kingdom Lived Happily Ever After! :-(
  • 5. Q: HOW CAN WE EVER BE PREDICTABLE?
  • 6. Pick Your Tool: Certainty v Uncertainty Predictable Environments • Lots known up front • ‘Variables/factors’ can all be identified… • …So can predict with high certainty where whole systems will be in t time-steps – seconds, minutes, hours, days, weeks, months, years… • Little Need to Adapt • Most appropriate for Standards Models – SI Units – HTTP/SMTP/POP3… • ‘Dictate works’, not nice, but true • e.g. ‘7ft’ Swords will have continued to exist – Even if the heads of the blacksmiths didn’t. Uncertain Environments • Very little known up front • Variable levels of traffic, experience etc. • ‘Fitness function’ itself changes – e.g. King changes = Foot changes • Continual need to check the fitness function… – e.g. Customer reviews, performance metrics • Infers Continual Need to Change/Improve Systems
  • 7. EXAMPLE: Running a Bath (Uncertain) Predictable Models • Don’t know the water temperature • Never done it before 1. Put hot tap on for 5 minutes 2. Cold Tap on for 2 minutes 3. Get in Risks Scolding your Jewels and More! Uncertainty Models • Don’t know the water temperature • Never done it before 1. Put hot tap on for 5 seconds 2. Put cold tap on for 2 seconds 3. Dip toe in 4. If • Too hot add cold water • Too cold add hot water • Else get in & relax 5. Go to 1 (Rinse, Repeat) Risks Slightly more time to get to ideal temperature, but gets there with much less risk of burning crucial elements and potential less water waste.
  • 8. EXAMPLE: Running a Bath Cycle Run Water (Hot and Cold) - Build Test with ‘Toe’ - Measure Evaluate Temperature - Learn Best test this with my toe, so I don’t scald myself… Ahh, F@#*!!! THAT’S HOT! I burnt my toe! Not doing that again!
  • 9. Dealing with Uncertainty • More variables than equations to solve them… • …Hence optimisation problem (no unique solution) • Like it or not, iterative cycles work best – Build-Measure-Learn; DMAIC • Frequent Experiments & Actionable Change • Control by Experimental Design Principles – Test one change in isolation – Compare against a control group/result – Randomise Groupings – Double Blind • Plus, smaller tasks = smaller variance = greater certainty Gold Standard: Randomised Double Blind Controlled Trial
  • 10. Definition: Randomised • Two groups • Randomly Assign Subjects to Each Group
  • 11. Definition: Double Blind Both Researcher & Subject Don’t know which group they are assigned to. So researcher and subject behave the same for A and B tests. TIP: Automated allocation Image via ’John the Math Guy’
  • 12. Definition: Controlled Every potential factor is fixed aside from the factor under test. Minimises ‘confounding variables’ e.g. If someone goes outside and gets wet, does it mean it’s raining? Image via ‘Not the average’ blog
  • 13. Designing Experiments • Start with Hypothesis – Include theory if analytical • Experiment AGAINST a control group! – Control Group = Baseline to compare against (B-test) – Experimental Group is A-test • Randomly Allocate Control & Experimental Group – Ideally Researcher & Subject Can’t Know • Analyse Results, Conclude AND Act!
  • 14. Caution • Change only one thing at once! – Can do A/B/n tests, but have to be linearly independent variables • statistically, not a certainty! • Objective: Make sure results aren’t by chance (e.g. against placebo)! • Analyse against ‘Null’ Hypothesis – Opposite of what you are trying to prove • Factor in type 1 & 2 statistical errors – False positive and Negatives • Your test is alternate hypothesis • If Null hypothesis (Chance) is very very small, accept Alternate hypothesis… – ‘Small-p’ = probability null hypothesis is true • …which you are trying to prove! • Otherwise, no choice but to accept null hypothesis
  • 15. Q: Where Can A/B-testing Be Used? A: EVERYWHERE!
  • 16. Where Can A/B-Tests Be Used? • Guerrilla testing • Lean-Startup A/B-Tests (tech, marketing etc.) • Pilots • Experiments • Proof of Concepts • Software Development Team Retrospectives • Manufacturing Processes • Change Programmes • Departmental Effectiveness • …
  • 17. Q: What tools can we use? A: STATISTICS
  • 18. Toolbox: Normal Distribution Data that is normally distributed shown as a continuous line. Fixed width histogram = Same (right) Pros: 1. Incredibly diverse 2. Tables/Excel Functions exist Cons: 1. Needs many samples (25+) – Errors significantly impact result & need other ways (e.g. t-test) 2. Can’t Always Force Normality – But story point estimates can! Source: Critical Numbers Group Sheffield University
  • 19. Toolbox: Confidence Intervals Indicates reliability of estimate, given data = Likelihood that result falls within values of x-standard deviations of the mean. Answers “How sure are you that this result was expected?” Pros: 1. Easy to do 2. Excel Functions/Libraries exist Cons: 1. Same weakness as normal distribution 2. Arbitrary confidence intervals – Researcher chooses, but 95% defacto standard (2 sigma) Source: Moz.com
  • 20. Toolbox: Correlation Matrix Matrix of elements. Each is correlation coefficient of data v data. “How strongly does this relate to that?” High correlation -> dig deeper Pros: 1. Excel Functions/Libraries exist Cons: 1. Correlation isn’t Causation! 2. More of a ‘faff’ in Excel – Prone to human error in analysis Source: Genome biology
  • 21. Toolbox: Factor Analysis Using correlation matrix to identify factors, determine independent variables for dependent variables. Pros: 1. Linear Algebra tools to help 2. Identifies combinations of factors Cons: 1. Excel doesn’t support it native 2. ‘Cancelling’ factors or confounding factors problematic 3. Have to understand linear algebra 4. Basically an approximation (so what’s good enough?) Source: Kovach Computing Services
  • 22. Definitions TERM DESCRIPTION Dependent Variable A variable that depends on one or more other variables (y = x + 2, y is dependent, x is independent) Independent Variable A variable that does not depend on the value of any other variable. Confounding Variable A variable that could independently present the same result as some other variable. This reduces the credibility and certainty of a result (e.g. if I go outside and I get wet, is it because it was raining?) Distribution The ‘shape’ of the graph of a random variable Type 1 Error (False Positive) Declaring a result as confirmed when it’s not, usually through experimental error. Type 2 Error (False Negative) Declaring a result as false when it’s true. Usually by experimental or interpretive error..
  • 23. Thanks for Viewing Further Reading Random Variables and Probability Distributions https://www.khanacademy.org/math/probability/random-variables-topic/ random_variables_prob_dist/v/random-variables Khan Academy Confidence Intervals http://en.wikipedia.org/wiki/Confidence_interval Normal Distribution http://en.wikipedia.org/wiki/Normal_distribution “Correlation & Dependence” Wikipedia http://en.wikipedia.org/wiki/Correlation_and_dependence Factor Analysis http://en.wikipedia.org/wiki/Factor_analysis Genome Biology http://genomebiology.com/ Publishes research, software and new methods Ethar Alali @EtharUK @Dynacognetics Managing Director & Chief Architect Polymath-MathMo. Programming since 9 years old. TOGAF 9 Certified, change agent. Blog: GoadingtheITGeek.blogspot.co.uk About Us Specialist ICT Strategists & Advisors. Member of HiveMind Network for some of the biggest household and corporate multi-nationals. Accredited Growth Voucher Advisors certified to deliver IT & Web Growth Consultancy as part of the government’s Growth Voucher Scheme. Accreditations & Associations