SlideShare a Scribd company logo
The basics of statistical hypothesis
testing in E-commerce.
By Anatoly Vuets
Agenda
• Why do we use (we should use) statistical hypothesis testing in e-commerce?
• Statistical test: how does it work and its main parameters
• Key features for e-commerce
Why do we need statistical
testing in e-commerce?
We need the right decisions
• A/B tests
• Ad-hoc analyses
• Building models
We need the right decisions
• A question: which of these groups makes more profit?
• What is missing here?
We need the right decisions
• A/B test: which version is better?
Statistical test: let’s recall
the basics!
• Random variable (discrete or continuous)
• Probability distribution function (PMF(x), PDF(x))
• Mean M or μ
• Standard deviation SD or σ
Basics of statistics
Basics of statistics: standard
distribution
Statistical test: uncertainty.
...
...
...
...
..................
True metrics value
Statistical population
Sample Possible samples
...
Observed value Other possible values
(distribution)
Uncertainty
We want to conclude about the statistical population based on single sample that we have
observed
Statistical population Observed sample Possible samples
Why is this important?
Distribution of metrics estimate
Statistical test: basic idea
and main parameters.
• We want to test a statement (typically existence of an effect).
• We have a set of observations (sample) from which we conclude the statement.
• Scenario, in which the statement is TRUE is called alternative hypothesis H1.
• Scenario, in which the statement is FALSE is called null hypothesis H0.
• Estimate the probability to observe the sample we have under H0.
• If the probability is high enough - we conclude that H1 can not be accepted. In the opposite
case, we accept H1.
Idea
... H0/H1𝗧(S)
H0: C = 5% H1: C > 5%
Statistical test
H0 H1
H0 Correct
P: 1 - α
Error T1
P: α
H1 Error T2
P: β
Correct
P: 1 - β
Test T(s)
Truth
• Error T1 - accept H1 when H0 is true.
• Error T2 -accept H0 when H1 is true.
• We would like to have a perfect test (α = 0, β = 0).
However as we shall see later, this is impossible in
practice. Because of this, test design and result
interpretation are crucial for proper decision
making.
Statistical test parameters
A detector can be considered as a binary classifier: passenger does not have (H0) or has metal
objects (H1) (weapon etc.)
The detector has a sensitivity knob (decision boundary).
If the sensitivity is low detector falsely detects metal in α = 5% of cases, but skips metal in β =
67% of cases.
If the sensitivity is high - it falsely detects metal in α = 50%, but skips in β = 0.3% of cases.
Intermediate sensitivity values allow choosing the trade-off between skipping a passenger
who has hidden metal objects (increases probability of an incident) and the service speed
(additional airport costs and lower passenger satisfaction).
Statistical test parameters: metal
detector in airport
Statistical tests based on data achieved from an A/B test can be treated as a classifier which is
supposed to tell whether conversion rate increased (H1) or remained the same (H0).
Question: which trade-off between α and β would you choose?
Statistical test parameters:
increasing web-page conversion rate
• H0: C = 5%, H1: C > 5%
• T(s) = c/n, n = 3600
• significance level = 5%
• P(T|H0) - ?
Theory:
Simulation:
bootstrap
How does statistical test work:
distribution P(T|H0)
How does statistical test works:
significance level and decision boundary
• H0: C = 5%, H1: C > 5%
• T(s) = c/n, n = 3600
• significance level = 5%
• P(T|H1) - ?
Hypothesis H1 consists of
infinite number of
hypotheses: C = 5.1%, C =
5.2% … Which one should
we consider?
• H1: С = 5.5%
(+ 10%, minimum expected boost)
How does statistical test works:
distribution P(T|H1)
How does statistical test work:
significance level vs power
How does statistical test work:
significance level vs power
Important features of statistical
testing in e-commerce
Growth dynamics of metrics
Significance level vs power trade-off
improvement: sample size
Significance level vs power trade-off
improvement: effect size
Question: what should we do if we choose α = 10% but got p.value = 12%?
Uncertainty of p-value
• Key parameters of the statistical test are significance level and power that correspond to the
probability of false detection and probability to miss effect.
• Increased test power can be achieved in two ways: by increasing sample size or by increasing
effect size
• Keep in mind that p-value is a random statistic! It is important to account for its uncertainty.
• Mind that some metrics (like conversion from registration to buyer) may take significant time
to measure
• Anomalies in data may dramatically impact test results
Summary
Conclusions
• In e-commerce, test power is often of the most importance (probability not to miss effect)
• In the case of high-traffic business: the required trade-off between significance level and
power can be easily achieved by increasing the sample size.
• In the case of low-traffic business: focus on features which:
1) are cheap, easy to implement and not risky, or
2) have potentially big effects.
Thank you for your attention!

More Related Content

What's hot

STP of tata Motors
STP of tata MotorsSTP of tata Motors
STP of tata Motors
Dipanway Bhabuk
 
Mktg Strats Tyre Industry
Mktg Strats   Tyre IndustryMktg Strats   Tyre Industry
Mktg Strats Tyre Industry
alvareena
 
Service marketing management of amazon
Service marketing management of amazonService marketing management of amazon
Service marketing management of amazon
Bendita Baylôn Ü
 
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
Data Driven Innovation
 
Cars24 Store Design Pitch - Concept
Cars24 Store Design Pitch - ConceptCars24 Store Design Pitch - Concept
Cars24 Store Design Pitch - Concept
Manav Shrivastav
 
Pick Me Up - a real time carpooling App
Pick Me Up - a real time carpooling AppPick Me Up - a real time carpooling App
Pick Me Up - a real time carpooling App
Nitin Jain
 
tata motor promotion ideas
tata motor promotion ideastata motor promotion ideas
tata motor promotion ideas
Anurag Tanwar
 

What's hot (7)

STP of tata Motors
STP of tata MotorsSTP of tata Motors
STP of tata Motors
 
Mktg Strats Tyre Industry
Mktg Strats   Tyre IndustryMktg Strats   Tyre Industry
Mktg Strats Tyre Industry
 
Service marketing management of amazon
Service marketing management of amazonService marketing management of amazon
Service marketing management of amazon
 
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
AI in Telecom: How artificial intelligence is reshaping the vision of telco i...
 
Cars24 Store Design Pitch - Concept
Cars24 Store Design Pitch - ConceptCars24 Store Design Pitch - Concept
Cars24 Store Design Pitch - Concept
 
Pick Me Up - a real time carpooling App
Pick Me Up - a real time carpooling AppPick Me Up - a real time carpooling App
Pick Me Up - a real time carpooling App
 
tata motor promotion ideas
tata motor promotion ideastata motor promotion ideas
tata motor promotion ideas
 

Similar to Statistical hypothesis testing in e commerce

ABTest-20231020.pptx
ABTest-20231020.pptxABTest-20231020.pptx
ABTest-20231020.pptx
Michael Ming Lei
 
Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5
Redwan Ferdous
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
Julián Urbano
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
Leanleaders.org
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
Leanleaders.org
 
1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference
Dev Pandey
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
MYRABACSAFRA2
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this pr
POLY33
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
Matt Hansen
 
hypothesis teesting
 hypothesis teesting hypothesis teesting
hypothesis teesting
kpgandhi
 
Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)
Daniel Augustine
 
Business Research Methods Unit V
Business Research Methods Unit VBusiness Research Methods Unit V
Business Research Methods Unit V
Kartikeya Singh
 
Hypothsis testing
Hypothsis testingHypothsis testing
Hypothsis testing
University of Balochistan
 
Meetup_FGVA_Uplift @ Dataiku
Meetup_FGVA_Uplift @ DataikuMeetup_FGVA_Uplift @ Dataiku
Meetup_FGVA_Uplift @ Dataiku
Johan-André Jeanville
 
Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018
April Bright
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
Andrea Arcuri
 
Calculating a Sample Size
Calculating a Sample SizeCalculating a Sample Size
Calculating a Sample Size
Matt Hansen
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Smarten Augmented Analytics
 

Similar to Statistical hypothesis testing in e commerce (20)

ABTest-20231020.pptx
ABTest-20231020.pptxABTest-20231020.pptx
ABTest-20231020.pptx
 
Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5Elementary Data Analysis with MS Excel_Day-5
Elementary Data Analysis with MS Excel_Day-5
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference1192012 155942 f023_=_statistical_inference
1192012 155942 f023_=_statistical_inference
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this pr
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
hypothesis teesting
 hypothesis teesting hypothesis teesting
hypothesis teesting
 
Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)
 
Business Research Methods Unit V
Business Research Methods Unit VBusiness Research Methods Unit V
Business Research Methods Unit V
 
Hypothsis testing
Hypothsis testingHypothsis testing
Hypothsis testing
 
Meetup_FGVA_Uplift @ Dataiku
Meetup_FGVA_Uplift @ DataikuMeetup_FGVA_Uplift @ Dataiku
Meetup_FGVA_Uplift @ Dataiku
 
Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
Calculating a Sample Size
Calculating a Sample SizeCalculating a Sample Size
Calculating a Sample Size
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 

Recently uploaded

2023 Ukraine Crisis Media Center Finance Balance
2023 Ukraine Crisis Media Center Finance Balance2023 Ukraine Crisis Media Center Finance Balance
2023 Ukraine Crisis Media Center Finance Balance
UkraineCrisisMediaCenter
 
SASi-SPi Science Policy Lab Pre-engagement
SASi-SPi Science Policy Lab Pre-engagementSASi-SPi Science Policy Lab Pre-engagement
SASi-SPi Science Policy Lab Pre-engagement
Francois Stepman
 
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPEACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
Charmi13
 
2023 Ukraine Crisis Media Center Annual Report
2023 Ukraine Crisis Media Center Annual Report2023 Ukraine Crisis Media Center Annual Report
2023 Ukraine Crisis Media Center Annual Report
UkraineCrisisMediaCenter
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 
Prsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptxPrsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptx
prafulpawar29
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
Ben Linders
 
Presentation agenda of three-day conference
Presentation agenda of three-day conferencePresentation agenda of three-day conference
Presentation agenda of three-day conference
bernadettalaurentia1
 
Cybersecurity Presentation PowerPoint!!!
Cybersecurity Presentation PowerPoint!!!Cybersecurity Presentation PowerPoint!!!
Cybersecurity Presentation PowerPoint!!!
arichardson21686
 
Legislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptxLegislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptx
Charmi13
 
Genesis chapter 3 Isaiah Scudder.pptx
Genesis    chapter 3 Isaiah Scudder.pptxGenesis    chapter 3 Isaiah Scudder.pptx
Genesis chapter 3 Isaiah Scudder.pptx
FamilyWorshipCenterD
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
Raheem Muhammad
 
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
gfysze
 
Data Processing in PHP - PHPers 2024 Poznań
Data Processing in PHP - PHPers 2024 PoznańData Processing in PHP - PHPers 2024 Poznań
Data Processing in PHP - PHPers 2024 Poznań
Norbert Orzechowicz
 
AWS User Group Torino 2024 #3 - 18/06/2024
AWS User Group Torino 2024 #3 - 18/06/2024AWS User Group Torino 2024 #3 - 18/06/2024
AWS User Group Torino 2024 #3 - 18/06/2024
Guido Maria Nebiolo
 
2023 Ukraine Crisis Media Center Financial Report
2023 Ukraine Crisis Media Center Financial Report2023 Ukraine Crisis Media Center Financial Report
2023 Ukraine Crisis Media Center Financial Report
UkraineCrisisMediaCenter
 
Bridging the visual gap between cultural heritage and digital scholarship
Bridging the visual gap between cultural heritage and digital scholarshipBridging the visual gap between cultural heritage and digital scholarship
Bridging the visual gap between cultural heritage and digital scholarship
Inesm9
 
Kalyan chart satta matka guessing result
Kalyan chart satta matka guessing resultKalyan chart satta matka guessing result
Kalyan chart satta matka guessing result
sanammadhu484
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
SkillCertProExams
 

Recently uploaded (19)

2023 Ukraine Crisis Media Center Finance Balance
2023 Ukraine Crisis Media Center Finance Balance2023 Ukraine Crisis Media Center Finance Balance
2023 Ukraine Crisis Media Center Finance Balance
 
SASi-SPi Science Policy Lab Pre-engagement
SASi-SPi Science Policy Lab Pre-engagementSASi-SPi Science Policy Lab Pre-engagement
SASi-SPi Science Policy Lab Pre-engagement
 
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPEACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
ACTIVE IMPLANTABLE MEDICAL DEVICE IN EUROPE
 
2023 Ukraine Crisis Media Center Annual Report
2023 Ukraine Crisis Media Center Annual Report2023 Ukraine Crisis Media Center Annual Report
2023 Ukraine Crisis Media Center Annual Report
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 
Prsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptxPrsentation for VIVA Welike project 1semester.pptx
Prsentation for VIVA Welike project 1semester.pptx
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
 
Presentation agenda of three-day conference
Presentation agenda of three-day conferencePresentation agenda of three-day conference
Presentation agenda of three-day conference
 
Cybersecurity Presentation PowerPoint!!!
Cybersecurity Presentation PowerPoint!!!Cybersecurity Presentation PowerPoint!!!
Cybersecurity Presentation PowerPoint!!!
 
Legislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptxLegislation And Regulations For Import, Manufacture,.pptx
Legislation And Regulations For Import, Manufacture,.pptx
 
Genesis chapter 3 Isaiah Scudder.pptx
Genesis    chapter 3 Isaiah Scudder.pptxGenesis    chapter 3 Isaiah Scudder.pptx
Genesis chapter 3 Isaiah Scudder.pptx
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
 
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
一比一原版(unc毕业证书)美国北卡罗来纳大学教堂山分校毕业证如何办理
 
Data Processing in PHP - PHPers 2024 Poznań
Data Processing in PHP - PHPers 2024 PoznańData Processing in PHP - PHPers 2024 Poznań
Data Processing in PHP - PHPers 2024 Poznań
 
AWS User Group Torino 2024 #3 - 18/06/2024
AWS User Group Torino 2024 #3 - 18/06/2024AWS User Group Torino 2024 #3 - 18/06/2024
AWS User Group Torino 2024 #3 - 18/06/2024
 
2023 Ukraine Crisis Media Center Financial Report
2023 Ukraine Crisis Media Center Financial Report2023 Ukraine Crisis Media Center Financial Report
2023 Ukraine Crisis Media Center Financial Report
 
Bridging the visual gap between cultural heritage and digital scholarship
Bridging the visual gap between cultural heritage and digital scholarshipBridging the visual gap between cultural heritage and digital scholarship
Bridging the visual gap between cultural heritage and digital scholarship
 
Kalyan chart satta matka guessing result
Kalyan chart satta matka guessing resultKalyan chart satta matka guessing result
Kalyan chart satta matka guessing result
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
 

Statistical hypothesis testing in e commerce

  • 1. The basics of statistical hypothesis testing in E-commerce. By Anatoly Vuets
  • 2. Agenda • Why do we use (we should use) statistical hypothesis testing in e-commerce? • Statistical test: how does it work and its main parameters • Key features for e-commerce
  • 3. Why do we need statistical testing in e-commerce?
  • 4. We need the right decisions • A/B tests • Ad-hoc analyses • Building models
  • 5. We need the right decisions • A question: which of these groups makes more profit? • What is missing here?
  • 6. We need the right decisions • A/B test: which version is better?
  • 7. Statistical test: let’s recall the basics!
  • 8. • Random variable (discrete or continuous) • Probability distribution function (PMF(x), PDF(x)) • Mean M or μ • Standard deviation SD or σ Basics of statistics
  • 9. Basics of statistics: standard distribution
  • 11. ... ... ... ... .................. True metrics value Statistical population Sample Possible samples ... Observed value Other possible values (distribution) Uncertainty
  • 12. We want to conclude about the statistical population based on single sample that we have observed Statistical population Observed sample Possible samples Why is this important?
  • 14. Statistical test: basic idea and main parameters.
  • 15. • We want to test a statement (typically existence of an effect). • We have a set of observations (sample) from which we conclude the statement. • Scenario, in which the statement is TRUE is called alternative hypothesis H1. • Scenario, in which the statement is FALSE is called null hypothesis H0. • Estimate the probability to observe the sample we have under H0. • If the probability is high enough - we conclude that H1 can not be accepted. In the opposite case, we accept H1. Idea
  • 16. ... H0/H1𝗧(S) H0: C = 5% H1: C > 5% Statistical test
  • 17. H0 H1 H0 Correct P: 1 - α Error T1 P: α H1 Error T2 P: β Correct P: 1 - β Test T(s) Truth • Error T1 - accept H1 when H0 is true. • Error T2 -accept H0 when H1 is true. • We would like to have a perfect test (α = 0, β = 0). However as we shall see later, this is impossible in practice. Because of this, test design and result interpretation are crucial for proper decision making. Statistical test parameters
  • 18. A detector can be considered as a binary classifier: passenger does not have (H0) or has metal objects (H1) (weapon etc.) The detector has a sensitivity knob (decision boundary). If the sensitivity is low detector falsely detects metal in α = 5% of cases, but skips metal in β = 67% of cases. If the sensitivity is high - it falsely detects metal in α = 50%, but skips in β = 0.3% of cases. Intermediate sensitivity values allow choosing the trade-off between skipping a passenger who has hidden metal objects (increases probability of an incident) and the service speed (additional airport costs and lower passenger satisfaction). Statistical test parameters: metal detector in airport
  • 19. Statistical tests based on data achieved from an A/B test can be treated as a classifier which is supposed to tell whether conversion rate increased (H1) or remained the same (H0). Question: which trade-off between α and β would you choose? Statistical test parameters: increasing web-page conversion rate
  • 20. • H0: C = 5%, H1: C > 5% • T(s) = c/n, n = 3600 • significance level = 5% • P(T|H0) - ? Theory: Simulation: bootstrap How does statistical test work: distribution P(T|H0)
  • 21. How does statistical test works: significance level and decision boundary
  • 22. • H0: C = 5%, H1: C > 5% • T(s) = c/n, n = 3600 • significance level = 5% • P(T|H1) - ? Hypothesis H1 consists of infinite number of hypotheses: C = 5.1%, C = 5.2% … Which one should we consider? • H1: С = 5.5% (+ 10%, minimum expected boost) How does statistical test works: distribution P(T|H1)
  • 23. How does statistical test work: significance level vs power
  • 24. How does statistical test work: significance level vs power
  • 25. Important features of statistical testing in e-commerce
  • 27. Significance level vs power trade-off improvement: sample size
  • 28. Significance level vs power trade-off improvement: effect size
  • 29. Question: what should we do if we choose α = 10% but got p.value = 12%? Uncertainty of p-value
  • 30. • Key parameters of the statistical test are significance level and power that correspond to the probability of false detection and probability to miss effect. • Increased test power can be achieved in two ways: by increasing sample size or by increasing effect size • Keep in mind that p-value is a random statistic! It is important to account for its uncertainty. • Mind that some metrics (like conversion from registration to buyer) may take significant time to measure • Anomalies in data may dramatically impact test results Summary
  • 31. Conclusions • In e-commerce, test power is often of the most importance (probability not to miss effect) • In the case of high-traffic business: the required trade-off between significance level and power can be easily achieved by increasing the sample size. • In the case of low-traffic business: focus on features which: 1) are cheap, easy to implement and not risky, or 2) have potentially big effects.
  • 32. Thank you for your attention!