SlideShare a Scribd company logo
1 of 17
Download to read offline
Controlled Experiments for Decision-Making in
e-Commerce Search
Anjan Goswami Wei Han Zhenrui Wang Angela Jiang
October 26, 2015
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 1 / 1
Agenda of this Presentation
Background of Controlled Experiments
Proposed Guidelines for Feature Development for E-commerce Search
Know your Bias
Know your Metrics
Know your Tests
Know your Results
Summary
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 2 / 1
Background of Controlled Experiments
Statistically-sound approach for causal inference
A/B testing as its simplest form
More complex approaches exist such as Randomized Complete Block
Design, Factorial Design, Split-Plot design etc (regression model
behind the scene)
Three core pillars: randomization, replication, blocking
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 3 / 1
Know Your Bias
Bias: related to additional impacting factors resulting unfair comparison
and misleading experiment conclusion, if not quantified/eliminated
visit level factors: visits from old users vs visits from new users [clicks
out of curiosity].
query level factors: query-level performance boost != visit level boost;
particular targeted queries
item level factors: impact of item difference (popular/unpopular,
price-competitive/high-end)
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 4 / 1
Know Your Bias
Avoiding or reducing bias is non-trivial
Proper randomization/blocking in experiment design;
Correct selection of target population
Suitable statistical analysis [proper tests]
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 5 / 1
Know Your Metrics
Know Your Metrics
Numerous metrics generated from user’s activities on walmart.com, for
example,
Product View Rate (PVR):
num of visits/query session with at least a click on an item
num of visits
Add to Cart Rate (ATC):
num of visits/query session with at least one cart addition
num of visits
Conversion Rate (CR):
num of visits/query session with at least one converted item
num of visits
Average Order Size (AOS) : total revenue
num of visits with at least a converted item
Revenue Per Visit (RPV) : total revenue
num of visits
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 6 / 1
Know Your Metrics
Know Your Metrics
Collect descriptive statistics: mean, median, variance and quantiles
Plot empirical distribution to check (i) normality, (ii) long tail, (iii)
skewness, (iv) multi-modal, etc.
Normal distribution leads to straightforward analysis. For non-normal
distribution, check if conditions of central limit theorem (CLT) can be
met. Use non-parametric test when CLT is not applicable
For long tailed data, verify if data in the tail are correct observations
Multi-modal data suggests data come from multiple sub-populations
and further data segmentation
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 7 / 1
Know Your Metrics
Know Your Metrics: Test design and analysis
Then, these knowledge are used on
Determine sample size based on these descriptive statistics sample to
avoid underpowered test
Remove erroneous measurement, choose proper metrics based on
distribution
Distribution helps to determine the hypothesis test, for example, RPV
is highly skewed with zero inflation (more than 95% of data are zeros)
Seasonality: weekly seasonality suggests test should be run for
multiple weeks
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 8 / 1
Know Your Tests
Hypothesis Testing
A method of statistical inference for testing a statistical hypothesis (e.g.
has product view rate improved after deploying a new feature)
Components of hypothesis test:
Hypotheses:
two contradicting beliefs (null and alternative hypothesis, denoted as
H0 and H1 ) regarding the metric of interest
testing procedure will decide which belief the collected data supports
Test statistics: A random variable whose distribution depends on
validity of H0 and H1
Rejection region: possible value of test statistics that supports
rejection of H0
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 9 / 1
Know Your Tests
Common Tests on Location Parameter
IID normal/CLT with large sample: two-sample t-test
paired data, non-normal: Wilcoxon signed-rank test, bootstrapping
(sample with replacement)
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 10 / 1
Know Your Tests
Example Pitfall
One common pitfall is to overlook independence assumption and use
t-test on autocorrelated data (e.g. daily metrics difference between
control and variation group); Below is one simulation 100 replications
of t-test on autocorrelated data with zero mean.
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 11 / 1
Know Your Result
Know Your Result
Visualization
time series plots for all business metrics
show the trend over time
different performance for weekdays and weekends
compare business metrics over different segmentation/sub-populations
segmented by query intent in terms of product category
segmented by user’s devices and browsers
segmented by new users and returned users
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 12 / 1
Know Your Result
Category 1
Category 13
Category 14
Category 15
Category 2
Category 3
Category 4
Category 5
Category 6
Category 7
Category 8
Category 9
Category 10
Category 11
Category 12
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 13 / 1
Know Your Result
Representation of A/B testing result
Confidence interval: a type of interval estimate of a population
parameter. It is an observed interval (i.e. calculated from the
observations), in principle different from sample to sample, that
frequently includes the parameter of interest if the experiment is
repeated. The ’frequently’ refers to a probability value associated
with the interval. Usually 95%
P-value: Probability of test statistics being as extreme as observed
one given the hypothesis H0 is true.
Small p-value leads to rejection of H0; statistical significance !=
physical significance
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 14 / 1
Know Your Result
Interpretation of Test Results
Hypothesis tests are decision making process subjective to Two types
of decision errors
type I error: rejecting H0 when H0 is true: related to confidence interval
type II error: rejecting H1 when H1 is true: related to power
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 15 / 1
Summary
Summary
Many factors (query/item/visit level) contribute to biased estimation
in controlled experiment for e-Commerce setting
Understand chosen metric (target population/distribution/impacting
factors) to design experiment
Verify assumptions of chosen hypothesis test (autocorrelation? A/A
test?)
Interpret the test results (p-value/Power/significance)
Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 16 / 1
Thank you

More Related Content

What's hot

Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 
Statistical Fundamentals in Total Quality Management
Statistical Fundamentals in Total Quality ManagementStatistical Fundamentals in Total Quality Management
Statistical Fundamentals in Total Quality ManagementDr.Raja R
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionJoseph Itopa Abubakar
 
Biomarker Strategies
Biomarker StrategiesBiomarker Strategies
Biomarker StrategiesTom Plasterer
 
Power Analysis: Determining Sample Size for Quantitative Studies
Power Analysis: Determining Sample Size for Quantitative StudiesPower Analysis: Determining Sample Size for Quantitative Studies
Power Analysis: Determining Sample Size for Quantitative StudiesStatistics Solutions
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
Sample size and power
Sample size and powerSample size and power
Sample size and powerChristina K J
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errorsKhulna University
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
 
How to calculate power in statistics
How to calculate power in statisticsHow to calculate power in statistics
How to calculate power in statisticsStat Analytica
 

What's hot (20)

Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Business Basic Statistics
Business Basic StatisticsBusiness Basic Statistics
Business Basic Statistics
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Outliers
OutliersOutliers
Outliers
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 
Statistical Fundamentals in Total Quality Management
Statistical Fundamentals in Total Quality ManagementStatistical Fundamentals in Total Quality Management
Statistical Fundamentals in Total Quality Management
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detection
 
Biomarker Strategies
Biomarker StrategiesBiomarker Strategies
Biomarker Strategies
 
Power Analysis: Determining Sample Size for Quantitative Studies
Power Analysis: Determining Sample Size for Quantitative StudiesPower Analysis: Determining Sample Size for Quantitative Studies
Power Analysis: Determining Sample Size for Quantitative Studies
 
How to Think Like A Statistician
How to Think Like A StatisticianHow to Think Like A Statistician
How to Think Like A Statistician
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Sample size and power
Sample size and powerSample size and power
Sample size and power
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errors
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
 
How to calculate power in statistics
How to calculate power in statisticsHow to calculate power in statistics
How to calculate power in statistics
 
Research on sampling
Research on samplingResearch on sampling
Research on sampling
 

Similar to Controlled Experiments for Decision-Making in e-Commerce Search

Chapter10 3%285%29
Chapter10 3%285%29Chapter10 3%285%29
Chapter10 3%285%29jhtrespa
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014tjcarter
 
Introductory Online Controlled Experiments
Introductory Online Controlled ExperimentsIntroductory Online Controlled Experiments
Introductory Online Controlled ExperimentsBowen Lee
 
Answer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docxAnswer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docxfestockton
 
Overview Of Ich New E9
Overview Of Ich New E9Overview Of Ich New E9
Overview Of Ich New E9Jay1818mar
 
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSK
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSKUndergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSK
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSKEbere Uzowuru
 
2011 JSM - Good Statistical Practices
2011 JSM - Good Statistical Practices2011 JSM - Good Statistical Practices
2011 JSM - Good Statistical PracticesTerry Liao
 
Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Joao Galdino Mello de Souza
 
How to conduct meta analysis
How to conduct meta analysisHow to conduct meta analysis
How to conduct meta analysisDr.Junaid Nazar
 
T8 audit sampling
T8 audit samplingT8 audit sampling
T8 audit samplingnamninh
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsMartin Kretzer
 
Data Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingData Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingJim Kaplan CIA CFE
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪Beckett Hsieh
 
confirmatory factor analysis
confirmatory factor analysisconfirmatory factor analysis
confirmatory factor analysisBinteAnwar3
 
Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)HennaAnsari
 

Similar to Controlled Experiments for Decision-Making in e-Commerce Search (20)

Chapter10 3%285%29
Chapter10 3%285%29Chapter10 3%285%29
Chapter10 3%285%29
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014
 
Introductory Online Controlled Experiments
Introductory Online Controlled ExperimentsIntroductory Online Controlled Experiments
Introductory Online Controlled Experiments
 
Nursing research
Nursing researchNursing research
Nursing research
 
Answer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docxAnswer all questions individually and cite all work!!1. Provid.docx
Answer all questions individually and cite all work!!1. Provid.docx
 
Overview Of Ich New E9
Overview Of Ich New E9Overview Of Ich New E9
Overview Of Ich New E9
 
Methodology
MethodologyMethodology
Methodology
 
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSK
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSKUndergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSK
Undergraduate Project written by EBERE on ANALYSIS OF VARIATION IN GSK
 
2011 JSM - Good Statistical Practices
2011 JSM - Good Statistical Practices2011 JSM - Good Statistical Practices
2011 JSM - Good Statistical Practices
 
Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...
 
Data science
Data scienceData science
Data science
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Experimental
ExperimentalExperimental
Experimental
 
How to conduct meta analysis
How to conduct meta analysisHow to conduct meta analysis
How to conduct meta analysis
 
T8 audit sampling
T8 audit samplingT8 audit sampling
T8 audit sampling
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and Experiments
 
Data Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingData Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding Sampling
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪
 
confirmatory factor analysis
confirmatory factor analysisconfirmatory factor analysis
confirmatory factor analysis
 
Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)Confirmatory factor analysis (cfa)
Confirmatory factor analysis (cfa)
 

More from Anjan Goswami

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Anjan Goswami
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce SearchAnjan Goswami
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Anjan Goswami
 
Spelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsSpelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsAnjan Goswami
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Anjan Goswami
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping Anjan Goswami
 

More from Anjan Goswami (8)

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce Search
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
 
Spelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsSpelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platforms
 
Reputation systems
Reputation systemsReputation systems
Reputation systems
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping
 
Clustering
ClusteringClustering
Clustering
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Controlled Experiments for Decision-Making in e-Commerce Search

  • 1. Controlled Experiments for Decision-Making in e-Commerce Search Anjan Goswami Wei Han Zhenrui Wang Angela Jiang October 26, 2015 Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 1 / 1
  • 2. Agenda of this Presentation Background of Controlled Experiments Proposed Guidelines for Feature Development for E-commerce Search Know your Bias Know your Metrics Know your Tests Know your Results Summary Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 2 / 1
  • 3. Background of Controlled Experiments Statistically-sound approach for causal inference A/B testing as its simplest form More complex approaches exist such as Randomized Complete Block Design, Factorial Design, Split-Plot design etc (regression model behind the scene) Three core pillars: randomization, replication, blocking Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 3 / 1
  • 4. Know Your Bias Bias: related to additional impacting factors resulting unfair comparison and misleading experiment conclusion, if not quantified/eliminated visit level factors: visits from old users vs visits from new users [clicks out of curiosity]. query level factors: query-level performance boost != visit level boost; particular targeted queries item level factors: impact of item difference (popular/unpopular, price-competitive/high-end) Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 4 / 1
  • 5. Know Your Bias Avoiding or reducing bias is non-trivial Proper randomization/blocking in experiment design; Correct selection of target population Suitable statistical analysis [proper tests] Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 5 / 1
  • 6. Know Your Metrics Know Your Metrics Numerous metrics generated from user’s activities on walmart.com, for example, Product View Rate (PVR): num of visits/query session with at least a click on an item num of visits Add to Cart Rate (ATC): num of visits/query session with at least one cart addition num of visits Conversion Rate (CR): num of visits/query session with at least one converted item num of visits Average Order Size (AOS) : total revenue num of visits with at least a converted item Revenue Per Visit (RPV) : total revenue num of visits Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 6 / 1
  • 7. Know Your Metrics Know Your Metrics Collect descriptive statistics: mean, median, variance and quantiles Plot empirical distribution to check (i) normality, (ii) long tail, (iii) skewness, (iv) multi-modal, etc. Normal distribution leads to straightforward analysis. For non-normal distribution, check if conditions of central limit theorem (CLT) can be met. Use non-parametric test when CLT is not applicable For long tailed data, verify if data in the tail are correct observations Multi-modal data suggests data come from multiple sub-populations and further data segmentation Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 7 / 1
  • 8. Know Your Metrics Know Your Metrics: Test design and analysis Then, these knowledge are used on Determine sample size based on these descriptive statistics sample to avoid underpowered test Remove erroneous measurement, choose proper metrics based on distribution Distribution helps to determine the hypothesis test, for example, RPV is highly skewed with zero inflation (more than 95% of data are zeros) Seasonality: weekly seasonality suggests test should be run for multiple weeks Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 8 / 1
  • 9. Know Your Tests Hypothesis Testing A method of statistical inference for testing a statistical hypothesis (e.g. has product view rate improved after deploying a new feature) Components of hypothesis test: Hypotheses: two contradicting beliefs (null and alternative hypothesis, denoted as H0 and H1 ) regarding the metric of interest testing procedure will decide which belief the collected data supports Test statistics: A random variable whose distribution depends on validity of H0 and H1 Rejection region: possible value of test statistics that supports rejection of H0 Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 9 / 1
  • 10. Know Your Tests Common Tests on Location Parameter IID normal/CLT with large sample: two-sample t-test paired data, non-normal: Wilcoxon signed-rank test, bootstrapping (sample with replacement) Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 10 / 1
  • 11. Know Your Tests Example Pitfall One common pitfall is to overlook independence assumption and use t-test on autocorrelated data (e.g. daily metrics difference between control and variation group); Below is one simulation 100 replications of t-test on autocorrelated data with zero mean. Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 11 / 1
  • 12. Know Your Result Know Your Result Visualization time series plots for all business metrics show the trend over time different performance for weekdays and weekends compare business metrics over different segmentation/sub-populations segmented by query intent in terms of product category segmented by user’s devices and browsers segmented by new users and returned users Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 12 / 1
  • 13. Know Your Result Category 1 Category 13 Category 14 Category 15 Category 2 Category 3 Category 4 Category 5 Category 6 Category 7 Category 8 Category 9 Category 10 Category 11 Category 12 Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 13 / 1
  • 14. Know Your Result Representation of A/B testing result Confidence interval: a type of interval estimate of a population parameter. It is an observed interval (i.e. calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. The ’frequently’ refers to a probability value associated with the interval. Usually 95% P-value: Probability of test statistics being as extreme as observed one given the hypothesis H0 is true. Small p-value leads to rejection of H0; statistical significance != physical significance Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 14 / 1
  • 15. Know Your Result Interpretation of Test Results Hypothesis tests are decision making process subjective to Two types of decision errors type I error: rejecting H0 when H0 is true: related to confidence interval type II error: rejecting H1 when H1 is true: related to power Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 15 / 1
  • 16. Summary Summary Many factors (query/item/visit level) contribute to biased estimation in controlled experiment for e-Commerce setting Understand chosen metric (target population/distribution/impacting factors) to design experiment Verify assumptions of chosen hypothesis test (autocorrelation? A/A test?) Interpret the test results (p-value/Power/significance) Anjan Goswami, Wei Han, Zhenrui Wang, Angela Jiang (WalmartLabs)IEEE big data 2016 October 26, 2015 16 / 1