SlideShare a Scribd company logo
1 of 31
Download to read offline
Consistent Transformation of Ratio Metrics
for Efficient Online Controlled Experiments
Roman Budylin, Alexey Drutsa,
Ilya Katsev, Valeriya Tsoy
About Yandex
• The largest internet company in Europe
• ”Google of Russia”, “Amazon of Russia”, “Uber of Russia” and so on
• More than 50M DAU
A/B testing methodology
Group
A
Group
B
Split
them
randomly
trafficofusers
Group
A
Group
B
Split
them
randomly
trafficofusers
Variant for A
Variant for B
Expose to one of
two variants of the service
e.g., the current production version
e.g., an evaluated update
Group
A
Group
B
Split
them
randomly
trafficofusers
Variant for A
Variant for B
Expose to one of
two variants of the service
e.g., the current production version
e.g., an evaluated update
Calculate a key
measure for each user
X(uA1)
…
Calculate the OEC for each
group as the mean value
e.g., X(u) is the number
of sessions of the user u
X(uA2)
X(uA3)
X(uA4)
X(uA5)
X(uB1)
…
X(uB2)
X(uB3)
X(uB4)
X(uB5)
µA(X)=avgu in AX(u)
µB(X)=avgu in BX(u)
Overall Evaluation Criterion
(OEC) for the group B
Overall Evaluation Criterion
(OEC) for the group A
µA(X)=avgu in AX(u)
µB(X)=avgu in BX(u)
Overall Evaluation Criterion
(OEC) for the group B
Overall Evaluation Criterion
(OEC) for the group A
Calculate the OEC for each
group as the mean value
Δ(x) VS 0
Δ(X) = µB(X) – µA(X)
the evaluated update is
positive or negative
Statistical
significance test
the difference is caused by
a noise or
the treatment effect
(e.g., Student’s t-test)
Overall Evaluation
Criterion (OEC)
[Kohavi et al., DMKD’2009]
Overall Acceptance
Criterion (OAC)
[Drutsa et al., CIKM’2015]
Sensitivity
Directionality
OEC levels
1. User level metrics
2. Non-user level metric
For example ratio OEC:
Example – average length of session
Example – average length of session
Example – average length of session
> 30m
Example – average length of session
Example – average length of session
Let user u has sessions with lengths
Then the average of lengths of all sessions is equal to
Problem
For ratio OEC
t-test
sensitivity improvement techniques
Problem
For ratio OEC
t-test
sensitivity improvement techniques
Problem
For ratio OEC
t-test
sensitivity improvement techniques
Existing approaches
• Bootstrap test
Problem: computational expensiveness
• Delta method
Problem: does not allow to apply directly regression adjustment
techniques
Linearization transformation
Our paper, WSDM 2018
21
Consistent Transformation of Ratio Metrics for Efficient Online
Controlled Experiments
Roman Budylin, Alexey Drutsa, Ilya Katsev, Valeriya Tsoy
Data Quality metrics • OEC metrics • Guard rail metrics • Local feature/Diagnosticmetrics
Our paper, WSDM 2018
22
Consistent Transformation of Ratio Metrics for Efficient Online
Controlled Experiments
Roman Budylin, Alexey Drutsa, Ilya Katsev, Valeriya Tsoy
The best comment: «This is too good to be true!» (someone from
Facebook)
Data Quality metrics • OEC metrics • Guard rail metrics • Local feature/Diagnosticmetrics
We found a transformation such that
Our contribution
We found a transformation such that
Ratio OEC User level OEC
Our contribution
We found a transformation such that
Ratio OEC User level OEC
NB: Preserve directionality and significance level!
Our contribution
Let we have a ratio OEC:
Consider the next expression:
And let us use its average
as a metric:
Now we got a linearization OEC:
Linearization
Let A and B be the control and the experiment. Let us denote
Theorem 1: Let be positive. Then for any
the next is true:
Theorems: directionality
Theorem 2: Let be positive and . Let
be the t-statistic applied for the OEC and let be the be the asymt.
standard normal statistic of obtained via the Delta method.
If then then under the null hypothesis that
1. the t-statistics is asymptotically normal
2. converges to 1 by probability
Theorems: significance level
Approaches comparision
Experiments
• Internet search, 2013-2016, 390 experiments
• The transformation + regression adjustment
• +34% of sensitivity
Experiments
The code

More Related Content

Similar to Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments

Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...Mounia Lalmas-Roelleke
 
Trend descriptor and Similarity Finder
Trend descriptor and Similarity FinderTrend descriptor and Similarity Finder
Trend descriptor and Similarity FinderPauloesteban
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsÁkos Horváth
 
Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)PyData
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
 
Aggregation Workflow at Europeana Aggregator Forum
Aggregation Workflow at Europeana Aggregator ForumAggregation Workflow at Europeana Aggregator Forum
Aggregation Workflow at Europeana Aggregator ForumEuropeana
 
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Matthias Braunhofer
 
Aline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingDhaval Thakker
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning민재 정
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesAlekya Yermal
 
Three qualities of OTT service: A mixed methods approach
Three qualities of OTT service: A mixed methods approachThree qualities of OTT service: A mixed methods approach
Three qualities of OTT service: A mixed methods approachJeffery Chang
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Reportweichen
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 

Similar to Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments (20)

Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
Which Vertical Search Engines are Relevant? Understanding Vertical Relevance ...
 
Trend descriptor and Similarity Finder
Trend descriptor and Similarity FinderTrend descriptor and Similarity Finder
Trend descriptor and Similarity Finder
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)Measuring the New Wikipedia Community (PyData SV 2013)
Measuring the New Wikipedia Community (PyData SV 2013)
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
Aggregation Workflow at Europeana Aggregator Forum
Aggregation Workflow at Europeana Aggregator ForumAggregation Workflow at Europeana Aggregator Forum
Aggregation Workflow at Europeana Aggregator Forum
 
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...
 
W4 A Sirithumgul
W4 A SirithumgulW4 A Sirithumgul
W4 A Sirithumgul
 
Aline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon: NextBuy
Aline Pichon: NextBuy
 
CrowDM system
CrowDM systemCrowDM system
CrowDM system
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories Benchmarking
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
 
Mirri5
Mirri5Mirri5
Mirri5
 
Three qualities of OTT service: A mixed methods approach
Three qualities of OTT service: A mixed methods approachThree qualities of OTT service: A mixed methods approach
Three qualities of OTT service: A mixed methods approach
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Report
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments

  • 1.
  • 2. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments Roman Budylin, Alexey Drutsa, Ilya Katsev, Valeriya Tsoy
  • 3. About Yandex • The largest internet company in Europe • ”Google of Russia”, “Amazon of Russia”, “Uber of Russia” and so on • More than 50M DAU
  • 6. Group A Group B Split them randomly trafficofusers Variant for A Variant for B Expose to one of two variants of the service e.g., the current production version e.g., an evaluated update
  • 7. Group A Group B Split them randomly trafficofusers Variant for A Variant for B Expose to one of two variants of the service e.g., the current production version e.g., an evaluated update Calculate a key measure for each user X(uA1) … Calculate the OEC for each group as the mean value e.g., X(u) is the number of sessions of the user u X(uA2) X(uA3) X(uA4) X(uA5) X(uB1) … X(uB2) X(uB3) X(uB4) X(uB5) µA(X)=avgu in AX(u) µB(X)=avgu in BX(u) Overall Evaluation Criterion (OEC) for the group B Overall Evaluation Criterion (OEC) for the group A
  • 8. µA(X)=avgu in AX(u) µB(X)=avgu in BX(u) Overall Evaluation Criterion (OEC) for the group B Overall Evaluation Criterion (OEC) for the group A Calculate the OEC for each group as the mean value Δ(x) VS 0 Δ(X) = µB(X) – µA(X) the evaluated update is positive or negative Statistical significance test the difference is caused by a noise or the treatment effect (e.g., Student’s t-test) Overall Evaluation Criterion (OEC) [Kohavi et al., DMKD’2009] Overall Acceptance Criterion (OAC) [Drutsa et al., CIKM’2015] Sensitivity Directionality
  • 9. OEC levels 1. User level metrics 2. Non-user level metric For example ratio OEC:
  • 10. Example – average length of session
  • 11. Example – average length of session
  • 12. Example – average length of session > 30m
  • 13. Example – average length of session
  • 14. Example – average length of session Let user u has sessions with lengths Then the average of lengths of all sessions is equal to
  • 18. Existing approaches • Bootstrap test Problem: computational expensiveness • Delta method Problem: does not allow to apply directly regression adjustment techniques
  • 20. Our paper, WSDM 2018 21 Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments Roman Budylin, Alexey Drutsa, Ilya Katsev, Valeriya Tsoy Data Quality metrics • OEC metrics • Guard rail metrics • Local feature/Diagnosticmetrics
  • 21. Our paper, WSDM 2018 22 Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments Roman Budylin, Alexey Drutsa, Ilya Katsev, Valeriya Tsoy The best comment: «This is too good to be true!» (someone from Facebook) Data Quality metrics • OEC metrics • Guard rail metrics • Local feature/Diagnosticmetrics
  • 22. We found a transformation such that Our contribution
  • 23. We found a transformation such that Ratio OEC User level OEC Our contribution
  • 24. We found a transformation such that Ratio OEC User level OEC NB: Preserve directionality and significance level! Our contribution
  • 25. Let we have a ratio OEC: Consider the next expression: And let us use its average as a metric: Now we got a linearization OEC: Linearization
  • 26. Let A and B be the control and the experiment. Let us denote Theorem 1: Let be positive. Then for any the next is true: Theorems: directionality
  • 27. Theorem 2: Let be positive and . Let be the t-statistic applied for the OEC and let be the be the asymt. standard normal statistic of obtained via the Delta method. If then then under the null hypothesis that 1. the t-statistics is asymptotically normal 2. converges to 1 by probability Theorems: significance level
  • 29. Experiments • Internet search, 2013-2016, 390 experiments • The transformation + regression adjustment • +34% of sensitivity