This document discusses A/B testing at large internet companies. It describes how companies like Amazon, Microsoft, Google, and LinkedIn use A/B testing to evaluate new ideas, measure their impact, and gain customer feedback. It outlines best practices for A/B testing, such as running one experiment at a time, choosing appropriate metrics and statistical significance, properly powering experiments, and addressing issues like multiple testing. The document also describes the key components of a scalable A/B testing system, including experiment management, online infrastructure for traffic routing and data logging, and automated offline analysis.
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
If you’re a marketer it’s very likely that you’ve run an A/B test. It’s also likely that you’ve never calculated the sample size for your tests, and instead, you run tests until they reach statistical significance. If this is the case, your strategy is statistically flawed. Conforming to sample size requires marketers to wait longer for test results, but choosing to ignore it will bear false positives and lead to bad decisions.
This deck was created for an email audience for there are valuable lessons for anyone who runs A/B tests.
To build a successful A/B testing strategy, you'll need more than just ideas of what to test, you'll need a plan that builds data into a repeatable strategy for producing winning experiments.
A primer on how ab testing can be set-up for success in an e-commerce environment. Includes guidelines of how to set-up ab tests including hypotheses definition, sample size determination, statistical testing and avoiding bias that can come in any experiment's set-up
A primer on AB testing and it's application in ecommerce. A necessary tool in every product manager's arsenal. Covers the principles behind setting up a good test and the statistical tools required to analyze results.
A/B Testing best practices from strategic vision to operational considerations to communication and finally expectations management. We need to adhere to fundamental project management, technology, statistical, experimental design, UX Design, Customer Relationship, business and data principles to ensure that the insights and hence the decision is as trustworthy as possible.
Spotify strives for team autonomy and independence. This means that no team should be blocked by others and they should be able to move as fast as they can. The autonomy has is a challenge for managing a centralised and coordinated experimentation infrastructure and analysis. This a talk about how we approach A/B testing in a fast moving company.
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
If you’re a marketer it’s very likely that you’ve run an A/B test. It’s also likely that you’ve never calculated the sample size for your tests, and instead, you run tests until they reach statistical significance. If this is the case, your strategy is statistically flawed. Conforming to sample size requires marketers to wait longer for test results, but choosing to ignore it will bear false positives and lead to bad decisions.
This deck was created for an email audience for there are valuable lessons for anyone who runs A/B tests.
To build a successful A/B testing strategy, you'll need more than just ideas of what to test, you'll need a plan that builds data into a repeatable strategy for producing winning experiments.
A primer on how ab testing can be set-up for success in an e-commerce environment. Includes guidelines of how to set-up ab tests including hypotheses definition, sample size determination, statistical testing and avoiding bias that can come in any experiment's set-up
A primer on AB testing and it's application in ecommerce. A necessary tool in every product manager's arsenal. Covers the principles behind setting up a good test and the statistical tools required to analyze results.
A/B Testing best practices from strategic vision to operational considerations to communication and finally expectations management. We need to adhere to fundamental project management, technology, statistical, experimental design, UX Design, Customer Relationship, business and data principles to ensure that the insights and hence the decision is as trustworthy as possible.
Spotify strives for team autonomy and independence. This means that no team should be blocked by others and they should be able to move as fast as they can. The autonomy has is a challenge for managing a centralised and coordinated experimentation infrastructure and analysis. This a talk about how we approach A/B testing in a fast moving company.
Learn how to use A/B testing to figure out the best product and marketing strategies for your business. Adopt a culture of testing everything from website copy to engagement emails to Facebook ads. Learn through a real SaaS product experiment.
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
Everything you've learned about A/B Testing is based on the fundamentally flawed belief that there's one right answer. But the era of mass-market, one-right-answers is over. A/B Testing is our most valuable tool in the battle to create a more engaging web. But our strategy is broken. Don't worry, we can gain a better understanding of our users with a little data science. And we can reinvent A/B Testing... I will show you how.
At Civis Analytics, we specialize in Data Science. From here, we can clearly see that all people are not the same. So why are A/B Tests designed to search for a single solution? In this session I'll show you where A/B Testing is headed next. See you in Austin!
This presentation by Anna Marie Clifton, Product Manager at Yammer, covers the important topics of when to use A/B testing, how to implement it and most importantly, how to measure the results.
The content is directed for software engineers who want to transition to product management, MBA's with finance/consulting background who wish to work high-tech companies as product managers and Project Managers, Marketers, and Designers who are seeking opportunities in product management.
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMProduct School
Main Takeaways:
-A/B testing: a simple idea that can be simple to apply
-Useful for more than incremental optimization - A/B tests can yield deep insight
-Just test it - A/B tests have the highest ROI of any data activity
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
At Netflix we run hundreds of A/B tests every year. Maintaining multivariate experiences quickly adds strain to any UI engineering team. In this talk, Alex Liu and Micah Ransdell explore the patterns we’ve built in Node.js to tame this beast - ultimately enabling quick feature development and rapid test iteration on our service used by over 50 million people around the world.
A/B Testing and Conversion Optimization Platform
- https://plab.skplanet.com
- SK Planet 우종호
2017.05.31 SK Planet @Tech 발표자료
AB Test Platform 개발 사례 공유
PLab(Planet AB Test)
Growth Hacking / Marketing 101: It's about processRuben Hamilius
Outline of the repeatable growth process startups should adopt to do Growth Marketing. Show & tell deck on basic principles and mindsets of Growth hacking for early stage startups.
Presented at the Singtel Group-Samsung Regional Mobile App Challenge 2015
in the Startup Mentorship Programme
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
Main takeaways:
-There is no one right way of validating a product, A/B testing is just one of them
-Get your product 'qualitatively' validated before 'quantitatively' validating
-Use holdouts to measure the long term success of your new products, while running A/B test in parallel
최보경 : 실무자를 위한 인과추론 활용 - Best Practices
발표영상 https://youtu.be/wTPEZDc6fw4
---
PAP가 준비한 팝콘 시즌1에서 프로덕트와 함께 성장하는 데이터 실무자들의 이야기를 담았습니다.
---
PAP(Product Analytics Playground)는 프로덕트 데이터 분석에 대해 편안하게 이야기할 수 있는 커뮤니티입니다.
우리는 데이터 드리븐 프로덕트 문화를 더 많은 분들이 각자의 자리에서 이끌어갈 수 있도록 하는 것을 목표로 합니다.
다양한 직군의 사람들이 모여 프로덕트를 만들듯 PAP 역시 다양한 멤버로 구성되어 있으며, 여러분들의 참여로 만들어집니다.
---
공식 페이지 : https://playinpap.oopy.io
페이스북 그룹 : https://www.facebook.com/groups/talkinpap
팀블로그 : https://playinpap.github.io
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
Presenter: Andrea Burbank, Pinterest
A successful experimentation program consists of much more than mere randomization and measurement. How do you help stakeholders understand the right things to measure, avoid common pitfalls, and learn to rely on A/B tests as the best way to measure a new system or feature? In this talk, Andrea will explain how building a culture of experimentation and the right tools to support it is just as important as the statistics behind the comparisons themselves - and potentially much trickier to get right.
From Labelling Open data images to building a private recommender systemPierre Gutierrez
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hostels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, we will describe how we improved an e-business vacation retailer recommender system using the content of images. We’ll explain how to leverage open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state of the art machine learning methods without having deep learning expertise.
Learn how to use A/B testing to figure out the best product and marketing strategies for your business. Adopt a culture of testing everything from website copy to engagement emails to Facebook ads. Learn through a real SaaS product experiment.
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
Everything you've learned about A/B Testing is based on the fundamentally flawed belief that there's one right answer. But the era of mass-market, one-right-answers is over. A/B Testing is our most valuable tool in the battle to create a more engaging web. But our strategy is broken. Don't worry, we can gain a better understanding of our users with a little data science. And we can reinvent A/B Testing... I will show you how.
At Civis Analytics, we specialize in Data Science. From here, we can clearly see that all people are not the same. So why are A/B Tests designed to search for a single solution? In this session I'll show you where A/B Testing is headed next. See you in Austin!
This presentation by Anna Marie Clifton, Product Manager at Yammer, covers the important topics of when to use A/B testing, how to implement it and most importantly, how to measure the results.
The content is directed for software engineers who want to transition to product management, MBA's with finance/consulting background who wish to work high-tech companies as product managers and Project Managers, Marketers, and Designers who are seeking opportunities in product management.
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMProduct School
Main Takeaways:
-A/B testing: a simple idea that can be simple to apply
-Useful for more than incremental optimization - A/B tests can yield deep insight
-Just test it - A/B tests have the highest ROI of any data activity
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
At Netflix we run hundreds of A/B tests every year. Maintaining multivariate experiences quickly adds strain to any UI engineering team. In this talk, Alex Liu and Micah Ransdell explore the patterns we’ve built in Node.js to tame this beast - ultimately enabling quick feature development and rapid test iteration on our service used by over 50 million people around the world.
A/B Testing and Conversion Optimization Platform
- https://plab.skplanet.com
- SK Planet 우종호
2017.05.31 SK Planet @Tech 발표자료
AB Test Platform 개발 사례 공유
PLab(Planet AB Test)
Growth Hacking / Marketing 101: It's about processRuben Hamilius
Outline of the repeatable growth process startups should adopt to do Growth Marketing. Show & tell deck on basic principles and mindsets of Growth hacking for early stage startups.
Presented at the Singtel Group-Samsung Regional Mobile App Challenge 2015
in the Startup Mentorship Programme
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
Main takeaways:
-There is no one right way of validating a product, A/B testing is just one of them
-Get your product 'qualitatively' validated before 'quantitatively' validating
-Use holdouts to measure the long term success of your new products, while running A/B test in parallel
최보경 : 실무자를 위한 인과추론 활용 - Best Practices
발표영상 https://youtu.be/wTPEZDc6fw4
---
PAP가 준비한 팝콘 시즌1에서 프로덕트와 함께 성장하는 데이터 실무자들의 이야기를 담았습니다.
---
PAP(Product Analytics Playground)는 프로덕트 데이터 분석에 대해 편안하게 이야기할 수 있는 커뮤니티입니다.
우리는 데이터 드리븐 프로덕트 문화를 더 많은 분들이 각자의 자리에서 이끌어갈 수 있도록 하는 것을 목표로 합니다.
다양한 직군의 사람들이 모여 프로덕트를 만들듯 PAP 역시 다양한 멤버로 구성되어 있으며, 여러분들의 참여로 만들어집니다.
---
공식 페이지 : https://playinpap.oopy.io
페이스북 그룹 : https://www.facebook.com/groups/talkinpap
팀블로그 : https://playinpap.github.io
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
Presenter: Andrea Burbank, Pinterest
A successful experimentation program consists of much more than mere randomization and measurement. How do you help stakeholders understand the right things to measure, avoid common pitfalls, and learn to rely on A/B tests as the best way to measure a new system or feature? In this talk, Andrea will explain how building a culture of experimentation and the right tools to support it is just as important as the statistics behind the comparisons themselves - and potentially much trickier to get right.
From Labelling Open data images to building a private recommender systemPierre Gutierrez
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hostels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, we will describe how we improved an e-business vacation retailer recommender system using the content of images. We’ll explain how to leverage open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state of the art machine learning methods without having deep learning expertise.
Surviving the AB Testing Hype Cycle - Reaktor Breakpoint 2015Craig Sullivan
My Slides from Reaktor Breakpoint 2015 - This is by far the best deck (and hopefully talk) I've done this year. Masses of info, reading, articles, useful reports and more.
Making Strategic Decisions by fmr Capital One Dir. Digital PMProduct School
Main takeaways:
- You'll learn a decision-making framework to use with your teams and in your work.
- You'll see how the framework was used in a real-life buy vs. build decision for an app-based startup.
- You'll work through your own strategic decision in order to apply the framework directly.
Geddes Munson, Mixpanel
Darrell Benatar, UserTesting.com
Shai Tamari, ClickTale
For many customers Optimizely has become a critical platform to listen to customers' web engagement and turn those data points into an improved experience. It's natural that users want to extend the Optimizely platform with all of their other marketing applications to become more effective in their decision making. Learn about a few of the complementary platforms to Optimizely that will help you unlock the most conversions from your web properties.
The partners attending discussed:
-How Clicktale's heatmaps and session playbacks can uncover the areas of the webpage that are most influential to your customers' decision-making process and may require further optimization
-How integrating Optimizely with the analytics platform Mixpanel can help you track more granular goals and view comprehensive reporting that seamlessly integrates with your business.
-Why you should use Usertesting.com to observe where customers are finding challenges on your site, and create tests to then improve their experience.
CloudFixer and MCG Training have concocted a 7-Step Master Cleanse for Salesforce data that they shared via webinar on Tuesday, March 19th at 1 PM EST. Luckily, there are no lemons, maple syrup or cayenne pepper involved!
You’re the perfect data cleansing candidate if you:
- Are worried that Salesforce, while very powerful, can also be costly and time consuming. We want to show you how it can be done easily and inexpensively.
- Need the right arguments for investing in data quality.
Experimenting on Humans’ (or, ‘everything you want from an a/b testing system’).
How do you know what 100 millions users like? Wix.com is conducting hundreds of experiments per month on production to understand which features our users like and which hurt or improve our business. At Wix we have developed our 3rd generation experiment system called Petri, which was open sourced. Petri helps us maintain some order in a chaotic system that keeps changing. We will also explain how it works and what are the patterns in conducting experiments that have a minimal effect on performance and user experience.
Surviving the hype cycle Shortcuts to split testing successCraig Sullivan
In this talk, I show the key shortcuts to stop doing stupid testing and move towards innovative and transformative design & build methodologies, including innovation through split testing exploration
Google Analytics tips including events, custom variables, multi-channel funnels, custom reports and more. From my #SearchFest presentation on February 24, 2012.
Progressive enhancement techniques are used to improve perceived performance. Incorporating progressive enhancement early in product design and development process can ensure that fast user experience is not an afterthought but is baked into the product.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
5. Amazon Shopping Cart Recommendation
5
• At Amazon, Greg Linden had this idea of showing
recommendations based on cart items
• Trade-offs
• Pro: cross-sell more items (increase average basket size)
• Con: distract people from checking out (reduce conversion)
• HiPPO (Highest Paid Person’s Opinion) : stop the project
From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html
http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
6. MSN Real Estate
§ “Find a house” widget variations
§ Revenue to MSN generated every time a user
clicks search/find button
6
A B
http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
7. Take-away
Experiments
are the only way to prove causality.
7
Use A/B testing to:
§ Guide product development
§ Measure impact (assess ROI)
§ Gain “real” customer feedback
11. What to A/B Test
§ Evaluating new ideas:
– Visual changes
– Complete redesign of web page
– Relevance algorithms
– …
§ Platform changes
§ Code refactoring
§ Bug fixes
11
Test Everything!
12. Startups vs. Big Websites
§ Do startups have enough users to A/B test?
– Startups typically look for larger effects
– 5% vs. 0.5% difference è 100 times more users!
§ Startups should establish A/B testing culture
early
12
16. 1. Experiment Management
§ Define experiments
– Whom to target?
– How to split traffic?
§ Start/stop an experiment
§ Important addition:
– Define success criteria
– Power analysis
16
17. 2. Online Infrastructure
1) Hash & partition: random & consistent
2) Deploy: server-side, as a change to
– The default configuration (Bing)
– The default code path (LinkedIn)
3) Data logging
17
0% 100%
Treatment1
D20% D20%
Hash (ID)
Treatment2 Control
18. Hash & Partition @ Scale (I)
§ Pure bucket system (Google/Bing before 200X)
18
0% 100%
Exp. 1
D20% D20%
Exp. 2 Exp. 3
60%
red green yellow
15% 15%30%
• Does not scale
• Traffic management
19. Hash & Partition @ Scale (II)
§ Fully overlapping system
0% 100%
D
Exp. 2
A2 B2 control
Exp.1
controlA1
D
B1
D
• Each experiment gets 100% traffic
• A user is in “all” experiments simultaneously
• Randomization btw experiments are independent
(unique hashID)
• Cannot avoid interaction
20. Hash & Partition @ Scale (III)
§ Hybrid: Layer + Domain
20
• Centralized management (Bing)
• Central exp. team creates/manages layers/domains
• De-centralized management (LinkedIn)
• Each experiment is one “layer” by default
• Experimenter controls hashID to create a “domain”
21. Data Logging
§ Trigger
§ Trigger-based logging
– Log whether a request is actually affected by the
experiment
– Log for both factual & counter-factual
21
All LinkedIn members
300MM +
Triggered:
Members visiting
contacts page
22. 3. Automated Offline Analysis
§ Large-scale data processing, e.g. daily @LinkedIn
– 200+ experiments
– 700+ metrics
– Billions of experiment trigger events
§ Statistical analysis
– Metrics design
– Statistical significance test (p-value, confidence interval)
– Deep-dive: slicing & dicing capability
§ Monitoring & alerting
– Data quality
– Early termination
22
25. What to Experiment?
Measure one change at a time.
Unified Search Experiments 1+2+…N50%
En-US
Pre-unified search
50%
En-US
26. What to Measure?
§ Success metrics: summarize whether
treatment is better
§ Puzzling example:
– Key metrics for Bing: number of searches &
revenue
– Ranking bug in experiment resulted in poor search
results
– Number of searches up +10% and revenue up
+30%
Success metrics should reflect long
term impact
27. Scientific Experiment Design
§ How long to run the experiment?
§ How much traffic to allocate to treatment?
Story:
§ Site speed matters
– Bing: +100msec = -0.6% revenue
– Amazon: +100msec = -1.0% revenue
– Google: +100msec = -0.2% queries
§ But not for Etsy.com?
“Faster results better? … meh”
27
28. Power
§ Power: the chance of detecting a
difference when there really is one.
§ Two reasons your feature doesn’t move
metrics
1. No “real” impact
2. Not enough power
28
Properly power up your experiment!
31. Statistical Significance
31
§ Must consider statistical significance
– A 12.9% delta can still be noise!
– Identify signal from noise; focus on the “real” movers
– Ensure results are reproducible
Experiment 1 Experiment 2
Pageviews 1.5% 12.9%
Revenue 0.8% Stat. significant 2.4%
33. Multiple Testing Concerns
§ Multiple ramps
– Pre-decide a ramp to base decision on (e.g. 50/50)
§ Multiple “peeks”
– Rely on “full”-week results
§ Multiple variants
– Choose the best, then rerun to see if replicate
§ Multiple metrics
34. An irrelevant metric is statistically
significant. What to do?
§ Which metric?
§ How “significant”? (p-value)
34
34
All
metrics
2nd order
metrics
1st order
metrics
p-value < 0.05
p-value < 0.01
p-value < 0.001
Directly impacted by exp.
Maybe impacted by exp.
Watch out for multiple testing
With 100 metrics, how many would you see stat. significant
even if your experiment does NOTHING? 5
35. References
§ Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better,
Faster Experimentation. Proceedings 16th Conference on Knowledge
Discovery and Data Mining. 2010.
§ Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD
2013: Proceedings of the 19th ACM SIGKDD international conference on
Knowledge discovery and data mining. 2013.
§ LinkedIn blog post:
http://engineering.linkedin.com/ab-testing/xlnt-platform-driving-ab-testing-linkedin
Additional Resources: RecSys’14 A/B testing workshop
35