SlideShare a Scribd company logo
1 of 14
War Stories and a Few
Lessons Learned
Tommy Guy
Principal Data Scientist
Microsoft
Content from the entire Analysis and Experimentation Team at Microsoft.
ExP Background
250,000+
Scorecards/year
2,000+
Experiments/month
4,000+
Users/month
Too much of a good thing
CONTROL (12 slides) TREATMENT (16 slides)
Too much of a good thing
WHAT HAPPENED
MSN noticed that competitors were displaying more content on their infopanes.
They decided to run an experiment to test whether or not a similar feature
would be useful for their customers.
Despite the initial hypothesis, the test results appeared to indicate that the
treatment (16 panes) was significantly worse than the control (12 panes).
A sample-ratio-mismatch (SRM) alert notified the experimenters there were
fewer users in treatment than control (49.8% instead of 50%)​.
Too much of a good thing
THE REAL CULPRIT
It turns out that the 16 pane variant was doing
extremely well. A little TOO well.
Treatment engagement increased so much that the
heaviest users were being classified as bots.
Power users were removed from the experiment
altogether causing an uneven distribution of visitors
and the subsequent SRM alert.
CONCLUSION
Once the issue was fixed, the SRM alert no longer
fired and the 16 pane variant performed extremely
well. This resulted in a significant improvement in
engagement and $1.2m annual revenue.
But this wasn't the only insight. MSN's machine
learning algorithms had to be retrained to account
for the change in user behavior, which helped
prevent lost revenue during the experiment.
SRM notifications helped uncover major issues that,
when resolved, both dramatically changed the
outcome of an experiment and shined a light on
deeper, classification flaws!
Uneven Telemetry Loss
TREATMENT
A subscription prompt was added for the treatment.
CONTROL
Windows Store app landing page.
THE RESULT
An experiment quality alert indicated there were more users in the
treatment rather than in the control.
THIS IS AN ODD (RARE) EVENT. WHY?
 Our Store telemetry has two event types, page views and clicks.
 All events should be logged, but prior issues with 1st impression event loss had occurred.
 Typically event loss occurs equally in treatment and control (or in treatment occasionally).
 Increased event loss in the control is unusual.
Uneven Telemetry Loss
What was the Reason?
An Analysis of the scorecard revealed that the SRM was caused
by a change in user behavior interacting with logging anomalies.
Meaning…
 While Treatment and Control had the same rate of lost first impressions
 The treatment had extra clicks from users closing the lightbox
 Hence the treatment ended up with more total users
Uneven Telemetry Loss
STRATEGIC QUESTIONS
Needed to set a testable product wide
guardrail that teams can independently
work within.
Should Bing invest in reducing time to
display search results?
EXPERIMENT DEFINITION
Add artificial delays to Page Load Time.
Study the effects of small and large
differences on Bing metrics like revenue
and sessions.
aka.ms/exp/paper/speed
The Value of Speed
What’s your estimate? An improvement of _______ seconds can pay for a senior
engineer for one year.
What was the result?
IMPACT AND DECISIONS MADE
A ‘performance’ budget was created per
team to enable distributed ship decisions.
Using the ExP metrics authoring service
Bing added Ads Revenue to the set of
required Bing guardrail metrics ensuring
it was computed for all future Bing
experiments.
A 4 millisecond gain can fund one engineer for a year.
The Value of Speed
TREATMENTCONTROL
Making a Successful Experiment Better
The Outlook team had a hypothesis that adding an 'unread'
badge would increase frequent engagement with the
application. An experiment was run to test this idea on
Android phones.
THE RESULT..?
The test was a strong winner at the
aggregate level. The experiment
was a good ship candidate.
Making a Successful Experiment Better
Why? because some manufactures
used a custom skin that hid the badge!
THE DISCOVERY
However, Segments of Interest
alerted the Outlook team to
additional insights:
On certain devices the lift was flat,
while others saw substantial
positive results.
THE OUTCOME
Armed with this knowledge,
Outlook invested additional
resources to create a version of the
badge for each incompatible
device.
The initial test success was
replicated across all devices.
Using Segments of Interest helped
Outlook maximize the reach of a
successful idea.
Making a Successful Experiment Better
We’re hiring!
https://aka.ms/aejobs
http://bitly.com/HIPPOExplained

More Related Content

Similar to War Stories and a Few Lessons Learned

Intuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramIntuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramOptimizely
 
5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing Beginners5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing BeginnersNeotys
 
Iwsm2014 importance of benchmarking (john ogilvie & harold van heeringen)
Iwsm2014   importance of benchmarking (john ogilvie & harold van heeringen)Iwsm2014   importance of benchmarking (john ogilvie & harold van heeringen)
Iwsm2014 importance of benchmarking (john ogilvie & harold van heeringen)Nesma
 
The importance of benchmarking software projects - Van Heeringen and Ogilvie
The importance of benchmarking software projects - Van Heeringen and OgilvieThe importance of benchmarking software projects - Van Heeringen and Ogilvie
The importance of benchmarking software projects - Van Heeringen and OgilvieHarold van Heeringen
 
Microsoft guide controlled experiments
Microsoft guide controlled experimentsMicrosoft guide controlled experiments
Microsoft guide controlled experimentsBitsytask
 
Into AB experiments
Into AB experimentsInto AB experiments
Into AB experimentsDeven
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of ProductProduct School
 
Fueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfFueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfVWO
 
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...Pierre-Majorique Léger
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingJack Nguyen (Hung Tien)
 
Optimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely
 
The Testing Planet Issue 7
The Testing Planet Issue 7The Testing Planet Issue 7
The Testing Planet Issue 7Rosie Sherry
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationOptimizely
 
AI Makes Software Testing Smarter.pdf
AI Makes Software Testing Smarter.pdfAI Makes Software Testing Smarter.pdf
AI Makes Software Testing Smarter.pdfGeorge Ukkuru
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online productsAshish Dua
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101Ashish Dua
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkOptimizely
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationOptimizely
 
Six sigma for beginner
Six sigma for beginnerSix sigma for beginner
Six sigma for beginnerYusar Cahyadi
 

Similar to War Stories and a Few Lessons Learned (20)

Intuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation ProgramIntuit - How to Scale Your Experimentation Program
Intuit - How to Scale Your Experimentation Program
 
5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing Beginners5 Essential Tips for Load Testing Beginners
5 Essential Tips for Load Testing Beginners
 
Iwsm2014 importance of benchmarking (john ogilvie & harold van heeringen)
Iwsm2014   importance of benchmarking (john ogilvie & harold van heeringen)Iwsm2014   importance of benchmarking (john ogilvie & harold van heeringen)
Iwsm2014 importance of benchmarking (john ogilvie & harold van heeringen)
 
The importance of benchmarking software projects - Van Heeringen and Ogilvie
The importance of benchmarking software projects - Van Heeringen and OgilvieThe importance of benchmarking software projects - Van Heeringen and Ogilvie
The importance of benchmarking software projects - Van Heeringen and Ogilvie
 
Microsoft guide controlled experiments
Microsoft guide controlled experimentsMicrosoft guide controlled experiments
Microsoft guide controlled experiments
 
Into AB experiments
Into AB experimentsInto AB experiments
Into AB experiments
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
 
bug-advocacy
bug-advocacybug-advocacy
bug-advocacy
 
Fueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfFueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdf
 
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...SFN 2019 Presentation : Method of and system for processing signals sensed fr...
SFN 2019 Presentation : Method of and system for processing signals sensed fr...
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
Optimizely Partner Ecosystem
Optimizely Partner EcosystemOptimizely Partner Ecosystem
Optimizely Partner Ecosystem
 
The Testing Planet Issue 7
The Testing Planet Issue 7The Testing Planet Issue 7
The Testing Planet Issue 7
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
 
AI Makes Software Testing Smarter.pdf
AI Makes Software Testing Smarter.pdfAI Makes Software Testing Smarter.pdf
AI Makes Software Testing Smarter.pdf
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online products
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with Experimentation
 
Six sigma for beginner
Six sigma for beginnerSix sigma for beginner
Six sigma for beginner
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

War Stories and a Few Lessons Learned

  • 1. War Stories and a Few Lessons Learned Tommy Guy Principal Data Scientist Microsoft Content from the entire Analysis and Experimentation Team at Microsoft.
  • 3. Too much of a good thing CONTROL (12 slides) TREATMENT (16 slides)
  • 4. Too much of a good thing WHAT HAPPENED MSN noticed that competitors were displaying more content on their infopanes. They decided to run an experiment to test whether or not a similar feature would be useful for their customers. Despite the initial hypothesis, the test results appeared to indicate that the treatment (16 panes) was significantly worse than the control (12 panes). A sample-ratio-mismatch (SRM) alert notified the experimenters there were fewer users in treatment than control (49.8% instead of 50%)​.
  • 5. Too much of a good thing THE REAL CULPRIT It turns out that the 16 pane variant was doing extremely well. A little TOO well. Treatment engagement increased so much that the heaviest users were being classified as bots. Power users were removed from the experiment altogether causing an uneven distribution of visitors and the subsequent SRM alert. CONCLUSION Once the issue was fixed, the SRM alert no longer fired and the 16 pane variant performed extremely well. This resulted in a significant improvement in engagement and $1.2m annual revenue. But this wasn't the only insight. MSN's machine learning algorithms had to be retrained to account for the change in user behavior, which helped prevent lost revenue during the experiment. SRM notifications helped uncover major issues that, when resolved, both dramatically changed the outcome of an experiment and shined a light on deeper, classification flaws!
  • 6. Uneven Telemetry Loss TREATMENT A subscription prompt was added for the treatment. CONTROL Windows Store app landing page.
  • 7. THE RESULT An experiment quality alert indicated there were more users in the treatment rather than in the control. THIS IS AN ODD (RARE) EVENT. WHY?  Our Store telemetry has two event types, page views and clicks.  All events should be logged, but prior issues with 1st impression event loss had occurred.  Typically event loss occurs equally in treatment and control (or in treatment occasionally).  Increased event loss in the control is unusual. Uneven Telemetry Loss
  • 8. What was the Reason? An Analysis of the scorecard revealed that the SRM was caused by a change in user behavior interacting with logging anomalies. Meaning…  While Treatment and Control had the same rate of lost first impressions  The treatment had extra clicks from users closing the lightbox  Hence the treatment ended up with more total users Uneven Telemetry Loss
  • 9. STRATEGIC QUESTIONS Needed to set a testable product wide guardrail that teams can independently work within. Should Bing invest in reducing time to display search results? EXPERIMENT DEFINITION Add artificial delays to Page Load Time. Study the effects of small and large differences on Bing metrics like revenue and sessions. aka.ms/exp/paper/speed The Value of Speed What’s your estimate? An improvement of _______ seconds can pay for a senior engineer for one year.
  • 10. What was the result? IMPACT AND DECISIONS MADE A ‘performance’ budget was created per team to enable distributed ship decisions. Using the ExP metrics authoring service Bing added Ads Revenue to the set of required Bing guardrail metrics ensuring it was computed for all future Bing experiments. A 4 millisecond gain can fund one engineer for a year. The Value of Speed
  • 11. TREATMENTCONTROL Making a Successful Experiment Better The Outlook team had a hypothesis that adding an 'unread' badge would increase frequent engagement with the application. An experiment was run to test this idea on Android phones.
  • 12. THE RESULT..? The test was a strong winner at the aggregate level. The experiment was a good ship candidate. Making a Successful Experiment Better Why? because some manufactures used a custom skin that hid the badge! THE DISCOVERY However, Segments of Interest alerted the Outlook team to additional insights: On certain devices the lift was flat, while others saw substantial positive results.
  • 13. THE OUTCOME Armed with this knowledge, Outlook invested additional resources to create a version of the badge for each incompatible device. The initial test success was replicated across all devices. Using Segments of Interest helped Outlook maximize the reach of a successful idea. Making a Successful Experiment Better

Editor's Notes

  1. TODO: check that link
  2. TODO: check that link
  3. TODO: check that link
  4. https://exp-xcard-preprod-slot1.azurewebsites.net/xcard/?view=9d6a66a2-b681-4bcf-af10-83fdd53d9628
  5. TODO: check that link
  6. https://exp.microsoft.com/xcard/?stepId=d6e62f4a-3f1c-4e7e-8fac-c0f6795a5aaa&scorecardIds=127147910
  7. https://exp.microsoft.com/xcard/?stepId=d6e62f4a-3f1c-4e7e-8fac-c0f6795a5aaa&scorecardIds=127147910