SlideShare a Scribd company logo
1 of 76
@arburbank
Building a culture
of experimentation
scaling data science at Pinterest
@arburbank
Andrea Burbank
@arburbank
σ, μ
@arburbank
Organizational maturity model
use source control
write unit tests
track bugs
write a spec
build often
@arburbank
Experimentation maturity model
@arburbank
Experimentation maturity model
get started
get big
get better
get out
get tools
@arburbank
Stage 1:
get started
get started
@arburbank
problem:
people making bad decisions
get started
@arburbank
Run experiments
entire
population
control
enable
d
@arburbank
Cultural maturity model
get started
entire
population
control
enabled
data
data
insight
@arburbank
Stage 2:
get big
get started
get big
@arburbank
problem:
underutilization
get started
get big
@arburbank
http://altmba.com/wp-content/uploads/2015/06/fieldofdreamscorn.jpg
@arburbank
http://altmba.com/wp-content/uploads/2015/06/fieldofdreamscorn.jpg
@arburbank
if you build it, they won’t come
marketing
@arburbank
if you build it, they won’t come
evangelism
@arburbank
if you build it, they won’t come
salesmanship
@arburbank
Cultural maturity model
evangelize
educate
explain
get big
@arburbank
Stage 3:
get better
get started
get big
get better
@arburbank
problem:
guidance
get started
get big
get better
needed
@arburbank
you are the human in the loop
ensure
successrun test
YOU
@arburbank
you are the human in the loop
ensure
successrun test
YOU
ensure
successrun test
ensure
success
run test
ensure
success
run test
@arburbank
@arburbank
It is not your career
goal to be the
experiments person
@arburbank
(you should have
higher ambitions)
@arburbank
Cultural maturity model
how can I
help?
get better
@arburbank
Stage 4:
get out
get started
get big
get better
get out
@arburbank
problem:
scale yourself
get started
get big
get better
get out
@arburbank
Write down the process
What mistakes do you see in experiments?
What questions do you answer repeatedly?
How will learning this help others?
@arburbank
@arburbank
@arburbank
@arburbank
“if you let engineers
run experiments, they
will screw them up in
every way possible.”
@arburbank
“if you let untrained
engineers run
experiments, they will
screw them up in every
way possible.”
@arburbank@arburbank
@arburbank
For every important
mistake, explain why
it’s wrong and how to
avoid it.
@arburbank
launch landing
in-flight
@arburbank
launch
@arburbank
@arburbank
in-flight
@arburbank
@arburbank
landing
@arburbank
@arburbank
Make a list, check it twice


 landing
in-flight
launch
@arburbank
Make a list, check it twice
e+r+
@arburbank
Make a list, check it twice
@experiments-help
@arburbank
@experiments-help
names matter:
“help,” not “on-call”
@arburbank
@experiments-help
engineer partners:
move fast, own the process
@arburbank
@experiments-help
the right people:
thoughtful, well-respected
@arburbank
@arburbank
Implement a process
1. Checklists for experiments
2. @experiments-help mention in code review
3. e+ as part of code review
4. Mailing list: experiments-help@
5. Experiment document template
6. Rotation of experiment helpers
@arburbank
Implement a process
1. Checklists for experiments
2. @experiments-help mention in code review
3. e+ as part of code review
4. Mailing list: experiments-help@
5. Experiment document template
6. Rotation of experiment helpers
@arburbank
Train your successors
So you want to be an experiment helper?
• Step 1: read the documentation
• Step 2: take the experiment quiz
• Step 3: review all experiments for a week
@arburbank
@arburbank
@arburbank
50
trained
experiment
helpers
@arburbank
Cultural maturity model
how would
you answer
that?
get out
@arburbank
Stage 5:
get tools
get started
get big
get better
get out
get tools
@arburbank
problem:
simple mistakes
get started
get big
get better
get out
get tools
@arburbank
get tools
launch
@arburbank
simplify experiment API
@arburbank
remove untriggered experiments
@arburbank
create helper functions
@arburbank
get tools
in flight
@arburbank
add a control group automatically
when a new variant is introduced
@arburbank
expand experiment groups
at the same rate
@arburbank
get tools
landing
@arburbank
detect errors
@arburbank
Automation: analysis
chi-squared test on group sizes
@arburbank
Automation: analysis
test that groups grew at the same rate
@arburbank
Automation: analysis
verify similar distributions of users
@arburbank
Automation: analysis
hide results that are likely to be wrong
@arburbank
simplify analysis
@arburbank
Automation: analysis
automatically track important metrics
(and compute statistical significance)
@arburbank
Automation: analysis
segment important populations
@arburbank
Automation: analysis
measure novelty vs. long-term effects
@arburbank
Cultural maturity model
just use
humans for
the hard part:
thinking
get tools
@arburbank
Experimentation maturity model
get started
get big
get better
get out
get tools
@arburbank
Stage 6:
the future ??
@arburbank
data science:
changing minds, one at a time
andrea@pinterest.com

More Related Content

What's hot

How to Build a Robust Product Roadmap by Salesforce VP of Product
How to Build a Robust Product Roadmap by Salesforce VP of ProductHow to Build a Robust Product Roadmap by Salesforce VP of Product
How to Build a Robust Product Roadmap by Salesforce VP of ProductProduct School
 
How to Shift to Product-Led Growth
How to Shift to Product-Led GrowthHow to Shift to Product-Led Growth
How to Shift to Product-Led GrowthProductPlan
 
Product Development with Spotify's Product Manager
 Product Development with Spotify's Product Manager Product Development with Spotify's Product Manager
Product Development with Spotify's Product ManagerProduct School
 
Harnessing the Power of Product Analytics by Dan Olsen
Harnessing the Power of Product Analytics by Dan OlsenHarnessing the Power of Product Analytics by Dan Olsen
Harnessing the Power of Product Analytics by Dan OlsenDan Olsen
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online productsAshish Dua
 
Product Led Growth Strategy
Product Led Growth StrategyProduct Led Growth Strategy
Product Led Growth StrategyMickey Alon
 
Brian Balfour: Building A Growth Machine
Brian Balfour: Building A Growth MachineBrian Balfour: Building A Growth Machine
Brian Balfour: Building A Growth MachineHeavybit
 
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...Dan Olsen
 
Startup Metrics for Pirates
Startup Metrics for PiratesStartup Metrics for Pirates
Startup Metrics for PiratesDave McClure
 
How to create your Minimum Viable Product - Raff Paquin
How to create your Minimum Viable Product - Raff PaquinHow to create your Minimum Viable Product - Raff Paquin
How to create your Minimum Viable Product - Raff PaquinRaff Paquin
 
Building a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessBuilding a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessDavid Skok
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
 
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)Trigger
 
Product Backlog - Refinement and Prioritization Techniques
Product Backlog - Refinement and Prioritization TechniquesProduct Backlog - Refinement and Prioritization Techniques
Product Backlog - Refinement and Prioritization TechniquesVikash Karuna
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMProduct School
 
How Product Management plus Design Leads to Product Success by Dan Olsen
How Product Management plus Design Leads to Product Success by Dan OlsenHow Product Management plus Design Leads to Product Success by Dan Olsen
How Product Management plus Design Leads to Product Success by Dan OlsenDan Olsen
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyDanielle Jabin
 
From Zero to a Million Users - Dropbox and Xobni lessons learned
From Zero to a Million Users - Dropbox and Xobni lessons learnedFrom Zero to a Million Users - Dropbox and Xobni lessons learned
From Zero to a Million Users - Dropbox and Xobni lessons learnedAdam Smith
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At GoogleJohn Gibbon
 

What's hot (20)

How to Build a Robust Product Roadmap by Salesforce VP of Product
How to Build a Robust Product Roadmap by Salesforce VP of ProductHow to Build a Robust Product Roadmap by Salesforce VP of Product
How to Build a Robust Product Roadmap by Salesforce VP of Product
 
How to Shift to Product-Led Growth
How to Shift to Product-Led GrowthHow to Shift to Product-Led Growth
How to Shift to Product-Led Growth
 
Product Development with Spotify's Product Manager
 Product Development with Spotify's Product Manager Product Development with Spotify's Product Manager
Product Development with Spotify's Product Manager
 
Harnessing the Power of Product Analytics by Dan Olsen
Harnessing the Power of Product Analytics by Dan OlsenHarnessing the Power of Product Analytics by Dan Olsen
Harnessing the Power of Product Analytics by Dan Olsen
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online products
 
Product Led Growth Strategy
Product Led Growth StrategyProduct Led Growth Strategy
Product Led Growth Strategy
 
Brian Balfour: Building A Growth Machine
Brian Balfour: Building A Growth MachineBrian Balfour: Building A Growth Machine
Brian Balfour: Building A Growth Machine
 
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...
A Playbook for Achieving Product-Market Fit by Dan Olsen at Lean Startup Conf...
 
WTF is a Product Roadmap?
WTF is a Product Roadmap?WTF is a Product Roadmap?
WTF is a Product Roadmap?
 
Startup Metrics for Pirates
Startup Metrics for PiratesStartup Metrics for Pirates
Startup Metrics for Pirates
 
How to create your Minimum Viable Product - Raff Paquin
How to create your Minimum Viable Product - Raff PaquinHow to create your Minimum Viable Product - Raff Paquin
How to create your Minimum Viable Product - Raff Paquin
 
Building a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth ProcessBuilding a Repeatable, Scalable & Profitable Growth Process
Building a Repeatable, Scalable & Profitable Growth Process
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PM
 
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)
12 Steps to Effective Growth Hacking (www.wepullthetrigger.com)
 
Product Backlog - Refinement and Prioritization Techniques
Product Backlog - Refinement and Prioritization TechniquesProduct Backlog - Refinement and Prioritization Techniques
Product Backlog - Refinement and Prioritization Techniques
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
 
How Product Management plus Design Leads to Product Success by Dan Olsen
How Product Management plus Design Leads to Product Success by Dan OlsenHow Product Management plus Design Leads to Product Success by Dan Olsen
How Product Management plus Design Leads to Product Success by Dan Olsen
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
 
From Zero to a Million Users - Dropbox and Xobni lessons learned
From Zero to a Million Users - Dropbox and Xobni lessons learnedFrom Zero to a Million Users - Dropbox and Xobni lessons learned
From Zero to a Million Users - Dropbox and Xobni lessons learned
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At Google
 

Viewers also liked

いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標圭輔 大曽根
 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ Seiji Takahashi
 
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習圭輔 大曽根
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTHiroaki Kudo
 
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計Cloudera Japan
 
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015Cloudera Japan
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechCloudera Japan
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のことHiroaki Kudo
 
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話Kentaro Yoshida
 
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例圭輔 大曽根
 
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97圭輔 大曽根
 
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理圭輔 大曽根
 
マイクロサービスとABテスト
マイクロサービスとABテストマイクロサービスとABテスト
マイクロサービスとABテスト圭輔 大曽根
 
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyWebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyHiroaki Kudo
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Cloudera Japan
 

Viewers also liked (15)

いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標
 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑ 
 
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習
 
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LTあなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
 
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
 
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
 
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
 
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
 
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97
 
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理
 
マイクロサービスとABテスト
マイクロサービスとABテストマイクロサービスとABテスト
マイクロサービスとABテスト
 
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosyWebDB Forum 2016 gunosy
WebDB Forum 2016 gunosy
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
 

Similar to A/B Testing at Pinterest: Building a Culture of Experimentation

Workshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras BilgenWorkshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras Bilgenux singapore
 
Rachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer
 
Designing to save lives: Government technical documentation
Designing  to save  lives: Government technical documentation Designing  to save  lives: Government technical documentation
Designing to save lives: Government technical documentation Laurian Vega
 
Content Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessContent Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessLaura Creekmore
 
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...COMAQA.BY
 
D school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestD school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestLee-Anne Walker
 
How to avoid research debt
How to avoid research debtHow to avoid research debt
How to avoid research debtCaroline Jarrett
 
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCFull Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCKarl Stanton
 
Introduction to bugs measurement
Introduction to bugs measurementIntroduction to bugs measurement
Introduction to bugs measurementVolodya Novostavsky
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningDomino Data Lab
 
LEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWLEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWwe20
 
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...Product Camp Brasil
 
Digital portfolio 1_v2
Digital portfolio 1_v2Digital portfolio 1_v2
Digital portfolio 1_v2mustafaalinike
 
Case study for agile software development:
Case study for agile software development: Case study for agile software development:
Case study for agile software development: Joe Crespo
 
Improve the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItImprove the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItPam Noreault
 
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...Sophie Freiermuth
 
Using cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityUsing cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityIntopia
 
Cultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That ScaleCultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That Scalecolleenfry
 
Pubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus EditionPubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus Editionrachelmeyer
 
5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!Usersnap
 

Similar to A/B Testing at Pinterest: Building a Culture of Experimentation (20)

Workshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras BilgenWorkshop #2: User Research For Everyone by Aras Bilgen
Workshop #2: User Research For Everyone by Aras Bilgen
 
Rachel Meyer Pubcon Presentation
Rachel Meyer Pubcon PresentationRachel Meyer Pubcon Presentation
Rachel Meyer Pubcon Presentation
 
Designing to save lives: Government technical documentation
Designing  to save  lives: Government technical documentation Designing  to save  lives: Government technical documentation
Designing to save lives: Government technical documentation
 
Content Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing SuccessContent Strategy: A Framework for Marketing Success
Content Strategy: A Framework for Marketing Success
 
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
Vivien Ibironke Ibiyemi. Comaqa Spring 2018. Enhance your Testing Skills With...
 
D school assignment 3 Prototype and Test
D school assignment 3 Prototype and TestD school assignment 3 Prototype and Test
D school assignment 3 Prototype and Test
 
How to avoid research debt
How to avoid research debtHow to avoid research debt
How to avoid research debt
 
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYCFull Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
Full Stack Engineering - April 29th, 2014 @ Full Stack Engineering Meetup NYC
 
Introduction to bugs measurement
Introduction to bugs measurementIntroduction to bugs measurement
Introduction to bugs measurement
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
 
LEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEWLEARN STARTUP OVERVIEW
LEARN STARTUP OVERVIEW
 
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
[Pcamp19] - Prototyping the Pivotal Moments First: Visualizing the Forks in t...
 
Digital portfolio 1_v2
Digital portfolio 1_v2Digital portfolio 1_v2
Digital portfolio 1_v2
 
Case study for agile software development:
Case study for agile software development: Case study for agile software development:
Case study for agile software development:
 
Improve the UX of Your Content and Prove It
Improve the UX of Your Content and Prove ItImprove the UX of Your Content and Prove It
Improve the UX of Your Content and Prove It
 
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
It's time to research our designs better. Here's how. UIUX Conference 2018 - ...
 
Using cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibilityUsing cognitive walkthroughs to better review designs for accessibility
Using cognitive walkthroughs to better review designs for accessibility
 
Cultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That ScaleCultivating Content: Designing Wiki Solutions That Scale
Cultivating Content: Designing Wiki Solutions That Scale
 
Pubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus EditionPubcon SFIMA Super Awesome Extended Bonus Edition
Pubcon SFIMA Super Awesome Extended Bonus Edition
 
5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!5 Essential Tips For Improving Your Website Mockups & Prototypes!
5 Essential Tips For Improving Your Website Mockups & Prototypes!
 

More from WrangleConf

Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangleConf
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangleConf
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...WrangleConf
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangleConf
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangleConf
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...WrangleConf
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangleConf
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudWrangleConf
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance WrangleConf
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug DiscoveryWrangleConf
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)WrangleConf
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseWrangleConf
 

More from WrangleConf (12)

Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes Data
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small Data
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at Scale
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HR
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to Cloud
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug Discovery
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product Sense
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

A/B Testing at Pinterest: Building a Culture of Experimentation

Editor's Notes

  1. Hi there! I’m Andrea Burbank, and I’m a data scientist at Pinterest. WHAT MOVES THE NEEDLE When I was asked to speak at a conference for data scientists, I thought for a while about what that means, and which aspects of my experience would be most interesting to folks who are working on the same sorts of problems that I tackle every day. What I ultimately decided was that the work that moves the needle at Pinterest wasn’t just the analysis we do to understand our ecosystem or to predict user engagement, but the culture of experimentation we’ve built up across the entire company. It wasn’t something that happened overnight, and I hope that by sharing our experience I can help you scale data science at your companies as well.
  2. As data scientists, we often think of ourselves as a hybrid between a software engineer and a statistician, blending the best of both to build a talented data machine. When we hit on a problem like AB testing, we tend to approach it from that perspective: what tools and frameworks should I engineer and what statistical comparisons are most relevant in order to build a successful AB testing program? Those are both tremendously important tools, obviously. But what will end up making or breaking your experimentation program is neither of those: it’s the people. It’s building up a culture of AB testing, one person at a time.
  3. Perhaps you’ve heard of a notion of an organizational maturity model. In software, there are basic steps you follow to improve your software engineering quality: Use source control, write unit tests, and so on.
  4. MODEL + PEOPLE + ANTICIPATE Along those lines, I’d like to propose a model for the cultural maturity of experimentation. Every time you solve THE BIG problem facing you at the moment, move on to the next stage of experimentation and you create a new problem. And fundamentally, each of the problems you face is about the people and the culture, and the solutions you form are only as successful as the culture you foster to nourish them. For us, we didn’t recognize this pattern until we’d already stumbled partway through this evolution, and even when we recognized that the solution was in the culture and the people, it took us a while to embrace that approach. My hope is that by talking about these stages I can help you, unlike us, to recognize the stage of the maturity model you’re currently in, to frame and solve it as a human problem, and then to start to anticipate the next phase before it becomes absolutely necessary. So what are those stages?
  5. So what are those stages? I’d say they look like this. Get STARTED. Get BIG. Get BETTER. Get OUT. Get TOOLS. Let’s dive in.
  6. Stage 1: get started. This is where you actually build the experiment framework.
  7. The problem: people are making bad decisions. Maybe they’re shipping things willy-nilly without measuring them at all, or maybe they’re watching trends over time and attributing change to newly released products when in fact the change might be completely unrelated. So you decide to build an AB testing framework.
  8. FRAMEWORK + PIPELINE + UI -> WE MADE IT! In my first couple months at Pinterest, I built up our experiment framework to have all the capabilities I thought were important: - triggering users at the moment the experiment actually affected their experience, - keeping track of novelty effects, and - functioning correctly for offline experiments. I built a data pipeline to capture all the most important metrics automatically and a UI to surface those metrics. I ran a few experiments myself, validated the findings with A/A tests and on real experiments, and figured AHA! We’d made it. Now we could run experiments and actually understand the effects of the feature changes we made.
  9. When you’ve toiled and coded and tested and built, you may think you’re done. After all, you now have a working framework. But in fact, you only just got started.
  10. ACTUALLY USE IT The next stage is to get BIG. What I mean by that is to get people to actually use the framework you’ve built.
  11. DRIVE ADOPTION The problem you’re facing now is that your framework on its own is useless; you need to drive adoption. I think it’s easy … to underestimate how important this phase is.
  12. A GREAT PRODUCT SPEAKS FOR ITSELF I think it’s easy to underestimate how important this phase is. Again, we are engineers. There’s a part of us that really wants to believe that a great product speaks for itself. It’s so tremendously useful! You’ve anticipated all the use cases and made it easy to actually understand the effect your feature is having on users! How on earth is this not the holy grail??
  13. Unfortunately, this is almost never actually true. Even high-quality tools don’t magically attract users. So stage 2 is about getting people to actually adopt your new framework, to buy into the idea of running experiments.
  14. Once you have your experiment framework in place, your #1 priority is to get people to use it. That means you do marketing.
  15. That means you do evangelism.
  16. TECH TALKS + DEMO + PM + BENEFITS + STRATEGIC PROJECT That means that you have to be a salesman (or woman). Give tech talks. Do a demo. Give impassioned speeches to anyone who will listen. Whenever you hear about a feature going out, go find the PM, chat with the engineer, try to convince them to run an experiment. Tell them what they will learn, how they will benefit, how easy it will be. Find a strategically important project, suggest running it as an experiment, and don’t take no for an answer.
  17. SHOW VALUE -> AGAIN AND AGAIN If you demonstrate the metrics effect of a strategic initiative, or you earn people call-outs at the company all-hands for lifting a metric by 5%, or you help the company avoid a huge mistake, people like it. They want you to do it again. And again. And now, you’ve done it: you got big.
  18. LOTS OF PEOPLE -> NUDGES In stage 3, your experiment framework is big and things are going swimmingly. People start running lots of experiments, and they firmly believe that running an AB test is the best way to understand the performance of their feature. But now that you’re not the person running all the experiments, you find that they need some nudges here and there to make their experiments run correctly. Instead of evangelizing, you spend your time helping people run experiments: come up with a hypothesis, determine how they’ll detect failure, consider how changes might affect individual users’ experience.
  19. DECIDE ON OWN -> NEED YOUR GUIDANCE In stage 2, no one was trying to run experiments unless you cajoled them into it, so you were always right there to help with implementation. Now that folks have bought in and are doing it on their own, guidance is needed, and you become the human to provide that guidance.
  20. FUN! And depending on your personality, your patience, and how quickly your company is growing, this stage might last a while. If you’re in this stage now, you might think it’s pretty great. Your framework is getting used, people are making good decisions, and you have the added perk that you get to be connected to feature development across the whole company, so you always know what’s going on. And honestly, that’s a lot of fun.
  21. SPOF! NAPA But after a while, you realize that you are a single point of failure. When you’re not there, people ship experiments when there aren’t enough users. They add new variants without thinking about how to measure them. They start experiments that accidentally trigger for everyone instead of only the affected users. For me, stage 3 became suboptimal pretty abruptly when I found myself trying to do code reviews on my iPhone while on my anniversary bike trip in Napa.
  22. GO INSANE OR STOP LEARNING Now, I hit this problem fairly quickly because Pinterest was growing at a breakneck pace. You might think you can last in stage 3 for a while, or even indefinitely. But as someone who enjoyed that stage tremendously, I’d advise against it. If your organization grows and you don’t scale, the culture will spin apart and you’ll go insane. If it doesn’t grow, you’ll keep needing to play the same role of experiments diva, and you won’t get a chance to learn what else you can contribute.
  23. This is important. It’s not your career goal to be the experiments person. (You should have higher ambitions.)
  24. Making experiments run is important. It’s interesting. But it’s not what you should be doing with the rest of your life.
  25. HELP PEOPLE SUCCEED. So that’s stage three. Once you have momentum behind your experiment framework, help people succeed with it. Help them think through their setup and their data. Help them figure out whether they have enough people, or it’s not working for a subset of the population. Help review their code, check their triggering, and figure out how to relaunch when things go wrong. Having you in the loop will make your company’s experiments successful. But also: start thinking about how you can move on to stage 4.
  26. TEACH OTHERS In stage 4, you start thinking about how you can teach others to fulfill the role you’ve been taking on in helping people to run successful experiments, and how you can get out. In stage 4, you start to (flip) scale yourself.
  27. Scale yourself. Figure out what you do and write it down. Develop repeatable processes, guidelines, checklists.
  28. LIST ERRORS At this point, you’ve been helping people with experiments for a while. What mistakes do you see happening? What questions do you answer repeatedly? And how can you get others to want to understand experiments better? The first thing I did was try to write down every problem I’d seen in an experiment. I dug up that list when I was writing this talk.
  29. It was three pages long in small font.
  30. It was three pages long in small font.
  31. It was three pages long in small font. When I shared this list with a coworker, he said:
  32. It was three pages long in small font. When I shared this list with a coworker, he said:
  33. TRAIN ENGINEERS -> PEOPLE PROBLEM + PEOPLE SOLUTION I rephrased it as follows. But then the question is: how do you train engineers to run experiments accurately? Again, this is a people problem, and the solution again comes from people.
  34. The answer: make your process clear and easily repeatable. If you haven’t read Atul Gawande’s book, you should. Even the most complex human processes: performing surgery, flying a plane, building a skyscraper, can be improved by simple checklists. There are so many pieces to keep track of that having a simple list can help you get the important things right.
  35. To make a checklist: for every important mistake, explain why it’s wrong and how to avoid it.
  36. LIFECYCLE We also thought about the experiment lifecycle. In the end, there are three major phases of every experiment. First, the experiment has to launch. Before it actually take off with users on board, you want to make sure that it’s configured correctly so we can learn what we want. Once an experiment is in flight, we may need to make adjustments. Perhaps we had an idea for a new take on the feature we’re testing, or we just want to increase our experimental power to measure the effect on a larger population. And finally, when we’re ready to land the experiment, we need to make sure that we’ve learned what we want to learn and that we’re making the right decision from the data. So we built checklists: what should you watch for in each of these phases?
  37. THINKING Launch is the most important thing. If the experiment is trying to measure the wrong thing or is set up incorrectly, you won’t learn anything from it. Before an experiment begins, most of the work is in the thinking. What are you trying to do? Why? Can you measure what you want to change?
  38. WHY CHANGE? Sometimes an experiment owner will want to make changes to an experiment after launch. Usually they want to increase group sizes to get more statistical power, but sometimes they want to change the population they’re measuring or add new types of treatment. Sometimes they want to change an experiment but they haven’t actually checked to see whether it’s working as expected. All of these things then turned into the in-flight checklist.
  39. READY? RIGHT DECISION? Lastly, at some point every experiment should be shut down. Sometimes people try to shut it down too early, before they have enough data or before we can understand the long-term effects. Or they’re right that it should be shut down now, but they decide to turn it off when it actually should be shipped because metrics are up, or they decide to ship it even though metrics are down but they don’t acknowledge it. The landing checklist tries to anticipate these issues and make sure we’re avoiding them.
  40. IMPLEMENT: NO FRICTION + GET OTHERS ON BOARD But all the checklists in the world are meaningless if nobody implements them. So we spent a while thinking about two things: how we could try to improve the quality of experiments being run without introducing too much friction, and how we could get others on board to help monitor experiments’ quality.
  41. PIGGYBACK ON R+ = OK. OPTIONAL IN THEORY. INTERNS. E+. YOUR CULTURE. To improve the quality of experiments without introducing friction, we piggybacked on the concept we already have of getting an r+. If you’re not familiar with an r+, it’s a naming convention that we adopted from Mozilla, but it’s just a way of signing off on a code review. When a code reviewer signs off with r+ on a review, it means that they think the new code improves the codebase. We had a culture of r+ that we stole for e+. We never said it was mandatory, just that it was recommended, but practically it was mandatory. No one ships code without an r+ except for new people and interns. For e+, we took that and said, hey, just as with code review, making sure that an experiment goes out correctly is critical. When you set up or change an experiment in a code review, ask someone who knows about experiments to take a look at it and provide feedback on your experiment setup. You need to find something that works within your culture. We could leverage this part of our existing engineering culture to create improved experiments. What is that lever at your company?
  42. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  43. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  44. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  45. The other key to our success was getting others on board to be the experiment reviewers. I think there were a couple pieces that were important here: 1) Calling it experiments-help. We considered experiments on-call but who finds on-call glamorous? Everyone wants to help others. 2) Getting partners in engineering: move faster, badge value (certification) and owning the process themselves, not gatekeepers. 3) Choosing the right people. The first few helpers were really thoughtful, well-respected engineers in the organization. Other people looked to them as leaders.
  46. And so we announced a process. We introduced the experiments-help@ email alias and just asked people to come to us for help if they wanted to learn from their experiments.
  47. ALL THE PIECES. TRAIN FIRST SET Now we had all the pieces in place: checklists for experiments, a way for people to ask for help in code review and for a certified helper to sign off, and a way for people to write about experiments in a standard way. Now we just had to train our first set of experiment helpers.
  48. LEARN BY DOING -> APPRENTICESHIP Now we had all the pieces in place: checklists for experiments, a way for people to ask for help in code review and for a certified helper to sign off, and a way for people to ask questions about experiments outside code reviews as well. Now we just had to train first set of experiment helpers. We are strong believers in learning by doing. So we set up the experiment helper program as an apprenticeship.
  49. QUIZ + ON THE HOOK Sure, people could read the documentation. But it’s not until they were put on the spot that they’d really begin to develop a sense of what to do. We created a quiz for prospective experiment helpers to test their understanding and ability to detect common problems. Nothing fancy – ours was just a Google doc with an answer key at the end. And then when your week of the rotation came along, you were on the hook for every question that came into experiments-help and every code review. When you were exposed to the variety of experiments people ran and had to be the person who kept them going in the right direction, you learned quickly.
  50. And so we expanded from just me, to me and Dan and John.
  51. And from Dan and John to a small set of respected engineers on a variety of teams, who start to build up the culture of experiments within their own smaller organizations.
  52. 50 PEOPLE + SELF-PROPELLING: QUEUE, COMMUNITY, TEAMS And now we have 50 trained experiment helpers distributed across all the product engineering teams at the company. It’s become self-propelling: we have queues of folks waiting to train as helpers, folks jumping in to answer each other’s questions, and individual engineering teams honing their own team’s experiment processes. We add questions to the quiz as new problems arise, and we now have a small army of folks equipped with experimental understanding who can explain new changes to their teams and help our process continue to grow.
  53. REMOVE YOURSELF FROM THE LOOP BY TRAINING OTHERS. GROWING VOLUME OF EXPERIMENTS. So that’s stage four. Remove yourself from the loop by training others to take over your role. Get them to ask the hard questions, to help experiment owners avoid pitfalls and follow best practices. At this point, you’ve built a well-oiled, self-sustaining machine. The volume of experiments grows and grows. Problems that were rare when you started now crop up often enough that they’re really starting to get irritating, and so you start to think about what else you could invest in to simplify experiments and increase their likelihood of success.
  54. MANY ERRORS ARE HUMAN, BUT SIMPLE ONES (FLIP) ARE PREVENTABLE. LETS HUMANS FOCUS ON THINKING. A lot of the things that can go wrong with an experiment are human: you can’t automate them away. Is it worth running an experiment in the first place? Does your hypothesis make sense, given the feature you’re building? Have you thought about what will happen to users if you remove the treatment? How will we decide whether the experiment is a success? But as you step back from individually reviewing everyone’s experiments, you may start to notice patterns of where simple things are going wrong,(flip) and you have the opportunity to step back and try to eliminate the problems that can be solved by better tools and automation. By solving this set of problems, you allow humans to focus on the hard stuff: the thinking.
  55. MANY ERRORS ARE HUMAN, BUT SIMPLE ONES (FLIP) ARE PREVENTABLE. LETS HUMANS FOCUS ON THINKING. A lot of the things that can go wrong with an experiment are human: you can’t automate them away. Is it worth running an experiment in the first place? Does your hypothesis make sense, given the feature you’re building? Have you thought about what will happen to users if you remove the treatment? How will we decide whether the experiment is a success? But as you step back from individually reviewing everyone’s experiments, you may start to notice patterns of where simple things are going wrong, and you have the opportunity to step back and try to eliminate the problems that can be solved by better tools and automation. By solving this set of problems, you allow humans to focus on the hard stuff: the thinking.
  56. Some of the simple mistakes happen at launch. If you can remove all the implementation details, you allow the experiment helper to focus on the important questions of what the experiment is trying to measure.
  57. For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  58. For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  59. LAST ONE For us, that meant simplifying the experiment API, removing untriggered experiments, and creating helper functions for common user populations, like only experimenting on the latest app version.
  60. Other mistakes happen in-flight. By building tools to take care of those details, we allowed the experiment helper to pay attention instead to why the experiment was changing and how it would be measured.
  61. LAST ONE
  62. WRONG DECISION -> HURTS USERS, WRONG DIRECTION Perhaps the most worrisome set of mistakes happens when someone decides to land an experiment. If they make the wrong decision here, it could result not only in shipping a product that hurts users, but in shaping future product decisions based on erroneous learnings! So we invested especially heavily in helping people avoid mistakes in interpreting their experiment results.
  63. First off, an experiment will be invalid if the randomization produced groups that aren’t actually the same, so we built a number of tools to detect errors.
  64. (last error)
  65. Other mistakes resulted from people trying to do their own analysis on metrics: querying the data incorrectly, making comparisons that didn’t make sense, or just not thinking about statistical significance.
  66. So that’s stage five, where Pinterest currently finds itself. After stepping back from the day-to-day review of experiments, we built tools so that the experiment helpers can focus on the important part: deciding what to build and understanding how it affects our users.
  67. SUMMARIZE. NOT JUST ENGINEERING: PEOPLE. BUY-IN, TEACHING, HARDER TO SHOOT FOOT. We’ve built an experiment framework that allows us to track changes on all parts of our service, gotten it widespread adoption, built up a core set of 50 engineers who lead their teams in running experiments, and automated tools to make all of the aspects of the experiment lifecycle harder to screw up. At each stage, while engineering and statistical know-how were part of the equation, the real solution lay in building a culture of experimentation: getting the humans who make up the organization to buy into experiments, teaching them to help each other make decisions, and building tools that make it harder to shoot yourself in the foot.
  68. NEXT?? LESSONS BEYOND EXPERIMENTATION. I don’t know yet what the next stage will look like. If you do, I’d love to find out. But I think the lessons extend beyond just experimentation. (flip) Data science is not just engineering and statistics: your recommendation model will not be used unless you convince someone it’s useful, and your analysis will not change product strategy until it’s changed people’s minds.
  69. NOT JUST ENGINEERING AND STATS. CONVINCE PEOPLE. Data science is not just engineering and statistics: your recommender system will gather dust unless you convince someone it’s useful, and your analysis will not change product strategy until it’s changed people’s minds. Spending time actively investing in building a data-driven culture will pay off handsomely in the long run.