Building an A/B Testing
Analytics System with R and
ShinyEmily Robinson
@robinson_es
About Me
➔ Data Scientist at DataCamp
➔ R user ~7 years
➔ Enjoy talking about:
◆ Building and finding data
science community
◆ Diversity in STEM
◆ R
Learn | datacamp.com/courses
What is A/B Testing?
Life B.D. (Before DataCamp)
➔ Worked on 60+
experiments with search
team
➔ 8+ year history of
experimentation
➔ 500+ experiments per year
Life B.D. (Before DataCamp)
➔ 5 data engineers working on the experimentation platform
➔ Over a thousand metrics computed for each experiment
➔ Fancy UI
From How Etsy Handles Peeking in A/B Testing by Callie McRee and Kelly Shen
First weeks at DataCamp
➔ No system for planning,
analyzing, or presenting
experiment results
➔ And no data engineers to
build it
4 Lessons
1. Build tools to save yourself time
Who here has had a “first this then that” question?
➔ Who tried X then did Y?
➔ What percent of people who did X then did Y?
➔ What was the last thing people did before doing Y?
➔ What are all the things people did after doing X?
Questions I might answer about an A/B test
➔ What percent of people in the treatment vs. control registered?
➔ What were the ad clicks that had a course start within 2 days?
Lengthy, repetitive code
➔ Lots of copying and
pasting
➔ Hard to switch between
types of funnels
And when you’re doing repetitive tasks ...
Package
Unfortunately …
Me and writing packages
Fortunately …
I had David Robinson Sorry, this David Robinson
Funneljoin package: github.com/datacamp/funneljoin
Structure
1. Table 1
2. Table 2
3. User column name(s)
4. Time column name(s)
5. Type of afterjoin
6. Type of join
Example: first-any
➔ What are all the courses people started after visiting the homepage
for the first time?
Example: first-firstafter
➔ What percent of people saw the pricing page and then subscribed?
Example: max-gap argument
➔ What percent of people saw the pricing page and then subscribed within four
days?
➔ Many funnel-types:
➔ Lastbefore-firstafter, any-any, first-any …
➔ Supports all types of dplyr joins:
➔ Inner, left, right, full, semi, and anti
➔ Works on remote tables
➔ Bug fixes, pull requests, feature requests welcome
➔ Try it yourself!
Funneljoin: github.com/datacamp/funneljoin
2. Everything that can go wrong, will
go wrong
Things that have happened …
➔ People are put in both control and treatment
➔ People in the experiment have no page views
➔ People have multiple experiment starts in the same group
➔ There aren’t the same number of people in control and treatment
➔ Experiment starts didn’t have cookies (so we couldn’t track user)
You need to check your assumptions
Initial solution
As a famous data scientist once said …
When you’ve run the same process three
times, make a dashboard
3. Build tools that empower others
Health Metrics Dashboard
* These are fake numbers
By metric view
* These are fake numbers
By metric view
* These are fake numbers
Individual experiments view
* These are fake numbers
Leveling up …
➔ Common request: What % increase can we detect in a 2 week test?
➔ Can I make a tool so people can answer this themselves without code?
➔ Delivering information -> discovering information
Impact calculator
Impact calculator
4. Make it easy to do the right thing
➔ Clarifies decision-making
➔ Can have additional
“guardrail” metrics that you
don’t want to negatively
impact
Best Practice 1: Have one key metric per experiment
Airtable Field
Best practice 2: Run your experiment for length you’re planned on
➔ Otherwise, you may quadruple your
false positive rate!
Show start and end date in dashboard
Conclusion
Recap
1. Build tools to save yourself time
2. Everything that can go wrong will go wrong
3. Build tools that empower others
4. Make it easy to do the right thing
Many thanks to …
➔ The growth and data science teams at DataCamp
➔ Anthony Baker & David Robinson, co-authors of funneljoin
➔ Analytics & Data Engineering team at Etsy
Thank you!
hookedondata.org
@robinson_es
github.com/datacamp/funneljoin

Building an A/B Testing Analytics System with R and Shiny