@gregdetre, gregdetre.co.uk
1-line AB tests in Django
23rd Feb, 2014
PyData, London
Greg Detre
@gregdetre
Sunday, 23 Febru...
@gregdetre, gregdetre.co.uk
INTRO
Sunday, 23 February 2014
GregDetre
Sunday, 23 February 2014
I'm Greg Detre
my PhD was on human memory & forgetting
Sunday, 23 February 2014
i spent my days scanning people’s brains
including my own
it turned out to be smaller than I’d ho...
Sunday, 23 February 2014
founded with Ed Cooke, grandmaster of memory, can remember a deck of cards in a
minute flat
set ou...
Sunday, 23 February 2014
helped build up their data science team
distil AB testing best practices for them
@gregdetre, gregdetre.co.uk
YOU
Sunday, 23 February 2014
Hands up if...
you’ve run an AB test
Sunday, 23 February 2014
Hands up if...
you’ve used Django
Sunday, 23 February 2014
WHAT IS AN
AB TEST?
Sunday, 23 February 2014
Sunday, 23 February 2014
When you release a change, you need to know whether you’ve
made a big step forward...
Or taken tw...
Sunday, 23 February 2014
When you release a change, you need to know whether you’ve
made a big step forward...
Or taken tw...
@gregdetre, gregdetre.co.uk
WHY RUN
AB TESTS?
Sunday, 23 February 2014
Sunday, 23 February 2014
AB testing for making decisions
Sunday, 23 February 2014
this has nothing to do with the talk
control for external factors
Sunday, 23 February 2014
If I’m a designer at The Guardian, and I change the font today.
Tomo...
improve your intuitions
Sunday, 23 February 2014
feedback loops, error-driven learning
PREFACE
Sunday, 23 February 2014
Sunday, 23 February 2014
yes, there are gotchas to AB testing
but the main problem in AB testing is that people don’t AB t...
CODE
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
I want to be able to do this
bucket = ab(user,
‘Expt 37 - red vs green buy button’,
[‘red’, ‘g...
@gregdetre, gregdetre.co.uk
Experiment model
class Experiment(Model):
name = CharField(max_length=100,
unique=True,
db_ind...
@gregdetre, gregdetre.co.uk
ExperimentUser model
class ExperimentUser(Model):
user = ForeignKey('auth.User',
related_name=...
@gregdetre, gregdetre.co.uk
Putting a user in a bucket
def ab(user, name, buckets):
expt = Experiment.objects.get_or_creat...
@gregdetre, gregdetre.co.uk
SQL for calculating retention
select
! d0.user,
! d0.dt as activity_date,
! 'd01'::text as ret...
Sunday, 23 February 2014
username visited
greg 20 Feb 2014
ed 20 Feb 2014
greg 21 Feb 2014
greg 22 Feb 2014
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
github.com/gregdetre/abracadjabra
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
PRO TIPS
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
do’s
Sunday, 23 February 2014
measure the right/high-level thing, so you can see if you're
mak...
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
do’s
Sunday, 23 Febru...
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
do’s
...
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus...
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus...
@gregdetre, gregdetre.co.uk
measure the right, high-level things ($, retention,
activation, sharing)
run on a subset
focus...
@gregdetre, gregdetre.co.uk
don’ts
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
Sunday, 23...
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
don’t keep...
@gregdetre, gregdetre.co.uk
don’ts
don’t get lost in the weeds
don’t expect your AB tests to succeed very often
don’t keep...
@gregdetre, gregdetre.co.uk
sanity checks
Sunday, 23 February 2014
e.g. if you make the site slower, how much does that hu...
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
Sunday, 23 February 2014
e.g. if you make th...
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
Sunday, 23 February 2014
e.g. if you make th...
@gregdetre, gregdetre.co.uk
sanity checks
AA test - should make no difference
does making things worse make things worse?
...
@gregdetre, gregdetre.co.uk
software is the easy bit
Sunday, 23 February 2014
culture
human intuition to generate hypothes...
@gregdetre, gregdetre.co.uk
WORKING
TOGETHER
Sunday, 23 February 2014
software
science
startups
gregdetre.co.uk
@gregdetre
Sunday, 23 February 2014
i’m moving back to London
happy to help if y...
@gregdetre, gregdetre.co.uk
THE END
Sunday, 23 February 2014
link to this
presentation
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
resources
Eric Ries,The one line split-test, or how to A/B all the time
http://www.startupless...
@gregdetre, gregdetre.co.uk
APPENDIX
Sunday, 23 February 2014
@gregdetre, gregdetre.co.uk
no peeking
DO NOT: peek at your results daily, and stop
when you see an improvement
see Miller...
Upcoming SlideShare
Loading in …5
×

1-Line AB Tests in Django by Greg Detre

592 views

Published on

1-Line AB Tests in Django by Greg Detre

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
592
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

1-Line AB Tests in Django by Greg Detre

  1. 1. @gregdetre, gregdetre.co.uk 1-line AB tests in Django 23rd Feb, 2014 PyData, London Greg Detre @gregdetre Sunday, 23 February 2014 i will show you how to write a 1-line AB test in Django. but itʼs only 1 line if you start sufficiently far to the left
  2. 2. @gregdetre, gregdetre.co.uk INTRO Sunday, 23 February 2014
  3. 3. GregDetre Sunday, 23 February 2014 I'm Greg Detre my PhD was on human memory & forgetting
  4. 4. Sunday, 23 February 2014 i spent my days scanning people’s brains including my own it turned out to be smaller than I’d hoped
  5. 5. Sunday, 23 February 2014 founded with Ed Cooke, grandmaster of memory, can remember a deck of cards in a minute flat set out to combine the art, and the science, of memory, to help people learn 10 times faster venture capital dance, millions of users did a lot of AB testing, built our own internal framework
  6. 6. Sunday, 23 February 2014 helped build up their data science team distil AB testing best practices for them
  7. 7. @gregdetre, gregdetre.co.uk YOU Sunday, 23 February 2014
  8. 8. Hands up if... you’ve run an AB test Sunday, 23 February 2014
  9. 9. Hands up if... you’ve used Django Sunday, 23 February 2014
  10. 10. WHAT IS AN AB TEST? Sunday, 23 February 2014
  11. 11. Sunday, 23 February 2014 When you release a change, you need to know whether you’ve made a big step forward... Or taken two steps back. The idea behind AB testing is very simple: - when you change something - show some people the old version - show some people the new version - look at which group are happiest i.e. it’s a scientific experiment on your product
  12. 12. Sunday, 23 February 2014 When you release a change, you need to know whether you’ve made a big step forward... Or taken two steps back. The idea behind AB testing is very simple: - when you change something - show some people the old version - show some people the new version - look at which group are happiest i.e. it’s a scientific experiment on your product
  13. 13. @gregdetre, gregdetre.co.uk WHY RUN AB TESTS? Sunday, 23 February 2014
  14. 14. Sunday, 23 February 2014 AB testing for making decisions
  15. 15. Sunday, 23 February 2014 this has nothing to do with the talk
  16. 16. control for external factors Sunday, 23 February 2014 If I’m a designer at The Guardian, and I change the font today. Tomorrow, traffic increases by 50%. Should I get a pay-rise? Not if the paper just published the NSA leaks this afternoon. By running old vs new simultaneously, you control for that surge in traffic. Both groups will show the boost, but you’re just looking at the difference between them.
  17. 17. improve your intuitions Sunday, 23 February 2014 feedback loops, error-driven learning
  18. 18. PREFACE Sunday, 23 February 2014
  19. 19. Sunday, 23 February 2014 yes, there are gotchas to AB testing but the main problem in AB testing is that people don’t AB test often enough
  20. 20. CODE Sunday, 23 February 2014
  21. 21. @gregdetre, gregdetre.co.uk I want to be able to do this bucket = ab(user, ‘Expt 37 - red vs green buy button’, [‘red’, ‘green’]) if bucket == ‘red’: # show a red button elif bucket == ‘green’: # show a green button else: raise Exception(...) Sunday, 23 February 2014
  22. 22. @gregdetre, gregdetre.co.uk Experiment model class Experiment(Model): name = CharField(max_length=100, unique=True, db_index=True) cre = DateTimeField(default=timezone.now, db_index=True) users = ManyToManyField('auth.User', through='ExperimentUser', related_name='experiments') Sunday, 23 February 2014
  23. 23. @gregdetre, gregdetre.co.uk ExperimentUser model class ExperimentUser(Model): user = ForeignKey('auth.User', related_name='exptusers') experiment = ForeignKey(Experiment, related_name='exptusers') bucket = CharField(max_length=100) cre = DateTimeField(default=timezone.now, editable=False) class Meta: unique_together = ('experiment', 'user',) Sunday, 23 February 2014 minimize FKs and indexes on ExperimentUser
  24. 24. @gregdetre, gregdetre.co.uk Putting a user in a bucket def ab(user, name, buckets): expt = Experiment.objects.get_or_create(name=name)[0] exptuser, cre = ExperimentUser.objects.get_or_create( experiment=expt, user=user) if created: exptuser.bucket = random.choice(buckets) exptuser.save() return exptuser.bucket Sunday, 23 February 2014 probably should be using default= in ExperimentUser get_or_create actually, why not ExperimentUser.objects.get_or_create(experiment__name=name)???
  25. 25. @gregdetre, gregdetre.co.uk SQL for calculating retention select ! d0.user, ! d0.dt as activity_date, ! 'd01'::text as retention_type, ! case when dXX.dt is not NULL then true else false end as user_returned from ! user_activity_per_day as d0 left join ! user_activity_per_day as dXX on ! d0.user = dXX.user ! and ! d0.dt + 1 = dXX.dt Sunday, 23 February 2014
  26. 26. Sunday, 23 February 2014
  27. 27. username visited greg 20 Feb 2014 ed 20 Feb 2014 greg 21 Feb 2014 greg 22 Feb 2014 Sunday, 23 February 2014
  28. 28. @gregdetre, gregdetre.co.uk github.com/gregdetre/abracadjabra Sunday, 23 February 2014
  29. 29. @gregdetre, gregdetre.co.uk PRO TIPS Sunday, 23 February 2014
  30. 30. @gregdetre, gregdetre.co.uk do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  31. 31. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  32. 32. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  33. 33. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  34. 34. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users make your prediction first do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  35. 35. @gregdetre, gregdetre.co.uk measure the right, high-level things ($, retention, activation, sharing) run on a subset focus the analysis on relevant users make your prediction first url for each expt (method, results) do’s Sunday, 23 February 2014 measure the right/high-level thing, so you can see if you're making things worse elsewhere/down the line e.g. eBay hurt their sale of books, but increased sale of cars
  36. 36. @gregdetre, gregdetre.co.uk don’ts Sunday, 23 February 2014
  37. 37. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds Sunday, 23 February 2014
  38. 38. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often Sunday, 23 February 2014
  39. 39. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often don’t keep checking the results Sunday, 23 February 2014
  40. 40. @gregdetre, gregdetre.co.uk don’ts don’t get lost in the weeds don’t expect your AB tests to succeed very often don’t keep checking the results Sunday, 23 February 2014
  41. 41. @gregdetre, gregdetre.co.uk sanity checks Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  42. 42. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  43. 43. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  44. 44. @gregdetre, gregdetre.co.uk sanity checks AA test - should make no difference does making things worse make things worse? Sunday, 23 February 2014 e.g. if you make the site slower, how much does that hurt you? prioritise dev efforts. or what if you get rid of components? or get rid of ads?
  45. 45. @gregdetre, gregdetre.co.uk software is the easy bit Sunday, 23 February 2014 culture human intuition to generate hypotheses vs being receptive to the results most AB tests are null results storing & sharing conclusions the big changes are the most important to test, but the hardest
  46. 46. @gregdetre, gregdetre.co.uk WORKING TOGETHER Sunday, 23 February 2014
  47. 47. software science startups gregdetre.co.uk @gregdetre Sunday, 23 February 2014 i’m moving back to London happy to help if you drop me a line. or you can hire me
  48. 48. @gregdetre, gregdetre.co.uk THE END Sunday, 23 February 2014
  49. 49. link to this presentation Sunday, 23 February 2014
  50. 50. @gregdetre, gregdetre.co.uk resources Eric Ries,The one line split-test, or how to A/B all the time http://www.startuplessonslearned.com/2008/09/one-line-split-test-or-how- to-ab-all.html Kohavi et al (2007), Practical Guide to Controlled Experiments on the Web: Listen toYour Customers not to the HiPPO http://exp-platform.com/Documents/GuideControlledExperiments.pdf Kohavi et al (2013), Online Controlled Experiments at Large Scale, KDD. http://www.exp-platform.com/Documents/ 2013%20controlledExperimentsAtScale.pdf Miller (2010), How not to run an AB test http://www.evanmiller.org/how-not-to-run-an-ab-test.html Sunday, 23 February 2014
  51. 51. @gregdetre, gregdetre.co.uk APPENDIX Sunday, 23 February 2014
  52. 52. @gregdetre, gregdetre.co.uk no peeking DO NOT: peek at your results daily, and stop when you see an improvement see Miller (2010) Sunday, 23 February 2014 - say you start with a 50% conversion rate - 2 buckets - and you decide to stop when 5% significance or after 150 observations - 26% chance of a false positive! this is the worst case scenario (running a significance test after every observation) but peeking to see if there’s a difference and stopping when there is inflates the chances of you seeing a spurious difference

×