Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Eytan Bakshy
Sean J. Taylor
Facebook Core Data Science
WWW 2015, Florence, Italy
May 19, 2015
Online Experiments for
Compu...
Outline
1. Introduction and Causal Inference (30 minutes)
2. Planning Experiments (30 minutes)



<30 minute break>

Discu...
Everything we assume
▪ Minimum requirements
▪ Some basic knowledge of statistics
▪ The ability to follow code
▪ Necessary ...
Don’t panic!
Don’t panic! Buddy up!
Group learning is good for you!
Software requirements
▪ If you are the kind of person who wants to tinker with
code yourself, here are the software requir...
Follow along:
http://bit.ly/
www15experiments
Section 1:
Introduction and Causal Inference
Obligatory
Associations
(Xi,Yi)
▪ Health:(smoking,cancer)
▪ Sports:(runningtheball,winninggames)
▪ Education:(completesMOOCcourse,get...
Possible Relationships
▪Pr(Yi |Xi)=Pr(Yi)

(independence)
▪Pr(Yi |Xi)≠Pr(Yi)

(dependence)
Dependencebetweenvariablesisuse...
Possible Causal Relationships
Allthreecausalrelationshipsarepossiblewhen
there’sadependencybetweenvariables.
Xi Yi
Xi Yi
X...
Correlation without Causation:
Introducing Zi ▪Smoking causes cancer 

(genetics)
▪Running the ball causes winning
games 
...
Bigger Example
Why Causal Inference?
1. Science: 

Why did something happen?
2. Decisions:

What will happen if I change something?
Causal Inference for Science
1. Associationsbetweentwo
variablesarealwaysmore
interestingwhenthey’re
causal.
2. Understand...
Two Kinds of Out-of-Sample
People we Observe Similar People
Similar People
under different
circumstances
Machine Learning ...
social adnon-social ad
Attribute Outcomes to Causes
Social Influence in Social Advertising: Evidence from Field Experiments...
Attribute Outcomes to Causes
Social Influence in Social Advertising: Evidence from Field Experiments.
Bakshy, Eckles, Yan, ...
Causal Inference for Decisions
▪ Health:Quitsmoking?Brushyourteeth?:)
▪ Sports:runtheballmore?
▪ Socialmedia:recruitmoreFa...
News Feed (2011) News Feed (2014)
Test complete alternatives
News Feed (2011) News Feed (2014)
Explore a design space
A Social Science Problem
Causalinferenceiseasyforthenaturalsciences:
▪ Manipulationiseasy!
▪ Molecules,cells,animals,plant...
Potential Outcomes Framework
Yi(1) Yi(0) Di Effect
Eytan 10 -5 15
Anna 0 5 -5
Gary -20 5 -25
Linda -10 10 -20
Edna -5 0 -5...
Fundamental Problem of Causal Inference
Yi(1) Yi(0) Di Effect
Eytan 10 1 ?
Anna 5 0 ?
Gary 5 0 ?
Linda 10 0 ?
Edna 0 0 ?
S...
Confounding
Yi(1) Yi(0) Di Effect
Eytan 10 1 ?
Anna 0 1 ?
Gary 5 0 ?
Linda -10 1 ?
Edna -5 1 ?
Sean 5 0 ?
Here we assumed ...
Randomization
Yi(1) Yi(0) Di Effect
Eytan 10 1 ?
Anna 0 1 ?
Gary 5 0 ?
Linda -10 1 ?
Edna -5 1 ?
Sean 5 0 ?
With randomly ...
The Average Treatment Effect
▪ Duetothelinearityofexpectations,wecanseparatethe
ATEintotwomeasurements.
▪ Inpractice,weest...
Uncertainty
▪ HowsureareweabouttheATEwemeasured?
VariabilityinATEestimatescomesfrom:
1. variationinrandomassignment
2. var...
Variation due to Random Assignment
Histogram of reps
reps
Frequency
-25 -20 -15 -10 -5 0 5 10
0102030405060
For a given se...
Standard Errors
▪ HowcanwequantifyuncertaintyaboutATEswemeasure?
▪ SEdecreaseswith√N
▪ SEissmallerwhenvariancesofpotential...
Confidence Intervals
▪ 95%confidenceintervalis1.96times
▪ SEsshrinkwiththesquarerootofthenumberof
observations,sotodoublet...
Confidence Intervals vs p-values
▪ Ofteneasytogetstatisticalsignificancewithbigdata.
Mostthingshaveeffects!
▪ Hardertogetb...
Section 2:
Planning Experiments
The Routine
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
The routine
▪ Step 1: Formulate a research hypothesis
▪ Step 2: State an expected effect size
▪ Step 3: Design your experi...
Walking through the routine for an
experiment
Step 1: Formulating research question

▪ Prior research suggests
individuals tend to avoid
ideologically cross-cutting
inf...
Step 1: Hypothesis
▪ Hypothesis: Source cues have a causal effect on users’
willingness to consider reading summaries of c...
Step 2: Effect size
▪ We empirically observe a risk ratio of -6% and -17%
▪ We might expect the effect to be in the ballpa...
Step 3: Design experiment
…
…
Next steps
▪ Section 2: Estimation+Power analysis:
▪ Power analysis (step 4)
▪ Write up analysis plan (step 5)
▪ Section 3...
$ ipython notebook 0-estimation-and-power.ipynb
Estimation and Power analysis
Section 3:
Designing and Implementing
Experiments with PlanOut
PlanOut scripts are
high-level descriptions
of randomized
parameterizations
The PlanOut Idea
▪ Userexperiencesareparameterizedbyexperimental
assignments
▪ PlanOutscriptsdescribeassignmentprocedures
...
The PlanOut framework
▪ TwowaysofwritingPlanOutscripts
▪ UsingthePlanOutlanguage
▪ UsingnativeAPI(e.g.,Python)
▪ PlanOutis...
Sample PlanOut script
button_color = uniformChoice(
choices=["#ff0000", "#00ff00"],
unit=userid);
button_text = uniformCho...
Compiled PlanOut Code
{
"op": "seq",
"seq": [
{
"op": "set",
"var": "button_color",
"value": {
"choices": {
"op": "array",...
$ ipython notebook 
1-planout-intro.ipynb
Using PlanOut
$ ipython notebook 
2-making-your-own-data.ipynb
Using PlanOut
open: webapp/app.py
Python Web Application Walkthrough
open: webapp/extract_data.py
Extracting Data from Logs
Section 4:
Analyzing Experimental Data
Outline
1. Dependence(non-i.i.d.data)
2. Thebootstrap
3. Usingcovariates
4. Datareduction
5. “BigDataGuide”
6. ExampleinR
http://www.tucsonsentinel.com/local/report/062712_az_students/state-natl-rankings-vary-widely-az-student-performance/
100 ...
Dependence
▪ Mostformulasforconfidenceintervalsassumethateach
individualdatapointisindependentofalltheothers.
▪ Inpractice...
The Bootstrap
confidence intervals
I both like this and find it to be
confusing. The way I explain it is that
the bootstrap simulates the process of
drawing s...
Bootstrapping in Practice
R1
All Your
Data
R2
…
R500
Generate random
sub-samples
s1
s2
s500
Compute statistics
or estimate...
Using Covariates
▪ Withsimplerandomassignment,usingcovariatesisnot
necessary.
▪ However,youcanimproveprecisionofATEestimat...
Data Reduction
Subject Di Yi
Evan 0 1
Ashley 0 1
Greg 1 0
Leena 1 0
Ema 0 0
Seamus 1 1
D Y=1 Y=0
0 2 1
1 1 2
N
# treatments
Data Reduction with Covariates
Subject Xi Di Yi
Evan M 0 1
Ashley F 0 1
Greg M 1 0
Leena F 1 0
Ema F 0 0
Seamus M 1 1
X D ...
Data Reduction with Dependent Data
Subject Di Yij
Evan 1 1
Evan 1 0
Ashley 0 1
Ashley 0 1
Ashley 0 1
Greg 1 0
Leena 1 0
Le...
Experiment Analysis (i.i.d. data)
Data Size
Fits in memory
< 2M rows
Doesn’t fit in memory
Using Covariates linear regressi...
Experiment Analysis (dependent data)
Data Size
Fits in memory
< 2M rows
Doesn’t fit in memory
Using Covariates
random effec...
$ ipython notebook 
4-analyzing-experiments.ipynb
Hands On Analyzing Experiments
(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
Upcoming SlideShare
Loading in …5
×

Www tutorial

5,789 views

Published on

Online Experiments for Computational Social Science
WWW 15 Tutorial

Published in: Science

Www tutorial

  1. 1. Eytan Bakshy Sean J. Taylor Facebook Core Data Science WWW 2015, Florence, Italy May 19, 2015 Online Experiments for Computational Social Science WWW 15 Tutorial
  2. 2. Outline 1. Introduction and Causal Inference (30 minutes) 2. Planning Experiments (30 minutes)
 
 <30 minute break>
 Discussion + analysis exercise (15 minutes) 3. Designing and Implementing Experiments (45 minutes) 4. Analyzing Experiments (30 minutes)
  3. 3. Everything we assume ▪ Minimum requirements ▪ Some basic knowledge of statistics ▪ The ability to follow code ▪ Necessary to understand 90% ▪ Intermediate knowledge of R and beginner knowledge of Python ▪ Necessary to understand 100% ▪ Advanced knowledge of R, intermediate Python, intermediate stats, design of experiments
  4. 4. Don’t panic!
  5. 5. Don’t panic! Buddy up! Group learning is good for you!
  6. 6. Software requirements ▪ If you are the kind of person who wants to tinker with code yourself, here are the software requirements ▪ Section 1: None ▪ Section 2: IPython + R ▪ Part 3: IPython + PlanOut ▪ Part 4: Python + R ▪ Can’t install the software? Buddy up
  7. 7. Follow along: http://bit.ly/ www15experiments
  8. 8. Section 1: Introduction and Causal Inference
  9. 9. Obligatory
  10. 10. Associations (Xi,Yi) ▪ Health:(smoking,cancer) ▪ Sports:(runningtheball,winninggames) ▪ Education:(completesMOOCcourse,getsapromotion) ▪ SocialMedia:(likesapageonFacebook,buysaproduct)
  11. 11. Possible Relationships ▪Pr(Yi |Xi)=Pr(Yi)
 (independence) ▪Pr(Yi |Xi)≠Pr(Yi)
 (dependence) Dependencebetweenvariablesisuseful,but interpretationofthisrelationshipcanoftenbetricky.
  12. 12. Possible Causal Relationships Allthreecausalrelationshipsarepossiblewhen there’sadependencybetweenvariables. Xi Yi Xi Yi Xi Yi shouldn’t there be an arrow going both ways here? I also wonder if this is confusing because it doesn’t leave room for a mediating variable, X->M->Y.
  13. 13. Correlation without Causation: Introducing Zi ▪Smoking causes cancer 
 (genetics) ▪Running the ball causes winning games 
 (having a lead) ▪Completing a MOOC causes a promotion
 (self-motivation) ▪Liking a page on Facebook causes a person to buy a product
 (brand loyalty) Xi Yi Ui ?
  14. 14. Bigger Example
  15. 15. Why Causal Inference? 1. Science: 
 Why did something happen? 2. Decisions:
 What will happen if I change something?
  16. 16. Causal Inference for Science 1. Associationsbetweentwo variablesarealwaysmore interestingwhenthey’re causal. 2. Understandinga phenomenonisdifferent frompredictingit. Explanatory Modeling Predictive Modeling model captures causal function model captures association model carefully constructed from theory models constructed from data retrospective forward-looking minimize bias minimize variance basic science applied science make better data make better features
  17. 17. Two Kinds of Out-of-Sample People we Observe Similar People Similar People under different circumstances Machine Learning / Statistics Experiments
  18. 18. social adnon-social ad Attribute Outcomes to Causes Social Influence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012.
  19. 19. Attribute Outcomes to Causes Social Influence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012. 1 liking friend 0 friends shown 1 liking friend 1 friend shown
  20. 20. Causal Inference for Decisions ▪ Health:Quitsmoking?Brushyourteeth?:) ▪ Sports:runtheballmore? ▪ Socialmedia:recruitmoreFacebookfansformypage? ▪ Advertising:purchaseads? ▪ Education:investtimeincompletingaMOOC?
  21. 21. News Feed (2011) News Feed (2014) Test complete alternatives
  22. 22. News Feed (2011) News Feed (2014) Explore a design space
  23. 23. A Social Science Problem Causalinferenceiseasyforthenaturalsciences: ▪ Manipulationiseasy! ▪ Molecules,cells,animals,plantsareexchangeable! Forpeople,there’salwayssomethingwemaynotbeobserving perfectly. ▪ Latenttraitsprovidealternativeexplanationsthatweoftencannot ruleout. ▪ Experimentsareoftentheonlywayofrulingthemout.
  24. 24. Potential Outcomes Framework Yi(1) Yi(0) Di Effect Eytan 10 -5 15 Anna 0 5 -5 Gary -20 5 -25 Linda -10 10 -20 Edna -5 0 -5 Sean 10 5 5 Mean -2.14 2.86 -5 Yi(1) is the outcome under treatment Yi(0) is the outcome under control
  25. 25. Fundamental Problem of Causal Inference Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean We only ever observe a unit in either treatment or control. Individual level effects are never defined.
  26. 26. Confounding Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? Here we assumed they selected Di to maximize their outcome. Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean 10 5 0.29 5
  27. 27. Randomization Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? With randomly assigned Di, we get unbiased effect estimates. The difference in means is called the Average Treatment Effect Yi(1) Yi(0) Di Effect Eytan -5 0 ? Anna 0 1 ? Gary -20 1 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean -3.33 1.67 0.5 -5
  28. 28. The Average Treatment Effect ▪ Duetothelinearityofexpectations,wecanseparatethe ATEintotwomeasurements. ▪ Inpractice,weestimatetheseexpectationsusingthe meansinourtreatmentandcontrolgroups. ATE = = E[Yi(1) Yi(0)] = E[Yi(1)] E[Yi(0)] [ATE = 1 N P i2T Yi(1) 1 M P i2C Yi(0)
  29. 29. Uncertainty ▪ HowsureareweabouttheATEwemeasured? VariabilityinATEestimatescomesfrom: 1. variationinrandomassignment 2. variationinsubjects
  30. 30. Variation due to Random Assignment Histogram of reps reps Frequency -25 -20 -15 -10 -5 0 5 10 0102030405060 For a given set of potential outcomes and a randomization, we may not always arrive at the right estimate, but it will not be biased. -what is this illustrating? sampling variation in subjects or in terms of permutation tests / randomization inference? -I wonder if its worth mentioning variability due to sampling different subjects vs assignments?
  31. 31. Standard Errors ▪ HowcanwequantifyuncertaintyaboutATEswemeasure? ▪ SEdecreaseswith√N ▪ SEissmallerwhenvariancesofpotentialoutcomesare smaller ▪ wantn0 andni tobesimilarifthevariancesarethesame ▪ wantmoreobservationsforhighervarianceconditions cSE([ATE) = q 1 n0+n1 q n1 n0 Var(Yi(0)) + n0 n1 Var(Yi(1))
  32. 32. Confidence Intervals ▪ 95%confidenceintervalis1.96times ▪ SEsshrinkwiththesquarerootofthenumberof observations,sotodoubletheprecisionofyour experimentyouneedfourtimesthenumberofsubjects. cSE([ATE)
  33. 33. Confidence Intervals vs p-values ▪ Ofteneasytogetstatisticalsignificancewithbigdata. Mostthingshaveeffects! ▪ Hardertogetbig,practicallysignificant,effects! ▪ CIsmakeuncertaintyaboutanestimatemorecredible. ▪ ShouldfavorreportingCIs.
  34. 34. Section 2: Planning Experiments
  35. 35. The Routine
  36. 36. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ A specific hypothesis: ▪ Population ▪ Empirical context ▪ Treatment(s) ▪ Subgroups ▪ Outcomes
  37. 37. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Use prior literature ▪ Use existing data ▪ Collect your own
 observational data ▪ Small effects need big data
  38. 38. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Follows from step 1 ▪ Identify: ▪ Constraints ▪ Threats to validity ▪ Ways to increase
 precision
  39. 39. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Simulate experiment with posited effects and design ▪ You should feel comfortable that you’ll find a clinically significant effect
  40. 40. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Makes it easier to communicate your study ▪ Helps catch problems with plan ▪ Keeps you honest as a scientist
  41. 41. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Collect pre-treatment data (if applicable) ▪ Implement ▪ Collect experiment data ▪ Log sane data
  42. 42. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Wrangle data to get it into a format that you can analyze in R ▪ Apply appropriate statistical procedures ▪ Analysis should be easy!
  43. 43. Walking through the routine for an experiment
  44. 44. Step 1: Formulating research question
 ▪ Prior research suggests individuals tend to avoid ideologically cross-cutting information ▪ Recent paper shows:
 1-Pr(click|op)/Pr(click|same) ▪ Conservatives = -17% ▪ Liberals: -6% ▪ Is this causal? Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015 ● ● ● ● ● ● ● ● 20% 30% 40% 50% Random Potential from network Exposed Selected Percentcross−cuttingcontent Viewer affiliation ● ● Conservative Liberal
  45. 45. Step 1: Hypothesis ▪ Hypothesis: Source cues have a causal effect on users’ willingness to consider reading summaries of content when given many choices ▪ Treatment: Randomly assigned sources induce selective exposure effects ▪ Context: Convenience sample of Turkers on AMT ▪ Outcomes: Choosing read an article summary Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
  46. 46. Step 2: Effect size ▪ We empirically observe a risk ratio of -6% and -17% ▪ We might expect the effect to be in the ballpark of this, but smaller, e.g., 4-10% ▪ Caveats: ▪ Different population ▪ Different outcomes (processing summary, vs clicks) Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
  47. 47. Step 3: Design experiment … …
  48. 48. Next steps ▪ Section 2: Estimation+Power analysis: ▪ Power analysis (step 4) ▪ Write up analysis plan (step 5) ▪ Section 3: Collect data (step 3+6) ▪ Code up experiment yourself in PlanOut ▪ Data already collected :) ▪ Section 4: Analyze data according to plan (step 7)
  49. 49. $ ipython notebook 0-estimation-and-power.ipynb Estimation and Power analysis
  50. 50. Section 3: Designing and Implementing Experiments with PlanOut
  51. 51. PlanOut scripts are high-level descriptions of randomized parameterizations
  52. 52. The PlanOut Idea ▪ Userexperiencesareparameterizedbyexperimental assignments ▪ PlanOutscriptsdescribeassignmentprocedures ▪ ExperimentsarePlanOutscriptsplusapopulation ▪ Parallelorfollow-onexperimentsarecentrallymanaged
  53. 53. The PlanOut framework ▪ TwowaysofwritingPlanOutscripts ▪ UsingthePlanOutlanguage ▪ UsingnativeAPI(e.g.,Python) ▪ PlanOutismultiplatform ▪ Reference:Python ▪ Production:PHP,Java ▪ Alpha:JavaScript,Ruby,Go
  54. 54. Sample PlanOut script button_color = uniformChoice( choices=["#ff0000", "#00ff00"], unit=userid); button_text = uniformChoice( choices=["I'm voting", "I'm a voter”], unit=userid); 2x2 factorial design
  55. 55. Compiled PlanOut Code { "op": "seq", "seq": [ { "op": "set", "var": "button_color", "value": { "choices": { "op": "array", "values": [ "#ff0000", "#00ff00" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } }, { "op": "set", "var": "button_text", "value": { "choices": { "op": "array", "values": [ "I'm voting", "I'm a voter" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } } ]
  56. 56. $ ipython notebook 1-planout-intro.ipynb Using PlanOut
  57. 57. $ ipython notebook 2-making-your-own-data.ipynb Using PlanOut
  58. 58. open: webapp/app.py Python Web Application Walkthrough
  59. 59. open: webapp/extract_data.py Extracting Data from Logs
  60. 60. Section 4: Analyzing Experimental Data
  61. 61. Outline 1. Dependence(non-i.i.d.data) 2. Thebootstrap 3. Usingcovariates 4. Datareduction 5. “BigDataGuide” 6. ExampleinR
  62. 62. http://www.tucsonsentinel.com/local/report/062712_az_students/state-natl-rankings-vary-widely-az-student-performance/ 100 items 100 users 100,000 impressions Does N = 100,000?
  63. 63. Dependence ▪ Mostformulasforconfidenceintervalsassumethateach individualdatapointisindependentofalltheothers. ▪ Inpractice,weoftenhaverepeatedobservationsofusers orcontentitems. ▪ Ignoringthisfactininferencewilltendtomakeconfidence intervalsanti-conservative. SeeBakshyandEckles,“UncertaintyinOnlineExperiments withDependentData”(KDD2013)
  64. 64. The Bootstrap confidence intervals
  65. 65. I both like this and find it to be confusing. The way I explain it is that the bootstrap simulates the process of drawing samples from a very large population. I think if you just relabeled the top as ‘Sampling from the whole population’ and ‘Sampling from your own data (bootstrap)’, this might make it clearer? I also wonder if you can use my benchmarking paper as illustrations for SEs vs N as a function of ICC and/or my plot that shows how the bootstrap
  66. 66. Bootstrapping in Practice R1 All Your Data R2 … R500 Generate random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (e.g. the ATE) - take mean - CIs == 95% quantiles - SEs == standard deviation
  67. 67. Using Covariates ▪ Withsimplerandomassignment,usingcovariatesisnot necessary. ▪ However,youcanimproveprecisionofATEestimatesif covariatesexplainalotofvariationinthepotential outcomes. ▪ CanbeaddedtoalinearmodelandSEsshoulddecreaseif theyarehelpful. ▪ Shouldalwaysatleastreportresultswithoutusing covariates.
  68. 68. Data Reduction Subject Di Yi Evan 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Ema 0 0 Seamus 1 1 D Y=1 Y=0 0 2 1 1 1 2 N # treatments
  69. 69. Data Reduction with Covariates Subject Xi Di Yi Evan M 0 1 Ashley F 0 1 Greg M 1 0 Leena F 1 0 Ema F 0 0 Seamus M 1 1 X D Y=1 Y=0 M 0 1 0 M 1 1 1 F 0 F 1 N # treatments X # groups Can analyze the reduced data using a weighted linear model.
  70. 70. Data Reduction with Dependent Data Subject Di Yij Evan 1 1 Evan 1 0 Ashley 0 1 Ashley 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Leena 1 1 Ema 0 0 Seamus 1 1 Create bootstrap replicates R1 R2 R3 reduce the replicates as if they’re i.i.d. r1 r2 r3 s1 s2 s3 compute statistics on reduced data
  71. 71. Experiment Analysis (i.i.d. data) Data Size Fits in memory < 2M rows Doesn’t fit in memory Using Covariates linear regression data reduction + weighted linear models No Covariates t-test data reduction
  72. 72. Experiment Analysis (dependent data) Data Size Fits in memory < 2M rows Doesn’t fit in memory Using Covariates random effects models or bootstrap + linear models bootstrap + data reduction + weighted linear models No Covariates random effects models or bootstrap in R bootstrap + data reduction
  73. 73. $ ipython notebook 4-analyzing-experiments.ipynb Hands On Analyzing Experiments
  74. 74. (c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

×