Successfully reported this slideshow.
Upcoming SlideShare
×

# Www tutorial

5,789 views

Published on

Online Experiments for Computational Social Science
WWW 15 Tutorial

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Www tutorial

1. 1. Eytan Bakshy Sean J. Taylor Facebook Core Data Science WWW 2015, Florence, Italy May 19, 2015 Online Experiments for Computational Social Science WWW 15 Tutorial
2. 2. Outline 1. Introduction and Causal Inference (30 minutes) 2. Planning Experiments (30 minutes)    <30 minute break>  Discussion + analysis exercise (15 minutes) 3. Designing and Implementing Experiments (45 minutes) 4. Analyzing Experiments (30 minutes)
3. 3. Everything we assume ▪ Minimum requirements ▪ Some basic knowledge of statistics ▪ The ability to follow code ▪ Necessary to understand 90% ▪ Intermediate knowledge of R and beginner knowledge of Python ▪ Necessary to understand 100% ▪ Advanced knowledge of R, intermediate Python, intermediate stats, design of experiments
4. 4. Don’t panic!
5. 5. Don’t panic! Buddy up! Group learning is good for you!
6. 6. Software requirements ▪ If you are the kind of person who wants to tinker with code yourself, here are the software requirements ▪ Section 1: None ▪ Section 2: IPython + R ▪ Part 3: IPython + PlanOut ▪ Part 4: Python + R ▪ Can’t install the software? Buddy up
7. 7. Follow along: http://bit.ly/ www15experiments
8. 8. Section 1: Introduction and Causal Inference
9. 9. Obligatory
10. 10. Associations (Xi,Yi) ▪ Health:(smoking,cancer) ▪ Sports:(runningtheball,winninggames) ▪ Education:(completesMOOCcourse,getsapromotion) ▪ SocialMedia:(likesapageonFacebook,buysaproduct)
11. 11. Possible Relationships ▪Pr(Yi |Xi)=Pr(Yi)  (independence) ▪Pr(Yi |Xi)≠Pr(Yi)  (dependence) Dependencebetweenvariablesisuseful,but interpretationofthisrelationshipcanoftenbetricky.
12. 12. Possible Causal Relationships Allthreecausalrelationshipsarepossiblewhen there’sadependencybetweenvariables. Xi Yi Xi Yi Xi Yi shouldn’t there be an arrow going both ways here? I also wonder if this is confusing because it doesn’t leave room for a mediating variable, X->M->Y.
13. 13. Correlation without Causation: Introducing Zi ▪Smoking causes cancer   (genetics) ▪Running the ball causes winning games   (having a lead) ▪Completing a MOOC causes a promotion  (self-motivation) ▪Liking a page on Facebook causes a person to buy a product  (brand loyalty) Xi Yi Ui ?
14. 14. Bigger Example
15. 15. Why Causal Inference? 1. Science:   Why did something happen? 2. Decisions:  What will happen if I change something?
16. 16. Causal Inference for Science 1. Associationsbetweentwo variablesarealwaysmore interestingwhenthey’re causal. 2. Understandinga phenomenonisdifferent frompredictingit. Explanatory Modeling Predictive Modeling model captures causal function model captures association model carefully constructed from theory models constructed from data retrospective forward-looking minimize bias minimize variance basic science applied science make better data make better features
17. 17. Two Kinds of Out-of-Sample People we Observe Similar People Similar People under different circumstances Machine Learning / Statistics Experiments
18. 18. social adnon-social ad Attribute Outcomes to Causes Social Inﬂuence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012.
19. 19. Attribute Outcomes to Causes Social Inﬂuence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012. 1 liking friend 0 friends shown 1 liking friend 1 friend shown
21. 21. News Feed (2011) News Feed (2014) Test complete alternatives
22. 22. News Feed (2011) News Feed (2014) Explore a design space
23. 23. A Social Science Problem Causalinferenceiseasyforthenaturalsciences: ▪ Manipulationiseasy! ▪ Molecules,cells,animals,plantsareexchangeable! Forpeople,there’salwayssomethingwemaynotbeobserving perfectly. ▪ Latenttraitsprovidealternativeexplanationsthatweoftencannot ruleout. ▪ Experimentsareoftentheonlywayofrulingthemout.
24. 24. Potential Outcomes Framework Yi(1) Yi(0) Di Effect Eytan 10 -5 15 Anna 0 5 -5 Gary -20 5 -25 Linda -10 10 -20 Edna -5 0 -5 Sean 10 5 5 Mean -2.14 2.86 -5 Yi(1) is the outcome under treatment Yi(0) is the outcome under control
25. 25. Fundamental Problem of Causal Inference Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean We only ever observe a unit in either treatment or control. Individual level effects are never deﬁned.
26. 26. Confounding Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? Here we assumed they selected Di to maximize their outcome. Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean 10 5 0.29 5
27. 27. Randomization Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? With randomly assigned Di, we get unbiased effect estimates. The difference in means is called the Average Treatment Effect Yi(1) Yi(0) Di Effect Eytan -5 0 ? Anna 0 1 ? Gary -20 1 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean -3.33 1.67 0.5 -5
28. 28. The Average Treatment Effect ▪ Duetothelinearityofexpectations,wecanseparatethe ATEintotwomeasurements. ▪ Inpractice,weestimatetheseexpectationsusingthe meansinourtreatmentandcontrolgroups. ATE = = E[Yi(1) Yi(0)] = E[Yi(1)] E[Yi(0)] [ATE = 1 N P i2T Yi(1) 1 M P i2C Yi(0)
29. 29. Uncertainty ▪ HowsureareweabouttheATEwemeasured? VariabilityinATEestimatescomesfrom: 1. variationinrandomassignment 2. variationinsubjects
30. 30. Variation due to Random Assignment Histogram of reps reps Frequency -25 -20 -15 -10 -5 0 5 10 0102030405060 For a given set of potential outcomes and a randomization, we may not always arrive at the right estimate, but it will not be biased. -what is this illustrating? sampling variation in subjects or in terms of permutation tests / randomization inference? -I wonder if its worth mentioning variability due to sampling diﬀerent subjects vs assignments?
31. 31. Standard Errors ▪ HowcanwequantifyuncertaintyaboutATEswemeasure? ▪ SEdecreaseswith√N ▪ SEissmallerwhenvariancesofpotentialoutcomesare smaller ▪ wantn0 andni tobesimilarifthevariancesarethesame ▪ wantmoreobservationsforhighervarianceconditions cSE([ATE) = q 1 n0+n1 q n1 n0 Var(Yi(0)) + n0 n1 Var(Yi(1))
32. 32. Confidence Intervals ▪ 95%confidenceintervalis1.96times ▪ SEsshrinkwiththesquarerootofthenumberof observations,sotodoubletheprecisionofyour experimentyouneedfourtimesthenumberofsubjects. cSE([ATE)
33. 33. Confidence Intervals vs p-values ▪ Ofteneasytogetstatisticalsignificancewithbigdata. Mostthingshaveeffects! ▪ Hardertogetbig,practicallysignificant,effects! ▪ CIsmakeuncertaintyaboutanestimatemorecredible. ▪ ShouldfavorreportingCIs.
34. 34. Section 2: Planning Experiments
35. 35. The Routine
36. 36. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ A specific hypothesis: ▪ Population ▪ Empirical context ▪ Treatment(s) ▪ Subgroups ▪ Outcomes
37. 37. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Use prior literature ▪ Use existing data ▪ Collect your own  observational data ▪ Small effects need big data
38. 38. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Follows from step 1 ▪ Identify: ▪ Constraints ▪ Threats to validity ▪ Ways to increase  precision
39. 39. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Simulate experiment with posited effects and design ▪ You should feel comfortable that you’ll find a clinically significant effect
40. 40. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Makes it easier to communicate your study ▪ Helps catch problems with plan ▪ Keeps you honest as a scientist
41. 41. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Collect pre-treatment data (if applicable) ▪ Implement ▪ Collect experiment data ▪ Log sane data
42. 42. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Wrangle data to get it into a format that you can analyze in R ▪ Apply appropriate statistical procedures ▪ Analysis should be easy!
43. 43. Walking through the routine for an experiment
44. 44. Step 1: Formulating research question  ▪ Prior research suggests individuals tend to avoid ideologically cross-cutting information ▪ Recent paper shows:  1-Pr(click|op)/Pr(click|same) ▪ Conservatives = -17% ▪ Liberals: -6% ▪ Is this causal? Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015 ● ● ● ● ● ● ● ● 20% 30% 40% 50% Random Potential from network Exposed Selected Percentcross−cuttingcontent Viewer affiliation ● ● Conservative Liberal
45. 45. Step 1: Hypothesis ▪ Hypothesis: Source cues have a causal effect on users’ willingness to consider reading summaries of content when given many choices ▪ Treatment: Randomly assigned sources induce selective exposure effects ▪ Context: Convenience sample of Turkers on AMT ▪ Outcomes: Choosing read an article summary Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
46. 46. Step 2: Effect size ▪ We empirically observe a risk ratio of -6% and -17% ▪ We might expect the effect to be in the ballpark of this, but smaller, e.g., 4-10% ▪ Caveats: ▪ Different population ▪ Different outcomes (processing summary, vs clicks) Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
47. 47. Step 3: Design experiment … …
48. 48. Next steps ▪ Section 2: Estimation+Power analysis: ▪ Power analysis (step 4) ▪ Write up analysis plan (step 5) ▪ Section 3: Collect data (step 3+6) ▪ Code up experiment yourself in PlanOut ▪ Data already collected :) ▪ Section 4: Analyze data according to plan (step 7)
49. 49. \$ ipython notebook 0-estimation-and-power.ipynb Estimation and Power analysis
50. 50. Section 3: Designing and Implementing Experiments with PlanOut
51. 51. PlanOut scripts are high-level descriptions of randomized parameterizations
52. 52. The PlanOut Idea ▪ Userexperiencesareparameterizedbyexperimental assignments ▪ PlanOutscriptsdescribeassignmentprocedures ▪ ExperimentsarePlanOutscriptsplusapopulation ▪ Parallelorfollow-onexperimentsarecentrallymanaged
53. 53. The PlanOut framework ▪ TwowaysofwritingPlanOutscripts ▪ UsingthePlanOutlanguage ▪ UsingnativeAPI(e.g.,Python) ▪ PlanOutismultiplatform ▪ Reference:Python ▪ Production:PHP,Java ▪ Alpha:JavaScript,Ruby,Go
54. 54. Sample PlanOut script button_color = uniformChoice( choices=["#ff0000", "#00ff00"], unit=userid); button_text = uniformChoice( choices=["I'm voting", "I'm a voter”], unit=userid); 2x2 factorial design
55. 55. Compiled PlanOut Code { "op": "seq", "seq": [ { "op": "set", "var": "button_color", "value": { "choices": { "op": "array", "values": [ "#ff0000", "#00ff00" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } }, { "op": "set", "var": "button_text", "value": { "choices": { "op": "array", "values": [ "I'm voting", "I'm a voter" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } } ]
56. 56. \$ ipython notebook 1-planout-intro.ipynb Using PlanOut
57. 57. \$ ipython notebook 2-making-your-own-data.ipynb Using PlanOut
58. 58. open: webapp/app.py Python Web Application Walkthrough
59. 59. open: webapp/extract_data.py Extracting Data from Logs
60. 60. Section 4: Analyzing Experimental Data
61. 61. Outline 1. Dependence(non-i.i.d.data) 2. Thebootstrap 3. Usingcovariates 4. Datareduction 5. “BigDataGuide” 6. ExampleinR
62. 62. http://www.tucsonsentinel.com/local/report/062712_az_students/state-natl-rankings-vary-widely-az-student-performance/ 100 items 100 users 100,000 impressions Does N = 100,000?
63. 63. Dependence ▪ Mostformulasforconfidenceintervalsassumethateach individualdatapointisindependentofalltheothers. ▪ Inpractice,weoftenhaverepeatedobservationsofusers orcontentitems. ▪ Ignoringthisfactininferencewilltendtomakeconfidence intervalsanti-conservative. SeeBakshyandEckles,“UncertaintyinOnlineExperiments withDependentData”(KDD2013)
64. 64. The Bootstrap confidence intervals
65. 65. I both like this and ﬁnd it to be confusing. The way I explain it is that the bootstrap simulates the process of drawing samples from a very large population. I think if you just relabeled the top as ‘Sampling from the whole population’ and ‘Sampling from your own data (bootstrap)’, this might make it clearer? I also wonder if you can use my benchmarking paper as illustrations for SEs vs N as a function of ICC and/or my plot that shows how the bootstrap
66. 66. Bootstrapping in Practice R1 All Your Data R2 … R500 Generate random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (e.g. the ATE) - take mean - CIs == 95% quantiles - SEs == standard deviation
67. 67. Using Covariates ▪ Withsimplerandomassignment,usingcovariatesisnot necessary. ▪ However,youcanimproveprecisionofATEestimatesif covariatesexplainalotofvariationinthepotential outcomes. ▪ CanbeaddedtoalinearmodelandSEsshoulddecreaseif theyarehelpful. ▪ Shouldalwaysatleastreportresultswithoutusing covariates.
68. 68. Data Reduction Subject Di Yi Evan 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Ema 0 0 Seamus 1 1 D Y=1 Y=0 0 2 1 1 1 2 N # treatments
69. 69. Data Reduction with Covariates Subject Xi Di Yi Evan M 0 1 Ashley F 0 1 Greg M 1 0 Leena F 1 0 Ema F 0 0 Seamus M 1 1 X D Y=1 Y=0 M 0 1 0 M 1 1 1 F 0 F 1 N # treatments X # groups Can analyze the reduced data using a weighted linear model.
70. 70. Data Reduction with Dependent Data Subject Di Yij Evan 1 1 Evan 1 0 Ashley 0 1 Ashley 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Leena 1 1 Ema 0 0 Seamus 1 1 Create bootstrap replicates R1 R2 R3 reduce the replicates as if they’re i.i.d. r1 r2 r3 s1 s2 s3 compute statistics on reduced data
71. 71. Experiment Analysis (i.i.d. data) Data Size Fits in memory < 2M rows Doesn’t ﬁt in memory Using Covariates linear regression data reduction + weighted linear models No Covariates t-test data reduction
72. 72. Experiment Analysis (dependent data) Data Size Fits in memory < 2M rows Doesn’t ﬁt in memory Using Covariates random effects models or bootstrap + linear models bootstrap + data reduction + weighted linear models No Covariates random effects models or bootstrap in R bootstrap + data reduction
73. 73. \$ ipython notebook 4-analyzing-experiments.ipynb Hands On Analyzing Experiments