Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

5,789 views

Published on

Online Experiments for Computational Social Science

WWW 15 Tutorial

Published in:
Science

No Downloads

Total views

5,789

On SlideShare

0

From Embeds

0

Number of Embeds

4,371

Shares

0

Downloads

57

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Eytan Bakshy Sean J. Taylor Facebook Core Data Science WWW 2015, Florence, Italy May 19, 2015 Online Experiments for Computational Social Science WWW 15 Tutorial
- 2. Outline 1. Introduction and Causal Inference (30 minutes) 2. Planning Experiments (30 minutes) <30 minute break> Discussion + analysis exercise (15 minutes) 3. Designing and Implementing Experiments (45 minutes) 4. Analyzing Experiments (30 minutes)
- 3. Everything we assume ▪ Minimum requirements ▪ Some basic knowledge of statistics ▪ The ability to follow code ▪ Necessary to understand 90% ▪ Intermediate knowledge of R and beginner knowledge of Python ▪ Necessary to understand 100% ▪ Advanced knowledge of R, intermediate Python, intermediate stats, design of experiments
- 4. Don’t panic!
- 5. Don’t panic! Buddy up! Group learning is good for you!
- 6. Software requirements ▪ If you are the kind of person who wants to tinker with code yourself, here are the software requirements ▪ Section 1: None ▪ Section 2: IPython + R ▪ Part 3: IPython + PlanOut ▪ Part 4: Python + R ▪ Can’t install the software? Buddy up
- 7. Follow along: http://bit.ly/ www15experiments
- 8. Section 1: Introduction and Causal Inference
- 9. Obligatory
- 10. Associations (Xi,Yi) ▪ Health:(smoking,cancer) ▪ Sports:(runningtheball,winninggames) ▪ Education:(completesMOOCcourse,getsapromotion) ▪ SocialMedia:(likesapageonFacebook,buysaproduct)
- 11. Possible Relationships ▪Pr(Yi |Xi)=Pr(Yi) (independence) ▪Pr(Yi |Xi)≠Pr(Yi) (dependence) Dependencebetweenvariablesisuseful,but interpretationofthisrelationshipcanoftenbetricky.
- 12. Possible Causal Relationships Allthreecausalrelationshipsarepossiblewhen there’sadependencybetweenvariables. Xi Yi Xi Yi Xi Yi shouldn’t there be an arrow going both ways here? I also wonder if this is confusing because it doesn’t leave room for a mediating variable, X->M->Y.
- 13. Correlation without Causation: Introducing Zi ▪Smoking causes cancer (genetics) ▪Running the ball causes winning games (having a lead) ▪Completing a MOOC causes a promotion (self-motivation) ▪Liking a page on Facebook causes a person to buy a product (brand loyalty) Xi Yi Ui ?
- 14. Bigger Example
- 15. Why Causal Inference? 1. Science: Why did something happen? 2. Decisions: What will happen if I change something?
- 16. Causal Inference for Science 1. Associationsbetweentwo variablesarealwaysmore interestingwhenthey’re causal. 2. Understandinga phenomenonisdifferent frompredictingit. Explanatory Modeling Predictive Modeling model captures causal function model captures association model carefully constructed from theory models constructed from data retrospective forward-looking minimize bias minimize variance basic science applied science make better data make better features
- 17. Two Kinds of Out-of-Sample People we Observe Similar People Similar People under different circumstances Machine Learning / Statistics Experiments
- 18. social adnon-social ad Attribute Outcomes to Causes Social Inﬂuence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012.
- 19. Attribute Outcomes to Causes Social Inﬂuence in Social Advertising: Evidence from Field Experiments. Bakshy, Eckles, Yan, Rosenn. EC 2012. 1 liking friend 0 friends shown 1 liking friend 1 friend shown
- 20. Causal Inference for Decisions ▪ Health:Quitsmoking?Brushyourteeth?:) ▪ Sports:runtheballmore? ▪ Socialmedia:recruitmoreFacebookfansformypage? ▪ Advertising:purchaseads? ▪ Education:investtimeincompletingaMOOC?
- 21. News Feed (2011) News Feed (2014) Test complete alternatives
- 22. News Feed (2011) News Feed (2014) Explore a design space
- 23. A Social Science Problem Causalinferenceiseasyforthenaturalsciences: ▪ Manipulationiseasy! ▪ Molecules,cells,animals,plantsareexchangeable! Forpeople,there’salwayssomethingwemaynotbeobserving perfectly. ▪ Latenttraitsprovidealternativeexplanationsthatweoftencannot ruleout. ▪ Experimentsareoftentheonlywayofrulingthemout.
- 24. Potential Outcomes Framework Yi(1) Yi(0) Di Effect Eytan 10 -5 15 Anna 0 5 -5 Gary -20 5 -25 Linda -10 10 -20 Edna -5 0 -5 Sean 10 5 5 Mean -2.14 2.86 -5 Yi(1) is the outcome under treatment Yi(0) is the outcome under control
- 25. Fundamental Problem of Causal Inference Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean We only ever observe a unit in either treatment or control. Individual level effects are never deﬁned.
- 26. Confounding Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? Here we assumed they selected Di to maximize their outcome. Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 5 0 ? Gary 5 0 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean 10 5 0.29 5
- 27. Randomization Yi(1) Yi(0) Di Effect Eytan 10 1 ? Anna 0 1 ? Gary 5 0 ? Linda -10 1 ? Edna -5 1 ? Sean 5 0 ? With randomly assigned Di, we get unbiased effect estimates. The difference in means is called the Average Treatment Effect Yi(1) Yi(0) Di Effect Eytan -5 0 ? Anna 0 1 ? Gary -20 1 ? Linda 10 0 ? Edna 0 0 ? Sean 10 1 ? Mean -3.33 1.67 0.5 -5
- 28. The Average Treatment Effect ▪ Duetothelinearityofexpectations,wecanseparatethe ATEintotwomeasurements. ▪ Inpractice,weestimatetheseexpectationsusingthe meansinourtreatmentandcontrolgroups. ATE = = E[Yi(1) Yi(0)] = E[Yi(1)] E[Yi(0)] [ATE = 1 N P i2T Yi(1) 1 M P i2C Yi(0)
- 29. Uncertainty ▪ HowsureareweabouttheATEwemeasured? VariabilityinATEestimatescomesfrom: 1. variationinrandomassignment 2. variationinsubjects
- 30. Variation due to Random Assignment Histogram of reps reps Frequency -25 -20 -15 -10 -5 0 5 10 0102030405060 For a given set of potential outcomes and a randomization, we may not always arrive at the right estimate, but it will not be biased. -what is this illustrating? sampling variation in subjects or in terms of permutation tests / randomization inference? -I wonder if its worth mentioning variability due to sampling diﬀerent subjects vs assignments?
- 31. Standard Errors ▪ HowcanwequantifyuncertaintyaboutATEswemeasure? ▪ SEdecreaseswith√N ▪ SEissmallerwhenvariancesofpotentialoutcomesare smaller ▪ wantn0 andni tobesimilarifthevariancesarethesame ▪ wantmoreobservationsforhighervarianceconditions cSE([ATE) = q 1 n0+n1 q n1 n0 Var(Yi(0)) + n0 n1 Var(Yi(1))
- 32. Confidence Intervals ▪ 95%confidenceintervalis1.96times ▪ SEsshrinkwiththesquarerootofthenumberof observations,sotodoubletheprecisionofyour experimentyouneedfourtimesthenumberofsubjects. cSE([ATE)
- 33. Confidence Intervals vs p-values ▪ Ofteneasytogetstatisticalsignificancewithbigdata. Mostthingshaveeffects! ▪ Hardertogetbig,practicallysignificant,effects! ▪ CIsmakeuncertaintyaboutanestimatemorecredible. ▪ ShouldfavorreportingCIs.
- 34. Section 2: Planning Experiments
- 35. The Routine
- 36. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ A specific hypothesis: ▪ Population ▪ Empirical context ▪ Treatment(s) ▪ Subgroups ▪ Outcomes
- 37. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Use prior literature ▪ Use existing data ▪ Collect your own observational data ▪ Small effects need big data
- 38. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Follows from step 1 ▪ Identify: ▪ Constraints ▪ Threats to validity ▪ Ways to increase precision
- 39. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Simulate experiment with posited effects and design ▪ You should feel comfortable that you’ll find a clinically significant effect
- 40. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Makes it easier to communicate your study ▪ Helps catch problems with plan ▪ Keeps you honest as a scientist
- 41. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Collect pre-treatment data (if applicable) ▪ Implement ▪ Collect experiment data ▪ Log sane data
- 42. The routine ▪ Step 1: Formulate a research hypothesis ▪ Step 2: State an expected effect size ▪ Step 3: Design your experiment ▪ Step 4: Power analysis ▪ Step 5: Write up analysis plan ▪ Step 6: Collect data ▪ Step 7: Analyze data according to plan ▪ Wrangle data to get it into a format that you can analyze in R ▪ Apply appropriate statistical procedures ▪ Analysis should be easy!
- 43. Walking through the routine for an experiment
- 44. Step 1: Formulating research question ▪ Prior research suggests individuals tend to avoid ideologically cross-cutting information ▪ Recent paper shows: 1-Pr(click|op)/Pr(click|same) ▪ Conservatives = -17% ▪ Liberals: -6% ▪ Is this causal? Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015 ● ● ● ● ● ● ● ● 20% 30% 40% 50% Random Potential from network Exposed Selected Percentcross−cuttingcontent Viewer affiliation ● ● Conservative Liberal
- 45. Step 1: Hypothesis ▪ Hypothesis: Source cues have a causal effect on users’ willingness to consider reading summaries of content when given many choices ▪ Treatment: Randomly assigned sources induce selective exposure effects ▪ Context: Convenience sample of Turkers on AMT ▪ Outcomes: Choosing read an article summary Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
- 46. Step 2: Effect size ▪ We empirically observe a risk ratio of -6% and -17% ▪ We might expect the effect to be in the ballpark of this, but smaller, e.g., 4-10% ▪ Caveats: ▪ Different population ▪ Different outcomes (processing summary, vs clicks) Exposure to Ideologically Diverse News and Opinion on Facebook. Bakshy, Messing, Adamic. Science. 2015
- 47. Step 3: Design experiment … …
- 48. Next steps ▪ Section 2: Estimation+Power analysis: ▪ Power analysis (step 4) ▪ Write up analysis plan (step 5) ▪ Section 3: Collect data (step 3+6) ▪ Code up experiment yourself in PlanOut ▪ Data already collected :) ▪ Section 4: Analyze data according to plan (step 7)
- 49. $ ipython notebook 0-estimation-and-power.ipynb Estimation and Power analysis
- 50. Section 3: Designing and Implementing Experiments with PlanOut
- 51. PlanOut scripts are high-level descriptions of randomized parameterizations
- 52. The PlanOut Idea ▪ Userexperiencesareparameterizedbyexperimental assignments ▪ PlanOutscriptsdescribeassignmentprocedures ▪ ExperimentsarePlanOutscriptsplusapopulation ▪ Parallelorfollow-onexperimentsarecentrallymanaged
- 53. The PlanOut framework ▪ TwowaysofwritingPlanOutscripts ▪ UsingthePlanOutlanguage ▪ UsingnativeAPI(e.g.,Python) ▪ PlanOutismultiplatform ▪ Reference:Python ▪ Production:PHP,Java ▪ Alpha:JavaScript,Ruby,Go
- 54. Sample PlanOut script button_color = uniformChoice( choices=["#ff0000", "#00ff00"], unit=userid); button_text = uniformChoice( choices=["I'm voting", "I'm a voter”], unit=userid); 2x2 factorial design
- 55. Compiled PlanOut Code { "op": "seq", "seq": [ { "op": "set", "var": "button_color", "value": { "choices": { "op": "array", "values": [ "#ff0000", "#00ff00" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } }, { "op": "set", "var": "button_text", "value": { "choices": { "op": "array", "values": [ "I'm voting", "I'm a voter" ] }, "unit": { "op": "get", "var": "userid" }, "op": "uniformChoice" } } ]
- 56. $ ipython notebook 1-planout-intro.ipynb Using PlanOut
- 57. $ ipython notebook 2-making-your-own-data.ipynb Using PlanOut
- 58. open: webapp/app.py Python Web Application Walkthrough
- 59. open: webapp/extract_data.py Extracting Data from Logs
- 60. Section 4: Analyzing Experimental Data
- 61. Outline 1. Dependence(non-i.i.d.data) 2. Thebootstrap 3. Usingcovariates 4. Datareduction 5. “BigDataGuide” 6. ExampleinR
- 62. http://www.tucsonsentinel.com/local/report/062712_az_students/state-natl-rankings-vary-widely-az-student-performance/ 100 items 100 users 100,000 impressions Does N = 100,000?
- 63. Dependence ▪ Mostformulasforconfidenceintervalsassumethateach individualdatapointisindependentofalltheothers. ▪ Inpractice,weoftenhaverepeatedobservationsofusers orcontentitems. ▪ Ignoringthisfactininferencewilltendtomakeconfidence intervalsanti-conservative. SeeBakshyandEckles,“UncertaintyinOnlineExperiments withDependentData”(KDD2013)
- 64. The Bootstrap confidence intervals
- 65. I both like this and ﬁnd it to be confusing. The way I explain it is that the bootstrap simulates the process of drawing samples from a very large population. I think if you just relabeled the top as ‘Sampling from the whole population’ and ‘Sampling from your own data (bootstrap)’, this might make it clearer? I also wonder if you can use my benchmarking paper as illustrations for SEs vs N as a function of ICC and/or my plot that shows how the bootstrap
- 66. Bootstrapping in Practice R1 All Your Data R2 … R500 Generate random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (e.g. the ATE) - take mean - CIs == 95% quantiles - SEs == standard deviation
- 67. Using Covariates ▪ Withsimplerandomassignment,usingcovariatesisnot necessary. ▪ However,youcanimproveprecisionofATEestimatesif covariatesexplainalotofvariationinthepotential outcomes. ▪ CanbeaddedtoalinearmodelandSEsshoulddecreaseif theyarehelpful. ▪ Shouldalwaysatleastreportresultswithoutusing covariates.
- 68. Data Reduction Subject Di Yi Evan 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Ema 0 0 Seamus 1 1 D Y=1 Y=0 0 2 1 1 1 2 N # treatments
- 69. Data Reduction with Covariates Subject Xi Di Yi Evan M 0 1 Ashley F 0 1 Greg M 1 0 Leena F 1 0 Ema F 0 0 Seamus M 1 1 X D Y=1 Y=0 M 0 1 0 M 1 1 1 F 0 F 1 N # treatments X # groups Can analyze the reduced data using a weighted linear model.
- 70. Data Reduction with Dependent Data Subject Di Yij Evan 1 1 Evan 1 0 Ashley 0 1 Ashley 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Leena 1 1 Ema 0 0 Seamus 1 1 Create bootstrap replicates R1 R2 R3 reduce the replicates as if they’re i.i.d. r1 r2 r3 s1 s2 s3 compute statistics on reduced data
- 71. Experiment Analysis (i.i.d. data) Data Size Fits in memory < 2M rows Doesn’t ﬁt in memory Using Covariates linear regression data reduction + weighted linear models No Covariates t-test data reduction
- 72. Experiment Analysis (dependent data) Data Size Fits in memory < 2M rows Doesn’t ﬁt in memory Using Covariates random effects models or bootstrap + linear models bootstrap + data reduction + weighted linear models No Covariates random effects models or bootstrap in R bootstrap + data reduction
- 73. $ ipython notebook 4-analyzing-experiments.ipynb Hands On Analyzing Experiments
- 74. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

No public clipboards found for this slide

Be the first to comment