K vasilaky action_designdc_15jan2013_final


Published on

Katya Vasilaky's slides on experimental design, from Action Design DC's Jan 15th meetup. Event held at LivingSocial in DC.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

K vasilaky action_designdc_15jan2013_final

  1. 1. Basics of ExperimentationKatya Vasilaky, PhD
  2. 2. Overview Experimental Design Data Analysis Power Calculations Sampling
  3. 3. Most Important You need a control group (C), and a treatment (T) group Only change one thing at time in your treatment groups  Answer your question-What needs to change? Calculate your sample size for T & C (power) before you begin the experiment Individuals should be randomly assigned
  4. 4. Why a control group? External events. Your experiment. What’s causing a change In the user behavior? • Your experiment? • Or external events?
  5. 5. Experimental Design What’s your unit? A person or a group?  If the treatment spills into the control, then you’ll need a higher clustering level than the individual user What’s your design?  1 control, 1 treatment  1 control, multiple treatments  One period or multiple periods?  “Ethical” Designs:  Roll-in, Within-group, and Encouragement
  6. 6. Think Ahead: Analysis Either you’ll have a one time treatment:  Cross section: ΔT-C Or the treatment(s) will last over time:  Panel:ΔT-ΔCWhich changes your analysis.Where the Δ is a change between the average (or otherstatistic) of your interested outcome variable and control.If it’s not the average, then things are just a tad morecomplique (as variance and medians are not Gaussian).
  7. 7. How many users? ->Power Represents the desired power (typically .84 for 80% power).Sample size in each group(assumes equal sized groups) Standard deviation of the outcome variable, e.g. sales. Represents the desired level of statistical significance (typically 1.96). Effect Size (the difference in means). E.g. change in sales.  Then check a tool (e.g. http://www.statisticalsolutions.net/pss_calc.php)  Then use a stats tool for bells and whistles (R, STATA, SPSS, Matlab). R (pwr), Stata (sampsi, sampclus)  To maximize power: the ratio of sample sizes is equal to the ratio of the standard deviations of the outcomes (List, 2008)
  8. 8. Factors Affecting Power1. Size of the effect2. Standard deviation of the characteristic3. Bigger sample size4. Significance level desired
  9. 9. Power and A/B Testing Ruby, Python built-in A/B testing tools often run experiments without a pre-specified sample size. Issues with that:  So when do you stop the experiment if you find no statistically significant effect between T and C?  Just wait and wait?  No, As the sample size increases and the variance of the outcome decreases, you’ll have the power to detect very small differences between μ0 and μ1.
  10. 10. Sampling So we figured out how many people to randomly select into T (N1) and C (N2). Now what? Randomly assign N1 individuals into T and N2 individuals into C. Tools: R(pwr), Stata(bsample) You might want to stratify by gender, user type etc., which would require a discussion on sampling weights.
  11. 11. Some References Cochran (1977) Sampling Techniques (Everyone should read) List et al (2008) So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design Levy & Lemeshow (1991) Sampling Populations: Methods and Applications Valliant, Dever & Kreute (2010) Practical Tools for Designing and Weighting Survey Samples Brady West (http://www-personal.umich.edu/~bwest/) I follow Andrew Gelman on the blogworld, who has a note (http://www.stat.columbia.edu/~gelman/stuff_for_blog/chap20.pd f) which may be useful.