Basics of ExperimentationKatya Vasilaky, PhD
Overview Experimental Design Data Analysis Power Calculations Sampling
Most Important You need a control group (C), and a treatment (T) group Only change one thing at time in your treatment g...
Why a control group?                    External events. Your experiment.                        What’s causing a change  ...
Experimental Design What’s your unit? A person or a group?   If the treatment spills into the control, then you’ll need ...
Think Ahead: Analysis Either you’ll have a one time treatment:    Cross section: ΔT-C Or the treatment(s) will last ove...
How many users? ->Power                                                 Represents the desired power                      ...
Factors Affecting Power1. Size of the effect2. Standard deviation of the characteristic3. Bigger sample size4. Significanc...
Power and A/B Testing Ruby, Python built-in A/B testing tools often run  experiments without a pre-specified sample size....
Sampling So we figured out how many people to randomly select  into T (N1) and C (N2). Now what? Randomly assign N1 indi...
Some References Cochran (1977) Sampling Techniques (Everyone should read) List et al (2008) So you want to run an experi...
Upcoming SlideShare
Loading in...5
×

K vasilaky action_designdc_15jan2013_final

257

Published on

Katya Vasilaky's slides on experimental design, from Action Design DC's Jan 15th meetup. Event held at LivingSocial in DC.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
257
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

K vasilaky action_designdc_15jan2013_final

  1. 1. Basics of ExperimentationKatya Vasilaky, PhD
  2. 2. Overview Experimental Design Data Analysis Power Calculations Sampling
  3. 3. Most Important You need a control group (C), and a treatment (T) group Only change one thing at time in your treatment groups  Answer your question-What needs to change? Calculate your sample size for T & C (power) before you begin the experiment Individuals should be randomly assigned
  4. 4. Why a control group? External events. Your experiment. What’s causing a change In the user behavior? • Your experiment? • Or external events?
  5. 5. Experimental Design What’s your unit? A person or a group?  If the treatment spills into the control, then you’ll need a higher clustering level than the individual user What’s your design?  1 control, 1 treatment  1 control, multiple treatments  One period or multiple periods?  “Ethical” Designs:  Roll-in, Within-group, and Encouragement
  6. 6. Think Ahead: Analysis Either you’ll have a one time treatment:  Cross section: ΔT-C Or the treatment(s) will last over time:  Panel:ΔT-ΔCWhich changes your analysis.Where the Δ is a change between the average (or otherstatistic) of your interested outcome variable and control.If it’s not the average, then things are just a tad morecomplique (as variance and medians are not Gaussian).
  7. 7. How many users? ->Power Represents the desired power (typically .84 for 80% power).Sample size in each group(assumes equal sized groups) Standard deviation of the outcome variable, e.g. sales. Represents the desired level of statistical significance (typically 1.96). Effect Size (the difference in means). E.g. change in sales.  Then check a tool (e.g. http://www.statisticalsolutions.net/pss_calc.php)  Then use a stats tool for bells and whistles (R, STATA, SPSS, Matlab). R (pwr), Stata (sampsi, sampclus)  To maximize power: the ratio of sample sizes is equal to the ratio of the standard deviations of the outcomes (List, 2008)
  8. 8. Factors Affecting Power1. Size of the effect2. Standard deviation of the characteristic3. Bigger sample size4. Significance level desired
  9. 9. Power and A/B Testing Ruby, Python built-in A/B testing tools often run experiments without a pre-specified sample size. Issues with that:  So when do you stop the experiment if you find no statistically significant effect between T and C?  Just wait and wait?  No, As the sample size increases and the variance of the outcome decreases, you’ll have the power to detect very small differences between μ0 and μ1.
  10. 10. Sampling So we figured out how many people to randomly select into T (N1) and C (N2). Now what? Randomly assign N1 individuals into T and N2 individuals into C. Tools: R(pwr), Stata(bsample) You might want to stratify by gender, user type etc., which would require a discussion on sampling weights.
  11. 11. Some References Cochran (1977) Sampling Techniques (Everyone should read) List et al (2008) So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design Levy & Lemeshow (1991) Sampling Populations: Methods and Applications Valliant, Dever & Kreute (2010) Practical Tools for Designing and Weighting Survey Samples Brady West (http://www-personal.umich.edu/~bwest/) I follow Andrew Gelman on the blogworld, who has a note (http://www.stat.columbia.edu/~gelman/stuff_for_blog/chap20.pd f) which may be useful.

×