Successfully reported this slideshow.
Upcoming SlideShare
×

# Data simulation basics

1,533 views

Published on

Slides from talk at Oxford Reproducibiity School: Kickstart your reproducible research, Sep 26 2018

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Data simulation basics

1. 1. Simulating data to gain insights into experimental design and statistics Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee
2. 2. Before we get started…. • I will show you some exercises for simulating data. It’s fine if you just want to listen and learn. There are materials online that you can work through later. • If you would like to work along with the exercises, that is also fine, but I won’t be able to answer many questions. The early exercises in this lesson use Microsoft Excel, which most people will have installed • The later exercises use R and R studio. If you are familiar with these and have them installed, feel free to work along. You will need the packages yarrr, mvrnorm and Hmisc.
3. 3. Have a bright idea Collect data Think about how to analyse data Hit problems: Take advice from statistician How most people do experiments
4. 4. 1890-1962
5. 5. Have a bright idea Simulate data Think about how to analyse simulated data If problems: Take advice from statistician Collect real data A better way to do experiments
6. 6. Why invent data? • If you can anticipate what your data will look like, you will also anticipate a lot of issues about study design that you might not have thought of • Analysing a simulated dataset can clarify what is optimal analysis/ how the analysis works • Simulating data with an anticipated effect is very useful for power analysis – deciding what sample size to use • Simulating data with no effect (i.e. random noise) gives unique insights into how easy it is to get a false positive result through p-hacking
7. 7. Ways to simulate data • For newbies: to get the general idea: Excel • Far better but involves steeper learning curve: R • Also (but not covered here) options in SPSS and Matlab: • e.g. https://www.youtube.com/watch?v=XBmvYORP5EU • http://uk.mathworks.com/help/matlab/random-number- generation.html
8. 8. Basic idea • Anything you measure can be seen as a combination of an effect of interest plus random noise • The goal of research is to find out • (a) whether there is an effect of interest • (b) if yes, how big it is • Classic hypothesis-testing with p-values is simply focuses just on (a) – i.e. have we just got noise or a real effect? • We can simulate most scenarios by generating random noise, with or without a consistent added effect
9. 9. Basic idea: generate a set of random numbers in Excel • Open a new workbook • In cell A1 type random number • In cell A2 type = rand() Grab the little square in the bottom right of A2 and pull it down to autofill the cells below to A8
10. 10. Random numbers in Excel, ctd • You have just simulated some data! • Are your numbers the same as mine? • What happens when you type rand() in A9?
11. 11. Random numbers in Excel, ctd. • Your numbers will be different to mine – that’s because they are random. • The numbers will change whenever you open the worksheet, or make any change to it. • Sometimes that’s fine, but for this demo we want to keep the same numbers. To control when random numbers update, select Manual in Formula|Calculation Options. • To update to new numbers use Calculate Now button. Remember to reset to Automatic afterwards!
12. 12. Random numbers in Excel, ctd. • The rand() function generates random numbers between 0 and 1: Are these the kind of numbers we want?
13. 13. Realistic data usually involves normally distributed numbers • Nifty way to do this in Excel: treat generated numbers as p-values • The normsinv() function turns a p-value into a z-score Z-score
14. 14. Normally distributed random numbers Try this: • Type = normsinv(A2) in cell B2 • Drag formula down to cell B8 • Now look at how the numbers in column A relate to those in column B. NB. In practice, we can generate normally distributed random numbers (i.e. z-scores) in just one step with formula: = normsinv(rand())
15. 15. Now we are ready to simulate a study where we have 2 groups to be compared on a t-test • Pull down the formula from columns A and B to extend to A11:B11 • Type a header ‘group’ in C1 • Type 1 in C2:C6 and 2 in C7:C11
16. 16. What is formula for t-test in Excel? Basic rule for life, especially in programming: if you don’t know it, Google it TTEST formula in xls: You specify: Range 1 Range 2 tails (1 or 2) type 1 = paired 2 = unpaired equal variance 3 = unpaired unequal variance
17. 17. Try entering the formula for the t-test in C12 =TTEST(B2:B6, B7:B11,2,2) What is the number that you get? This formula gives you a p-value Now press ‘calculate now’ 20 times, and keep a tally of how many p-values are < .05 in 20 simulations
18. 18. • What has this shown you? • P-values ‘dance about’ even when data are entirely random • On average, one in 20 runs will give p < .05 when null hypothesis is true – no difference between groups • Doesn’t mean you get EXACTLY 1 in 20 p-values < .05: need a long run to converge on that value. See Geoff Cumming: Dance of the p-values https://www.youtube.com/watch?v=5OL1RqHrZQ8 Congratulations! You have done your first simulation
19. 19. We’ll stick with Excel for one more simulation • So far, we’ve simulated the null hypothesis - random data. If we find a ‘significant’ difference, we know it’s a false positive • Next, we’ll simulate data with a genuine effect. • It’s easy to do this: we just add a constant to all the values for group 2 • Since we’re using z-scores, the constant will correspond to the effect size (expressed as Cohen’s d). • Let’s try an effect size of .5 • For cells B7, change the formula to = normsinv(A7)+.5 • Drag the formula down to cell B11 and hit ‘Calculate now’
20. 20. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5?
21. 21. I’ve added formulae to show the mean and SD for the two groups: = AVERAGE(B2:B6) = STDEV(B2:B6) = AVERAGE(B7:B11) = STDEV(B7:B11) Your values will differ. Why isn’t the difference in means for the two groups exactly .5? ANSWER: mean/SD describe the population; this is just a sample from that population
22. 22. Now type the formula for the t-test =TTEST(B2:B6,B 7:B11,2,2) Is p < .05 ? It’s pretty unlikely you will see a significant result. Why?
23. 23. It’s pretty unlikely you will see a significant result. Why? ANSWER: Sample too small – can’t pick out signal from noise
24. 24. • The first simulation gave some insights into false positive rates: it shows how you can get a ‘significant’ result from random data • The second simulation illustrates the opposite situation: showing how often you can fail to get a significant p-value, even when there is a true effect (false negative) • This brings us on to the topic of statistical power: the probability of detecting a real effect with a given sample size • To build on these insights we need to do lots of simulations, and for that it’s best to move to R What have we learned so far?
25. 25. Fire up R studio Console: try commands out here Environment: check variables here Cursor: console is ready for you to type here
26. 26. At the cursor type: scoresA <- rnorm(n = 5, m = 0, sd = 1) • This creates a vector of z-scores (i.e. random normal deviates with mean of 0 and SD of 1) • But where is it? • To see the numbers you can either look in the Environment pane (top right) and/or just type the vector’s name at the cursor scoresA [1] -0.15348659 0.01984155 0.18353508 0.23524739 1.18143805 Blue courier: what you type at cursor. Black courier, output at cursor rnorm is an inbuilt R function that generates random normal deviates
27. 27. We’ll now create another vector for group B. Same command but we’ll make scores for group B an average .5 points higher: scoresB <- rnorm(n = 5, m = 0.5, sd = 1) You can inspect this as before: type its name at the console. Now we can do a t-test t.test(scoresA,scoresB) Welch Two Sample t-test data: scoresA and scoresB t = -1.502, df = 5.8215, p-value = 0.1853 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.0909662 0.5076982 sample estimates: mean of x mean of y 0.2933151 1.0849491 • Console shows results for a Welch 2-sample t-test (i.e. t-test with correction for unequal variances)
28. 28. We’ll now do exactly the same thing, but with N of 50 per group scoresA <- rnorm(n = 50, m = 0, sd = 1) scoresB <- rnorm(n = 50, m = 0.5, sd = 1) t.test(scoresA,scoresB) Welch Two Sample t-test data: scoresA and scoresB t = -2.6022, df = 94.313, p-value = 0.01076 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.9723062 -0.1307207 sample estimates: mean of x mean of y 0.1208312 0.6723447
29. 29. Benefits of simulating data in R • Much faster than Excel, and reproducible • Can generate different distributions, correlated variables, etc. • Powerful plotting functions • A good way of starting to learn R • Can write a script that executes commands to generate data and then run it automatically many times with different parameters (e.g. N and effect size) and store results Downside: Steep initial learning curve But remember: Google is your friend Tons of material about R on the internet
30. 30. Self-teaching scripts on https://osf.io/skz3j/ Download, save and open this one: Simulation_ex1_multioutput.R Source pane: script Console: window moves down when we open a script file
31. 31. First thing to do: Set working directory • Working directory is where R will default to when reading and writing stuff • Easiest way to set it: Go to Session|Set working directory Note that when you do this, the command to set working directory will pop up on the console. On my computer I see: setwd("~/deevybee_repo")
32. 32. Take note of location of Run button
33. 33. Simulation_ex1_multioutput.R This repeatedly runs the steps you put into the console, plots the results and saves the plots in a pdf: • There are some additional steps to reorganize the numbers: for an explanation of the details please see Simulation_ex1_intro.R • You run the simulation repeatedly, with two different values for N The structure of the script is with 2 nested loops: for (i in 1:2){ #line 15 ……… #various commands here for (j in 1:10){ #line 21 ……… #various commands here } } • The outer loop runs twice; the inner loop, which is nested inside it, runs 10 times. So overall there are 20 runs • The value,i,in the outer loop, controls sample size which is either myNs[1] or myNs[2] • The value, j, in the inner loop just acts as a counter, to give 10 repetitions
34. 34. Let’s run the whole script! • Select all the code in the source (upper left-hand pane) by clicking in that pane and then typing Ctrl+A or Command+A • Now hit the Run button on the menu bar to run the script • Click on the Files tab in the bottom right-hand pane, and you’ll see you have created two new pdf files (you may need to scroll down to see them):
35. 35. 10 runs of simulation with N = 20 per group and effect size (d) = .5 t = −1.8; p = 0.0767 Group Score −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.4; p = 0.0018 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.48; p = 0.637 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.165 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.164 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = 0.044; p = 0.965 Group Score −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.9; p = 0.0638 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.86; p = 0.394 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.6; p = 0.0139 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.3; p = 0.0256 Group Score −1.5 −1 −0.5 0 0.5 1 1.5 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 ** * *
36. 36. 10 runs of simulation with N = 100 per group and effect size (d) = .5 t = −2.9; p = 0.00396 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −4.4; p = 0.0000159 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.5; p = 0.000486 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2; p = 0.0417 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.9; p = 0.000137 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.4; p = 0.00084 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −4.7; p = 0.00000539 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.9; p = 0.00463 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.8; p = 0.000218 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −3.3; p = 0.00117 Group Score −3 −2 −1 0 1 2 3 4 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 1 2 ***** **** **** *** *********
37. 37. 10 runs of simulation with N = 100 per group and effect size (d) = .3 t = −2.9; p = 0.00406 Group Score −4 −3 −2 −1 0 1 2 3 4 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.5; p = 0.128 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.93; p = 0.354 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 1 2 t = −1.2; p = 0.242 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 1 2 t = −2.6; p = 0.00932 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● 1 2 t = −2.9; p = 0.00463 Group Score −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −0.63; p = 0.529 Group Score −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.9; p = 0.00443 Group Score −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −2.6; p = 0.011 Group Score −3.5 −2.5 −1.5 −0.5 0.5 1.5 2.5 3.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 t = −1.4; p = 0.151 Group Score −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 ** ** ** ** *
38. 38. Points to note • Smaller samples associated with more variable results. • With small sample sizes, true but weak effects will usually not give you a ‘significant’ result (i.e. p < .05). • In the example here, with effect size of .3, sample of 100 per group only gives a significant result on around 60% of runs (when we do many runs of simulation). • This is the same as saying the power of the study to detect an effect size of .3 is equal to .60% • Many statisticians recommend power should be 80% or more (though will depend on purpose of study).
39. 39. Body of table show sample size per group Jacob Cohen worked this all out in 1988
40. 40. Estimating statistical power for your study You can compute power without needing to simulate: For simple designs can use G-power package (or Cohen’s formulae) But simulation gives more insight into what power means. It is also more flexible: can use with complex datasets and analytic methods. Simulate data, run the analysis 10,000 times and then see how frequently your result is ‘significant’ by whatever criterion you plan to use. This requires you to have a sense of what your data will look like, and you have to have an estimate of what is the smallest effect size that you’d be interested in.
41. 41. “Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant.” Weak statistical power has been, and continues to be a major cause of problems with replication of findings 1987 Newcombe
42. 42. Low power plagues much research in biomedical science and psychology What can be done?! • Take steps to improve effect size: minimize noise Use better measures – check they are reliable Take more samples of dependent variable – e.g. more trials • Think hard about experimental design – simulate different possibilities E.g. Sometimes a within-subjects design is more sensitive • Work collaboratively to increase sample size
43. 43. Within-subjects vs between-subjects design: Matched pairs vs. independent t-test • See simulation_ex1a_withinsubs.R If some of the noise reflects consistent attribute of subjects, then testing 20 people twice more powerful than testing 2 groups of 20. t = −2.8; p = 0.0115 Difference ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● −2−101234 * t = −4.1; p = 0.000678 Difference ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 *** t = −1.4; p = 0.167 Difference ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● −2−101234 t = −5.2; p = 0.0000558 Difference ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● −2−101234 *** t = −3.3; p = 0.00337 Difference ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 ** t = −3.4; p = 0.00296 Difference ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 ** t = −2.4; p = 0.026 Difference ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 * t = −2.5; p = 0.0207 Difference ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● −2−101234 * t = −2; p = 0.0564 Difference ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2−101234 t = −2.7; p = 0.0152 Difference ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● −2−101234 * Difference scores pre-post treatment, N = 20: effect size = .5, correlation time1/2 =.5
44. 44. See: DeclareDesignIntro on https://github.com/oscci/simulate_designs Also R package: simstudy – simulate datasets with different properties, including multilevel data
45. 45. Low power plagues much research in biomedical science and psychology What can be done?! • Work collaboratively to increase sample size https://psysciacc.org/ Nature 561, 287 (2018) doi: 10.1038/d41586-018-06692-8
46. 46. Part 2: Simulating null results to illustrate p-hacking
47. 47. P-hacking and type 1 error (false positives) Simulation_ex2_correlations.R Often studies have multiple variables of interest. This script uses the mvrnorm function from the MASS package to simulate multivariate normal data It also demonstrates the dangers of p-hacking by showing how easy it is to get some values with p < .05 if you have a large selection of variables
48. 48. Thought experiment: we’ll simulate 7 uncorrelated variables. In a single run, how likely is it that we’ll see: • No significant correlations • Some significant correlations Suppose you make a specific prediction in advance that your two favourite variables (e.g. V1 and V3) will be significantly correlated: what’s the probability you will be correct?
49. 49. Correlation matrix for run 1 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); Sample size not relevant for this demonstration: With larger N, smaller r will be significant at .05
50. 50. Correlation matrix for run 2 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); Why do we get significant values when we have specified true r = 0 ?
51. 51. Correlation matrix for run 3 Output from simulation of 7 independent variables, where true correlation = 0 N = 30 Red denotes p < .05 ( r > .31 or < -.31); On any one run, we are looking at 21 correlations. So we should use Bonferroni corrected p-value: .05/21 = .002, corresponds to r = .51
52. 52. • Use of .05 cutoff makes sense only in relation to an a-priori hypothesis Focusing just on ‘significant’ associations in a dataset is classic p- hacking – also known as ‘data dredging’ It is very commonly done, and many people fail to appreciate how misleading it is. It’s fine to look for patterns in complex data as a way of exploring and deriving a hypothesis, but it must then be tested in another sample. Consider: we saw particular patterns in our random noise data – but they did not replicate in another run. Key point: p-values can only be interpreted in terms of the context in which they are computed
53. 53. • Multi-way Anova with many main effects/interactions • Cramer, A. O. J., et al (2016). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review, 23(2), 640-647. doi:10.3758/s13423-015-0913-5) Other ways in which ‘hidden multiplicity’ of testing can give false positive (p < .05) results
54. 54. Illustrated with field of ERP/EEG • Flexibility in analysis in terms of: • Electrodes • Time intervals • Frequency ranges • Measurement of peaks • etc, etc • Often see analyses with 4- or 5-way ANOVA (group x side x site x condition x interval) • Standard stats packages correct p-values for N levels WITHIN a factor, but not for overall N factors and interactions . Cramer AOJ, et al 2016. Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review 23:640-647
55. 55. • Subgroup analysis Other ways in which ‘hidden multiplicity’ of testing can give false positive (p < .05) results
56. 56. You run a study investigating how a drug, X, affects anxiety. You plot the results by age, and see this: No significant effect of X on anxiety overall -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
57. 57. But you notice that there is a treatment effect for those aged over 36 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
58. 58. Close link between p-hacking and HARKing You are HARKing if you have no prior predictions, but on seeing results you write up paper as if you planned to look at effect of age on drug effect. This kind of thing is endemic in psychology. • It is OK to say that this association was observed in exploratory analysis, and that it suggests a new hypothesis that needs to be tested in a new sample. • It is NOT OK to pretend that you predicted the association if you didn’t. • And it is REALLY REALLY NOT OK to report only the data that support your new hypothesis (e.g. dropping those aged below 36 from the analysis) -1 -0.5 0 0.5 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptom improvement Age (yr) Treatment effect by age
59. 59. • Analytic flexibility affects not just subgroups, but also selection of measures, type of analysis, removal of outliers, etc. • ‘Garden of forking paths’ • In many cases, hard to apply any statistical correction, because we are unaware of all the potential analyses The problem: analytic flexibility that allows analysis to be Influenced by the results "El jardín de senderos que se bifurcan"
60. 60. 1 contrast Probability of a ‘significant’ p-value < .05 = .05 Large population database used to explore link between ADHD and handedness https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379 Demonstration of rapid expansion of comparisons with binary divisions
61. 61. Focus just on Young subgroup: 2 contrasts at this level Probability of a ‘significant’ p-value < .05 = .10 Large population database used to explore link between ADHD and handedness
62. 62. Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of a ‘significant’ p-value < .05 = .19 Large population database used to explore link between ADHD and handedness
63. 63. Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of a ‘significant’ p-value < .05 = .34 Large population database used to explore link between ADHD and handedness
64. 64. Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level Probability of a ‘significant’ p-value < .05 = .56 Large population database used to explore link between ADHD and handedness
65. 65. Richard Peto: ISIS-2 study group (1988) Lancet 332, 349-410
66. 66. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis-generating (exploratory) research -> misuse of statistical tests de Groot, A. D. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric- Jan Wagenmakers, et al]. Acta Psychologica, 148, 188-194. doi:http://dx.doi.org/10.1016/j.actpsy.2014.02.001 Further reading
67. 67. A comprehensive solution: Pre-registration
68. 68. Some general points to help you learn R 1. Basic rule for life, especially in programming: if you don’t know it, Google it In R, Google your error message 2. Best way to learn is by making mistakes If you see a line of code you don’t understand, play with it to find out what it does. Look at Environment tab, or type name of variable on the console to check its value Don’t be afraid to experiment; E.g., you want repeating numbers? Type in the console to compare: rep (1,3) and
69. 69. R scripts available on : https://osf.io/view/reproducibility2017/ • Simulation_ex1_intro.R Suitable for R newbies. Demonstrates ‘dance of the p-values’ in a t-test. Bonus, you learn to make pirate plots • Simulation_ex2_correlations Generate correlation matrices from multivariate normal distribution. Bonus, you learn to use ‘grid’ to make nicely formatted tabular outputs. • Simulation_ex3_multiwayAnova.R Simulate data for a 3-way mixed ANOVA. Demonstrates need to correct for N factors and interactions when doing exploratory multiway Anova. • Simulation_ex4_multipleReg.R Simulate data for multiple regression. • Simulation_ex5_falsediscovery.R Simulate data for mixture of null and true effects, to demonstrate that the probability of the data given the hypothesis is different from the probability of the hypothesis given the data. Two simulations from Daniel Lakens’ Coursera Course – with notes! • 1.1 WhichPvaluesCanYouExpect.R • 3.2 OptionalStoppingSim.R Now even more: See OSF!