Successfully reported this slideshow.
Upcoming SlideShare
×

# Estimating sample size through simulations

2,077 views

Published on

Determining sample size is one critical and important procedure for designing an experiment. The sample size for most statistical models can be easily calculated by using the POWER procedure. However, the PROC POWER cannot be used for a complicated statistical model. This paper reviews a more generalized method to estimate the sample size through a simulation approach by using SAS® software. The simulation approach not only applies to the simple but also to a more complex statistical design.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• hai.. i want to ask.. how the coding if i want to do simulation for unequal sample sizes? for the equal sample sizes, i got it which is Data Try;
Do treat=1 to 3;
Do j=1 to 40;
y=rannorm(12345);
output;
End;
End;
Run;
Quit;
plez help me.. :(

Are you sure you want to  Yes  No

### Estimating sample size through simulations

1. 1. Estimating Sample Size through Simulations Wuchen Zhao Department of Preventive Medicine University of Southern California Arthur Li City of Hope National Cancer Center
2. 2. Background <ul><li>One of the important components of planning a statistical study is determining the sample size </li></ul><ul><li>An adequate size is critical to produce a statistically significant result </li></ul><ul><li>The larger the sample size, the more accurate the estimate of the parameters of the population </li></ul>
3. 3. Background <ul><li>Having a large sample size will definitely increase your research budget </li></ul><ul><li>Estimate the sample size to be large enough to detect the association of the research interest </li></ul>
4. 4. PROC POWER <ul><li>Suppose that we would like to test a new diet pill for weight loss </li></ul><ul><ul><li>A group of overweight patients were recruited </li></ul></ul><ul><ul><li>50 % patients  drug group </li></ul></ul><ul><ul><li>50 % patients  placebo group </li></ul></ul>
5. 5. PROC POWER <ul><li>Before the trial starts: μ weight = 265 </li></ul><ul><li>Assume: α weight = 50 </li></ul><ul><li>After the trail, we like to detect 15 pounds weight loss in the drug group </li></ul><ul><ul><li>Power (1- β ) = 80% </li></ul></ul><ul><ul><li>Significant Level ( α ) = 5% </li></ul></ul>
6. 6. PROC POWER proc power ; twosamplemeans test =diff groupmeans = 250 | 265 stddev = 50 alpha = 0.05 sides = 1 npergroup = . power = 0.8 ; run ;
7. 7. PROC POWER The POWER Procedure Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Alpha 0.05 Group 1 Mean 250 Group 2 Mean 265 Standard Deviation 50 Nominal Power 0.8 Null Difference 0 Computed N Per Group Actual N Per Power Group 0.802 139
8. 8. Calculating Sample Size Through Simulation <ul><li>Calculating N for a two-sample t-test: </li></ul>Q e : proportion of subjects allocated to drug group Q c = 1 – Q e d = μ 1 – μ 2
9. 9. Calculating Sample Size Through Simulation <ul><li>Calculating N for a two-sample t-test: </li></ul>
10. 10. Calculating Sample Size Through Simulation <ul><li>Calculating N for a two-sample t-test: </li></ul>Let
11. 11. Calculating Sample Size Through Simulation <ul><li>E(T) can be calculated by using the following steps: </li></ul><ul><ul><li>Simulate a large data set with M subjects based on the design parameters </li></ul></ul><ul><ul><li>Analyze the simulated data set to obtain the test statistics </li></ul></ul><ul><ul><li>Calculate the expected test statistics depending on the distribution of the test statistics </li></ul></ul><ul><li>Calculate sample size N </li></ul>
12. 12. Calculating Sample Size Through Simulation <ul><li>Step 1: Simulate a large data set </li></ul>%let sim_n = 2000000; %let mu_drug = 250; %let mu_placebo = 265; %let sigma = 50; data sim (drop = i seed); retain seed 1 ; length group \$ 7. ; do i = 1 to &sim_n; if ranuni(seed) < 0.5 then do ; group = 'drug' ; weight = &mu_drug + &sigma * rannor(seed); end ; else do ; group = 'placebo' ; weight = &mu_placebo + &sigma * rannor(seed); end ; output ; end ; run ;
13. 13. Calculating Sample Size Through Simulation <ul><li>Step 1: Simulate a large data set </li></ul>title &quot;The first 5 observations of simulated data&quot; ; proc print data = sim ( obs = 5 ) noobs ; run ; The first 5 observations of simulated data group weight drug 240.039 drug 269.829 placebo 318.472 drug 218.788 placebo 376.986
14. 14. Calculating Sample Size Through Simulation <ul><li>Step 2: Analyze the simulated data set to obtain the test statistics </li></ul>proc ttest data = sim; class group; var weight; ods output ttests = stats; run ; title 'The T-test Result' ; proc print data = stats noobs ; run ; The T-test Result Variable Method Variances tValue DF Probt weightloss Pooled Equal -211.37 2E6 <.0001 weightloss Satterthwaite Unequal -211.37 2E6 <.0001
15. 15. Calculating Sample Size Through Simulation <ul><li>Step 3: Calculate E(T) </li></ul>The T-test Result Variable Method Variances tValue DF Probt weightloss Pooled Equal -211.37 2E6 <.0001 weightloss Satterthwaite Unequal -211.37 2E6 <.0001 Average contribution of each person to the total test statistics
16. 16. Calculating Sample Size Through Simulation <ul><li>Step 3: Calculate E(T) </li></ul>
17. 17. Calculating Sample Size Through Simulation <ul><li>Calculate the sample size N </li></ul>%let alpha = 0.05; %let power = 0.80; %let sides = 1; data size_n(keep = alpha power z_alpha z_beta tvalue expected_t n); set stats; if method = 'Pooled' ; z_alpha = probit( 1 - &alpha/&sides); z_beta = probit(&power); expected_t = abs(tvalue)/sqrt(&sim_n); n=((z_alpha+z_beta)** 2 /expected_t** 2 )/ 2 ; run ;
18. 18. Calculating Sample Size Through Simulation <ul><li>Calculate the sample size N </li></ul>title 'The required sample size for each group' ; proc print data =size_n noobs ; run ; The required sample size for each group expected_ tValue z_alpha z_beta t n -211.37 1.64485 0.84162 0.14946 138.378
19. 19. Calculating Sample Size For Interaction Through Simulation <ul><li>Estimate N for detecting the interaction effect between caloric intake and whether or not the diet pill or the placebo was taken </li></ul><ul><li>Drug Group: </li></ul><ul><li>Placebo Group: </li></ul>
20. 20. Calculating Sample Size For Interaction Through Simulation <ul><li>Assume a negative correlation between caloric intake and weight loss </li></ul><ul><li>Taking the diet pill, the rate of weight loss decreases slower </li></ul><ul><li>Parameters: α = 10, β 1 = 5, β 2 = -0.004, and β 3 = 0.0025 </li></ul>Drug Group: Placebo Group:
21. 21. Calculating Sample Size For Interaction Through Simulation <ul><li>We like to assign </li></ul><ul><ul><li>30% patients with diet pill group </li></ul></ul><ul><ul><li>70% patients with placebo </li></ul></ul><ul><li>We like to detect a significant interaction with </li></ul><ul><ul><li>Power (1- β ) = 80% </li></ul></ul><ul><ul><li>Significant Level ( α ) = 5% </li></ul></ul>
22. 22. Calculating Sample Size For Interaction Through Simulation <ul><li>Step 1. Simulate a large data set based on the design parameters </li></ul><ul><li>Set the known parameters: </li></ul>%let sim_n=1000000; %let p_drug=0.3; %let mean_cal=2500; %let sd_cal=1000; %let alpha=10; %let beta1=5; %let beta2=-0.004; %let beta3=0.0025; %let sd_error=5;
23. 23. Calculating Sample Size For Interaction Through Simulation <ul><li>Step 1. Simulate a large data set based on the design parameters </li></ul>data sim (drop = seed i); retain seed 1 ; do i= 1 to &sim_n; if ranuni(seed) < &p_drug then drug= 1 ; else drug= 0 ; calorie = &mean_cal + &sd_cal * rannor(seed); weightloss = &alpha + &beta1*drug + &beta2* calorie + &beta3*drug*calorie + &sd_error*rannor(seed); output ; end ; run ;
24. 24. Calculating Sample Size For Interaction Through Simulation <ul><li>Step 1. Simulate a large data set based on the design parameters </li></ul>title 'The frist 5 observations of the simulated data' ; proc print data =sim ( obs = 5 ) noobs ; run ; The frist 5 observations of the simulated data drug calorie weightloss 1 2300.78 18.7863 0 1416.68 15.5247 0 3187.84 8.4472 1 1905.82 12.3007 0 1258.70 -2.8413
25. 25. Calculating Sample Size Through Simulation <ul><li>Step 2. Analyze the simulated data set to obtain the test statistics </li></ul>proc glm data = sim; model weightloss = drug calorie drug*calorie/ solution ; ods output parameterestimates=pe; quit ; title 'The result from the linear regression' ; proc print data =pe; run ; The result from the linear regression   Obs Dependent Parameter Estimate StdErr tValue Probt   1 weightloss Intercept 9.999851985 0.01605913 622.69 <.0001 2 weightloss drug 5.032458567 0.02931489 171.67 <.0001 3 weightloss calorie -0.003996279 0.00000597 -669.75 <.0001 4 weightloss drug*calorie 0.002481769 0.00001089 227.91 <.0001
26. 26. Calculating Sample Size Through Simulation <ul><li>Step 3. Calculate E(T) </li></ul>The result from the linear regression   Obs Dependent Parameter Estimate StdErr tValue Probt   1 weightloss Intercept 9.999851985 0.01605913 622.69 <.0001 2 weightloss drug 5.032458567 0.02931489 171.67 <.0001 3 weightloss calorie -0.003996279 0.00000597 -669.75 <.0001 4 weightloss drug*calorie 0.002481769 0.00001089 227.91 <.0001
27. 27. Calculating Sample Size Through Simulation <ul><li>Calculate the sample size </li></ul>%let alpha=0.05; %let power=0.80; %let sides=2; data size_n(keep=z_alpha z_beta tvalue expected_t n); set pe; if parameter= 'drug*calorie' ; z_alpha = probit( 1 - &alpha/&sides); z_beta = probit(&power); expected_t = abs(tvalue)/sqrt(&sim_n); n = (z_alpha + z_beta)** 2 /expected_t** 2 ; run ;
28. 28. Calculating Sample Size Through Simulation <ul><li>Calculate the sample size </li></ul>title 'The Sample Size for Testing the Interaction' ; proc print data =size_n noobs ; run ; The Sample Size for Testing the Interaction expected_ tValue z_alpha z_beta t n 227.91 1.95996 0.84162 0.22791 151.112 <ul><li>We need at least 152 people to detect the interaction </li></ul>
29. 29. Conclusion <ul><li>Three steps for estimating the expected test statistics </li></ul><ul><ul><li>1. Simulate a large number of subjects M based on design parameters </li></ul></ul><ul><ul><li>2. Analyze the simulated dataset to obtain the test statistics </li></ul></ul><ul><ul><li>3. Calculate the E(T) </li></ul></ul><ul><li>Calculate the Sample Size: </li></ul>
30. 30. Conclusion <ul><li>Calculating the sample size is one of the most important tasks for statisticians </li></ul><ul><li>When the study design is complex and cannot be calculated by the POWER procedure, you can use the simulation method to solve the problem </li></ul>