1.1 Introduction Fundamental Points Clinical trials should have sufficient statistical power to detect difference between groups considered to be of clinical interest. Therefore calculation of sample size with provision for adequate levels of significance and power is a essential part of planning.
Five Key Questions Regarding the SampleSize What is the main purpose of the trial? What is the principal measure of patients outcome? How will the data be analyzed to detect a treatment difference? (The test statistic: t-test , X2 or CI.) What type of results does one anticipate with standard treatment? Ho and HA, How small a treatment difference is it important to detect and with what degree of certainty? ( δ, α and β.) How to deal with treatment withdraws and protocol violations. (Data set used.)
SSC: Only an Estimate Parameters used in calculation are estimates with uncertainty and often base on very small prior studies Population may be different Publication bias--overly optimistic Different inclusion and exclusion criteria Mathematical models approximation
What should be in the protocol? Sample size justification Methods of calculation Quantities used in calculation: • Variances • mean values • response rates • difference to be detected
Realistic and Conservative Overestimated size: unfeasible early termination Underestimated size justify an increase extension in follow-up incorrect conclusion (WORSE)
What is α (Type I error)? The probability of erroneously rejecting the null hypothesis (Put an useless medicine into the market!)
What is β (Type II error)? The probability of erroneously failing to reject the null hypothesis. (keep a good medicine away from patients!)
What is Power ? Power quantifies the ability of the study to find true differences of various values of δ. Power = 1- β=P (accept H1|H1 is true) ----the chance of correctly identify H1 (correctly identify a better medicine)
What is δ? δ is the minimum difference between groups that is judged to be clinically important Minimal effect which has clinical relevance in the management of patients The anticipated effect of the new treatment (larger)
The Choice of α and β depend on: the medical and practical consequences of the two kinds of errors prior plausibility of the hypothesis the desired impact of the results
The Choice of α and β α=0.10 and β=0.2 for preliminary trials that are likely to be replicated. α=0.01 and β=0.05 for the trial that are unlikely replicated. α=β if both test and control treatments are new, about equal in cost, and there are good reasons to consider them both relatively safe.
The Choice of α and β α>β if there is no established control treatment and test treatment is relatively inexpensive, easy to apply and is not known to have any serious side effects. α<β (the most common approach 0.05 and 0,2)if the control treatment is already widely used and is known to be reasonably safe and effective, whereas the test treatment is new,costly, and produces serious side effects.
1.2 SSC for Continuous OutcomeVariables H0: δ=µC-µI=0 HA: δ=µC-µI≠0 If the variance in known If z= (x −x )c I 1 1 σ + NC N I Z > Zα If H0 will be rejected at the α level of significance.
A total sample 2N would be needed to detect a true difference δ between µI and µC with power (1-β) and significant level α by formula: 2N = ( ) 4 Zα + Z β σ 2 2 δ2
Example 1 An investigator wish to estimate the sample size necessary to detect a 10 mg/dl difference in cholesterol level in a diet intervention group compared to the control group. The variance from other data is estimated to be (50 mg/dl). For a two sided 5% significance level, Zα=1.96, and for 90% power, Zβ=1.282. 2N=4(1.96+1.282)2(50)2/102=1050
Example1aBaseline Adjustment An investigator interested in the mean levels of change might want to test whether diet intervention lowers serum cholesterol from baseline levels when compare with a control. H0: =0 ∆ I ∆c − HA: ∆ c − ∆ I ≠0 σ=20mg/dl, δ=10mg/dl 2N=4(1.96+1.282)2(20)2/102=170
A Professional Statement A sample size of 85 in each group will have 90% power to detect a difference in means of 10.0 assuming that the common standard deviation is 20.0 using a two group t-test with a 0.05 two-sided significant level.
Values of f(α,β) to be used in formulafor sample size calculation β(Type II error) α 0.05 0.1 0.2 0.5 (Type I 0.1 10.8 8.6 6.2 2.7 error) 0.05 13.0 10.5 7.9 3.8 0.02 15.8 13.0 10.0 5.4 0.01 17.8 14.9 11.7 6.6(Z α +Z β ) 2 = f (α β) ,
1.3 SSC for a Binary Outcome Two independent samples 1 1 Z = ( pC − pI ) / p (1 − p ) N + C NI p = ( rI + rC ) /( N I + N C )
p = ( pC + pI ) / 22 N = 4( Zα + Z β ) p (1 − p ) / ( pC − pI ) 2 2
Example 2 Suppose the annual event rate in the control group is anticipated to be 20%. The investigator hopes that the intervention will reduce the annual rate to 15%. The study is planned so that each participant will be followed for 2 years. Therefore, if the assumption are accurate, approximately 40% of the participants in the control group and 30% of the participants in the intervention group will develop an event.
A Professional Statement A two group x2 test with a 0.05 two- sided significant level will have 90% power to detect the difference between a Group 1 proportion, P1,of 0.40 and a Group 2 proportion P2 of 0.30 (odds ratio of 0.643) when the sample size in each group is 480.
From Table 1.3 You can see: δ↑→N↓ The power 1- β↑→N ↑ The α↓→N ↑
Paired Binary Outcome McNemar’s test Np = [Z α + Zβ ] 2 f 2 d d=difference in the proportion of successes (d=pI-pC) f=the portion of participants whose response is discordant (the pair of outcome are not the same)
Example 3 Consider an eye study where one eye is treated for loss in visual acuity by a new laser procedure and the other eye is treated by standard therapy. The failure rate on the control, pC, is estimated to be 0.4, and the new procedure is projected to reduce the failure rate to 0.20. The discordant rate f is assumed to be 0.50.
1.4 Adjusting for Non-adherence Ro =drop out rate RI=drop in rate / (1 − RO − RI ) 2 N∗=N If RO=0.20, RI=0.05 N ∗=1.78N
1.5 Adjusting the Multiple Comparison α’= α/k k= the number of multiple comparison variables
Table 1.4 Adjusting for Randomization RatioRandomization Ratio Increase in total N1:1 01:2 +12.5%1:3 +33%1:4 +56%1:5 +80%1:6 +100%
1.6 Adjusting for loss of follow up If p is the proportion of subjects lost to follow-up, the number of subjects must be increased by a factor of 1/(1-p).
1.7 Other Factors: the rate of attrition of subjects during a trial intermediate analyses
Sample size re-estimation Events rates are lower than anticipate Variability of larger than expected Without unbinding data and Making treatment comparisons
1.8 Power Calculation(assuming we compare two medicines) Power Depends on 4 Elements: The real difference between the two medicines, δ • Big δ⇒big power The variation among individuals,σ • Small σ⇒big power The sample size, n • Large n⇒big power Type I error,α • Large α ⇒big power
Sensitivity of the sample sizeestimate to a variety of deviations from these assumptions a power table
Table 1 Statistical Power of the TanzaniaVitamin and HIV Infection Trial (N=960) Effect of B 0% 15% 30%Effect of A Loss to follow up Loss to follow up Loss to follow up 0% 20% 33% 0% 20% 33% 0% 20% 33%30% 89% 82% 74% 85% 76% 68% 79% 69% 61%25% 75% 65% 58% 69% 59% 52% 62% 52% 45%
Example 4Regret for Low Power Due to SmallSample? I have a set of data that the mean change between the 2 groups is significantly different (p<0.05). But when I put calculate the power it gives only 50%. How should I interpret this? Also, can someone kindly advise as whether it is meaningful (or pointless) to calculate the power when the result is statistically significant?
Books and Software Sample size tables for clinical studies (second edition) By David Machin, Michael Campbell Peter Fayers and Alain Pinol Blackwell Science 1997 PASS 2000 available in CCTER nQuery 4.0 available in CCTER
Randomization Definition: randomization is a process by which each participant has the same chance of being assigned to either intervention or control.
Fundamental Point Randomization trends to produce study groups comparable with respect to known and unknown risk factors, removes investigator bias in the allocation of participants, and guarantees that statistical tests will have valid significance levels.
Two Types of Bias in Randomization Selection bias occurs if the allocation process is predictable. If any bias exists as to what treatment particular types of participants should receive, then a selection bias might occur. Accidental bias can arise if the randomization procedure does not achieve balance on risk factors or prognostic covariates especially in small studies.
Fixed Allocation Randomization Fixedallocation randomization procedures assign the intervention to participants with a pre-specified probability, usually equal, and that allocation probability is not altered as the study processes • Simple randomization • Blocked randomization • Stratified randomization
Simple Randomization Option 1: to toss an unbiased coin for a randomized trial with two treatment (call them A and B) Option 2: to use a random digit table. A randomization list may be generated by using the digits, one per treatment assignment, starting with the top row and working downwards: Option 3: to use a random number-producing algorithm, available on most digital computer systems.
Advantages Each treatment assignment is completely unpredictable, and probability theory guarantees that in the long run the numbers of patients on each treatment will not be radically different and easy to implement
Disadvantages Unequal groups one treatment is assigned more often than another Time imbalance or chronological bias One treatment is given with greater frequency at the beginning of a trial and another with greater frequency at the end of the trial. Simple randomization is not often used, even for large studies.
Blocked Randomization(permuted block randomization) Blocked randomization is to ensure exactly equal treatment numbers at certain equally spaced point in the sequence of patients assignments A table of random permutations is used containing, in random order, all possible combinations (permutations) of a small series of figures. Block size: 6,8,10,16,20.
Advantages The balance between the number of participants in each group is guaranteed during the course of randomization. The number in each group will never differ by more than b/2 when b is the length of the block.
Disadvantages Analysis may be more complicated (in theory) Correct analysis could have bigger power Changing block size can avoid the randomization to be predictable Mid-block inequality might occur if the interim analysis is intended.
Randomization Types Stratified randomizationgeographic U .S . E u ro p elocationpreviousexposure Yes No Yes Nosite l y m p h s k i n b re a s t l y m p h s k i n b re a st l y m p h s k i n b re a s t l y m p h s k i n b re a s t
Stratified Randomization Stratified randomization process involves measuring the level of the selected factors for participants, determining to which stratum each belongs, and performing the randomization within the stratum. Within each stratum, the randomization process itself could be simple randomization, but in practice most clinical trials use some blocked randomization strategy.
Table 3. Stratification Factors and Levels(3×2×3=18 Strata)Age Sex Smoking history1. 40-49 yr 1.Male 1. Current smoker2. 50-59 yr 2 Female 2. Ex-smoker3. 60-69 yr 3. Never smoked
Table 4 Stratified Randomization with Block Size of Four Strat Age Sex Smoking Group assignment a 1 40-49 M Current ABBA BABA.. 2 40-49 M Ex BABA BBAA.. 3 40-49 M Never Etc. 4 40-49 F Current 5 40-49 F Ex 6 40-59 F Never 7 50-59 M Current 8 50-59 M Ex 9 50-59 M Never 10 50-59 F Current 11 50-59 F Ex 12 50-59 F Never etc.
Advantages Tomake two study groups appear comparable with regard to specified factors, the power of the study can be increased by taking the stratification into account in the analysis.
Disadvantages The prognostic factor used in stratified randomization may be unimportant and other factors may be identified later are of more importance
MechanismTrial Type MechanismNo central registration office Randomization list sealed envelopsDouble blind drug trial Pharmacist will be involvedMulti-centre trial Central registration officeSingle-centre trial Independent person responsible for patients registration and randomization
An Example of Stratified Randomization Patients will be stratified according to the following criteria: 1) Treatment center (Hospital A vs Hospital B vs Hospital C) 2) N-stage(N2 vs N3) 3) T-stage (T1-2 vs T3-4)
What should be in the protocol? A dynamic allocation scheme will be used to randomize patients in equal proportions within each of 12 strata. The scheme first creates time- ordered blocks of size divisible by three and then uses simple randomization to divide the patients in each block into three treatment arms, in equal proportion. The block sizes will be chosen randomly so that each block contains either 6 or 9 patients.
Cont… This procedure helps to ensure both randomness and investigator blinding (the block sizes are known only to the statistician), as recommended by Freedman et al. Randomization will be generated by the consulting statistician in sealed envelopes, labeled by stratum, which will be unsealed after patient registration.
Biased Coin Method Advantages Investigators can not determine the next assignment by discovery the blocking factor. Disadvantages Complexity in use Statistical analysis cumbersome
Minimization Minimization is an well -accepted statistical method to limit imbalance in relative small randomized clinical trials in conditions with known important prognostic baseline characteristics. It called minimization because imbalance in the distribution of prognostic factors are minimized
Table 1 Some baseline characteristics of patients in a controlled trial of mustine versus talc in the control of pleural effusions in patients with breast cancer (Frientiman et al, 1983) Treatment Mustine (n=23) Talc(n=23)Mean age (SE) 50.3(1.5) 55.3(2.2)Stage of disease:1 or 2 52% 74%3 or 4 48% 26%Mean interval in 33.1(6.2) 60.4(13.1)month between BCdiag. and effusiondiag. (SE)Postmenopausal 43% 74%
Minimization Factors Age ( years) <=50 Or >50 Stage of disease 1 or 2 Or 3 or 4 Time between diagnosis <=30 Or >30 of cancer and diagnosis of effusions(months) Menopausal Pre Or Post
Table 2 Characteristics of the first 29 patients in a clinical trial using minimization to allocate treatment Mustine Talc Age <=50 7 6 >50 8 8 Stage 1 or 2 11 11 3 or 4 4 3 Time <=30m 6 4 Interval >30m 9 10 Menopausal Pre 7 5 Post 8 9
Table 3 Calculation of imbalance in patient characteristics for allocating treatment to the thirtieth patient Mustine Talc (n=15) (n=14) Age >50 8 8 Stage 3 or 4 4 3 Time interval <=30m 6 4 Postmenopausal 8 9 Total 26 24
Advantages It can reduce the imbalance into the minimum level especially in small trial Computer Program available (called Mini) and also not difficult to perform ‘by hand’ Minimization and stratification on the same prognostic factors produce similar levels of power, but minimization may add slightly more power if stratification does not include all of the covariance
Disadvantages It is a bit complicated process compare to the simple randomization
Practical ConsiderationsStudy type RandomizationLarge studies BlockedLarge, Multicentre studies Stratified by centreSmall studies Blocked and Stratified by centreLarge number of MinimizationPrognostic factorsLarge studies Stratified analysis without stratified randomization