Sample Size Determination in Health Research

Sample size
Determination in
Health Research
Dr. D.K.Yadav
Department of Statistics &
Demography (S&D)
The National Institute of Health &Family Welfare
New Delhi-110067

Learning Objectives
1. Understandsamplesizeand powerestimation
2. Understandwhysamplesizeis an important
partof bothstudydesignand analysis
3. Understandthedifferencebetweensample
size calculations in different studies
4. Learnhowto performa samplesizecalculation
(a) For discrete or qualitative data
(b) Forcontinuousor quantitativedata

What is Sample Size and
why does it matter ?
Sample size is a measures of how many patients
are needed in a study. Nearly all clinical studies
entail studying a sample of patients with a
particular characteristic rather than the entire
patient- population. Subsequently the information
obtained from this sample is used to draw
inferences about the whole population.
Sample size estimations are used by researchers
to determine how many subjects are needed to
answer the research question with predefined
assumptions (or reject the null hypothesis).

Error in decision
Clinical
trial
result
Ultimate
truth
Benefit from
Treatment
No Benefit from
Treatment
Benefit
from
Treatment
√Correct
result
Type I error
(p) False +ve
result
No benefit
from
Treatment
Type II error
() False -
ve result
√Correct
result
Sample size calculations tell us how many patients are required
in order to reduce atype I or atype II error.

Factors affecting
sample size
The precision and variance of
measurements within asample
Magnitude of aclinically
significant difference
How certain we want to be to
avoid type Ierror
The type ofstatistical test
being applied

Basic approach
to sample size
 Estimates from single sample
 Estimation of prevalence
 Estimation of proportions
 Estimation of population mean
 Comparison of two groups
 Cross sectional studies
 Case control studies
 Cohort studies and
 Clinical trials
 Comparison of more than two groups
 Prediction problems (regression)
 Comparison of Survival times

What type of
measurement?
 Means of quantitative data
 Correlation between variables
 Proportions from binary variables
 Count of cases in different groups
 Ordered scales (pain score)
 Survival/failure times etc

Sample Size of aStudy
The statisticalreasoning
The Fundamental question:
How many cases do we need ?
Five key questions to answer the above :
1. What is the main purpose of the study?
2. What is the principal measure of outcome?
3. What will be statistical method to
detect the significant difference?
4. What is standard or anticipated result
to be compared with study outcome?
5. How small difference between study
outcome and anticipated value is
practically important and with what
degree of certainty?

Estimating diseaseoccurrence
The tuberculosis inchildren under-five
Example: A local health authoritywants to estimatesthe prevalence of
tuberculosis in children under-5 yrs. It is known that the true rate is
unlikelyto exceed 30%. To verify this figure within 5% of true value and
95% confidencehow manychildrenwillbeneeded?
Ans1. To estimate the prevalence of tuberculosis in children
in a community. (purpose)
Ans2. Cases of tuberculosis among under-5 children (per
100) reported within a year from the community is the
primary indicator of outcome. (principal outcome)
Ans3. 95 % confidence interval at p <0.05 (statistical method
to be used)
Ans4. Assumed figure is the standard in this case. ie 30%
children are suffering from tuberculosis (anticipated
result )
Ans5. 5% margin of error in anticipated value is acceptable.
(smallest difference practically useful)

Formula for sample size
For a survey design based on a simple random
sample, the sample size required can be calculated
according to the following formula:
n= z² x p(1-p)
m²
n = required sample size
z = confidence level at 95% (standard value of 1.96)
p = expected prevalence of tuberculosisin the community
m = margin of error at 5% (standard value of 0.05)

Estimating sample size
for disease occurrence
No
w
n =
?
n= 1.96² x 0.3(1- 0.3)
.05²
n = 3.8416 x .21
z =
1.96
p =
0.3
n = .8068
.0025
.0025
m =
0.05
n = 322.72 ~ 323
Design Effect: The survey is designed as a cluster sample (a
representative selection of villages), not a simple random
sample. To correct for the difference in design, the sample

size is multiplied by the design effect D=Var(CS)/ Var(SRS)
generally= 2
Therefore N = n x D = 323 x 2 = 646
children of age under 5 years

Estimating difference inproportions
Example:Drug delivery and discontinuance of treatment in
tuberculosis
Is checking of empty foils of combipacks at the time of
issue of next dose is helpful in continuity of treatment
in tuberculosis patients?
Ans1. To see if patients being checked for empty foils while issuing
next dose have lower rate of discontinuance during first three
months of treatment. (purpose)
Ans2. Stopping treatment within first three months the primary
indicator of
outcome. (principal outcome)
Ans3. Difference in percentage of discontinued treatment during
first three months for patients being asked to present empty
foils and standard practice . Z test for proportions will be
used at p <0.05 (statistical method to be used)
Ans4. Normal way of drug delivery (without asking for empty foils)
is the standard in this case. 10% patients of this drug delivery
stop the treatment and 90% continue) up to 3 months
(anticipated result )

Ans5. If empty foils are checked, only 5% patients are likely to
discontinue (and 95% will continue) within 3 months. This
should happen with 90% certainty. (smallest difference
practically useful)

Size of a Study contd..
Difference in proportions
If
p1 = Percentage of expected successes on
standard method (usual drug distribution)
p2 = Percentage of expected successes on new
method (checking empty foils)
 = Level of significance for statistical test
used (0.05)
1- = Degree of certainty for difference (p1-p2),
if present would be detected (usually 0.90)
n = Required number of subjects(TB patients)
in each study arm
p1 (100- p1 ) + p2 (100- p2 )
n = ----------------------------- x ƒ( )
(p1 - p2)2

Size of Study contd..
Minimum size : Drug delivery in TB
In drug delivery study study:
p1 = 90% (continuing if standard
drug delivery)
p2 = 95% (continuing if new drug
delivery)
 = 0.05 ( level of significance )
 = 0.10 (1-  = 0.9, hence  =0.1)
f( ) = 10.5 (from table)
n
9010955
(9590)2 10.5 578patientsoftuberculosisineachgroup

Size of Clinical trials contd...
Example : Vitamin D and Neonatal Hypocalcaemia
Supplementation of Vitamin D
to mothers for prevention
of neonatal hypocalcaemia
Ans1. Supplementation of vitamin D during pregnancy has
any role in neonatal hypocalcaemia. (purpose)
Ans2. Serum calcium level of infants after one week of birth
is the primary indicator of the treatment response.
(principal outcome)
Ans3.Difference in mean calcium level of placebo and
supplemented group. Two sample t test will be used
at p <0.05 (statistical method to be used)
Ans4. Placebo is the standard in this case. The mean serum
calcium level is assumed as 9.0+1.8 (result
anticipated with std. treatment)
Ans5. In vitamin D supplemented group the expected mean
serum calcium level would be 9.5+1.8. (min diff. =0.5)
This should happen with 90% certainty. (smallest
difference with clinical value)

Size of Clinical trials contd..
Statistical methods : difference of means
If
m1 = Mean of expected level on standard treatment
(placebo) m2 = Mean of expected level with new
treatment (intervention)
 = Standard deviation in response variable
 = Level of significance for statistical test used (0.05)
1- = Degree of certainty for difference (m1 - m2 ), if present
would be detected ( usually 0.90)
n = Required number of patients in each treatment group
2 2
n   f (, )
(m  m )2
1 2

Size of Clinical trialscontd.
Minimum size : Vitamin D trial.
In vitamin D trial:
m1 = 9.0mg/100ml ( infant’s serum calcium level one week after
birth
in placebo group)
m2 = 9.5mg/100ml ( infant’s serum calcium level one week after
birth
in supplemented group)
 = 1.8 mg per 100 ml
 = 0.05 ( level of significance )
 = 0.10 (1-  = 0.9, hence  =0.1)
ƒ( ) = 10.5( from table)
n
21.82



(9.09.5)2 10.5 273mothersineachgroup

Sample Size Determination in Health Research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sample Size Determination in Health Research

Similar to Sample Size Determination in Health Research (20)

More from Ramachandra Barik

More from Ramachandra Barik (20)

Recently uploaded

Recently uploaded (20)

Sample Size Determination in Health Research