INTRODUCTORY EPIDEMIOLOGICAL
CONCEPTS – SAMPLING, BIAS AND
ERROR

XNN001 Population nutrition and physical activity
assessment
Study design
Research question
Target population

Study design

Sampling frame
Data collection
tools

Data collection
methods
Generalisability
of findings

Sampling selection
Sampling – what is it?


Selection of a smaller
number of units from a
larger group



In research aim to enable
generalisation to a target
population
Why do we sample?
Not possible to study ALL people in a
population
Feasible and realistic financially to study
smaller subset of a population
Unethical if sample is larger than necessary
(overpowered)

1.

2.

3.

Aim to provide an accurate representation of
the target population






Allows for generalisation from sample to broader
population
Need to minimise sampling error and bias
EXTRAPOLATE or
MAKE INFERENCES
How big should a sample be?


Sample size calculations to determine required
size


Based on variables to be measured - expected
difference, expected response rates, cluster effect,
attrition etc



Small sample size



Larger sample size

 Less

likely that sample is representative of target
population
 Limited POWER to detect ‘effect’
 More

likely that sample is representative of target
population
 Increased POWER to detect ‘effect’
Sampling methodology



Probability
sampling
1.
2.
3.

4.
5.

simple random
systematic
stratified
cluster
multi-stage



Non-probability
1.

2.
3.
4.

convenience
quota
purposive
snowball
1. Simple Random Sampling




Subset of individuals chosen from a list of
individuals from the broader population
(sampling frame)
Each individual chosen at random
all subjects have equal chance of being selected
 Most likely to achieve sample representative of
population (least selection bias)
 May be difficult to achieve in practice
 Not ideal for special interest groups/ population
minorities

Simple Random Sampling
2. Systematic sampling


Units sampled at regular intervals
 Width

of intervals randomly determined
 inadequate sampling of rare individuals who
may be of interest
 chance that random dispersion is “unlucky” and
inadequate
 Researcher

pattern

must ensure sampling does not hide
Systematic sampling - example
3. Stratified sampling




Population divided into subgroups prior to
sampling
To ensure adequate numbers of subjects from
subgroups are included
 e.g.

male and female subgroups
 Then simple random sample the individuals
among male group and then female group
Target population – Brisbane households
Sampling frame – electoral roll
Sampling frame –
electoral roll MALES

Sampling frame –
electoral roll FEMALES

SAMPLE
4. Cluster sampling




Total population broken down into ‘groups’ or
‘clusters’
Number of clusters then randomly selected
from all eligible clusters
 All

individuals in each selected cluster become
potential subjects.
4. Cluster sampling


One-stage cluster sampling
Clusters are selected randomly
 All individuals within clusters are invited to participate
in the study




Two-stage cluster sampling
Clusters are selected randomly
 Lists of all elements within clusters are obtained random samples drawn from lists

Cluster sampling - example

Simple Random Sampling

Stage 1

Stage 2

All Schools
in Brisbane

School A – all students

School B – all students

Random sample

Random sample
5. multi-stage sampling


Complex form of cluster sampling
 Population



divided into clusters and sub-clusters

Used when selecting from very large
population
Nationwide retail chain
random selection of region
Region 1

Region 2

random selection of stores
Store 1

Store 2

Store 1

Store 2

Stratified sampling
Male

Female

Male

Female

Male

Female

Male

random selection
20

20

20

20

20

20

20 20

Female
Non-probability sampling


Sampling techniques that do not rely on random
selection
When sampling frame not able to be identified e.g.
visitors to a particular internet site
 When sampling populations are difficult to access
(e.g. drug users, street based sex workers).
 When very strict inclusion and exclusion criteria are
necessary (e.g. in pharmaceutical drug testing)

1. Convenience sampling



Units ‘selected’ based on ease of access
Volunteers
 Shoppers

in a supermarket
 Respondents to advertisements
 Clinic attendees


The sample usually is different from the target
population
 Cannot

generalise results to general population
2. Quota sample


Population divided into defined subgroups
 e.g.





males; females

Proportions of subgroups in population
identified
Convenience sample of each subgroup to
make up required numbers
3. Purposive sample


Deliberate selection of individuals by
researchers based on a predefined criteria INCLUSION & EXCULSION CRITERIA
 Often

used in pharmaceutical drug testing
 Also called judgmental sampling
4. Snowball sampling


Involves asking subjects to provide names of
others who may meet study criteria
 Useful

for sampling populations difficult to access
 Also called networking




drug users
street-based sex workers
underground networks
Snowball sampling
Measurement issues


Error- validity




when an estimate (eg, incidence, prevalence, mortality) or
association (RR, OR) deviates from ‘true’ situation in nature

May be introduced at any point during the
study:
 Study design (quality)
 sampling
Random error
 Measurement
 Analysis
Systematic bias
Random error


Fluctuations around a true value
Related to poor precision



Sources



 individual

biological variation (always present)
 sampling variation
 measurement variation (protocols and training)


Reduced by:
larger sample sizes
 standard protocols and equipment

Systematic bias





Any systematic error in the design, conduct or
analysis of a study that results in a mistaken
estimate of an exposure’s effect on the risk of a
disease
Due to causes other than random error
Problem of validity


internal and/or external validity
I. Selection bias




Arises when different criteria are used so the
study population does not represent the
population of interest
for example:
1.
2.
3.

4.

Referral Bias (Berkson’s Bias)
Surveillance Bias
Prevalence-Incidence Bias (Neyman’s Bias)
Response Bias

Attrition Bias

Participation Bias
Types of bias
Referral bias
 Occurs in case-control studies conducted in hospitals
 Causes a spurious association between the exposure and the
disease, because of the different probabilities of admission to
a hospital for those with/without a disease (or with/without the
exposure)
Surveillance bias
 For example:



When conducting a case-control study to examine the relationship
between oral contraceptive (OC) use and diabetes
Women taking OCs are likely to have more Dr visits, so diabetes is more
likely to be diagnosed in OC users than in non-OC users
3. Prevalence-incidence bias



Also known as Neyman’s bias
Usually occurs when prevalent cases are used to
investigate a disease-exposure association
 Prevalent

cases represent survivors, who may
be atypical with respect to exposure status
 Once a person is diagnosed with the disease,
they may change their exposure
Types of bias
Participation bias
People who participate in research studies are often
different to those who do not take part.
Demographic, socioeconomic, cultural, lifestyle, and medical
characteristics
 Self-selection bias (individual consent is essential in research,
except public available information)


Attrition bias
 Occurs when study participants withdraw before the
study is completed and is often differential
II. Information bias






Arises when inaccurate measurement or
misclassification of study variables occurs
Can affect exposure or outcome (or even
confounders)
Extent of bias depends on the particular
variable
 whether

non-differential or differential
misclassification
Non-differential info-bias


Error in measurement does not vary according to
other variables (cases vs controls; exposed vs
unexposed)



Underestimate of the true association



Any association that is observed is likely to be true
Differential info-bias




Systematic error (ie non-random)
May over-estimate or under-estimate the
actual association, depending upon the
situation.
Types of information bias
1. Recall Bias




cases and controls recall their exposures differently
It is human being’s nature to looking for reasons if
something went wrong
“If you seek, you will find.”

2. Detection Bias


the exposed group is monitored more closely

3. Interviewer/observer Bias



Not blinded
Not properly trained
Types of information bias
4. Reporting Bias


“Objectively”





Cases tend to have better information
Individuals who are part of a study may
behave differently (Hawthorne effect)

“Subjectively”




Reluctant to report: attitudes, beliefs, perception
Wish bias: subjects attempting to answer the
question of “why me?” and the disease is not their
fault (lifestyle), but others (work related exposure)
III. Confounding - definition
An association between a given exposure and outcome
is influenced by a third variable – confounding
factor.

To be a confounder:
1.
Be a risk factor for disease
2.
Be associated with the exposure
3.
Not a result of the exposure

Not be an intermediate between exposure and the
outcome (i.e must not lie on the causal pathway)
Validity






Do the study conclusions reflect the true
value/relationship?
External validity (generalisability): can the findings
be generalised to other similar samples or the
population-at-large?
Internal validity: are the results correct for the
particular group you have studied?
Reliability


Accuracy -- how close to the true population value is
your measurement value?




Assess accuracy by comparing to “gold standard”

Precision -- If you repeat your measurement/
sample selection/analysis on numerous occasions,
will you get consistent results?


Assess precision by inter-observer and intra-observer
comparisons

XNN001 Introductory epidemiological concepts - sampling, bias and error

  • 1.
    INTRODUCTORY EPIDEMIOLOGICAL CONCEPTS –SAMPLING, BIAS AND ERROR XNN001 Population nutrition and physical activity assessment
  • 2.
    Study design Research question Targetpopulation Study design Sampling frame Data collection tools Data collection methods Generalisability of findings Sampling selection
  • 3.
    Sampling – whatis it?  Selection of a smaller number of units from a larger group  In research aim to enable generalisation to a target population
  • 4.
    Why do wesample? Not possible to study ALL people in a population Feasible and realistic financially to study smaller subset of a population Unethical if sample is larger than necessary (overpowered) 1. 2. 3. Aim to provide an accurate representation of the target population    Allows for generalisation from sample to broader population Need to minimise sampling error and bias
  • 5.
  • 6.
    How big shoulda sample be?  Sample size calculations to determine required size  Based on variables to be measured - expected difference, expected response rates, cluster effect, attrition etc  Small sample size  Larger sample size  Less likely that sample is representative of target population  Limited POWER to detect ‘effect’  More likely that sample is representative of target population  Increased POWER to detect ‘effect’
  • 7.
  • 8.
    1. Simple RandomSampling   Subset of individuals chosen from a list of individuals from the broader population (sampling frame) Each individual chosen at random all subjects have equal chance of being selected  Most likely to achieve sample representative of population (least selection bias)  May be difficult to achieve in practice  Not ideal for special interest groups/ population minorities 
  • 9.
  • 10.
    2. Systematic sampling  Unitssampled at regular intervals  Width of intervals randomly determined  inadequate sampling of rare individuals who may be of interest  chance that random dispersion is “unlucky” and inadequate  Researcher pattern must ensure sampling does not hide
  • 11.
  • 13.
    3. Stratified sampling   Populationdivided into subgroups prior to sampling To ensure adequate numbers of subjects from subgroups are included  e.g. male and female subgroups  Then simple random sample the individuals among male group and then female group
  • 14.
    Target population –Brisbane households Sampling frame – electoral roll Sampling frame – electoral roll MALES Sampling frame – electoral roll FEMALES SAMPLE
  • 15.
    4. Cluster sampling   Totalpopulation broken down into ‘groups’ or ‘clusters’ Number of clusters then randomly selected from all eligible clusters  All individuals in each selected cluster become potential subjects.
  • 16.
    4. Cluster sampling  One-stagecluster sampling Clusters are selected randomly  All individuals within clusters are invited to participate in the study   Two-stage cluster sampling Clusters are selected randomly  Lists of all elements within clusters are obtained random samples drawn from lists 
  • 17.
    Cluster sampling -example Simple Random Sampling Stage 1 Stage 2 All Schools in Brisbane School A – all students School B – all students Random sample Random sample
  • 18.
    5. multi-stage sampling  Complexform of cluster sampling  Population  divided into clusters and sub-clusters Used when selecting from very large population
  • 19.
    Nationwide retail chain randomselection of region Region 1 Region 2 random selection of stores Store 1 Store 2 Store 1 Store 2 Stratified sampling Male Female Male Female Male Female Male random selection 20 20 20 20 20 20 20 20 Female
  • 20.
    Non-probability sampling  Sampling techniquesthat do not rely on random selection When sampling frame not able to be identified e.g. visitors to a particular internet site  When sampling populations are difficult to access (e.g. drug users, street based sex workers).  When very strict inclusion and exclusion criteria are necessary (e.g. in pharmaceutical drug testing) 
  • 21.
    1. Convenience sampling   Units‘selected’ based on ease of access Volunteers  Shoppers in a supermarket  Respondents to advertisements  Clinic attendees  The sample usually is different from the target population  Cannot generalise results to general population
  • 22.
    2. Quota sample  Populationdivided into defined subgroups  e.g.   males; females Proportions of subgroups in population identified Convenience sample of each subgroup to make up required numbers
  • 23.
    3. Purposive sample  Deliberateselection of individuals by researchers based on a predefined criteria INCLUSION & EXCULSION CRITERIA  Often used in pharmaceutical drug testing  Also called judgmental sampling
  • 24.
    4. Snowball sampling  Involvesasking subjects to provide names of others who may meet study criteria  Useful for sampling populations difficult to access  Also called networking    drug users street-based sex workers underground networks
  • 25.
  • 26.
    Measurement issues  Error- validity   whenan estimate (eg, incidence, prevalence, mortality) or association (RR, OR) deviates from ‘true’ situation in nature May be introduced at any point during the study:  Study design (quality)  sampling Random error  Measurement  Analysis Systematic bias
  • 27.
    Random error  Fluctuations arounda true value Related to poor precision  Sources   individual biological variation (always present)  sampling variation  measurement variation (protocols and training)  Reduced by: larger sample sizes  standard protocols and equipment 
  • 28.
    Systematic bias    Any systematicerror in the design, conduct or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of a disease Due to causes other than random error Problem of validity  internal and/or external validity
  • 29.
    I. Selection bias   Ariseswhen different criteria are used so the study population does not represent the population of interest for example: 1. 2. 3. 4. Referral Bias (Berkson’s Bias) Surveillance Bias Prevalence-Incidence Bias (Neyman’s Bias) Response Bias  Attrition Bias  Participation Bias
  • 30.
    Types of bias Referralbias  Occurs in case-control studies conducted in hospitals  Causes a spurious association between the exposure and the disease, because of the different probabilities of admission to a hospital for those with/without a disease (or with/without the exposure) Surveillance bias  For example:   When conducting a case-control study to examine the relationship between oral contraceptive (OC) use and diabetes Women taking OCs are likely to have more Dr visits, so diabetes is more likely to be diagnosed in OC users than in non-OC users
  • 31.
    3. Prevalence-incidence bias   Alsoknown as Neyman’s bias Usually occurs when prevalent cases are used to investigate a disease-exposure association  Prevalent cases represent survivors, who may be atypical with respect to exposure status  Once a person is diagnosed with the disease, they may change their exposure
  • 32.
    Types of bias Participationbias People who participate in research studies are often different to those who do not take part. Demographic, socioeconomic, cultural, lifestyle, and medical characteristics  Self-selection bias (individual consent is essential in research, except public available information)  Attrition bias  Occurs when study participants withdraw before the study is completed and is often differential
  • 33.
    II. Information bias    Ariseswhen inaccurate measurement or misclassification of study variables occurs Can affect exposure or outcome (or even confounders) Extent of bias depends on the particular variable  whether non-differential or differential misclassification
  • 34.
    Non-differential info-bias  Error inmeasurement does not vary according to other variables (cases vs controls; exposed vs unexposed)  Underestimate of the true association  Any association that is observed is likely to be true
  • 35.
    Differential info-bias   Systematic error(ie non-random) May over-estimate or under-estimate the actual association, depending upon the situation.
  • 36.
    Types of informationbias 1. Recall Bias    cases and controls recall their exposures differently It is human being’s nature to looking for reasons if something went wrong “If you seek, you will find.” 2. Detection Bias  the exposed group is monitored more closely 3. Interviewer/observer Bias   Not blinded Not properly trained
  • 37.
    Types of informationbias 4. Reporting Bias  “Objectively”    Cases tend to have better information Individuals who are part of a study may behave differently (Hawthorne effect) “Subjectively”   Reluctant to report: attitudes, beliefs, perception Wish bias: subjects attempting to answer the question of “why me?” and the disease is not their fault (lifestyle), but others (work related exposure)
  • 38.
    III. Confounding -definition An association between a given exposure and outcome is influenced by a third variable – confounding factor. To be a confounder: 1. Be a risk factor for disease 2. Be associated with the exposure 3. Not a result of the exposure  Not be an intermediate between exposure and the outcome (i.e must not lie on the causal pathway)
  • 39.
    Validity    Do the studyconclusions reflect the true value/relationship? External validity (generalisability): can the findings be generalised to other similar samples or the population-at-large? Internal validity: are the results correct for the particular group you have studied?
  • 40.
    Reliability  Accuracy -- howclose to the true population value is your measurement value?   Assess accuracy by comparing to “gold standard” Precision -- If you repeat your measurement/ sample selection/analysis on numerous occasions, will you get consistent results?  Assess precision by inter-observer and intra-observer comparisons