Bias results from systematic errors in study design or data collection. This document discusses selection bias and information bias, including examples. Selection bias occurs when individuals have different probabilities of being included in the study based on characteristics like exposure or outcome. Information bias results from imperfect definition or collection of exposure and outcome data, potentially causing misclassification. Differential misclassification differs between groups and inflates or deflates associations, while non-differential misclassification occurs equally and biases toward the null. Careful study design and data collection can help prevent and control for bias.
2. Bias-definition
Result of systematic error in the design or conduct of the study
Systematic error results from flaws in either
• Method of selection of study participants (study design)
• In the procedure of collecting information on exposure or outcome
Compare it Random error results from using a sample to give population estimates (random
variability)
• measurement,
• biological variability
• Sampling/ sample size
A study without systematic error is considered unbiased/valid. On
average its results will tend to be correct
3. Main types of Bias
•Selection bias
•Information bias (observation bias)
NB: there are many types of bias described in epidemiological
literature
4. Selection bias
• Present when individuals have different probabilities of being included in the study
sample according to relevant study characteristics (exposure/outcome of interest)
• Arises when one exposure group have a higher probability of having the study
outcome detected, due to increased surveillance, screening or testing of the
outcome itself, or an associated symptom
• Examples:
1. post-menopausal exposure to estrogen is associated with an increased risk of
bleeding that can trigger screening for endometrial cancers, leading to a higher
probability of early stage endometrial cancers being detected. Any association
between estrogen exposure and endometrial cancer potentially overestimates risk,
because unexposed patients with sub-clinical cancers would have a lower probability
of their cancer being diagnosed or recorded.
5. Selection bias
• Examples:
2. Women who use oral-contraceptives have more visits to the hospital and are more likely
to also be screened for diabetes. This study might include more exposed cases (users of OC
and diabetic). A non existing association might be observed
3. A study on a relationship between SES and cervical cancer, cases are patients with
cervical cancer at a referral hospital while controls were obtained from the community
during daytime. Most of the controls were likely to be of low SES (available at home during
the working hours).
4. The study aims to explore whether smoking increases the risk of experiencing a stroke.
Cases are patients admitted for stroke, controls are patients admitted for everything else.
Because smokers are also at higher risk for other diseases that lead to hospitalizations than
non-smokers (lung cancer, COPD, etc). Smoking is more common among hospitalized non-
cases than among non-cases in the source population.
6. Information bias
Results from either
• Imperfect definition of study variables
• Imperfect data collection procedures
Data collection concerning outcomes or exposure is fault
May result in misclassification of exposure and/or outcome
status
7. Examples of information bias
Exposure identification bias
•Recall bias
•Observer/interviewer bias
Outcome identification bias
•Observer bias
•Respondent bias
8. Exposure identification bias
Recall bias
• Recall bias resulting from inaccurate recall of past exposure is most often cited
type of exposure identification bias
• Errors in recall of these past exposures result in misclassification of exposure
status, thus biasing the results of the study.
e.g. mothers of children with birth defects will remember the drugs
they took during pregnancy better than mothers of children with no
birth defect
9. Preventing recall bias
•Verification of exposure information obtained
from participants
•Uses of disease controls in case-control studies
•Use of objective markers of exposure
•Use of cohort study design (case-control within
cohort)
10. Exposure identification bias
Observer/interviewer bias
• When data collection in a case-control study is not
masked with regard to the disease status of study
participants, observer/interviewer bias in ascertaining
exposure may occur
- Observer/Interviewer bias may be a consequence of
• trying to “clarify” questions when such clarifications are not
part of the study protocol
• failing to follow either the protocol-determined probing or
skipping rules of questionnaires
11. Preventing observer/interviewer bias
• Blind the interviewers with regards to the outcome status
• Careful design and conduct of quality assurance and
control activities
• Development of detailed manual of operations
• Training of staff
• Standardization of data collection procedures
• Reliability and validity sub-studies in samples
• Reliability challenging due to intra-participant variability
• Validity using independent sources can assess accuracy of data
12. Outcome identification bias
Observer bias
Can be affected by knowledge of the exposure status of the study
participants or clinical criteria
E.g.
• the outcome is alcoholic cirrhosis, and you know the participant is
alcoholic
• The effect of patient’s race on the diagnosis of hypertensive end-stage-
renal disease and black patients were more likely to be diagnosed with
the disease than white patients
14. Outcome identification bias
Respondent bias
Outcome ascertainment bias may occur during follow-up of a
cohort when information on the outcome is obtained by
participant response.
E.g.
• study participants consistently give the answer that the
investigator wants to hear, then information bias would
occur
15. Preventing respondent bias
• Obtain detailed information on presence vs. absence of
event as well as related symptoms
• E.g. not just presence of severe headache, but also presence of
nausea, and fatigue accompanying headache
• Use standardized questionnaires to measure event when
possible
• E.g., validated questionnaire
•Whenever possible, information given by a participant should be
confirmed by more objective means (hospital chart, medicines taken,
medical records)
16. Consequence of information bias
Misclassification
•Differential misclassification
•Non-differential misclassification
17. Differential misclassification
Differential misclassification occurs when the level of
misclassification differs between the groups being compared.
• Sensitivity and/or specificity of exposure ascertainment differs
between cases and controls
E.g. Case-control study
• Mothers of newborns with birth defects who better recall
past events compared to mothers with healthy babies
21. Non-differential misclassification
• Non-differential misclassification occurs when the
level of misclassification does not differ between the
groups being compared
• Sensitivity and specificity of exposure ascertainment are
the same in cases and controls (errors in exposure or
outcome status occur with approximate equal frequency
in groups being compared).
22. Examples of Non-Differential Misclassification
• Both groups have the same difficulty of remembering exposures in
the past (case-control study of physical activity (PA) and stroke).
• An inaccurately calibrated scale creating systematic error. (not
showing 0 pound when nobody stands on it).
• A survey of PA including vigorous, moderate, and leisure activities
where participants overestimate their activity – but this happens
regardless of case-control status.
24. Example of non-differential misclassification
True Distribution
Cases Controls Total
Exposed 100 50 150
Nonexposed 50 50 100
150 100 250
OR = ad/bc = 2.0
Non-differential misclassification - Overestimate exposure in 10
cases, 10 controls
Cases Controls Total
Exposed 110 60 170
Nonexposed 40 40 80
150 100 250
OR = ad/bc = 1.8
*** Bias the association towards null value of 1
25. Prevention and control of bias
• Ensuring that the study design is appropriate
for addressing the study hypotheses –
selection of controls in case control studies
• Establishing and carefully monitor the
procedures of data collection (valid and
reliable)
26. Summary
• Information bias
• Exposure identification bias
• Recall bias
• Interviewer bias
• Outcome identification bias
• Observer bias
• Respondent bias
• Misclassification :
• Non-differential misclassification
• Differential misclassification
There is always risk of error in any
study!!
We have to be aware of it, try to prevent
it, and minimize its impact on the
findings.
Editor's Notes
.
.
Reliability: consistency of a measure: whether the results can be reproduced under the same condition
Validity: accuracy of a measure: whether the results really do represent what they are supposed to measure
Stratifying be certainty: certain, somehow certain, not certain
Overestimated the association
Think of recall bias among mothers with babies with birth defect
Underestimated the association
In non-differential misclassification, this mostly produces a bias toward the null hypothesis