Case control study - Part 1

Dr. Rizwan S A, M.D.,
1
Outline of presentation
•
•
•
•
•

Some history
Planning and conducting a study
Matching
Sources of bias
Applications

2
A scenario
• Assume you are the senior health advisor to the GOI
• Recently, several isolated reports of neurological
illness following DPT vaccination have come up in
the country

• Media is adding fuel to the fire
• Parents and doctors are reluctant to vaccinate and the
vaccination rates are going down the drain!

• What will you do?
3
Some history
• 1788 - Early concepts found in works of Parisian
physician PCA Louis
• 1843 - First explicit description by William Augustus
Guy (occupational exposure and pulmonary disease)

• 1862 - Baker, case control comparisons of marriage and
fertility in breast cancer patients
• 1926 - Lane Claypon’s Breast cancer study
• 1950 - Levin et al ; Wynder & Graham ; Schrek et al.
and Doll & Hill; (smoking and lung cancer)
4
Planning and conducting
•
•
•
•
•

Research question
Definition of case
Definition of control
Selecting the cases & controls
Research instrument

5
Case Control Studies

Cohort Studies

Proceeds from effect to cause

Proceeds from cause to effect

Starts with the disease

Starts with people exposed to the risk factor
or suspected cause

Tests whether the suspected cause occurs
more frequently in those with disease than
those without disease

Tests whether disease occurs more
frequently in those exposed than in those not
exposed

Usually the 1st approach to the testing of
hypothesis, but also useful for exploratory
studies

Reserved for the testing of precisely
formulated hypothesis

Involves fewer study subjects

Involves larger number of subjects

Yields results relatively quickly

Long follow-up, delayed results

Suitable for study of rare diseases

Inappropriate when disease or exposure
under investigation is rare

Generally, yields only estimate of relative
risk (Odds ratio)

Yields incidence rates, relative risk,
attributable risk

Cannot yield information about disease other
than that under study

Can give information about more than one
disease outcome

Relatively inexpensive

Expensive

6
Research question
•
•
•
•

Begin with broad and ambitious question
Later, narrow and more precise
Considerations of time, cost
Eg.
1. Does tobacco cause cancer?
2. Does smoking tobacco cause bronchogenic
CA?
3. Do persons having broncho. CA have h/o
greater exposure to tobacco smoking as
compared to persons w/o the disease?

• Poor questions can spoil the entire study
7
Definition of case - 1
• Eligibility
• Definition of disease

8
Definition of case - 2
• Eligibility (2 components)
– Objective criteria for diagnosis
– Stating the eligibility criteria

• Eligibility criteria – should reflect
‘potentially at risk for exposure’ both for
the case & control
– Eg. recent OCP and MI; (sterilized, postmenopausal, CIs to OCPs)
9
Definition of case - 3
• Cases sh. have reasonable possibility of
having had their disease induced by the
exposure
– Eg. OCP and Thromboembolism – sh. exclude
postpartum and postoperative cases (Why?)

• Incident cases
–
–
–
–

Will be more uniform
Recall more accurate
More certain that exposure preceded the disease
Berkson and Neyman
10
Definition of case - 4
• Definition of disease
– Objective criteria to reliably diagnose the
disease
– Eg. Rh. Arthritis (several diagnostic criteria causing confusion)
– To reduce misclassification

• Sources of cases
– Hospital lists, special reporting systems like
cancer registries, disease surveillance, death
certificates
11
Definition of control - 1
• Eligibility criteria
– Sh. be similar to the cases with regard to potential
for exposure
– Problems arise in hospital based controls
• We want to select controls that are likely to reflect the
exposure rate in the population
• We sh. exclude those hospital controls whose condition
is associated with the exposure (Eg. Aspirin and MI;
controls with chronic pain/peptic ulcer)

– One solution – include controls with a variety of
diagnoses not associated with exposure
12
Definition of control - 2
• Sources of controls
– Hospital based
– Dead controls
– Controls with similar diseases
– Neighborhood controls
• Population based
• Best friend control/ Sibling control

13
Definition of control - 3
• Hospital based
– Referral pattern is similar to cases (form the same study
base)
– Similar quality of information
– Convenience
– May not be representative of the population

• Dead controls
– In a study where the case is death from a particular cause
– Information obtained from ‘proxy’ informants
– But dead controls differ from living controls
• Controls with similar diseases
– Cancer (of different type) controls for cancer cases
– Minimize recall bias, interviewer bias, examine specificity
of exposure

14
Definition of control - 4
• Neighborhood controls
– Best friend control/Sibling control
• Inexpensive, easy and quick
• Ability to match on a number of variables that
are associated with neighborhood/friendship
• May introduce selection bias (‘smoking’ cases
nominate ‘smoking’ friends) related to the
exposure and overmatching

– Population based
• Truly representative sample
• From tax lists, voting lists, telephone directories
15
Definition of control - 5
Source

Advantage

Disadvantage

Hospital based

Easily identified.
Available for interview.
More willing to cooperate.
Tend to give complete and
accurate information
( recall bias).

Not typical of general population.
Possess more risk factors for disease.
Some diseases may share risk factors
with disease under study.
Berkesonian bias

Population based

Most representative of the
general population.
Generally healthy.

Time, money, energy.
Opportunity of exposure may not be
same as that of cases. (location, occup.)

Neighbourhood
controls/ Telephone
exchange random
dialing

Controls and cases similar
in residence.
Easier than sampling the
population.

Non cooperation.
Not representative of general population.

Best friend control/
Sibling control

Accessible, Cooperative.
Similar to cases in most
aspects.

Overmatching.
16
Selection process - 1

Total population
Reference
population

cases

controls

17
Selection process - 2
• Cases
– In practice; we use all eligible cases within a
defined time period
• From disease registry or hospital
• We are implicitly sampling from a subset of total
population of cases

• Controls
– Sampling is most pertinent here because in
rare diseases, the no. of controls greatly
exceed no. of cases
18
Selection of cases - 1
• Representativeness
– Ideally, cases sh. be a random sample of all cases of
interest in the source population (e.g. from vital
data, registry data)
– But commonly they are a selection of available cases
from a medical care facility. (e.g. from hospitals,
clinics)

• Method of Selection
– Selection may be from incident or prevalent cases
– Incident cases are those derived from ongoing
ascertainment of cases over time
– Prevalent cases are derived from a cross-sectional
survey
19
Selection of cases - 2
• Incident cases are more optimal
• These should be all newly diagnosed cases over a
given period of time in a defined population.
(However we are excluding patients who died
before diagnosis)
• Prevalent cases do not include patients with a
short course of disease (patients who recovered
early and those who died will not be included)

• Can be partly overcome by including deceased
cases as well as those alive
20
Selection of cases - 3
• Validity
is
more
important
than
generalizability i.e. the need to establish an
etiologic relationship is more important than
to generalise results to the population
• Eg.
– In a study on breast cancer – we can include all
cases or we can include only premenopausal
women with lobular cancer
• If we take the later group as cases; we can elicit the
etiology better

– Studies done in nurses for OCP use
21
Selection of controls - 1
• The four principals of Wacholder
1. The study base

2. De-confounding
3. Comparable accuracy
4. Efficiency
22
Selection of controls - 2
• Should the controls be similar to the cases
in all respects other than having the
disease? i.e. comparable

• Should the controls be representative of
all non-diseased people in the population
from which the cases are selected? i.e.
representative
23
Selection of controls - 3
• Representativeness
– Sh. be representative of the general population
in terms of probability of exposure to the risk
factor

• Comparability
– Sh. also have had the same opportunity to be
exposed as the cases have

• Not that both cases and controls are equally
exposed; but only that they have had the
same opportunity for exposure.
24
Selection of controls - 4
• Usually, cases are not a random sample of
all cases in the population. So, the
controls must be selected in the same way
(and with the same biases) as the cases.
• If follows from the above, that a pool of
potential controls must be defined. This is
a universe of people from whom controls
may be selected (study base)
25
Selection of controls - 5
• The study base is composed of a population
at risk of exposure over a period
• Cases emerge within a study base. Controls
should also emerge from the same study
base, except that they are not cases.
• Eg. If cases are selected exclusively from
hospitalized patients, controls must also be
selected from hospitalized patients.
26
Selection of controls - 6
• Comparability is more important than
representativeness in the selection of
controls

• The control should resemble the case in
all respects except for the presence of
disease

27
Selection of controls - 7
• Number of controls
– Large study; equal numbers
– Small study; multiple controls

• Use of multiple controls
– Controls of same type
– Multiple controls of different types
• Hospital and neighborhood controls
• e.g. case - children with brain tumor, controlchildren with other cancer, normal children
28
Selection of controls - 8
Children with
brain tumors

Children with
other cancers

Children
without cancer

Exposure to
radiation

Radiation
causes cancers

Radiation
causes brain
cancers only

Multiple controls of different types are valuable for exploring alternate
hypothesis & for taking into account possible potential recall bias.
29
Sampling for cases/controls - 1
• Frame – list of all potentially eligible cases and
controls in the target population (a subset of the
general pop. both at risk of exposure and disease
development)
• The frame sh. not be biased in any manner, else the
sample will also be biased even if random
• Types of sampling
•
•
•
•

SRS
Systematic
Stratified
Matched

• The objective is to avoid bias in selection, each
case or control has equal chance of being
selected
30
Sampling for cases/controls - 2
• If we are using all incident cases occurring in a
defined area and time period, then controls selected at
random from the gen. pop. is the best choice (sound
basis for calculating RR, AR, etiologic fraction)
• If cases are selected from hospital(s), it is not
necessary that population controls are the only best
choice, a valid control series from hospital can be
valid
• However, hosp. controls often leave room for doubt
about validity of comparison (cost and practicality)
31
Sampling for cases/controls - 3
• Random digit dialing
– Prerequisite; extensive telephone coverage
– Either screen for potential controls/telephone interviews

• Method
–
–
–
–

All area codes and prefix numbers are obtained
Add all possible two digit numbers
The first 8 digits – PSU
Select a PSU at random – if response obtained then
retain PSU
– Then the last two digits are randomly selected and
continued until required sample is reached
– The no. of PSUs and total houses depend on design
32
Sampling for cases/controls - 4
• Examples
– Artificial sweeteners and bladder cancer
• Cases; 21-84 years, newly diagnosed bladder cancer in 10
designated counties in metropolitan areas
• Controls; age-sex stratified random sample of the general
populations in the ten counties frequency matched at 2:1
ratio

– Oral contraception and congenital malformations
• Cases of malformation from all newborns and stillborns
delivered at five major hospitals bet 1974-76
• Controls; all unaffected newborns in the 5 five hospitals,
sampling days were rotated to represent all 7 days
33
Matching - 1
• Matching is defined as the process of selecting
controls so that they are similar to cases in
certain characteristics such as age, sex, race,
socioeconomic status and occupation
• What is post-matching?
– Pairing controls to cases from an unmatched data during
analysis

• We often want a constant case control ratio, but
sometimes matching is incomplete so that we end up
with a variable ratio

34
Matching - 2
• Objective – to eliminate biased comparison between
cases and controls
• Two step process
1. The matched design
2. The matched analysis

•
•

One immediate effect of matching is the balance
between no. of cases and controls
Sometimes we can deliberately match on a factor
which comes in the casual path to confirm or
refute its role. (Eg. Smoking and MI, matched on
cholesterol)
35
Matching - 3
• What variables to match?
– Factors which are independent risk factors for the disease
– Assoc. with the exposure but non-causally
– May not be directly a risk factor, but may be assoc. with
other casual factors excluding the study exposure

• Similar to something?

36
Matching - 4
• Situations to match or not?
• Casual
Non-causal

37
Matching - 5
• Examples
• 1. E = alcohol

F = smoking

D = lung CA

– Implication if not matched?

• 2. E = OCP

F = smoking

D = MI

– Implication if not matched?

• 3. E = blood grp O F = age, sex

D = thrombosis

– Implication if matched?

• 2. E = OCP

F = prescribing physician

D = MI

– Implication if not matched or matched?
38
Matching - 6
• In summary, the decision to match or not depends
on the residual association of the factor with
disease and exposure after controlling other
variables
• Overmatching
– Reduces validity or statistical efficiency
– Two general meanings
• Unmatched analysis in matched studies
• Matching for unnecessary variables

– If one matches on a factor that is associated with
exposure but not the disease
• Paired analysis may correctly estimate odds ratio but the variance
will be more compared to an unmatched study of the sample
(overmatching increases the frequency of exposure concordant
pairs which are discarded in paired analysis)
39
Matching - 7
• If one matches a factor that is casually or non-causally
assoc. with disease but not exposure then OR will be
biased towards unity
• If one matches a factor which is assoc. with disease but
not exposure then OR will be correctly estimated
whether or not pairing is retained or not
– Paired analysis will be less efficient than unpaired one

• Matching on highly correlated variable is also
unnecessary
• Finally, matching sh. be done for factors which have
strongest relationship to the disease and are least
correlated
40
Matching - 8
• Alternatives to matching
– At the sampling phase
• Stratified sampling
• Frequency matching

– At analysis phase
• Post-stratification
• Regression analysis

• Stratified sampling
• Pre-determined number of cases and controls in each
subgroup created by the cross-classification
• Eg. Age (4 groups), sex (2), race (4 groups)
– Total 32 subgroups
41
Matching - 9
• Frequency matching
• Controls being taken from the corresponding subgroups in
proportion to the no. of cases
–
–

•

Eg. If 30% of cases are males of Hindu religion in 60-65 years then we
take 30% of similar controls
More practical than stratified sampling but it requires one to continually
update on the distribution of accumulating cases to maintain a fixed
case-control ratio

Post-stratification
•
•
•

Stratify the subgroups and analyze
Very flexible in that variables need not be pre-specified
Limitation - the number of variables that can be stratified due to
lack of numbers

• Regression analysis
– Most useful when the number of variables/subgroups increase

42
Matching - 10
• Effectiveness of matching
– Removal of bias
– Reduction of variance
• Matched design only gives a modest increase in efficiency
• Greatest improvement is when there is strong assoc. between
disease and the confounder
• Also efficient when only a small proportion of the target
population is exposed to the study factor

• The added cost and complexity of matching should be
weighed against any expected gains in precision

43
Matching - 11
• Advantages
– Cases and controls will be comparable to the matched
variables
– Provides the best means to investigate a very specific
hypothesis

• Disadvantages
– One can no longer study the matched variable in
relation to the risk of disease
– Increase in cost, time and labor
– A certain fraction of cases are discarded as a result of
failure to find a matching control
44
Matching - 12
• Summary
– Unless one has very good reason to match, one
is better off avoiding it
– Frequency matching within rather broad
categories of the matching variables will
suffice for most studies

45
Sources of bias - 1
• Bias – systematic error in the design, conduct, or analysis of a study
that results in a mistaken estimate of the risk measure

1. Ascertainment and selection bias
a)
b)
c)
d)
e)
f)
g)
h)

Surveillance
Diagnosis
Referral
Selection
Non-response
Length of stay
Survival
Admission diagnoses

2. Bias in estimation of exposure
a)
b)
c)
d)

Recall
Interviewer
Prevarication
Improper analysis

3. Misclassification
4. Other sources

46
Sources of bias - 2
1. Ascertainment and selection bias
- Not peculiar to case-control, can occur in cohort studies also

a) Differential Surveillance
– In asymptomatic/mild diseases , cases are more likely to be
detected in persons who are closely examined
– Eg. OPC and endometrial cancer/phlebitis
• Women taking OCPs were more thoroughly evaluated
• Based on preliminary reports of OCP use and phlebitis, clinicians
started looking for phlebitis in such exposed patients

– Exposed cases would have a greater likelihood of being
diagnosed as compared to unexposed cases
– This bias can be checked by doing a stratified analysis in
subgroups having equal surveillance (based on some index
of medical care) or restrict the study to time prior to
47
publication of such finding
Sources of bias - 3
1. Ascertainment and selection bias

b) Diagnosis
• In conditions like cervical dysplasia, knowledge of
exposure may alter the assessment
• This is most likely to occur in cases of uncertain
diagnosis

c) Differential Referral
•

OR’ = bOR; b = (s1s4)/(s2s3);
–

–
–

Where s1, s2, s3, s4 are the proportions of exposed and
unexposed cases and controls resp.
A biased selection of cases will be compensated by
biased selection of controls also
The probability of selecting exposed case = unexposed
case, and likewise for control
48
Sources of bias - 4
1. Ascertainment and selection bias

c) Differential Referral (cont.)
•

Eg.

•

A study of Alcohol and kidney failure, and income is
assoc. with alcohol intake
A Hospital only admits wealthy patients, so cases of
kidney failure in this hospital will be more exposed to
alcohol than patients in the gen. pop.
But if patients with other diseases also have similar
income characters and they were taken as controls,
bias won’t occur
If controls are taken form gen. pop. then we have to
match/stratify income to eliminate income as a source
of selection bias
49

•

•

•
Sources of bias - 5
1. Ascertainment and selection bias

d) Selection
•

•

Eg. Interviewer ‘keying’ on cases who are exposed (one
particular nurse was searching out all the cases of ectopic
pregnancy with IUD usage)
To avoid this, we must precisely and in advance the
methods by which cases and controls are selected, carefully
train staff, quality control

e) Non-Response
•

•

a worst case analysis taking all non-responding cases as
unexposed and all non-responding controls as exposed will
show if the non-response is likely to bias the estimates
if the exposure rates were equal between responders and
non-responders, there will be no bias
50
Sources of bias - 6
1. Ascertainment and selection bias

f) Length of stay
• In hospital study – incident cases sh. be selected rather
than prevalent cases otherwise,
– Patients who stay longer will have more probability of
being selected
– Cases of short duration would be under represented

• We check this by stratifying the analysis on the basis of
the duration b/w admission and selection

g) Survival
• In a situation where disease accompanied by mortality
is studied only in survivors
• Eg. A study in survivors of MI may reveal factors that
are assoc. with surviving an MI rather than sustaining
one
• Unless one can justify that exposure is not related to
duration/survival one sh. take only incident cases
• This bias can be checked by stratifying date of onset 51
Sources of bias - 7
1. Ascertainment and selection bias

h) Admission diagnoses
• Eg. In hospital based study – assoc. b/w smoking
and MI, if controls are lung cancer patients; this
will underestimate the effect
• To avoid this bias we must select controls with a
variety of diseases which are believed to be
unrelated with study exposure (neither + nor -)

2. Bias in the estimation of exposure
a) Recall
•

Eg. A mother with malformed baby will try with
more care and intensity to recall a pelvic X-ray
compared to women with normal baby
52
Sources of bias - 8
2. Bias in the estimation of exposure
a) Recall (cont.)
• Sometimes, the disease itself affects memory (dementia)
• This bias can be reduced by using controls with another disease
who will also keep thinking of reasons for their disease
• Independent verification of h/o exposure can be sought

b) Interviewer
• Interviewer may probe cases more intensely for histories of
exposure than in controls if they know the hypothesis
• Reduced by training staff, keeping staff ignorant of hypothesis
(ideal but unobtainable), keeping interview time constant

c) Prevarication
• Subjects may have ulterior motives for deliberately
overestimating or underestimating exposure
• Eg. A worker who may receive disability pay may exaggerate his
exposure; if it means loss of job, he may minimize it
• May be overcome by several independent raters
53
Sources of bias - 9
2. Bias in the estimation of exposure
d) Improper analysis
• Unmatched analysis for a matched study

3. Misclassification
– The disease/exposure status classification may be erroneous
– Some controls may actually have the study disease but this is
very improbable with rare diseases
– The most likely source of misclassification will occur in the
determination of exposure
– Any measure to reduce misclassification sh. addressed at the
design stage, a pilot study will reveal many errors

4. Other sources of error
– Insufficient sample size, errors of interpretation, not accounting
for effect of extraneous variables
54
Sources of bias - 10
4. Other sources of error
– Cases and controls sh. be similar with respect to factors that
might have affected both the development of disease and the
opportunity for past exposure
– For eg. Medical conditions like HTN, DM preclude the use of
OCPs, thus users of these would inherently be at a lower risk
– An agent found in assoc. with study disease was prescribed due
to an early manifestation of the disease
– For eg. Estrogens prescribed for irregular bleeding that was the
first symptom of undetected endometrial cancer. If this was the
case then later diagnosis of the cancer would find an apparent
assoc. with estrogen usage.

55
Sources of bias - 11
Summary
– Before starting a study, one should list the
likely sources of bias and plan the
investigation and analyses so as to
prevent/minimize them

56
Specific limitations of Case control study
• Is not useful to study weak associations
(OR < 1.5)
• Non-participation rates are freq. low and
differential for cases and controls
• Differential recall bias

57
Applications of Case control study
1. Vaccine effectiveness
2. Evaluation of treatment and program
efficacy
3. Evaluation of screening programs
4. Outbreak investigations
5. Demography
6. Genetic epidemiology
7. Occupational epidemiology
58
Pertussis vacc. in UK - 1
Year

Event(s)

1906

Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in
artificial media

1912-14

Pert. vaccine used by many researchers

Next few Many versions of vaccine developed
years
1942

Several local authorities in UK start vaccine

1947-48

First published reports appear of irreversible brain damage after wholecell pertussis vaccine

1957

85,000 cases of pertussis reported
Vaccination magnified to national scale

1975

Cases came down to 8,900
Pertussis incidence peaks every 4 years
The peaks became smaller and smaller, the smallest was in 1974-75

The next peak at 1978 should have been the smallest, but was it?
59
Pertussis vacc. in UK - 2

60
Pertussis vacc. in UK - 3
Year

Event(s)

1974-75

Adverse publicity by media about the side effects of pert.
vaccine
Parents and doctors hesitated to give vaccine

1976-79

National Childhood Encephalopathy Study (NCES)
commissioned by the Dept. of Health and Social Security

1974

Vaccine acceptance rate came down (from 78% in 1971) to
37%

1977-79

An epidemic of pertussis occurs in Great Britain. > 100,000
cases and 36 deaths

1979

Vaccine Damage Payment Act passed in Great Britain. The
act provides a mechanism for government compensation to
those with vaccine-associated injuries

61
Pertussis vacc. in UK - 4
• Findings of the NCES study;

• Attributable risk –
– Serious neurological disorders = 1 in 1,10,000 injs.
– Persistent neurological sequelae = 1 in 3,10,000 injs.
62
Pertussis vacc. in UK - 5
Year

Event(s)

1982

British Child Health and Education Study
Long-term neurologic problems are not found to be related to
pertussis immunizations.

1983

Communicable Diseases Surveillance Centre Study, or North
West Thames Study, followed a large group of children after
pertussis vaccination, finds no convincing evidence relating DPT
vaccine to neurologic damage.

1988

Loveday judgment in Great Britain's High Court rules that there is
insufficient evidence to demonstrate that pertussis vaccine can
cause permanent brain damage.
Considered as "test case" meaning that other lawsuits claiming
permanent neurologic effects from pertussis vaccine are effectively
excluded.

63
Pertussis vacc. in UK - 6
1990Happy
ending?

64
Critical Appraisal of NCES - 1
• Research question
– Intended and actual

• Study design
– Case control – reasons for choosing
– Cohort – reasons for not choosing

• Case selection

65
Critical Appraisal of NCES - 2

• Only hospital admitted cases were selected as
cases – any comments?
• Control selection

– Comments?
66
Critical Appraisal of NCES - 3
• Exposure measurement

67
Critical Appraisal of NCES - 4
• Results

• There was no noticeable clustering in any area
68
Critical Appraisal of NCES - 5
• Results
• 3.5% of cases and 1.7% of controls had been
immunized
• OR 0f 2.4, p value < 0.001

69
Critical Appraisal of NCES - 6
• Results
• There was no significant association between serious
neurological illness and diphtheria and tetanus vaccine
• Confounders
• History of fits
– Is a known contraindication to immunization, including such cases
will underestimate OR,
– A separate analysis limited to normal children with no past history
of fits gave a RR of 3.2

• Social class
– Could not be controlled
– But analysis in those pairs of children in which both the affected
and control were of the same social class – no differences
70
Critical Appraisal of NCES - 7
• Causation Vs. association
A) clinically distinctive
B) restricted to immunized children
C) closely related in time to immunization
D) biologically plausible
E) without alternative explanation

• Attributable risk
– Can this be calculated in a case control study?
– Covered an entire national population (in theory represents
the total incidence of serious neurological illnesses,
assumption about immunization coverage)

– Serious neurological disorders = 1 in 1,10,000 injs.
– Persistent neurological sequelae = 1 in 3,10,000 injs.
– Is this appropriate?
71
Thank you
72

Case control study – part 1

  • 1.
    Case control study- Part 1 Dr. Rizwan S A, M.D., 1
  • 2.
    Outline of presentation • • • • • Somehistory Planning and conducting a study Matching Sources of bias Applications 2
  • 3.
    A scenario • Assumeyou are the senior health advisor to the GOI • Recently, several isolated reports of neurological illness following DPT vaccination have come up in the country • Media is adding fuel to the fire • Parents and doctors are reluctant to vaccinate and the vaccination rates are going down the drain! • What will you do? 3
  • 4.
    Some history • 1788- Early concepts found in works of Parisian physician PCA Louis • 1843 - First explicit description by William Augustus Guy (occupational exposure and pulmonary disease) • 1862 - Baker, case control comparisons of marriage and fertility in breast cancer patients • 1926 - Lane Claypon’s Breast cancer study • 1950 - Levin et al ; Wynder & Graham ; Schrek et al. and Doll & Hill; (smoking and lung cancer) 4
  • 5.
    Planning and conducting • • • • • Researchquestion Definition of case Definition of control Selecting the cases & controls Research instrument 5
  • 6.
    Case Control Studies CohortStudies Proceeds from effect to cause Proceeds from cause to effect Starts with the disease Starts with people exposed to the risk factor or suspected cause Tests whether the suspected cause occurs more frequently in those with disease than those without disease Tests whether disease occurs more frequently in those exposed than in those not exposed Usually the 1st approach to the testing of hypothesis, but also useful for exploratory studies Reserved for the testing of precisely formulated hypothesis Involves fewer study subjects Involves larger number of subjects Yields results relatively quickly Long follow-up, delayed results Suitable for study of rare diseases Inappropriate when disease or exposure under investigation is rare Generally, yields only estimate of relative risk (Odds ratio) Yields incidence rates, relative risk, attributable risk Cannot yield information about disease other than that under study Can give information about more than one disease outcome Relatively inexpensive Expensive 6
  • 7.
    Research question • • • • Begin withbroad and ambitious question Later, narrow and more precise Considerations of time, cost Eg. 1. Does tobacco cause cancer? 2. Does smoking tobacco cause bronchogenic CA? 3. Do persons having broncho. CA have h/o greater exposure to tobacco smoking as compared to persons w/o the disease? • Poor questions can spoil the entire study 7
  • 8.
    Definition of case- 1 • Eligibility • Definition of disease 8
  • 9.
    Definition of case- 2 • Eligibility (2 components) – Objective criteria for diagnosis – Stating the eligibility criteria • Eligibility criteria – should reflect ‘potentially at risk for exposure’ both for the case & control – Eg. recent OCP and MI; (sterilized, postmenopausal, CIs to OCPs) 9
  • 10.
    Definition of case- 3 • Cases sh. have reasonable possibility of having had their disease induced by the exposure – Eg. OCP and Thromboembolism – sh. exclude postpartum and postoperative cases (Why?) • Incident cases – – – – Will be more uniform Recall more accurate More certain that exposure preceded the disease Berkson and Neyman 10
  • 11.
    Definition of case- 4 • Definition of disease – Objective criteria to reliably diagnose the disease – Eg. Rh. Arthritis (several diagnostic criteria causing confusion) – To reduce misclassification • Sources of cases – Hospital lists, special reporting systems like cancer registries, disease surveillance, death certificates 11
  • 12.
    Definition of control- 1 • Eligibility criteria – Sh. be similar to the cases with regard to potential for exposure – Problems arise in hospital based controls • We want to select controls that are likely to reflect the exposure rate in the population • We sh. exclude those hospital controls whose condition is associated with the exposure (Eg. Aspirin and MI; controls with chronic pain/peptic ulcer) – One solution – include controls with a variety of diagnoses not associated with exposure 12
  • 13.
    Definition of control- 2 • Sources of controls – Hospital based – Dead controls – Controls with similar diseases – Neighborhood controls • Population based • Best friend control/ Sibling control 13
  • 14.
    Definition of control- 3 • Hospital based – Referral pattern is similar to cases (form the same study base) – Similar quality of information – Convenience – May not be representative of the population • Dead controls – In a study where the case is death from a particular cause – Information obtained from ‘proxy’ informants – But dead controls differ from living controls • Controls with similar diseases – Cancer (of different type) controls for cancer cases – Minimize recall bias, interviewer bias, examine specificity of exposure 14
  • 15.
    Definition of control- 4 • Neighborhood controls – Best friend control/Sibling control • Inexpensive, easy and quick • Ability to match on a number of variables that are associated with neighborhood/friendship • May introduce selection bias (‘smoking’ cases nominate ‘smoking’ friends) related to the exposure and overmatching – Population based • Truly representative sample • From tax lists, voting lists, telephone directories 15
  • 16.
    Definition of control- 5 Source Advantage Disadvantage Hospital based Easily identified. Available for interview. More willing to cooperate. Tend to give complete and accurate information ( recall bias). Not typical of general population. Possess more risk factors for disease. Some diseases may share risk factors with disease under study. Berkesonian bias Population based Most representative of the general population. Generally healthy. Time, money, energy. Opportunity of exposure may not be same as that of cases. (location, occup.) Neighbourhood controls/ Telephone exchange random dialing Controls and cases similar in residence. Easier than sampling the population. Non cooperation. Not representative of general population. Best friend control/ Sibling control Accessible, Cooperative. Similar to cases in most aspects. Overmatching. 16
  • 17.
    Selection process -1 Total population Reference population cases controls 17
  • 18.
    Selection process -2 • Cases – In practice; we use all eligible cases within a defined time period • From disease registry or hospital • We are implicitly sampling from a subset of total population of cases • Controls – Sampling is most pertinent here because in rare diseases, the no. of controls greatly exceed no. of cases 18
  • 19.
    Selection of cases- 1 • Representativeness – Ideally, cases sh. be a random sample of all cases of interest in the source population (e.g. from vital data, registry data) – But commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics) • Method of Selection – Selection may be from incident or prevalent cases – Incident cases are those derived from ongoing ascertainment of cases over time – Prevalent cases are derived from a cross-sectional survey 19
  • 20.
    Selection of cases- 2 • Incident cases are more optimal • These should be all newly diagnosed cases over a given period of time in a defined population. (However we are excluding patients who died before diagnosis) • Prevalent cases do not include patients with a short course of disease (patients who recovered early and those who died will not be included) • Can be partly overcome by including deceased cases as well as those alive 20
  • 21.
    Selection of cases- 3 • Validity is more important than generalizability i.e. the need to establish an etiologic relationship is more important than to generalise results to the population • Eg. – In a study on breast cancer – we can include all cases or we can include only premenopausal women with lobular cancer • If we take the later group as cases; we can elicit the etiology better – Studies done in nurses for OCP use 21
  • 22.
    Selection of controls- 1 • The four principals of Wacholder 1. The study base 2. De-confounding 3. Comparable accuracy 4. Efficiency 22
  • 23.
    Selection of controls- 2 • Should the controls be similar to the cases in all respects other than having the disease? i.e. comparable • Should the controls be representative of all non-diseased people in the population from which the cases are selected? i.e. representative 23
  • 24.
    Selection of controls- 3 • Representativeness – Sh. be representative of the general population in terms of probability of exposure to the risk factor • Comparability – Sh. also have had the same opportunity to be exposed as the cases have • Not that both cases and controls are equally exposed; but only that they have had the same opportunity for exposure. 24
  • 25.
    Selection of controls- 4 • Usually, cases are not a random sample of all cases in the population. So, the controls must be selected in the same way (and with the same biases) as the cases. • If follows from the above, that a pool of potential controls must be defined. This is a universe of people from whom controls may be selected (study base) 25
  • 26.
    Selection of controls- 5 • The study base is composed of a population at risk of exposure over a period • Cases emerge within a study base. Controls should also emerge from the same study base, except that they are not cases. • Eg. If cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients. 26
  • 27.
    Selection of controls- 6 • Comparability is more important than representativeness in the selection of controls • The control should resemble the case in all respects except for the presence of disease 27
  • 28.
    Selection of controls- 7 • Number of controls – Large study; equal numbers – Small study; multiple controls • Use of multiple controls – Controls of same type – Multiple controls of different types • Hospital and neighborhood controls • e.g. case - children with brain tumor, controlchildren with other cancer, normal children 28
  • 29.
    Selection of controls- 8 Children with brain tumors Children with other cancers Children without cancer Exposure to radiation Radiation causes cancers Radiation causes brain cancers only Multiple controls of different types are valuable for exploring alternate hypothesis & for taking into account possible potential recall bias. 29
  • 30.
    Sampling for cases/controls- 1 • Frame – list of all potentially eligible cases and controls in the target population (a subset of the general pop. both at risk of exposure and disease development) • The frame sh. not be biased in any manner, else the sample will also be biased even if random • Types of sampling • • • • SRS Systematic Stratified Matched • The objective is to avoid bias in selection, each case or control has equal chance of being selected 30
  • 31.
    Sampling for cases/controls- 2 • If we are using all incident cases occurring in a defined area and time period, then controls selected at random from the gen. pop. is the best choice (sound basis for calculating RR, AR, etiologic fraction) • If cases are selected from hospital(s), it is not necessary that population controls are the only best choice, a valid control series from hospital can be valid • However, hosp. controls often leave room for doubt about validity of comparison (cost and practicality) 31
  • 32.
    Sampling for cases/controls- 3 • Random digit dialing – Prerequisite; extensive telephone coverage – Either screen for potential controls/telephone interviews • Method – – – – All area codes and prefix numbers are obtained Add all possible two digit numbers The first 8 digits – PSU Select a PSU at random – if response obtained then retain PSU – Then the last two digits are randomly selected and continued until required sample is reached – The no. of PSUs and total houses depend on design 32
  • 33.
    Sampling for cases/controls- 4 • Examples – Artificial sweeteners and bladder cancer • Cases; 21-84 years, newly diagnosed bladder cancer in 10 designated counties in metropolitan areas • Controls; age-sex stratified random sample of the general populations in the ten counties frequency matched at 2:1 ratio – Oral contraception and congenital malformations • Cases of malformation from all newborns and stillborns delivered at five major hospitals bet 1974-76 • Controls; all unaffected newborns in the 5 five hospitals, sampling days were rotated to represent all 7 days 33
  • 34.
    Matching - 1 •Matching is defined as the process of selecting controls so that they are similar to cases in certain characteristics such as age, sex, race, socioeconomic status and occupation • What is post-matching? – Pairing controls to cases from an unmatched data during analysis • We often want a constant case control ratio, but sometimes matching is incomplete so that we end up with a variable ratio 34
  • 35.
    Matching - 2 •Objective – to eliminate biased comparison between cases and controls • Two step process 1. The matched design 2. The matched analysis • • One immediate effect of matching is the balance between no. of cases and controls Sometimes we can deliberately match on a factor which comes in the casual path to confirm or refute its role. (Eg. Smoking and MI, matched on cholesterol) 35
  • 36.
    Matching - 3 •What variables to match? – Factors which are independent risk factors for the disease – Assoc. with the exposure but non-causally – May not be directly a risk factor, but may be assoc. with other casual factors excluding the study exposure • Similar to something? 36
  • 37.
    Matching - 4 •Situations to match or not? • Casual Non-causal 37
  • 38.
    Matching - 5 •Examples • 1. E = alcohol F = smoking D = lung CA – Implication if not matched? • 2. E = OCP F = smoking D = MI – Implication if not matched? • 3. E = blood grp O F = age, sex D = thrombosis – Implication if matched? • 2. E = OCP F = prescribing physician D = MI – Implication if not matched or matched? 38
  • 39.
    Matching - 6 •In summary, the decision to match or not depends on the residual association of the factor with disease and exposure after controlling other variables • Overmatching – Reduces validity or statistical efficiency – Two general meanings • Unmatched analysis in matched studies • Matching for unnecessary variables – If one matches on a factor that is associated with exposure but not the disease • Paired analysis may correctly estimate odds ratio but the variance will be more compared to an unmatched study of the sample (overmatching increases the frequency of exposure concordant pairs which are discarded in paired analysis) 39
  • 40.
    Matching - 7 •If one matches a factor that is casually or non-causally assoc. with disease but not exposure then OR will be biased towards unity • If one matches a factor which is assoc. with disease but not exposure then OR will be correctly estimated whether or not pairing is retained or not – Paired analysis will be less efficient than unpaired one • Matching on highly correlated variable is also unnecessary • Finally, matching sh. be done for factors which have strongest relationship to the disease and are least correlated 40
  • 41.
    Matching - 8 •Alternatives to matching – At the sampling phase • Stratified sampling • Frequency matching – At analysis phase • Post-stratification • Regression analysis • Stratified sampling • Pre-determined number of cases and controls in each subgroup created by the cross-classification • Eg. Age (4 groups), sex (2), race (4 groups) – Total 32 subgroups 41
  • 42.
    Matching - 9 •Frequency matching • Controls being taken from the corresponding subgroups in proportion to the no. of cases – – • Eg. If 30% of cases are males of Hindu religion in 60-65 years then we take 30% of similar controls More practical than stratified sampling but it requires one to continually update on the distribution of accumulating cases to maintain a fixed case-control ratio Post-stratification • • • Stratify the subgroups and analyze Very flexible in that variables need not be pre-specified Limitation - the number of variables that can be stratified due to lack of numbers • Regression analysis – Most useful when the number of variables/subgroups increase 42
  • 43.
    Matching - 10 •Effectiveness of matching – Removal of bias – Reduction of variance • Matched design only gives a modest increase in efficiency • Greatest improvement is when there is strong assoc. between disease and the confounder • Also efficient when only a small proportion of the target population is exposed to the study factor • The added cost and complexity of matching should be weighed against any expected gains in precision 43
  • 44.
    Matching - 11 •Advantages – Cases and controls will be comparable to the matched variables – Provides the best means to investigate a very specific hypothesis • Disadvantages – One can no longer study the matched variable in relation to the risk of disease – Increase in cost, time and labor – A certain fraction of cases are discarded as a result of failure to find a matching control 44
  • 45.
    Matching - 12 •Summary – Unless one has very good reason to match, one is better off avoiding it – Frequency matching within rather broad categories of the matching variables will suffice for most studies 45
  • 46.
    Sources of bias- 1 • Bias – systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of the risk measure 1. Ascertainment and selection bias a) b) c) d) e) f) g) h) Surveillance Diagnosis Referral Selection Non-response Length of stay Survival Admission diagnoses 2. Bias in estimation of exposure a) b) c) d) Recall Interviewer Prevarication Improper analysis 3. Misclassification 4. Other sources 46
  • 47.
    Sources of bias- 2 1. Ascertainment and selection bias - Not peculiar to case-control, can occur in cohort studies also a) Differential Surveillance – In asymptomatic/mild diseases , cases are more likely to be detected in persons who are closely examined – Eg. OPC and endometrial cancer/phlebitis • Women taking OCPs were more thoroughly evaluated • Based on preliminary reports of OCP use and phlebitis, clinicians started looking for phlebitis in such exposed patients – Exposed cases would have a greater likelihood of being diagnosed as compared to unexposed cases – This bias can be checked by doing a stratified analysis in subgroups having equal surveillance (based on some index of medical care) or restrict the study to time prior to 47 publication of such finding
  • 48.
    Sources of bias- 3 1. Ascertainment and selection bias b) Diagnosis • In conditions like cervical dysplasia, knowledge of exposure may alter the assessment • This is most likely to occur in cases of uncertain diagnosis c) Differential Referral • OR’ = bOR; b = (s1s4)/(s2s3); – – – Where s1, s2, s3, s4 are the proportions of exposed and unexposed cases and controls resp. A biased selection of cases will be compensated by biased selection of controls also The probability of selecting exposed case = unexposed case, and likewise for control 48
  • 49.
    Sources of bias- 4 1. Ascertainment and selection bias c) Differential Referral (cont.) • Eg. • A study of Alcohol and kidney failure, and income is assoc. with alcohol intake A Hospital only admits wealthy patients, so cases of kidney failure in this hospital will be more exposed to alcohol than patients in the gen. pop. But if patients with other diseases also have similar income characters and they were taken as controls, bias won’t occur If controls are taken form gen. pop. then we have to match/stratify income to eliminate income as a source of selection bias 49 • • •
  • 50.
    Sources of bias- 5 1. Ascertainment and selection bias d) Selection • • Eg. Interviewer ‘keying’ on cases who are exposed (one particular nurse was searching out all the cases of ectopic pregnancy with IUD usage) To avoid this, we must precisely and in advance the methods by which cases and controls are selected, carefully train staff, quality control e) Non-Response • • a worst case analysis taking all non-responding cases as unexposed and all non-responding controls as exposed will show if the non-response is likely to bias the estimates if the exposure rates were equal between responders and non-responders, there will be no bias 50
  • 51.
    Sources of bias- 6 1. Ascertainment and selection bias f) Length of stay • In hospital study – incident cases sh. be selected rather than prevalent cases otherwise, – Patients who stay longer will have more probability of being selected – Cases of short duration would be under represented • We check this by stratifying the analysis on the basis of the duration b/w admission and selection g) Survival • In a situation where disease accompanied by mortality is studied only in survivors • Eg. A study in survivors of MI may reveal factors that are assoc. with surviving an MI rather than sustaining one • Unless one can justify that exposure is not related to duration/survival one sh. take only incident cases • This bias can be checked by stratifying date of onset 51
  • 52.
    Sources of bias- 7 1. Ascertainment and selection bias h) Admission diagnoses • Eg. In hospital based study – assoc. b/w smoking and MI, if controls are lung cancer patients; this will underestimate the effect • To avoid this bias we must select controls with a variety of diseases which are believed to be unrelated with study exposure (neither + nor -) 2. Bias in the estimation of exposure a) Recall • Eg. A mother with malformed baby will try with more care and intensity to recall a pelvic X-ray compared to women with normal baby 52
  • 53.
    Sources of bias- 8 2. Bias in the estimation of exposure a) Recall (cont.) • Sometimes, the disease itself affects memory (dementia) • This bias can be reduced by using controls with another disease who will also keep thinking of reasons for their disease • Independent verification of h/o exposure can be sought b) Interviewer • Interviewer may probe cases more intensely for histories of exposure than in controls if they know the hypothesis • Reduced by training staff, keeping staff ignorant of hypothesis (ideal but unobtainable), keeping interview time constant c) Prevarication • Subjects may have ulterior motives for deliberately overestimating or underestimating exposure • Eg. A worker who may receive disability pay may exaggerate his exposure; if it means loss of job, he may minimize it • May be overcome by several independent raters 53
  • 54.
    Sources of bias- 9 2. Bias in the estimation of exposure d) Improper analysis • Unmatched analysis for a matched study 3. Misclassification – The disease/exposure status classification may be erroneous – Some controls may actually have the study disease but this is very improbable with rare diseases – The most likely source of misclassification will occur in the determination of exposure – Any measure to reduce misclassification sh. addressed at the design stage, a pilot study will reveal many errors 4. Other sources of error – Insufficient sample size, errors of interpretation, not accounting for effect of extraneous variables 54
  • 55.
    Sources of bias- 10 4. Other sources of error – Cases and controls sh. be similar with respect to factors that might have affected both the development of disease and the opportunity for past exposure – For eg. Medical conditions like HTN, DM preclude the use of OCPs, thus users of these would inherently be at a lower risk – An agent found in assoc. with study disease was prescribed due to an early manifestation of the disease – For eg. Estrogens prescribed for irregular bleeding that was the first symptom of undetected endometrial cancer. If this was the case then later diagnosis of the cancer would find an apparent assoc. with estrogen usage. 55
  • 56.
    Sources of bias- 11 Summary – Before starting a study, one should list the likely sources of bias and plan the investigation and analyses so as to prevent/minimize them 56
  • 57.
    Specific limitations ofCase control study • Is not useful to study weak associations (OR < 1.5) • Non-participation rates are freq. low and differential for cases and controls • Differential recall bias 57
  • 58.
    Applications of Casecontrol study 1. Vaccine effectiveness 2. Evaluation of treatment and program efficacy 3. Evaluation of screening programs 4. Outbreak investigations 5. Demography 6. Genetic epidemiology 7. Occupational epidemiology 58
  • 59.
    Pertussis vacc. inUK - 1 Year Event(s) 1906 Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in artificial media 1912-14 Pert. vaccine used by many researchers Next few Many versions of vaccine developed years 1942 Several local authorities in UK start vaccine 1947-48 First published reports appear of irreversible brain damage after wholecell pertussis vaccine 1957 85,000 cases of pertussis reported Vaccination magnified to national scale 1975 Cases came down to 8,900 Pertussis incidence peaks every 4 years The peaks became smaller and smaller, the smallest was in 1974-75 The next peak at 1978 should have been the smallest, but was it? 59
  • 60.
  • 61.
    Pertussis vacc. inUK - 3 Year Event(s) 1974-75 Adverse publicity by media about the side effects of pert. vaccine Parents and doctors hesitated to give vaccine 1976-79 National Childhood Encephalopathy Study (NCES) commissioned by the Dept. of Health and Social Security 1974 Vaccine acceptance rate came down (from 78% in 1971) to 37% 1977-79 An epidemic of pertussis occurs in Great Britain. > 100,000 cases and 36 deaths 1979 Vaccine Damage Payment Act passed in Great Britain. The act provides a mechanism for government compensation to those with vaccine-associated injuries 61
  • 62.
    Pertussis vacc. inUK - 4 • Findings of the NCES study; • Attributable risk – – Serious neurological disorders = 1 in 1,10,000 injs. – Persistent neurological sequelae = 1 in 3,10,000 injs. 62
  • 63.
    Pertussis vacc. inUK - 5 Year Event(s) 1982 British Child Health and Education Study Long-term neurologic problems are not found to be related to pertussis immunizations. 1983 Communicable Diseases Surveillance Centre Study, or North West Thames Study, followed a large group of children after pertussis vaccination, finds no convincing evidence relating DPT vaccine to neurologic damage. 1988 Loveday judgment in Great Britain's High Court rules that there is insufficient evidence to demonstrate that pertussis vaccine can cause permanent brain damage. Considered as "test case" meaning that other lawsuits claiming permanent neurologic effects from pertussis vaccine are effectively excluded. 63
  • 64.
    Pertussis vacc. inUK - 6 1990Happy ending? 64
  • 65.
    Critical Appraisal ofNCES - 1 • Research question – Intended and actual • Study design – Case control – reasons for choosing – Cohort – reasons for not choosing • Case selection 65
  • 66.
    Critical Appraisal ofNCES - 2 • Only hospital admitted cases were selected as cases – any comments? • Control selection – Comments? 66
  • 67.
    Critical Appraisal ofNCES - 3 • Exposure measurement 67
  • 68.
    Critical Appraisal ofNCES - 4 • Results • There was no noticeable clustering in any area 68
  • 69.
    Critical Appraisal ofNCES - 5 • Results • 3.5% of cases and 1.7% of controls had been immunized • OR 0f 2.4, p value < 0.001 69
  • 70.
    Critical Appraisal ofNCES - 6 • Results • There was no significant association between serious neurological illness and diphtheria and tetanus vaccine • Confounders • History of fits – Is a known contraindication to immunization, including such cases will underestimate OR, – A separate analysis limited to normal children with no past history of fits gave a RR of 3.2 • Social class – Could not be controlled – But analysis in those pairs of children in which both the affected and control were of the same social class – no differences 70
  • 71.
    Critical Appraisal ofNCES - 7 • Causation Vs. association A) clinically distinctive B) restricted to immunized children C) closely related in time to immunization D) biologically plausible E) without alternative explanation • Attributable risk – Can this be calculated in a case control study? – Covered an entire national population (in theory represents the total incidence of serious neurological illnesses, assumption about immunization coverage) – Serious neurological disorders = 1 in 1,10,000 injs. – Persistent neurological sequelae = 1 in 3,10,000 injs. – Is this appropriate? 71
  • 72.