Case control study – part 1

Case control study - Part 1

Dr. Rizwan S A, M.D.,
1

Outline of presentation
•
•
•
•
•

Some history
Planning and conducting a study
Matching
Sources of bias
Applications

2

A scenario
• Assume you are the senior health advisor to the GOI
• Recently, several isolated reports of neurological
illness following DPT vaccination have come up in
the country

• Media is adding fuel to the fire
• Parents and doctors are reluctant to vaccinate and the
vaccination rates are going down the drain!

• What will you do?
3

Some history
• 1788 - Early concepts found in works of Parisian
physician PCA Louis
• 1843 - First explicit description by William Augustus
Guy (occupational exposure and pulmonary disease)

• 1862 - Baker, case control comparisons of marriage and
fertility in breast cancer patients
• 1926 - Lane Claypon’s Breast cancer study
• 1950 - Levin et al ; Wynder & Graham ; Schrek et al.
and Doll & Hill; (smoking and lung cancer)
4

Planning and conducting
•
•
•
•
•

Research question
Definition of case
Definition of control
Selecting the cases & controls
Research instrument

5

Case Control Studies

Cohort Studies

Proceeds from effect to cause

Proceeds from cause to effect

Starts with the disease

Starts with people exposed to the risk factor
or suspected cause

Tests whether the suspected cause occurs
more frequently in those with disease than
those without disease

Tests whether disease occurs more
frequently in those exposed than in those not
exposed

Usually the 1st approach to the testing of
hypothesis, but also useful for exploratory
studies

Reserved for the testing of precisely
formulated hypothesis

Involves fewer study subjects

Involves larger number of subjects

Yields results relatively quickly

Long follow-up, delayed results

Suitable for study of rare diseases

Inappropriate when disease or exposure
under investigation is rare

Generally, yields only estimate of relative
risk (Odds ratio)

Yields incidence rates, relative risk,
attributable risk

Cannot yield information about disease other
than that under study

Can give information about more than one
disease outcome

Relatively inexpensive

Expensive

6

Research question
•
•
•
•

Begin with broad and ambitious question
Later, narrow and more precise
Considerations of time, cost
Eg.
1. Does tobacco cause cancer?
2. Does smoking tobacco cause bronchogenic
CA?
3. Do persons having broncho. CA have h/o
greater exposure to tobacco smoking as
compared to persons w/o the disease?

• Poor questions can spoil the entire study
7

Definition of case - 1
• Eligibility
• Definition of disease

8

• Eligibility (2 components)
– Objective criteria for diagnosis
– Stating the eligibility criteria

• Eligibility criteria – should reflect
‘potentially at risk for exposure’ both for
the case & control
– Eg. recent OCP and MI; (sterilized, postmenopausal, CIs to OCPs)
9

• Cases sh. have reasonable possibility of
having had their disease induced by the
exposure
– Eg. OCP and Thromboembolism – sh. exclude
postpartum and postoperative cases (Why?)

• Incident cases
–
–
–
–

Will be more uniform
Recall more accurate
More certain that exposure preceded the disease
Berkson and Neyman
10

• Definition of disease
– Objective criteria to reliably diagnose the
disease
– Eg. Rh. Arthritis (several diagnostic criteria causing confusion)
– To reduce misclassification

• Sources of cases
– Hospital lists, special reporting systems like
cancer registries, disease surveillance, death
certificates
11

Definition of control - 1
• Eligibility criteria
– Sh. be similar to the cases with regard to potential
for exposure
– Problems arise in hospital based controls
• We want to select controls that are likely to reflect the
exposure rate in the population
• We sh. exclude those hospital controls whose condition
is associated with the exposure (Eg. Aspirin and MI;
controls with chronic pain/peptic ulcer)

– One solution – include controls with a variety of
diagnoses not associated with exposure
12

• Sources of controls
– Hospital based
– Dead controls
– Controls with similar diseases
– Neighborhood controls
• Population based
• Best friend control/ Sibling control

13

• Hospital based
– Referral pattern is similar to cases (form the same study
base)
– Similar quality of information
– Convenience
– May not be representative of the population

• Dead controls
– In a study where the case is death from a particular cause
– Information obtained from ‘proxy’ informants
– But dead controls differ from living controls
• Controls with similar diseases
– Cancer (of different type) controls for cancer cases
– Minimize recall bias, interviewer bias, examine specificity
of exposure

14

• Neighborhood controls
– Best friend control/Sibling control
• Inexpensive, easy and quick
• Ability to match on a number of variables that
are associated with neighborhood/friendship
• May introduce selection bias (‘smoking’ cases
nominate ‘smoking’ friends) related to the
exposure and overmatching

– Population based
• Truly representative sample
• From tax lists, voting lists, telephone directories
15

Source

Advantage

Disadvantage

Hospital based

Easily identified.
Available for interview.
More willing to cooperate.
Tend to give complete and
accurate information
( recall bias).

Not typical of general population.
Possess more risk factors for disease.
Some diseases may share risk factors
with disease under study.
Berkesonian bias

Population based

Most representative of the
general population.
Generally healthy.

Time, money, energy.
Opportunity of exposure may not be
same as that of cases. (location, occup.)

Neighbourhood
controls/ Telephone
exchange random
dialing

Controls and cases similar
in residence.
Easier than sampling the
population.

Non cooperation.
Not representative of general population.

Best friend control/
Sibling control

Accessible, Cooperative.
Similar to cases in most
aspects.

Overmatching.
16

Selection process - 1

Total population
Reference
population

cases

controls

17

Selection process - 2
• Cases
– In practice; we use all eligible cases within a
defined time period
• From disease registry or hospital
• We are implicitly sampling from a subset of total
population of cases

• Controls
– Sampling is most pertinent here because in
rare diseases, the no. of controls greatly
exceed no. of cases
18

Selection of cases - 1
• Representativeness
– Ideally, cases sh. be a random sample of all cases of
interest in the source population (e.g. from vital
data, registry data)
– But commonly they are a selection of available cases
from a medical care facility. (e.g. from hospitals,
clinics)

• Method of Selection
– Selection may be from incident or prevalent cases
– Incident cases are those derived from ongoing
ascertainment of cases over time
– Prevalent cases are derived from a cross-sectional
survey
19

• Incident cases are more optimal
• These should be all newly diagnosed cases over a
given period of time in a defined population.
(However we are excluding patients who died
before diagnosis)
• Prevalent cases do not include patients with a
short course of disease (patients who recovered
early and those who died will not be included)

• Can be partly overcome by including deceased
cases as well as those alive
20

• Validity
is
more
important
than
generalizability i.e. the need to establish an
etiologic relationship is more important than
to generalise results to the population
• Eg.
– In a study on breast cancer – we can include all
cases or we can include only premenopausal
women with lobular cancer
• If we take the later group as cases; we can elicit the
etiology better

– Studies done in nurses for OCP use
21

Selection of controls - 1
• The four principals of Wacholder
1. The study base

2. De-confounding
3. Comparable accuracy
4. Efficiency
22

• Should the controls be similar to the cases
in all respects other than having the
disease? i.e. comparable

• Should the controls be representative of
all non-diseased people in the population
from which the cases are selected? i.e.
representative
23

• Representativeness
– Sh. be representative of the general population
in terms of probability of exposure to the risk
factor

• Comparability
– Sh. also have had the same opportunity to be
exposed as the cases have

• Not that both cases and controls are equally
exposed; but only that they have had the
same opportunity for exposure.
24

• Usually, cases are not a random sample of
all cases in the population. So, the
controls must be selected in the same way
(and with the same biases) as the cases.
• If follows from the above, that a pool of
potential controls must be defined. This is
a universe of people from whom controls
may be selected (study base)
25

• The study base is composed of a population
at risk of exposure over a period
• Cases emerge within a study base. Controls
should also emerge from the same study
base, except that they are not cases.
• Eg. If cases are selected exclusively from
hospitalized patients, controls must also be
selected from hospitalized patients.
26

• Comparability is more important than
representativeness in the selection of
controls

• The control should resemble the case in
all respects except for the presence of
disease

27

• Number of controls
– Large study; equal numbers
– Small study; multiple controls

• Use of multiple controls
– Controls of same type
– Multiple controls of different types
• Hospital and neighborhood controls
• e.g. case - children with brain tumor, controlchildren with other cancer, normal children
28

Children with
brain tumors

Children with
other cancers

Children
without cancer

Exposure to
radiation

Radiation
causes cancers

Radiation
causes brain
cancers only

Multiple controls of different types are valuable for exploring alternate
hypothesis & for taking into account possible potential recall bias.
29

Sampling for cases/controls - 1
• Frame – list of all potentially eligible cases and
controls in the target population (a subset of the
general pop. both at risk of exposure and disease
development)
• The frame sh. not be biased in any manner, else the
sample will also be biased even if random
• Types of sampling
•
•
•
•

SRS
Systematic
Stratified
Matched

• The objective is to avoid bias in selection, each
case or control has equal chance of being
selected
30

• If we are using all incident cases occurring in a
defined area and time period, then controls selected at
random from the gen. pop. is the best choice (sound
basis for calculating RR, AR, etiologic fraction)
• If cases are selected from hospital(s), it is not
necessary that population controls are the only best
choice, a valid control series from hospital can be
valid
• However, hosp. controls often leave room for doubt
about validity of comparison (cost and practicality)
31

• Random digit dialing
– Prerequisite; extensive telephone coverage
– Either screen for potential controls/telephone interviews

• Method
–
–
–
–

All area codes and prefix numbers are obtained
Add all possible two digit numbers
The first 8 digits – PSU
Select a PSU at random – if response obtained then
retain PSU
– Then the last two digits are randomly selected and
continued until required sample is reached
– The no. of PSUs and total houses depend on design
32

• Examples
– Artificial sweeteners and bladder cancer
• Cases; 21-84 years, newly diagnosed bladder cancer in 10
designated counties in metropolitan areas
• Controls; age-sex stratified random sample of the general
populations in the ten counties frequency matched at 2:1
ratio

– Oral contraception and congenital malformations
• Cases of malformation from all newborns and stillborns
delivered at five major hospitals bet 1974-76
• Controls; all unaffected newborns in the 5 five hospitals,
sampling days were rotated to represent all 7 days
33

Matching - 1
• Matching is defined as the process of selecting
controls so that they are similar to cases in
certain characteristics such as age, sex, race,
socioeconomic status and occupation
• What is post-matching?
– Pairing controls to cases from an unmatched data during
analysis

• We often want a constant case control ratio, but
sometimes matching is incomplete so that we end up
with a variable ratio

34

Matching - 2
• Objective – to eliminate biased comparison between
cases and controls
• Two step process
1. The matched design
2. The matched analysis

•
•

One immediate effect of matching is the balance
between no. of cases and controls
Sometimes we can deliberately match on a factor
which comes in the casual path to confirm or
refute its role. (Eg. Smoking and MI, matched on
cholesterol)
35

Matching - 3
• What variables to match?
– Factors which are independent risk factors for the disease
– Assoc. with the exposure but non-causally
– May not be directly a risk factor, but may be assoc. with
other casual factors excluding the study exposure

• Similar to something?

36

Matching - 4
• Situations to match or not?
• Casual
Non-causal

37

Matching - 5
• Examples
• 1. E = alcohol

F = smoking

D = lung CA

– Implication if not matched?

• 2. E = OCP

F = smoking

D = MI

– Implication if not matched?

• 3. E = blood grp O F = age, sex

D = thrombosis

– Implication if matched?

• 2. E = OCP

F = prescribing physician

D = MI

– Implication if not matched or matched?
38

Matching - 6
• In summary, the decision to match or not depends
on the residual association of the factor with
disease and exposure after controlling other
variables
• Overmatching
– Reduces validity or statistical efficiency
– Two general meanings
• Unmatched analysis in matched studies
• Matching for unnecessary variables

– If one matches on a factor that is associated with
exposure but not the disease
• Paired analysis may correctly estimate odds ratio but the variance
will be more compared to an unmatched study of the sample
(overmatching increases the frequency of exposure concordant
pairs which are discarded in paired analysis)
39

Matching - 7
• If one matches a factor that is casually or non-causally
assoc. with disease but not exposure then OR will be
biased towards unity
• If one matches a factor which is assoc. with disease but
not exposure then OR will be correctly estimated
whether or not pairing is retained or not
– Paired analysis will be less efficient than unpaired one

• Matching on highly correlated variable is also
unnecessary
• Finally, matching sh. be done for factors which have
strongest relationship to the disease and are least
correlated
40

Matching - 8
• Alternatives to matching
– At the sampling phase
• Stratified sampling
• Frequency matching

– At analysis phase
• Post-stratification
• Regression analysis

• Stratified sampling
• Pre-determined number of cases and controls in each
subgroup created by the cross-classification
• Eg. Age (4 groups), sex (2), race (4 groups)
– Total 32 subgroups
41

Matching - 9
• Frequency matching
• Controls being taken from the corresponding subgroups in
proportion to the no. of cases
–
–

•

Eg. If 30% of cases are males of Hindu religion in 60-65 years then we
take 30% of similar controls
More practical than stratified sampling but it requires one to continually
update on the distribution of accumulating cases to maintain a fixed
case-control ratio

Post-stratification
•
•
•

Stratify the subgroups and analyze
Very flexible in that variables need not be pre-specified
Limitation - the number of variables that can be stratified due to
lack of numbers

• Regression analysis
– Most useful when the number of variables/subgroups increase

42

Matching - 10
• Effectiveness of matching
– Removal of bias
– Reduction of variance
• Matched design only gives a modest increase in efficiency
• Greatest improvement is when there is strong assoc. between
disease and the confounder
• Also efficient when only a small proportion of the target
population is exposed to the study factor

• The added cost and complexity of matching should be
weighed against any expected gains in precision

43

Matching - 11
• Advantages
– Cases and controls will be comparable to the matched
variables
– Provides the best means to investigate a very specific
hypothesis

• Disadvantages
– One can no longer study the matched variable in
relation to the risk of disease
– Increase in cost, time and labor
– A certain fraction of cases are discarded as a result of
failure to find a matching control
44

Matching - 12
• Summary
– Unless one has very good reason to match, one
is better off avoiding it
– Frequency matching within rather broad
categories of the matching variables will
suffice for most studies

45

Sources of bias - 1
• Bias – systematic error in the design, conduct, or analysis of a study
that results in a mistaken estimate of the risk measure

1. Ascertainment and selection bias
a)
b)
c)
d)
e)
f)
g)
h)

Surveillance
Diagnosis
Referral
Selection
Non-response
Length of stay
Survival
Admission diagnoses

2. Bias in estimation of exposure
a)
b)
c)
d)

Recall
Interviewer
Prevarication
Improper analysis

3. Misclassification
4. Other sources

46

Sources of bias - 2
- Not peculiar to case-control, can occur in cohort studies also

a) Differential Surveillance
– In asymptomatic/mild diseases , cases are more likely to be
detected in persons who are closely examined
– Eg. OPC and endometrial cancer/phlebitis
• Women taking OCPs were more thoroughly evaluated
• Based on preliminary reports of OCP use and phlebitis, clinicians
started looking for phlebitis in such exposed patients

– Exposed cases would have a greater likelihood of being
diagnosed as compared to unexposed cases
– This bias can be checked by doing a stratified analysis in
subgroups having equal surveillance (based on some index
of medical care) or restrict the study to time prior to
47
publication of such finding

Sources of bias - 3

b) Diagnosis
• In conditions like cervical dysplasia, knowledge of
exposure may alter the assessment
• This is most likely to occur in cases of uncertain
diagnosis

c) Differential Referral
•

OR’ = bOR; b = (s1s4)/(s2s3);
–

–
–

Where s1, s2, s3, s4 are the proportions of exposed and
unexposed cases and controls resp.
A biased selection of cases will be compensated by
biased selection of controls also
The probability of selecting exposed case = unexposed
case, and likewise for control
48

Sources of bias - 4

c) Differential Referral (cont.)
•

Eg.

•

A study of Alcohol and kidney failure, and income is
assoc. with alcohol intake
A Hospital only admits wealthy patients, so cases of
kidney failure in this hospital will be more exposed to
alcohol than patients in the gen. pop.
But if patients with other diseases also have similar
income characters and they were taken as controls,
bias won’t occur
If controls are taken form gen. pop. then we have to
match/stratify income to eliminate income as a source
of selection bias
49

•

•

•

Sources of bias - 5

d) Selection
•

•

Eg. Interviewer ‘keying’ on cases who are exposed (one
particular nurse was searching out all the cases of ectopic
pregnancy with IUD usage)
To avoid this, we must precisely and in advance the
methods by which cases and controls are selected, carefully
train staff, quality control

e) Non-Response
•

•

a worst case analysis taking all non-responding cases as
unexposed and all non-responding controls as exposed will
show if the non-response is likely to bias the estimates
if the exposure rates were equal between responders and
non-responders, there will be no bias
50

Sources of bias - 6

f) Length of stay
• In hospital study – incident cases sh. be selected rather
than prevalent cases otherwise,
– Patients who stay longer will have more probability of
being selected
– Cases of short duration would be under represented

• We check this by stratifying the analysis on the basis of
the duration b/w admission and selection

g) Survival
• In a situation where disease accompanied by mortality
is studied only in survivors
• Eg. A study in survivors of MI may reveal factors that
are assoc. with surviving an MI rather than sustaining
one
• Unless one can justify that exposure is not related to
duration/survival one sh. take only incident cases
• This bias can be checked by stratifying date of onset 51

Sources of bias - 7

h) Admission diagnoses
• Eg. In hospital based study – assoc. b/w smoking
and MI, if controls are lung cancer patients; this
will underestimate the effect
• To avoid this bias we must select controls with a
variety of diseases which are believed to be
unrelated with study exposure (neither + nor -)

2. Bias in the estimation of exposure
a) Recall
•

Eg. A mother with malformed baby will try with
more care and intensity to recall a pelvic X-ray
compared to women with normal baby
52

Sources of bias - 8
a) Recall (cont.)
• Sometimes, the disease itself affects memory (dementia)
• This bias can be reduced by using controls with another disease
who will also keep thinking of reasons for their disease
• Independent verification of h/o exposure can be sought

b) Interviewer
• Interviewer may probe cases more intensely for histories of
exposure than in controls if they know the hypothesis
• Reduced by training staff, keeping staff ignorant of hypothesis
(ideal but unobtainable), keeping interview time constant

c) Prevarication
• Subjects may have ulterior motives for deliberately
overestimating or underestimating exposure
• Eg. A worker who may receive disability pay may exaggerate his
exposure; if it means loss of job, he may minimize it
• May be overcome by several independent raters
53

Sources of bias - 9
d) Improper analysis
• Unmatched analysis for a matched study

3. Misclassification
– The disease/exposure status classification may be erroneous
– Some controls may actually have the study disease but this is
very improbable with rare diseases
– The most likely source of misclassification will occur in the
determination of exposure
– Any measure to reduce misclassification sh. addressed at the
design stage, a pilot study will reveal many errors

4. Other sources of error
– Insufficient sample size, errors of interpretation, not accounting
for effect of extraneous variables
54

Sources of bias - 10
4. Other sources of error
– Cases and controls sh. be similar with respect to factors that
might have affected both the development of disease and the
opportunity for past exposure
– For eg. Medical conditions like HTN, DM preclude the use of
OCPs, thus users of these would inherently be at a lower risk
– An agent found in assoc. with study disease was prescribed due
to an early manifestation of the disease
– For eg. Estrogens prescribed for irregular bleeding that was the
first symptom of undetected endometrial cancer. If this was the
case then later diagnosis of the cancer would find an apparent
assoc. with estrogen usage.

55

Sources of bias - 11
Summary
– Before starting a study, one should list the
likely sources of bias and plan the
investigation and analyses so as to
prevent/minimize them

56

Specific limitations of Case control study
• Is not useful to study weak associations
(OR < 1.5)
• Non-participation rates are freq. low and
differential for cases and controls
• Differential recall bias

57

Applications of Case control study
1. Vaccine effectiveness
2. Evaluation of treatment and program
efficacy
3. Evaluation of screening programs
4. Outbreak investigations
5. Demography
6. Genetic epidemiology
7. Occupational epidemiology
58

Pertussis vacc. in UK - 1
Year

Event(s)

1906

Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in
artificial media

1912-14

Pert. vaccine used by many researchers

Next few Many versions of vaccine developed
years
1942

Several local authorities in UK start vaccine

1947-48

First published reports appear of irreversible brain damage after wholecell pertussis vaccine

1957

85,000 cases of pertussis reported
Vaccination magnified to national scale

1975

Cases came down to 8,900
Pertussis incidence peaks every 4 years
The peaks became smaller and smaller, the smallest was in 1974-75

The next peak at 1978 should have been the smallest, but was it?
59

Year

Event(s)

1974-75

Adverse publicity by media about the side effects of pert.
vaccine
Parents and doctors hesitated to give vaccine

1976-79

National Childhood Encephalopathy Study (NCES)
commissioned by the Dept. of Health and Social Security

1974

Vaccine acceptance rate came down (from 78% in 1971) to
37%

1977-79

An epidemic of pertussis occurs in Great Britain. > 100,000
cases and 36 deaths

1979

Vaccine Damage Payment Act passed in Great Britain. The
act provides a mechanism for government compensation to
those with vaccine-associated injuries

61

• Findings of the NCES study;

• Attributable risk –
– Serious neurological disorders = 1 in 1,10,000 injs.
– Persistent neurological sequelae = 1 in 3,10,000 injs.
62

Year

Event(s)

1982

British Child Health and Education Study
Long-term neurologic problems are not found to be related to
pertussis immunizations.

1983

Communicable Diseases Surveillance Centre Study, or North
West Thames Study, followed a large group of children after
pertussis vaccination, finds no convincing evidence relating DPT
vaccine to neurologic damage.

1988

Loveday judgment in Great Britain's High Court rules that there is
insufficient evidence to demonstrate that pertussis vaccine can
cause permanent brain damage.
Considered as "test case" meaning that other lawsuits claiming
permanent neurologic effects from pertussis vaccine are effectively
excluded.

63

1990Happy
ending?

64

Critical Appraisal of NCES - 1
• Research question
– Intended and actual

• Study design
– Case control – reasons for choosing
– Cohort – reasons for not choosing

• Case selection

65


• Only hospital admitted cases were selected as
cases – any comments?
• Control selection

– Comments?
66

• Exposure measurement

67

• Results

• There was no noticeable clustering in any area
68

• Results
• 3.5% of cases and 1.7% of controls had been
immunized
• OR 0f 2.4, p value < 0.001

69

• Results
• There was no significant association between serious
neurological illness and diphtheria and tetanus vaccine
• Confounders
• History of fits
– Is a known contraindication to immunization, including such cases
will underestimate OR,
– A separate analysis limited to normal children with no past history
of fits gave a RR of 3.2

• Social class
– Could not be controlled
– But analysis in those pairs of children in which both the affected
and control were of the same social class – no differences
70

• Causation Vs. association
A) clinically distinctive
B) restricted to immunized children
C) closely related in time to immunization
D) biologically plausible
E) without alternative explanation

• Attributable risk
– Can this be calculated in a case control study?
– Covered an entire national population (in theory represents
the total incidence of serious neurological illnesses,
assumption about immunization coverage)

– Serious neurological disorders = 1 in 1,10,000 injs.
– Persistent neurological sequelae = 1 in 3,10,000 injs.
– Is this appropriate?
71

Case control study – part 1

More Related Content

What's hot

Viewers also liked

Similar to Case control study – part 1

More from Rizwan S A

Recently uploaded

Case control study – part 1