2. CONTENTS
⢠Introduction
⢠Design of case control study
⢠Basic steps
Selection of cases and controls
Matching
Measurement of exposure
Analysis of exposure
⢠Potential biases in case control studies
⢠Confounding and bias 2
4. INTRODUCTION TO ANALYTICAL STUDY
⢠Subject of interest is the individual within the population
CASE CONTROL
STUDY
[RETROSPECTIVE]
COHORT STUDY
[PROSPECTIVE]
ANALYTICAL
EPIDEMIOLOGY
4
5. CASE CONTROL STUDY
Schematic diagram on design of Case control study
Cases
(people
with
disease)
Controls
(people
without
disease)
5
TIME
Direction of inquiry
Exposed
Not exposed
Exposed
Not exposed
Population
6. CASE CONTROL STUDY
⢠Also called the âRetrospective studyâ
3 distinct features:
ďBoth exposure and outcome (disease) have occurred before the start of the study
ďStudy proceeds backwards from effect to cause
ďUses control or the comparison group to support/refute an inference
⢠Hence they are basically comparison studies , wherein cases and controls must be
comparable with respect to known âconfounding factorsâ like age, sex, occupation, social
status etc 6
8. 1. Selection of
cases and
controls
2. Matching
4. Analysis of
interpretation
3. Measurement
of exposure
8
B
A
S
I
C
S
T
E
P
S
9. 1A. SELECTION OF CASES
⢠Identification of cases is relatively easy but selection of suitable controls may pose
difficulties
⢠Group of individuals with the disease are cases
ďź Definition of case : Prior definition to what constitutes a case is important
Involves two specifications:
ďąDiagnostic criteria
⢠Diagnostic criteria of the disease and the stage(eg: Stage 1 cancer) of disease if any must be
specified before the study is undertaken
⢠Once the criteria is established , they should not be changed or altered till the study is over
9
10. ďąEligibility criteria: Incident cases are considered more eligible than prevalent cases
⢠The reason is that any risk factors we may identify in a study using prevalent cases
may be related more to survival with disease than to development of disease
(incidence)
ďź Sources of cases: Cases may be drawn from hospitals, or general population
⢠Hospitals: Often convenient
⢠Recruitment of only newly diagnosed (incident) cases within a specified period of
time are eligible than old cases or cases in the advanced stages (prevalent)
⢠May be drawn from a single hospital or a network of hospitals, admitted at a
specific period of time
⢠Entire case series or a random sample is drawn from it
10
11. 11
⢠General population: In population based studies, all cases with study disease within a
defined geographical area at a specified period of time are selected
⢠Cases are ascertained through a survey, a disease registry or hospital network
⢠Either entire case series or a random sample is drawn from it
⢠Cases must fairly represent all the cases in community
12. 1B. SELECTION OF CONTROLS
12
⢠Controls must be free from disease under study
⢠Must be as similar as possible to cases except the absence of disease
⢠Difficulties in selection of controls may arise when the disease under investigation occur in
subclinical forms whose diagnosis is difficult
Sources
of
controls
Hospitalized or hospital controls
Non hospitalized controls
13. HOSPITAL CONTROLS
13
⢠Controls may be selected from the same hospital as cases but should be with
different illness than the disease under study
⢠Relatively more economical
ď Disadvantages
⢠Source of âselection biasâ
⢠May have diseases also influenced by the factor under study
⢠Eg: Relationship of tobacco and oral cancer is been studied and cases with bladder
cancer is chosen as controls , the relationship may not be demonstrated
14. NON HOSPITALIZED CONTROLS
14
ďRelatives
⢠Controls from siblings and spouses
⢠Siblings are unsuitable for studies undertaking genetic conditions
ďNeighborhood controls
⢠Drawn from persons living in the same locality as cases
⢠Same factory or children attending same schools
ďGeneral population
⢠Obtained from defined geographic areas by taking a random sample of individuals free of
disease
15. HOW MANY CONTROLS ARE NEEDED?
15
⢠If many cases are available, a large study is contemplated
⢠If the cost to collect case and control is about equal , then one tends to use one
control for each case
⢠If study group is small (eg: under 50 ) , 2, 3 or even 4 controls may be selected for each
study subject
17. 2. MATCHING
17
⢠Ensures comparability between cases and controls
⢠DEFINITION : Process by which we select controls in such a way
that they are similar to cases with regard to certain pertinent
variables (age,sex,occupation etc) which are known to influence
the outcome of the disease and which if not adequately matched
for comparability could distort or confound the results
⢠Protects against an unexpected strong association between the
matched factor and the disease
⢠TWO TYPES:
o Group matching/ frequency matching
o Individual matching/ matched pairs
18. GROUP MATCHING/FREQUENCY MATCHING
18
⢠Selecting the controls in such a manner that the proportion of controls with a certain
characteristic(age,occupation,social status etc) is identical to the proportion of cases
with the same characteristic
⢠For example if 25% of the cases are married, the controls will be selected so that 25%
of that group is also married
⢠After calculations are made of the proportions of certain characteristics in the group of
cases, then a control group, in which the same characteristics occur in the same
proportions, is selected
⢠Frequency distribution of the matched variable must be similar in study and control
group
19. INDIVIDUAL MATCHING/MATCHED PAIRS
19
⢠For each case selected for the study, a control is selected who is similar to the case in
terms of the specific variable or variables of concern
⢠Eg: 50 year old mason with a particular disease will have a 50 year old mason without
the disease as control
⢠One can obtain pairs of patients and controls of same sex,age,duration, severity of
illness etc
⢠Often used in case-control studies that use hospital controls
20. PROBLEMS WITH MATCHING
20
o Practical Problems : If an attempt is made to match according to too many characteristics,
it may prove difficult or impossible to identify an appropriate control
⢠Overmatching also leads to an inability to statistically analyze variables used in matching
o Conceptual problems : Once we have matched controls to cases according to a given
characteristic, we cannot study that characteristic/risk factor
⢠For example, suppose we are interested in studying age as a risk factor for periodontitis. If
we match the cases (with periodontitis) and the controls (no periodontitis) for age, we
can no longer study whether or not age is a risk factor for periodontitis
21. 21
⢠By using matching to impose comparability for a certain factor, we ensure
the same prevalence of that factor in the cases and the controls
⢠Clearly we will not be able to ask whether cases differ from controls in the
prevalence of that factor
o Unplanned matching may inadvertently occur in case-control studies
o Eg: If we use neighborhood controls, we are in effect matching for
socioeconomic status as well as for cultural and other characteristics of a
neighborhood
o Unplanned matching on a variable that is strongly related to the exposure
being investigated in the study is called overmatching
o Overmatching reduce odds ratio
22. USE OF MULTIPLE CONTROLS
22
⢠Matching 2 : 1, 3 : 1 or 4 : 1 will increase the statistical power of the study. Therefore
many case-control studies will have more controls than cases
⢠These controls may be either :
(1) controls of the same type or
(2) controls of different types, such as hospital and neighborhood controls or controls
with different diseases
23. MULTIPLE CONTROLS OF SAME TYPE
23
⢠Multiple controls of the same type, such as two controls or three controls for each
case, are used to increase the power of the study
⢠A noticeable increase in power is gained only up to a ratio of about 1 case to 4
controls
ďś Why not keep the ratio of controls to cases at 1 : 1 and just increase the number of
cases?
ď For many of the relatively infrequent diseases we study (which are best studied using
case-control designs), there may be a limit to the number of potential cases available
for study
ď The number of cases cannot be increased without either extending the study in time
to enroll more cases or developing a collaborative multicenter study, the option of
increasing the number of controls per case is often chosen
⢠These controls are of the same type (e.g., neighborhood controls); only the ratio of
controls to cases has changed
24. MULTIPLE CONTROLS OF DIFFERENT TYPE
24
⢠The exposure of the hospital controls used in a study may not represent the rate of
exposure that is âexpectedâ in a population of non diseased personsâthat is, the
controls may be a highly selected subset of non diseased individuals and may have a
different exposure experience
⢠To address this problem, we may choose to use an additional control group, such as
neighborhood controls
25. 3. MEASUREMENT OF EXPOSURE
25
⢠Information about exposure must be precisely obtained
⢠Obtained by interviews, questionnaires, or studying past records of cases like hospital
records, employment records etc.
26. 4. ANALYSIS
26
⢠Final step to find out:
ď Exposure rates among cases and controls to suspected factor
ď Estimation of disease risk associated with exposure (odds ratio)
1. Exposure rates
⢠Case control studies give direct estimation of the exposure rates (frequency of
exposure) to a suspected factor in disease and non disease groups
28. 28
⢠The particular test of significance will depend upon the variables under
investigation
⢠According to convention, if P is less than or equal to 0.05 , regarded as
statistically significant
⢠Smaller the P value, the greater the statistical significance or probability that
the association is not due to chance alone
⢠P value does not imply causation
29. 1. ESTIMATION OF RISKS
29
⢠Estimates the disease risk associated with exposure
⢠Obtained by an index called ârelative riskâ (RR) or the ârisk ratioâ
⢠Its defined as the ratio between the incidence of disease among exposed persons
and incidence among non- exposed
ďś Formula
Relative risk = Incidence among exposed
Incidence among non exposed
(a/a+b divided by c/c+d)
Cases (with
disease)
Controls (without
disease)
Total
Tobacco users a b a + b
Non users c d c + d
Total a + c b + d a + b + c + d
30. 30
⢠A typical case control study does not provide incidence rates from which RR can be
calculated directly
⢠As there is no appropriate denominator or population at risk to calculate these risks
⢠RR is exactly determined in cohort studies
31. 31
Interpretation of relative risk
⢠Value ranges from 0 to infinity
⢠RR equal to 1 indicates no association between the exposure and the health related
event
⢠RR greater than 1 indicates positive association and RR less than 1 indicates
negative association
⢠Taking the previous example , if the relative risk value equals to 2.5 , then it
indicates that the smokers are 2.5 times more likely to develop oral cancer than non
smokers
32. 2.ODDS RATIO/RELATIVE ODDS
32
⢠Also called the âcross product ratioâ
⢠Measure of strength of association between risk factor and outcome
⢠Closely related to RR
⢠Derivation is based on 3 assumptions:
ďź Disease being investigated must be relatively rare
ďź Cases must represent those with the disease
ďź Controls must represent those without the disease
33. 33
DISEASES
Yes No
Exposed a b
Not exposed c d
Total a+b+c+d
Odds ratio = a/b = ad
c/d bc
Key parameter in the analysis of case control studies
36. POTENTIAL BIASES IN CASE CONTROL STUDIES
36
⢠Bias is any systematic error in the determination of the association between the
exposure and the disease
⢠Can increase or decrease the relative risk estimate
Selection bias : Occur due to ill defined population, during sampling etc
⢠There may be systematic differences in characteristics between cases and controls
⢠Can be controlled by its prevention
Examples :
ď§ Health care access bias : When cases admitted to certain facility do not represent cases
in the community
37. 37
ď§ Popularity bias: When admissions are based on interest of the investigator
ď§ Neyman bias/ Selective survivor bias/ Late look bias/Prevalence-incidence bias :
When exposure of interest is a prognostic determinant by under evaluating
association between disease and risk factor ,ie: mild, clinically resolved or fatal
cases are excluded from case group
ď§ Berkesonian bias : Occur due to different rates of admission to hospitals for people
with different diseases
ď§ Inclusion bias: Controls with one /more conditions related to exposure are selected
ď§ Exclusion bias: Controls with conditions related to exposure are excluded
ď§ Mimicry bias: Conditions clinically close to disease may be diagnosed as disease
itself
38. 38
ď§ Non-response Bias: When participants differ from nonparticipants. The healthy worker
effect is a particular case when the participants are healthier than the general
population
Information bias / Measurement bias : Occur due to systematic measurement error,
misclassification of subjects in one / more variables
Examples
ď§ Observer bias : When different observers may get different measurement for a case
ď§ Interviewer bias: When the interviewer knows the hypothesis and also who the cases
are can introduce errors in questioning the cases more thoroughly ,by asking leading
questions , emphasizing on some questions and helping with responses
⢠Can be eliminated by checking the length of time taken for interviewing the average
cases and average controls and by double blinding
39. 39
ď§ Memory/Recall bias: When asked about past history , the cases are more likely to
recall certain events or factors than controls
ď§ Reporting bias: Participants help the researchers by giving answers in direction of
interest or do not answer sensitive questions
ď§ Hawthorne effect: When participants are aware that they are being observed
there can be an increase in productivity or outcome
ď§ Surrogate bias: When case himself id not available for giving information,
somebody other than case is interviewed
40. 40
Confounding and bias
⢠Bias that occur due to a confounding factor
⢠Confounding factor is a third factor associated with both exposure and disease ,
distributed unequally in control and study group
⢠Confounding occurs because of non-random distribution of risk factors in the study
population
ď Be associated with exposure without being the consequence of exposure
ď Be associated with outcome independently of exposure (not an intermediary)
Example
⢠Study of role of smoking in oral cancer , alcohol is a confounding factor
⢠Confounding can be eliminated by matching
41. 41
⢠By convention, when a third variable masks or weakens a true association
between two variables, this is negative confounding (observed association is
biased towards the null)
⢠When a third variable produces an association that does not actually exist, this is
positive confounding (observed association is biased away from the null)
⢠To be clear, neither type of confounding is a âgood thingâ (i.e., neither is a positive
factor); both are âbadâ (i.e., negative in terms of effect)
42. SYNERGISM AND EFFECT MODIFICATION
42
⢠Synergism (from Greek roots meaning âwork togetherâ) is the interaction of two or more
presumably causal variables, so that the combined effect is clearly greater than the sum of
the individual effects
⢠Eg: Combined effect of uncontrolled diabetes and bad oral hygiene will have greater risk of
periodontitis than the individual factors alone
⢠When an association between an exposure and disease outcome is modified by the level of
an extrinsic risk factor , that extrinsic variable is called an effect modifier
⢠This association is usually called effect modification by epidemiologists and interaction by
biostatisticians
43. CONFOUNDING V/S EFFECT MODIFICATION
43
CONFOUNDING
⢠An effect or association between
an exposure and outcome is
distorted by the presence of
another variable
⢠If an observed association is not
correct because a different
variable is associated with both the
potential risk factor and the
outcome, but it is not a causal
factor itself
EFFECT MODIFICATION
⢠A variable that differentially (positively and
negatively) modifies the observed effect of a
risk factor on disease status.
⢠Different groups have different risk estimates
when effect modification is present
⢠If an effect is real but the magnitude of the
effect is different for different groups of
individuals (e.g., males vs. females or blacks
vs. whites)
45. CONCLUSION
45
⢠In recent years, considerable attention has focused on whether it is possible to take
advantage of the benefits of both case control and cohort studies into a single study
⢠Resulting combination lead to HYBRID DESIGN in which a case control study is
initiated within a cohort study
⢠Here the population is identified and followed over time
⢠Based on approaches used in the selection of controls there are two types:
⢠Nested case control study
⢠Case-cohort study
47. DESIGN OF CASE COHORT STUDY
47
⢠Cases develop at the same times that
were seen in the nested case-control
design just discussed, but the controls
are randomly chosen from the defined
cohort with which the study began
⢠This subset of the full cohort is called
the sub cohort
⢠advantage of this design is that because
controls are not individually matched to
each case, it is possible to study
different diseases (different sets of
cases) in the same case-cohort study
using the same cohort for controls
48. REFERENCES
48
ďź Park K. Health care of the community. Textbook of Preventive and Social Medicine. 24th
ed. Jabalpur: Banarsidas Bhanot Publishers. 2018;75-85
ďź Celentano D David.Szklo Moyses.Gordis epidemiology.6th ed.Canada.Elsevier.2019;157-
174,187
ďź Merrill RM. Introduction to epidemiology. 5th edition. Jones & Bartlett Learning;
2010;186-196
ďź Bonita R, Beaglehole R, KjellstrĂśm T. Basic epidemiology. World Health Organization;
2006.
49. 49
ďź Kumar G, Acharya AS. Biases in epidemiological studies: How far are we from the
truth?. Indian Journal of Medical Specialities. 2014 Jan 1;5(1):29-35.
ďź Katz DL, Elmore JG, Wild D, Lucan SC. Jekel's Epidemiology, Biostatistics, Preventive
Medicine, and Public Health: With STUDENT CONSULT Online Access. Elsevier Health
Sciences; 2013 Feb 11; 64-65
ďź Interpreting results of case control studies. Centre for Disease Control.[Accessed on 6th
December 2019]
Url at :www.cdc.gov/training/SIC_CaseStudy/Interpreting_Odds_ptversion
If cancer is being studied, all the cases undertaken should be histologically same
Incident : newly diagnosed cases within a specific period of time
Prevalent : old cases or cases in advanced stages
Why ?
Controls can differ in many factors like age sex occu
When the study is used to find out associations, bias should be ruled out
If a/a+c is higher then frequency rate of occurrence of oral cancer is def higher in tobacco users than non users, next step is to see if there is any statistical sig , for that find p value
CC study does not provide incidence rates to directly calculate rr as there is no appropriate denominator or popultn at risk to calculate these rates. R
Attractive mathematical properties, OR can be calculated using logistic regression . Adjusted odds ratio is when confounder variables are included
Berk bias reduced by randomization
Crude OR > stratified OR then positive confounding OR< Stratified OR then negative confounding
A situation in which the effect or association between an exposure and outcome is distorted by the presence of another variable - confounding
 a variable that differentially (positively and negatively) modifies the observed effect of a risk factor on disease status. Different groups have different risk estimates when effect modification is present
Controls are a sample of individuals who are at risk for the disease at the time each case of the disease develops