Designs vary based on the following key elements:
1. How are subjects sampled (on disease status or exposure
2. Data collection methods (survey, observations, experimental)
3. Timing of data collection (when)
4. Unit of observation (Individuals or groups)
5. Number of observations made (data collection points)
6. Availability of subjects
• Things to consider in choosing a study design:
The prevalence of the EXPOSURE (Frequent vs. Rare)
The prevalence of the OUTCOME (Frequent vs. Rare)
How much is known about the relationship?
How strong you want the evidence to be?
Choosing a good study design is essential in
The general common choices are:
Case reports and Case series
Randomized-controlled Trial (experimental design)
Manipulation of study factor
Was exposure of interest controlled by the investigator?
Randomization of study subjects
Was there use of a random process to determine exposure of
These studies help us generate hypotheses
(Observational/ descriptive Studies =
the investigator observes the natural
occurrence of the events)
Physician or group of physicians seeing a handful of cases of
a very rare disease, or an unusual presentation of a disease
They are useful as a starting point for thinking about etiology (cause).
Has no comparison group
Without any comparison group, we CANNOT assume any
I have a
dots on my feet.
A case series is an extension of a case report. You now have
several cases of the disease
You can start looking for commonalities Which may help you develop a
You cannot make ANY argument about an exposure being
associated with disease.
For that, we need a comparison group.
Selection bias (major disadvantage)
Error due to systematic differences in characteristics between those who
take part in a study and those who do not (nonresponse problems).
Ecologic Studies (group level analysis)
Aggregates of individuals defined by units:
geographic region, school, health care facility.
Does the overall occurrence of disease in a
population correlate with occurrence of the
No individual level data
People argue that a high-fat
diet is associated with breast
cancer since countries with
high-fat diets have higher
breast cancer incidence.
Ecologic Study Drawbacks
Can we be sure that the pattern we are seeing is
reflecting the real cause?
No, not with this design.
We are looking at data on groups, not individuals.
Tendency to apply the relationship seen at the group
level to an individual == ecologic fallacy
Tendency to think what is meaningful for the group is
meaningful to the individual
Likely to be many confounding factors we haven’t
So, to look at causal
should have a
group… …and I should
have data on
Cross-Sectional Studies (Surveys)
Collect data (single point in TIME) on individuals and
allows for comparison between groups.
Sampling is done on a single group of people of interest
(e.g. college students and drinking)
The main tool is a survey.
After collecting data,you group them by exposure and
an outcome status to everyone (drinkers vs. nondrinkers)
Compute a measure of association to compare by either
exposure or disease status
Considers prevalence of disease or condition
Cross-Sectional 2 x 2
E+ A B
E- C D
Now, you distribute your total
N based on E and D status
e.g., obesity and diabetes
Example: Cross-Sectional Study
Diabetes No Diabetes
obese 150 50 200
Not obese 75 125 200
225 175 400
Prevalence of exposure comparison in diseased and
OR = a/c = ad
= (150 x 125) / (50 x 75) = 18750 / 3750 = 5.0
Usually easy to conduct
Describe patterns of disease occurrence
Can study several diseases or exposures
We have prevalence data only, no incidence data.
We collect exposure and outcome information at the
same time, so we cannot argue any temporal
relationship between the two (exposure must precede
disease in order to be causal)
Selection bias (non-response)
So what is our next step?
So far we discussed at, descriptive designs that allow us
to look at correlations and associations, and provide
knowledge for hypothesis development
But, we need to consider incident cases and time
sequence if we want to look at cause-effect
For that, we must turn to the analytic study designs.
case-control, and cohort (hypothesis testing)
Design of a Case-Control Study
Case-control Study Analytic study
The study subjects are selected on the basis of whether they
have the disease (cases) or not (controls) and then
We determine how many in each group had the risk factor
A case definition is a statement of the characteristics a subject must
have to be considered a “case” of the disease to be studied.
Usually confirmed by clinical signs or laboratory test
Signs must distinguish the specific disease from similar
conditions that might be caused by different exposures, genetic
traits, interactions, etc.
May consist of combinations of symptoms. (e.g., flu vs. cold)
Exposure collection is retrospective
You will ask cases and controls about their exposures before
they became a case (or before the date they were enrolled as a
Need to identify a comparison group - controls
These could be people in the hospital with other conditions,
people from the neighborhood where the case lives, people
identified from voter registration
The goal is to identify a control group that represents
the source population of the cases
Patients from the same hospital as the cases
Relatives of cases
Friends of cases--SES control
Any exclusion or inclusion criteria applied to the
selection of cases must also be applied in the
selection of controls.
Case-control 2 x 2 Odds Ratio
Your total group
You assign exposure
status for the case and
control groups, and this
completes the table
Definition of odds: the ratio of the probability of an event occurring
(disease) to that of it not occurring = P /1-P
Calculating Odds Ratio
Exposed a b
Not Exposed c d
Odds of exposure among cases:
Proportion of exposed among cases
Proportion of non-exposed among cases
= (a/a+c) / (c/a+c) = a/c
Odds of exposure among controls:
Proportion of exposed among controls
Proportion of non-exposed among controls
= (b/b+d)/(d/b+d) = b/d
OR = a/c = ad
Data from a case-control study of current oral contraceptive (OC)
use and myocardial infarction in premenopausal female nurses
Data from L. Rosenberg et al., Oral contraceptive use in relation to non-fatal
myocardial infarction. Am J. Epidemiol. 111:59, 1980.
Current OC use
a/d 23 x 2816
OR = = = OR = 1.6
b/c 304 x 133
a = 23 b = 304
c = 133 d = 2816
Those who used OC had a 60% increased risk of having MI
compared to women who did not use. Or
the relative odds are nearly 2 times higher among those who used OC
Relatively inexpensive, small sample sizes, fast study technique
Great for rare diseases (sampling on disease status lets you
guarantee enough cases to make comparisons)
Can study multiple exposures
•Hard to find an appropriate control group (should match the
cases in other characteristics)
•Potential for recall bias – biggest problem (limited recall or
•Can assume, but cannot confirm the temporal sequence of
exposure and disease
A cohort is defined as a population group, or subset
thereof, that is followed over a period of time.
• Instead of sampling on disease status, we sample on exposure
(Subjects are defined on the basis of presence or absence of exposure to a
risk factor; but you are not assigning an exposure to occur)
• Follow people prospectively until disease occurs or the study
ends (thus, you can demonstrate that exposure precedes disease)
At the time exposure status is defined subjects are outcome negative.
What are the requirements for the Cohort Population
All members of the cohort population must:
• Be free of the disease at the start of the study period.
• Be at risk of developing the disease
– Must be alive (exclude dead people).
– Must not be immune (exclude people who have been immunized or
have had the disease before, if the first episode confers immunity).
– Must not be in a non-susceptible group (men don’t get
The purpose of following a cohort is to measure the
occurrence of one or more specific diseases/outcomes
during the period of follow-up, usually with the aim of
comparing the disease rates for two or more cohorts.
TYPES OF COHORT STUDIES
PROSPECTIVE (OR CONCURRENT)
RETROSPECTIVE (OR NON-CONCURRENT)
cohort studies with sampling unrelated to exposure (common)
cohort studies with exposure-based sampling (rare exposure)
C. POPULATION BASE
D. TYPE OF COHORTS
• OPEN - people moving in and out
• CLOSED - fixed population
cohort is defined at the start of the study.
No subjects are added after the start,
some cohort members may drop out or die before the end
of the study.
Investigators try to follow all cohort members to the study
Examples: Clinical trials, Framingham study, Nurses, Health Study
Open Cohort = The cohort takes on new members and
may lose members as the study progresses. Also called a
Dynamic Cohort or Dynamic Population.
Examples: Cancer registries, school studies, hospital
infection surveillance studies
Types of Cohorts
Exposure Study starts
Prospective cohort study
Retrospective cohort study
Prospective Study: Observing a cohort of subjects and
over a long period for an event occur (e.g., disease or
death, or cure) during the study period and take a note of
suspected risk or protective factor(s).
1. The outcome of interest should be clear and defined.
2. Use incidence of an outcome or the relative risk of an
outcome based on exposure.
Prospective vs. Retrospective studies
Retrospective Study: Looks backwards and examines
exposures to suspected risk or protection factors in relation
to an outcome. E.g., cancer
1. Many valuable case-control studies, are retrospective and
ask for patient histories.
2. the odds ratio provides an estimate of relative risk (in Rare
• Framingham Heart Study
• Study began in 1948 by recruiting an Original Cohort of
5,209 men and women between the ages of 30 and 62 from
the town of Framingham, Massachusetts, who had not yet
developed overt symptoms of cardiovascular disease or
suffered a heart attack or stroke
• . Since that time the Study has added an Offspring Cohort
in 1971, the Omni Cohort in 1994, a Third Generation
Cohort in 2002, a New Offspring Spouse Cohort in 2003,
and a Second Generation Omni Cohort in 2003.
• identification of major CVD risk factors
Examples of Prospective Cohort Studies
Evans county, GA (biracial)
Bogalusa, LA (children)
OCCUPATION BASED TO STUDY
Coke-oven workers (lung cancer)
Asbestos workers (lung cancer)
Radium dial painters (oral cancer)
• Mustard-gas poisoning from WW I (lung disease)
• Vietnam Veterans (post-traumatic stress disorder, agent orange
• Gulf War Veterans (Gulf war syndrome)
The National Children’s Study will examine the effects of environmental influences on
the health and development of 100,000 children across the United States, following them
from before birth until age 21.
• 105 Study locations (counties or groups of counties) across the United
• 79 metropolitan areas (urban, suburban, and small cities), as well as 26 rural
• All locations were selected using a probability-based sampling method to
ensure that children and families across the nation—from diverse ethnic,
racial, economic, religious, geographic, and social groups—are represented in
• Enroll women who are either pregnant or likely to have a child during the
• Each Study location will recruit enough women for 250 infant births per year
during the four-year enrollment period.
A - Bexar County, TX
B - Childress County, TX *
C - Collingsworth County, TX *
D - Dallas County, TX *
E - Donley County, TX *
F - Hall County, TX *
Harris County, TX
H - Hidalgo County, TX A
I - Lamar County, TX
J - Stephens County, TX *
K - Travis County, TX *
L - Young County, TX *
Study Locations State
Research efforts geared toward studying children’s
health and development and will form the basis of
child health guidance, interventions, and policy for
generations to come.
The NCS will examine important health issues to
establish links between children’s environments and
their health, including:
birth defects and pregnancy-related problems
behavior, learning, and mental health disorders
* To become active 2009 - 2010
Families will join the Study, or enroll, beginning in some communities
in the winter of 2009. Other communities will begin enrolling families
over the next couple of years.
Cohort Data Analysis
The tabulation and analysis of morbidity or mortality rates
The study base is the person-time experience of the
individuals in whom the outcome is ascertained.
Calculation of person-years at risk is the means of achieving
equivalence of study base in cohort studies.
The denominator depends on whether all subjects were
followed till the end of the study or not.
Measuring Exposure: The Scale
• There are different ways to measure exposure, and
their association with outcomes will vary.
• For example,
measuring smoking exposure:
1. • Yes
2. • No
3. • Never
4. • Past but quit (pack-years, type of smoking)
5. • Current (pack-years, type of smoking)
– Pack-years of smoking
How many cigarettes do you currently smoke per day?
Then we can categorize accordingly.
Cohort Study 2 x 2
You selected study subjects by
exposure status, so you know
these row totals.
(a+b) and (c+d)
Now you separate your two
exposure groups by disease status
We can calculate either the Relative Risk as a valid measure of association,
Risk ratios or Relative Risk
What if we started with
exposure instead of disease
status? We could compute
risk directly, so the measure
of association is the risk
Essentially, you compute the
risk in the exposed group
and divide by the risk in the
Risk in exposed=a/(a+b)
E+ a b
E- c d
Imaging that the data
from Example 1 were
actually from a cohort
(meaning we followed
OJ drinkers and non-
drinkers over time)
The relative risk would
be 2.5, meaning orange
juice drinkers had a
150% increase (2.5-1)
in the risk of stomach
ulcer as compared to
E+ 50 150 200
E- 20 180 200
80 320 400
Life Table Methods
Give estimates for survival during time intervals and present
the cumulative survival probability at the end of the interval.
Example: Life tables can be constructed to portray the
survival times of patients in clinical trials.
There are two life table methods:
• Cohort life table:
– Shows the mortality experience of all persons born during a
• Period life table:
– Enables us to project the future life expectancy of persons
born during the year as well as the remaining life expectancy
of persons who have attained a certain age.
A method for portraying survival times
In order to construct a survival curve, the following
information is required:
Time of entry into the study
Time of death or other outcome
Status of patient at time of outcome, e.g., dead or censored
(patient is lost to follow-up)
– 15 subjects followed over 36
months; all entered the study at
the same time.
– Nine died at different points of the
– Deaths of two patients caused a
steep drop at 19 months.
– Each step indicates the death(s)
of one or more patients.
Advantages of Cohort Studies
- Can examine rare exposures (asbestos = lung
- Temporal relationship can be inferred (prospective
design); temporal order between exposure and
- Time-to-event analysis is possible - Multiple
outcomes of a single exposure can be studied
(smoking = lung cancer, COPD, larynx cancer); may
uncover unanticipated associations with outcome
- Cohort studies are usually but not exclusively
- outcome is measured after exposure and can
determine multiple effects of a single exposure.
- Lengthy/ Time consuming and expensive (if
- If retrospective, requires availability of adequate
- May require very large samples
- Not suitable for rare diseases: When outcomes are
rare, large populations need to be followed for
- Not suitable for diseases with long-latency
- Unexpected environmental changes may influence
- Nonresponse, migration and loss-to-follow-up
biases (Tracking for a long time)
- Potential for bias from loss to follow up affecting
- Sampling, ascertainment and observer biases are
Inefficient for study of rare disease
Bias or issues in Interpretation of the results
1. Loss to Follow Up
Large loss to follow up (more than 30%)
will raise issues about validity.
Obtain as much data as possible.
Loss to follow up analysis using baseline
data to compare those interviewed and
those lost to follow up.
Loss to follow up is the major source of
bias in cohort studies.
Retention rate related to length of follow
Those who agree to participate may differ from non-
Non-response affects generalizability and MAY affect
Non-response affects validity if it is associated to both
Analytic study designs used in epidemiology
Sampling Best to use when:
Cross-sectional Entire study group
Exposure and outcome are common
The relationship is not well
Recruit cases of
disease and a
Outcome/disease is rare (roughly
<10% of the population has the
who are exposed
to a factor and
those who are not
Exposure is rare
Disease does not have a long
Closed Cohort --
Open Cohort --
people to be
(control or placibo
You want to see if an intervention or
drug is more effective than another
There is not a long latency period for
People exposed Outbreak situation
Attack rate ratio
(RR or OR)