PHC215
By Dr. Khaled Ouanes Ph.D.
E-mail: k.ouanes@seu.edu.sa
Twitter: @khaled_ouanes
INTRODUCTION TO
HEALTHCARE RESEARCH
METHODS
Correlational /Ecological
Studies
Correlational studies are also called
ecological or aggregate studies.
This type of studies uses population-level
data to examine the relationship between
exposure rates and disease rates.
We are thus in the case of a study in which
units of analysis are populations or groups
of people rather than individuals.
i.e. The focus will be on the comparison of
populations/groups rather than individual patients
or participants.
Examples
 Does the percentage of adults with multiple sclerosis
tend to be higher in countries farther from the
equator?
 Does the rate of asthma tend to be higher in cities with
higher levels of air pollution?
 Does the prevalence of diabetes tend to be higher
when we have higher prevalence of obesity?
Population-level data are used
to look for associations between
two or more group
characteristics
Data Sources
At least one data source (if not more) that
contains comparable information about the
population characteristics of interest must be
identified.
Information about all the variables of interest
must be available for a suitable number of
populations, which can be grouped by place or
by time.
Examples of Populations
All Western European countries
The largest 25 metropolitan areas in the Arab
world
All Sub-Sahara countries
A random sample of survey Areas in London
Historic data for the past decades from one or
more place-based populations
Exposures and Outcomes
At least one characteristic of the populations
being examined is designated as an exposure
Exposures are often environmental measures likely to be fairly consistent across an
entire population
At least one characteristic is designated as an
outcome
Aggregate Data
Population characteristics are in the form of
aggregate (grouped) data, such as:
 the proportion of each population with a particular characteristic
 the average value of the variable in the population
Examples of Exposures
 The percentage of adults older than 30 who have not completed
at least 12 years of education
 The mean income in the population
 The median age
 The number of rainy days over a given year in the population
 The average ultraviolet radiation index during midday in the
hottest month of the year
Examples of Diseases
The prevalence of obesity among adults
The mean BMI (body mass index) among adults
The annual mortality rate from asthma
Cautions
Correlational studies are valid only if the data
points are comparable.
A data point is a discrete unit of information. Generally, any single fact is a data point.
In a statistical or analytical context, a data point is usually derived from a
measurement or research and can be represented numerically and/or graphically.
In some populations, exposures and diseases may
be routinely undercounted or routinely over-
diagnosed compared to other populations.
Cautions
If multiple sources of data are used or if the data
were collected over a lengthy period of time,
then the definition of exposure or disease may
differ from one population to another and may
not be comparable.
Data Management Example
Data should be entered into a spreadsheet
Each population (A, B, C, etc.) is in its own row
Each exposure and each outcome is in its own
column
Analysis: Correlation
 On a scatterplot used to illustrate correlation, each point
represents one population in the study.
 The exposure is plotted on the x-axis, and the outcome or
disease is plotted on the y-axis.
Do you see a Correlation?
Do you see a Correlation?
Analysis: Correlation
1. When all the points fall neatly in a line, then the
correlation is strong.
2. When the points are not exactly linear but a line for
trend can be drawn, then the correlation is mild or
moderate.
3. When the points appear to be randomly placed
and no obvious line can be drawn through them,
then the correlation is weak or nonexistent.
Analysis: Correlation
 If higher levels of exposure are linked to higher rates of
disease, then the slope is positive.
 If higher levels of exposure are linked to lower rates of
disease, then the slope is negative.
Analysis: Correlation
 For continuous variables and other variables with
responses that can be plotted on a number line, a
Pearson correlation coefficient (r) should be used to
calculate the correlation.
 For variables that assign a rank to responses or that have
ordered categories, use the Spearman rank-order
correlation (designated by the letter r or the Greek letter
r (rho) in most statistical programs).
Analysis: Correlation
The Pearson method is built on the notion that if
Measurement 1 trails Measurement 2 (directly or
inversely), you can get some indications on how
linked they are by calculating Pearson's r -the
correlation coefficient-, which is a quantity
derived from the products of the differences
between each M1 and its average and each M2
and its average.
Analysis: Correlation
Spearman's rank coefficient is similar to Pearson in
producing a value from -1 to +1, but you would
use Spearman when the rank order of the data
are important in some way.
The Pearson test is more widely used.
 r = –1: all points lie perfectly on a line with a negative slope
 r = 1: all points lie perfectly on a line with a positive slope
 r = 0: no association between the exposure and outcome
 r2 shows how strong a correlation is without indicating the
direction of the association
Analysis: Correlation
Analysis
Use linear regression models when the goal is to:
compare more than two variables
understand the relationship between two variables
while controlling or adjusting for the effects of other
variables
Age Adjustment
When the populations being compared have
very different age structures, age adjustment
may be necessary to make a fair comparison
among populations.
Avoiding the Ecological Fallacy
 Correlational studies compare groups rather than
individuals.
 No individual-level data are included in the analysis,
only population-level data.
 The incorrect attribution of population-level
associations to individuals is called the ecological
fallacy.
Even though a population with a higher rate of
exposure to something has a higher rate of
disease than populations with lower exposure
rates, individuals in that population who have a
high level of exposure do not necessarily have
the disease.
Avoiding the Ecological Fallacy
Avoiding the Ecological Fallacy
The experience of an individual in a population
may vary significantly from the population
average.
It would be incorrect to assume that any one
individual from a country with a high average
body mass index (BMI) will be obese or that an
individual from a country with a low average BMI
will not be obese.
However, it is appropriate to identify trends in
populations and to use those observations to
generate hypotheses for individual-level studies
that will test for relationships between the
characteristics of interest in individuals.
Avoiding the Ecological Fallacy
Key Characteristics of Correlational
(Ecological) Studies
Case Series
Uses of Case Series
 Describing the characteristics of and similarities
among a group of individuals with the same signs
and/or symptoms of disease
 Identifying new syndromes and refining case
definitions.
 Clarifying typical disease progression
 Developing hypotheses for future research
Sample Size
Some case series for rare conditions may
require only a few participants
Other studies may include several hundred
individuals
Getting Started…
 Select one disease or condition of interest
 Determine what will be new and interesting about
the study
 Identify an appropriate and available source of
cases
 Establish a clear case definition that spells out
inclusion criteria and exclusion criteria.
Case Definitions
Specify characteristics related to:
The disease or procedure
 ICD codes (International Classification of Diseases codes) are often used as
part of the definition
Person
Place
Time
Sample Case Definitions
Data Collection
Primary data: interviews of cases using a
questionnaire and/or qualitative techniques
Secondary data from patient charts (medical
records)
 It is often helpful to create a questionnaire that guides the extraction of
information from medical records
 Be aware that patient charts are often incomplete; missing information
about a symptom does not mean that the patient did not experience it
Most case studies do not require
any advanced analyses or any
numbers beyond simple counts
and frequencies.
Key Characteristics of a
Case Series
Cross-Sectional Surveys
Overview
The goal of a cross-sectional survey, also
called a prevalence study, is to measure the
proportion of a population with a particular
exposure or disease at one point in time based
on a representative sample of a population.
Cross-sectional surveys are among the
most popular study approaches in the
health sciences because they allow for
the relatively rapid collection of new
data.
Uses
Cross-sectional surveys are used to:
Describe communities
Assess population needs
Evaluate programs
Establish baseline data prior to the initiation
of longitudinal studies
Representative Populations
Cross-sectional studies use a simple study design:
The researcher asks a few hundred people to
complete a short questionnaire and then analyzes the
data.
However, there is one very important requirement: the
participants must be reasonably representative of
some larger population.
Representative Populations
If a researcher wants the results of a survey to be
generalizable to all town residents, it is NOT acceptable to
use a convenience population such as:
 Friends
 Fans attending a football game
 Shoppers at a store at a given time on a chosen day
 Individuals attending a clinic
 Pupils attending a neighbourhood school
Representative Populations
If the results of a cross-sectional survey are
intended to reflect the profile of an entire
town (or other population group), then the
study’s sampling strategy must recruit a
population that is as diverse as the town.
Analysis: Prevalence
Prevalence = the proportion of the population with a given trait at
the time of the survey
Analysis: Comparative Statistics
 Prevalence rate ratios (PRRs) compare the prevalence of a characteristic in
2 population subgroups by taking a ratio of their prevalence rates
 Note: An exposure can be said to be “associated” or “related” to a disease,
but a cross-sectional survey cannot show that an exposure caused a
disease.
Key Characteristics of
Cross-Sectional Surveys
PHC215
By Dr. Khaled Ouanes Ph.D.
E-mail: k.ouanes@seu.edu.sa
Twitter: @khaled_ouanes
HEALTHCARE RESEARCH METHODS
Based on the textbook of introduction to health research methods – K.H. Jacobsen

INTRODUCTION TO HEALTHCARE RESEARCH METHODS: Correlational Studies, Case Series and Cross-Sectional Surveys

  • 1.
    PHC215 By Dr. KhaledOuanes Ph.D. E-mail: k.ouanes@seu.edu.sa Twitter: @khaled_ouanes INTRODUCTION TO HEALTHCARE RESEARCH METHODS
  • 2.
  • 3.
    Correlational studies arealso called ecological or aggregate studies. This type of studies uses population-level data to examine the relationship between exposure rates and disease rates.
  • 4.
    We are thusin the case of a study in which units of analysis are populations or groups of people rather than individuals. i.e. The focus will be on the comparison of populations/groups rather than individual patients or participants.
  • 5.
    Examples  Does thepercentage of adults with multiple sclerosis tend to be higher in countries farther from the equator?  Does the rate of asthma tend to be higher in cities with higher levels of air pollution?  Does the prevalence of diabetes tend to be higher when we have higher prevalence of obesity?
  • 6.
    Population-level data areused to look for associations between two or more group characteristics
  • 7.
    Data Sources At leastone data source (if not more) that contains comparable information about the population characteristics of interest must be identified. Information about all the variables of interest must be available for a suitable number of populations, which can be grouped by place or by time.
  • 8.
    Examples of Populations AllWestern European countries The largest 25 metropolitan areas in the Arab world All Sub-Sahara countries A random sample of survey Areas in London Historic data for the past decades from one or more place-based populations
  • 9.
    Exposures and Outcomes Atleast one characteristic of the populations being examined is designated as an exposure Exposures are often environmental measures likely to be fairly consistent across an entire population At least one characteristic is designated as an outcome
  • 10.
    Aggregate Data Population characteristicsare in the form of aggregate (grouped) data, such as:  the proportion of each population with a particular characteristic  the average value of the variable in the population
  • 11.
    Examples of Exposures The percentage of adults older than 30 who have not completed at least 12 years of education  The mean income in the population  The median age  The number of rainy days over a given year in the population  The average ultraviolet radiation index during midday in the hottest month of the year
  • 12.
    Examples of Diseases Theprevalence of obesity among adults The mean BMI (body mass index) among adults The annual mortality rate from asthma
  • 13.
    Cautions Correlational studies arevalid only if the data points are comparable. A data point is a discrete unit of information. Generally, any single fact is a data point. In a statistical or analytical context, a data point is usually derived from a measurement or research and can be represented numerically and/or graphically. In some populations, exposures and diseases may be routinely undercounted or routinely over- diagnosed compared to other populations.
  • 14.
    Cautions If multiple sourcesof data are used or if the data were collected over a lengthy period of time, then the definition of exposure or disease may differ from one population to another and may not be comparable.
  • 15.
    Data Management Example Datashould be entered into a spreadsheet Each population (A, B, C, etc.) is in its own row Each exposure and each outcome is in its own column
  • 16.
    Analysis: Correlation  Ona scatterplot used to illustrate correlation, each point represents one population in the study.  The exposure is plotted on the x-axis, and the outcome or disease is plotted on the y-axis.
  • 17.
    Do you seea Correlation?
  • 18.
    Do you seea Correlation?
  • 19.
    Analysis: Correlation 1. Whenall the points fall neatly in a line, then the correlation is strong. 2. When the points are not exactly linear but a line for trend can be drawn, then the correlation is mild or moderate. 3. When the points appear to be randomly placed and no obvious line can be drawn through them, then the correlation is weak or nonexistent.
  • 21.
    Analysis: Correlation  Ifhigher levels of exposure are linked to higher rates of disease, then the slope is positive.  If higher levels of exposure are linked to lower rates of disease, then the slope is negative.
  • 22.
    Analysis: Correlation  Forcontinuous variables and other variables with responses that can be plotted on a number line, a Pearson correlation coefficient (r) should be used to calculate the correlation.  For variables that assign a rank to responses or that have ordered categories, use the Spearman rank-order correlation (designated by the letter r or the Greek letter r (rho) in most statistical programs).
  • 23.
    Analysis: Correlation The Pearsonmethod is built on the notion that if Measurement 1 trails Measurement 2 (directly or inversely), you can get some indications on how linked they are by calculating Pearson's r -the correlation coefficient-, which is a quantity derived from the products of the differences between each M1 and its average and each M2 and its average.
  • 24.
    Analysis: Correlation Spearman's rankcoefficient is similar to Pearson in producing a value from -1 to +1, but you would use Spearman when the rank order of the data are important in some way. The Pearson test is more widely used.
  • 25.
     r =–1: all points lie perfectly on a line with a negative slope  r = 1: all points lie perfectly on a line with a positive slope  r = 0: no association between the exposure and outcome  r2 shows how strong a correlation is without indicating the direction of the association Analysis: Correlation
  • 26.
    Analysis Use linear regressionmodels when the goal is to: compare more than two variables understand the relationship between two variables while controlling or adjusting for the effects of other variables
  • 27.
    Age Adjustment When thepopulations being compared have very different age structures, age adjustment may be necessary to make a fair comparison among populations.
  • 28.
    Avoiding the EcologicalFallacy  Correlational studies compare groups rather than individuals.  No individual-level data are included in the analysis, only population-level data.  The incorrect attribution of population-level associations to individuals is called the ecological fallacy.
  • 29.
    Even though apopulation with a higher rate of exposure to something has a higher rate of disease than populations with lower exposure rates, individuals in that population who have a high level of exposure do not necessarily have the disease. Avoiding the Ecological Fallacy
  • 30.
    Avoiding the EcologicalFallacy The experience of an individual in a population may vary significantly from the population average. It would be incorrect to assume that any one individual from a country with a high average body mass index (BMI) will be obese or that an individual from a country with a low average BMI will not be obese.
  • 31.
    However, it isappropriate to identify trends in populations and to use those observations to generate hypotheses for individual-level studies that will test for relationships between the characteristics of interest in individuals. Avoiding the Ecological Fallacy
  • 32.
    Key Characteristics ofCorrelational (Ecological) Studies
  • 33.
  • 34.
    Uses of CaseSeries  Describing the characteristics of and similarities among a group of individuals with the same signs and/or symptoms of disease  Identifying new syndromes and refining case definitions.  Clarifying typical disease progression  Developing hypotheses for future research
  • 35.
    Sample Size Some caseseries for rare conditions may require only a few participants Other studies may include several hundred individuals
  • 36.
    Getting Started…  Selectone disease or condition of interest  Determine what will be new and interesting about the study  Identify an appropriate and available source of cases  Establish a clear case definition that spells out inclusion criteria and exclusion criteria.
  • 37.
    Case Definitions Specify characteristicsrelated to: The disease or procedure  ICD codes (International Classification of Diseases codes) are often used as part of the definition Person Place Time
  • 38.
  • 39.
    Data Collection Primary data:interviews of cases using a questionnaire and/or qualitative techniques Secondary data from patient charts (medical records)  It is often helpful to create a questionnaire that guides the extraction of information from medical records  Be aware that patient charts are often incomplete; missing information about a symptom does not mean that the patient did not experience it
  • 40.
    Most case studiesdo not require any advanced analyses or any numbers beyond simple counts and frequencies.
  • 41.
  • 42.
  • 43.
    Overview The goal ofa cross-sectional survey, also called a prevalence study, is to measure the proportion of a population with a particular exposure or disease at one point in time based on a representative sample of a population.
  • 44.
    Cross-sectional surveys areamong the most popular study approaches in the health sciences because they allow for the relatively rapid collection of new data.
  • 45.
    Uses Cross-sectional surveys areused to: Describe communities Assess population needs Evaluate programs Establish baseline data prior to the initiation of longitudinal studies
  • 46.
    Representative Populations Cross-sectional studiesuse a simple study design: The researcher asks a few hundred people to complete a short questionnaire and then analyzes the data. However, there is one very important requirement: the participants must be reasonably representative of some larger population.
  • 47.
    Representative Populations If aresearcher wants the results of a survey to be generalizable to all town residents, it is NOT acceptable to use a convenience population such as:  Friends  Fans attending a football game  Shoppers at a store at a given time on a chosen day  Individuals attending a clinic  Pupils attending a neighbourhood school
  • 48.
    Representative Populations If theresults of a cross-sectional survey are intended to reflect the profile of an entire town (or other population group), then the study’s sampling strategy must recruit a population that is as diverse as the town.
  • 49.
    Analysis: Prevalence Prevalence =the proportion of the population with a given trait at the time of the survey Analysis: Comparative Statistics  Prevalence rate ratios (PRRs) compare the prevalence of a characteristic in 2 population subgroups by taking a ratio of their prevalence rates  Note: An exposure can be said to be “associated” or “related” to a disease, but a cross-sectional survey cannot show that an exposure caused a disease.
  • 50.
  • 51.
    PHC215 By Dr. KhaledOuanes Ph.D. E-mail: k.ouanes@seu.edu.sa Twitter: @khaled_ouanes HEALTHCARE RESEARCH METHODS Based on the textbook of introduction to health research methods – K.H. Jacobsen