Applied statistics part 3

Applied Statistics
Part 3
By:
M. H. Farjoo MD, PhD, Bioanimator
Shahid Beheshti University of Medical Sciences
Instagram: @bio_animation

Applied Statistics
Part 3
 Comparing 2 variables
 t Tests
 Comparing 3 or more variables
 ANOVA
 MANOVA
 ANCOVA
 MANCOVA
 Question
 Contingency Tables, Relative Risk, and Odds Ratio
 Chi-square
 Fisher's Exact Test
 McNemar Test
 Cochran Q Test
 Choosing Tests
 Survival Analysis

t Tests
 Comparing 1 group with a value:
 Parametric: One sample t test
 Non-parametric: Wilcoxon signed rank test
 Compare 2 groups:
 Paired:
 Parametric: Paired t test
 Non-parametric: Wilcoxon matched-pairs test
 Unpaired:
 Parametric: Unpaired t test (2 independent sample t
test)
 Non-parametric: Mann-Whitney test

Comparing 1 Group With a Value
 This Test tells whether the mean of a single variable
differs from a specified constant.
 It compares the mean of a single column of numbers
against a hypothetical mean that we assume.
 For example comparing IQ of a group with the
standard IQ (100).
 The parametric test is one sample t test.
 The non-parametric test is Wilcoxon signed rank test.

Comparing 2 Paired Groups
 It tells you if the variables in two matched groups are
distinct.
 Matching should be by the experimental design, but
not based on the variable you are comparing.
 For comparing blood pressures, it is OK to match
based on age or gender, but not OK to match based
on blood pressure.
 Parametric test is paired (or matched) t test.
 Non parametric test is Wilcoxon matched pairs test.

Comparing 2 Paired Groups
 Choose a paired test when the columns of data are
matched.
 That means that values on the same row are related to
each other, some examples:
 Measuring a variable in each subject before and after an
intervention (very common in life science).
 In matched pairs, one of the pair gets one treatment (or
placebo); the other gets an alternative treatment.
 A laboratory experiment is run in 2 time points in the same
sample.

Comparing 2 Unpaired Groups
 The unpaired t test compares the means of two
unmatched groups.
 Choose unpaired t test when the values of data on the
same row are independent of each other.
 For this test, the subjects are randomly assigned to
two groups.
 Any difference in response is due to the treatment (or
lack of treatment) and not to other factors.

t Tests
Hands-on practice
 To calculate One-Sample t Test in Excel:
 =T.TEST(array1, array2, tails, type)
 To calculate parametric t Tests in SPSS:
 Analyze => Compare Means => One-Sample T Test...
 Analyze => Compare Means => Paired-Samples T Test...
 Analyze => Compare Means => Independent-Samples T
Test...

t Tests
Hands-on practice
 To calculate Non-parametric t Tests in SPSS:
 Analyze => Nonparametric Tests => One Sample…=>
Objective Tab => Automatically compare observed data to
hypothesized => Settings Tab => Customize test radio
button => Compare median to hypothesized…. Check box
 Analyze => Nonparametric Tests => Legacy Dialogs => 2
Related Samples...
 Analyze => Nonparametric Tests => Legacy Dialogs => 2
Independent Samples...

t Tests
Hands-on practice
 To calculate One-Sample t Test (both parametric and
Non-parametric) in Prism:
 Column (from welcome screen) => t test-one sample =>
Analyze => Column statistics => Inferences Section =>
check the appropriate check boxes and determine the
hypothetical value
 To calculate both paired and unpaired t Test (both
parametric and Non-parametric) in Prism:
 Column (from welcome screen) => t test-unpaired =>
Analyze => t tests (and nonparametric tests)

ANOVA Tests
 Variance is SD squared (SD2), and ANOVA is
acronym of Analysis of variance.
 ANOVA is used for comparing 3 or more groups for
their mean differences.
 ANOVA tests a continuous (scale or interval)
response variable (dependent variable).

ANOVA Tests
 One-way ANOVA
 Not matched groups:
 Parametric: ordinary one-way ANOVA test
 Non parametric: Kruskal-Wallis test
 Matched groups:
 Parametric: Repeated measures one-way ANOVA test
 Non parametric: Friedman's test
 Two-way ANOVA
 Parametric: Repeated measures two-way ANOVA test
 Non parametric: there is no consensus, use parametric

One-way ANOVA
 One-way ANOVA, is also known as one-factor, or
single-factor ANOVA.
 “One-way” means a response is affected by one
factor.
 If we measure a response to 3 different drugs, drug
treatment is just one factor.
 Since there are 3 drugs, the factor is said to have 3
levels.
 ‘Repeated measures' means we treat or observe each
subject repeatedly (more than once).

One-way ANOVA
One-way Because There Is One Factor (Level of Education)
but 3 Levels (College, Graduate, High School)
Education
Graduate
College High school
Test
Score
Factor
(Independent
Variable)
Dependent,
Response or
Outcome
Variable
Levels of the Factor
(3 Levels)

Repeated Measure One Way ANOVA

One-way ANOVA
 Some examples of one-way ANOVA are:
 Measuring a variable in each subject several times: before,
during and after an intervention (repeated measure ANOVA).
 Recruiting subjects as matched groups for age, disease
severity, etc. (but NOT based on the desired variable)
 Running a laboratory experiment several times, each time
with several treatments handled in parallel.
 (Again!), Matching should not be based on the variable you
are comparing.
 If you have data from three or more groups, you are not
allowed to compare two groups at a time with a t test.

One-way ANOVA
 The concept of homo- and heteroscedasticity also
applies here.
 “Sphericity” means that you waited long enough
between treatments for any treatment effect to wash
away.
 This concept is not relevant for nonparametric tests,
or if your data are not matched.
 So sphericity is applicable only for repeated measure
ANOVA.

One-way ANOVA
 After ANOVA we should compare every mean with the
others, and we will have multiple P values.
 The problem is, with multiple P values, we get at least one
P < 0.05, even if all null hypotheses are true (Type I error).
 Some tests do these comparisons and correct for the type I
error.
 They are called “post tests” or “Post-hoc tests”.
 Tukey-Kramer test is the most commonly used post test
after one-way ANOVA.
 We can decrease type I error by “planned comparisons”.

Goal Report also CI? Method
Compare every mean to every
other mean
Yes Tukey
No
Holm-Sidak
Dunn (nonparametric)
Compare every mean to a
control mean
Yes
Dunnettt
Sidak
Bonferroni
No
Holm-Sidak
Compare selected pairs of
means (up to 40)
Yes
Bonferroni-Dunn
Sidak-Bonferroni
No
Holm-Sidak
Linear trend? Do column mean
correlate with column order?
No
Test for linear trend. Only
available with one-way
ANOVA.

One-way ANOVA
 The term planned comparison is used when:
 You focus on a few sensible comparisons rather than
every possible comparison.
 The choice of which comparisons to make, was part of
the experimental design.
 You did not do more comparisons after looking at the
data.

Two-way ANOVA
 Two-way ANOVA (or two-factor ANOVA)
determines how a response is affected by two factors.
 If we measure a response to 3 different drugs in men
and women, drug is one factor, and gender is the
other.
 In the above example, two-way ANOVA responds
these questions:
 Is the response affected by drug?
 Is the response affected By gender?
 Are the two intertwined?

Two-way ANOVA
Two-way Because There are 2 Factors (Education, and Gender)
Test
Score
College Graduate High school
Education
Dependent,
Response or
Outcome
Variable
Levels of the First
Factor (3 Levels)
Levels of the Second
Factor (2 Levels)
Men Women
First Factor
(Independent
Variable)
Second Factor
(Independent
Variable)

Two-way ANOVA
 Use two-way ANOVA when:
 You have one quantitative, and 2 nominal variables.
 Each value of one nominal variable is in combination with
the other nominal variable.
 So It defines 3 null hypotheses:
 The means of the measurement variable are equal for
different values of the first nominal variable.
 The means are equal for different values of the second
nominal variable.
 There is no interaction (the effects of one nominal variable
don't depend on the value of the other nominal variable).

Two-way ANOVA
Interaction of Factors
A significant interaction indicates that the effect that one independent
variable has on the dependent variable depends on the level of the
other independent variable.
This pattern is true for all 2 factor analyses (ANOVA, ANCOVA,
MANCOVA)

Do not use this kind of graph in two way ANOVA.
Which bar is higher: fs in females or ss in males?
Two-way ANOVA

Use these kinds of graphs in two way ANOVA.
Two-way ANOVA

ANOVA
Hands-on practice
 To calculate ANOVA in SPSS:
 For One-way ANOVA: Analyze => Compare Means => One-
Way ANOVA
 For Two-way ANOVA: Analyze => General Linear Model =>
Univariate...
 To calculate ANOVA in Prism (parametric,
nonparametric, repeated measure):
 For One- way ANOVA: Column (from welcome screen) => one
way ANOVA=> Analyze => one way ANOVA (and
nonparametric)
 For Two-way ANOVA: Grouped (from welcome screen) =>
Analyze => grouped analyses => 2 way ANOVA

MANOVA
 MANOVA stands for: Multivariate Analysis of
Variance.
 MANOVA is an ANOVA with two or more
continuous response (dependent) variables.
 MANOVA may also be one-way or two-way.
 The number of factor variables distinguishes a one-
way MANOVA from a two-way MANOVA.

One-way MANOVA
One-way because there is one factor
MANOVA because there are 2 Response Variables
(Factor)
(Response)
(Response)
Independent Variable
(for categorization)
Dependent or outcome variables
(which we are looking for)
Level of Education
(High school,
College, Graduate)
Test score
Annual Income

Two-way MANOVA
Two-way Because There Are 2 Factors
MANOVA Because There Are 2 Response Variables
(Response)
(Response)
(Factor)
(Factor)
Independent Variables
(which we are interested in)
Test score
Annual Income
Gender
Level of Education
(High school,
College, Graduate)

ANCOVA
 If a variable confounds the results, it is called a
covariate; hence the name ANCOVA.
 ANCOVA is used to control or remove the effects of
covariate(s) [also called confounding variable(s)].
 ANCOVA is the most suitable tool for pre-, and post-
tests in two independent groups (explained later).
 Similar to ANOVA, there are 2 types of ANCOVA:
one-way and two-way ANCOVA.

One-way ANCOVA
One-way Because There Is One Factor
ANCOVA Because There Is / Are Covariate(s)
Response
Factor
Covariate(s) or
Confounding Variable(s)
Dependent or outcome variable
The covariate(s) are not
considered a “factor” and
do not determine one-way
or two-way test
Level of Education
(High school,
College, Graduate)
Test score
Number of Hours
Spent Studying

One-way ANCOVA
Example 1
or Factor for
categorization
Exercise 1
Exercise 2
Exercise 3
Sys. BP. = Systolic blood pressure
Dependent / Outcome
/Response variable
which we are interested in
Sys. BP.
Sys. BP.
Sys. BP.
After exercise
Sys. BP.
Sys. BP.
Sys. BP.
Covariate(s) or
Before exercise

One-way ANCOVA
Example 2
or Factor for
categorization
Exercise 1
Exercise 2
Control
Dependent / Outcome
/Response variable
Cholesterol
Cholesterol
Cholesterol
After exercise
Cholesterol
Cholesterol
Cholesterol
Covariate(s) or
Before exercise

Two-way ANCOVA
 The two-way ANCOVA is similar to two-way
ANOVA but controls covariate(s).
 Two-way ANCOVA determines
 The effect of the first factor on the response variable.
 The effect of the first factor on the response variable.
 the interaction between the two factors on the response
variable.
 Two-way ANCOVA is used for two types of study:
observational, and experimental study.

Two-way ANCOVA
Observational Example
Independent Variables or
Factors for categorization
(Gender & Anxiety)
Gender Anxiety
Male
Low
Moderate
High
Female
Low
Moderate
High
Covariate(s) or
Revision time
Revision time
Revision time
Revision time
Revision time
Revision time
Exam score
Dependent / outcome
/Response variable
Exam score
Exam score
Exam score
Exam score
Exam score

Two-way ANCOVA
Experimental Example 1
(Drug & Treatment)
Drug Treatment
Drug A
(Placebo)
Rest (Control)
Exercise
Diet
Drug B
Rest (Control)
Exercise
Diet
Covariate(s) or
Weight
Weight
Weight
Weight
Weight
Weight
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Dependent / outcome
/Response variable
Note that the covariate(s) may, or may not
be the same as the dependent variable

Two-way ANCOVA
Experimental Example 2
(Diet & Exercise)
Diet Exercise
With Diet
Low
Moderate
High
No Diet
Low
Moderate
High
Covariate(s) or
Weight
Weight
Weight
Weight
Weight
Weight
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Cholesterol
Dependent / outcome
/Response variable
Note that the covariate(s) may, or may not
be the same as the dependent variable

MANCOVA
 MANCOVA stands for: Multi Analysis of
Covariance
 MANCOVA incorporates features of both MANOVA
and ANCOVA.
 MANCOVA tests two or more response (dependent)
variables, and controls for one or more covariates.
 Again, there are one-way, and two-way MANCOVA.

One-way MANCOVA
One-way Because There Is One Factor
MANCOVA Because There Are 2 Response Variables and One Covariate
Response
Response
Factor
Covariate)
Covariate or
Confounding Variable
Number of Hours
Spent Studying
Level of Education
(High school,
College, Graduate
Test score
Annual Income
The covariate(s) are NOT
considered a “factor” and
do NOT determine one-
way or two-way test

One-way MANCOVA
One-way because there is one factor (exercise with 3 levels: low, moderate,
high), and MANCOVA because there are 3 response variables (cholesterol,
CRP, systolic blood pressure), and one covariate (weight).

Two-way MANCOVA
Two-way because there is 2 factors (exercise with 3 levels: low, moderate,
high; gender with 2 levels: male and female), and MANCOVA because there
are 3 response variables (cholesterol, CRP, systolic blood pressure), and one
covariate (weight).

Question
•
‫فارسی‬ ‫اسالیدهای‬ ‫و‬ ‫اسالید‬ ‫این‬ ‫مطالب‬
‫توسط‬ ‫بعدی‬
‫آقای‬
‫ابوالفضل‬
‫قودجانی‬
‫است‬ ‫شده‬ ‫استخراج‬ ‫سایت‬ ‫وب‬ ‫این‬ ‫از‬ ‫و‬ ‫شده‬ ‫ارائه‬
:
https://graphpad.ir/repeated-measures-or-ancova/
•
‫نظر‬ ‫در‬ ‫را‬ ‫زیر‬ ‫طرح‬ ‫و‬ ‫مطالعه‬
‫بگیرید‬
:
•
‫دو‬
‫داریم‬ ‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬
.
‫در‬ ‫تعدادی‬ ‫یعنی‬
‫گروه‬
case
‫در‬ ‫دیگری‬ ‫تعداد‬ ‫و‬
‫گروه‬
control
‫قرار‬
‫دارند‬
.
•
‫کمیت‬ ‫یک‬
‫پاسخ‬
Dependent Variable
‫گروه‬ ‫دو‬ ‫هر‬ ‫در‬ ‫را‬
‫کنیم‬‫می‬ ‫گیری‬‫اندازه‬ ‫آزمایش‬ ‫و‬ ‫کنترل‬
.
•
‫از‬ ‫قبل‬ ‫یکبار‬ ‫ما‬ ‫گیری‬‫اندازه‬
‫مداخله‬
Pre
‫و‬
‫دیگر‬ ‫بار‬
‫از‬ ‫بعد‬
‫مداخله‬
Post
‫خواهد‬
‫بود‬
.
‫دو‬ ‫هر‬ ‫افراد‬ ‫یعنی‬
‫شوند‬‫می‬ ‫گیری‬‫اندازه‬ ‫دوبار‬ ‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬
.
‫قبل‬
‫بعد‬ ‫و‬
.
•
‫این‬ ‫مهم‬ ‫سوال‬ ‫حال‬
‫است‬
:
‫مطالعه‬ ‫این‬ ‫آماری‬ ‫تحلیل‬

Question
Case
Pre-test
Case
Post-test
Control
Pre-test
Control
Post-test
Intervention
Independent
groups
This study is a combination of paired, and independent
patterns. Which statistical test should we choose?

Question
•
‫از‬ ‫کنیم‬ ‫استفاده‬ ‫روش‬ ‫کدام‬ ‫از‬ ‫که‬‫این‬ ‫و‬ ‫چالش‬ ‫این‬
‫اندازه‬ ‫زمان‬ ‫دو‬ ‫فقط‬ ‫ما‬ ‫که‬ ‫شود‬ ‫می‬ ‫شروع‬ ‫جایی‬‫آن‬
‫گیری‬
(
‫و‬ ‫قبل‬
‫بعد‬
)
‫داریم‬
.
•
‫وقتی‬
‫دوبار‬ ‫از‬ ‫بیشتر‬ ‫گیری‬ ‫اندازه‬ ‫های‬‫زمان‬ ‫با‬
‫هستیم‬ ‫رو‬ ‫روبه‬
(
‫پیگیری‬ ‫و‬ ‫بعد‬ ،‫قبل‬ ً
‫مثال‬
)
‫از‬
‫روش‬
Repeated Measure
‫استفاده‬
‫کنیم‬‫می‬
.
•
‫روش؟‬ ‫کدام‬ ‫باالخره‬
•
‫و‬ ‫بعد‬ ‫و‬ ‫قبل‬ ‫های‬‫اندازه‬ ‫که‬ ‫مطالعات‬ ‫از‬ ‫نوع‬ ‫این‬ ‫در‬
،‫داریم‬ ‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬ ‫دو‬
‫از‬
ANCOVA
‫و‬
‫یا‬
Repeated Measure
‫کنیم‬ ‫می‬ ‫استفاده‬
.
•
‫سوال‬ ‫به‬ ‫کنیم‬ ‫انتخاب‬ ‫را‬ ‫کدامیک‬ ‫که‬ ‫این‬
‫فرصیه‬ ‫و‬
‫مربوط‬ ‫ما‬ ‫پژوهشی‬
‫شود‬‫می‬
.

•
‫میانگین‬ ‫خواهیم‬ ‫می‬ ‫اگر‬
‫گروه‬ ‫دو‬ ‫در‬ ‫آزمون‬ ‫پس‬
‫حالی‬ ‫در‬ ‫کنیم‬ ‫مقایسه‬ ‫هم‬ ‫با‬ ‫را‬ ‫کنترل‬ ‫و‬ ‫آزمایش‬
‫آزمون‬ ‫پیش‬ ‫نمرات‬ ‫که‬
،‫باشند‬ ‫شده‬ ‫کنترل‬
ANCOVA
‫یک‬
‫خوب‬ ‫حل‬‫راه‬
‫است‬
.
•
‫در‬
‫بین‬ ‫اختالف‬ ‫که‬ ‫شویم‬ ‫مطمئن‬ ‫خواهیم‬‫می‬ ‫اگر‬ ‫واقع‬
‫استفاده‬ ،‫است‬ ‫مداخله‬ ‫از‬ ‫ناشی‬ ً
‫واقعا‬ ‫ها‬‫آزمون‬ ‫پس‬
‫از‬
ANCOVA
‫توصیه‬
.
•
‫بنابراین‬
‫تفاوت‬ ‫مورد‬ ‫در‬ ‫تحقیق‬ ‫سوال‬ ‫وقتی‬
،‫است‬ ‫آزمون‬ ‫از‬ ‫بعد‬ ‫در‬ ‫میانگین‬
ANCOVA
‫یک‬
‫گزینه‬
‫است‬ ‫عالی‬
.
•
‫این‬
‫زیرا‬ ‫است‬ ‫رایج‬ ‫بسیار‬ ‫پزشکی‬ ‫مطالعات‬ ‫در‬ ‫مطلب‬
‫وجود‬ ‫درمان‬ ‫اثر‬ ‫اندازه‬ ‫بر‬ ‫بیشتر‬ ‫توجه‬ ‫و‬ ‫تمرکز‬
‫دارد‬
.
Question

Question
•
‫اگر‬
‫اختالف‬ ‫میانگین‬ ‫آیا‬ ‫که‬ ‫بود‬ ‫این‬ ‫تحقیق‬ ‫سوال‬
‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬ ‫دو‬ ‫در‬ ‫بعد‬ ‫و‬ ‫قبل‬ ‫بین‬
‫استفاده‬ ‫صورت‬ ‫آن‬ ‫در‬ ،‫است‬ ‫متفاوت‬
‫از‬
Repeated
Measure
‫مطلوب‬
‫بود‬ ‫خواهد‬
.
•
‫در‬
‫رشد‬ ،‫سود‬ ‫درباره‬ ‫تحقیق‬ ‫سوال‬ ‫که‬ ‫هنگامی‬ ‫واقع‬
،‫است‬ ‫اختالف‬ ‫یا‬ ‫و‬
‫تحلیل‬
Repeated Measure
‫توصیه‬
.
•
‫کالم‬ ‫جان‬
:
–
‫اگر‬
‫در‬ ‫را‬ ‫بعد‬ ‫های‬‫اندازه‬ ‫و‬ ‫ها‬‫آزمون‬ ‫پس‬ ‫خواهید‬‫می‬
‫از‬ ،‫کنید‬ ‫مقایسه‬ ‫هم‬ ‫با‬ ‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬ ‫دو‬
ANCOVA
‫کنید‬ ‫استفاده‬
.
–
‫اگر‬
‫قبل‬ ‫های‬‫اندازه‬ ‫بین‬ ‫اختالف‬ ‫مقایسه‬ ‫به‬ ‫خواهید‬‫می‬
‫آزمایش‬ ‫و‬ ‫کنترل‬ ‫گروه‬ ‫دو‬ ‫در‬ ‫بعد‬ ‫و‬
،‫بپردازید‬
Repeated Measure
‫شود‬ ‫می‬ ‫پیشنهاد‬
.

Question
Case
Pre-test
Case
Post-test
Control
Pre-test
Control
Post-test
Intervention
ANCOVA
Repeated Measure

Contingency Tables (Crosstabs)
 Contingency tables (crosstabs) examine the relationship
between two categorical (nominal or ordinal) variables.
 Since the nominal variable often (not always) has 2
outcome, it is called a “binominal” or “binary” variable.
 Some examples of outcome (binominal variable) are:
 Disease / no disease
 Pass / fail
 Artery open / artery obstructed
 Survive or not
 Metastasis or not

 Contingency tables are heavily used in basic science
experiments.
 The rows represent alternative treatments or events,
and the columns tabulate alternative outcomes.
 Crosstabs are NOT a statistical test, and are used
when an outcome has only two (or a few)
possibilities.
 The appropriate test is chosen after arranging data in
contingency tables.

It is claimed that there
is an association
between living near
Electromagnetic
Fields (EMF) and
leukemia.
In how many ways
can we investigate
whether this
association really
exists?

 In crosstabs data may be arrange for 4 types of
studies.
 The sequence of choosing data (especially the first
step) is crucial in the design and outcome of the
study.
 The 4 types of study in crosstabs are:
1. Cross-sectional
2. Experimental (Clinical trial)
3. Prospective (Cohort)
4. Retrospective (Case-Control)

Cross-sectional Study
A large
sample from
a population
Exposure
assessment
Exposed
Leukemia
assessment
Leukemic
Healthy
Non-exposed
Leukemia
assessment
Leukemic
Healthy

Leukemia
Exposed to EMF Affected Not Affected
Not Exposed to EMF Affected Not Affected
EMF: Electromagnetic Fields
1. Choose a large sample of people selected from the general population.
2. assess whether or not each subject has been exposed to high levels of EMF.
3. Then check the subjects to see whether or not they have leukemia.
4. It would not be a cross-sectional study if you selected subjects based on EMF exposure or
presence of leukemia.
Cross-sectional Study

Experimental (Clinical trial) Study
A group of
animals
Deliberate
Exposure
Follow up
Leukemia
assessment
Leukemic
Healthy
Deliberate
Non-
exposure
Follow up
Leukemia
assessment
Leukemic
Healthy

Leukemia
Half Deliberately
Exposed to EMF
Affected Not Affected
Half Deliberately not
Exposed to EMF
1. Half of animals are exposed to EMF, while half are not.
2. After a period of time, assess leukemia in animals.
3. Affected and not affected animals, make the columns
Experimental (Clinical trial) Study

Prospective (Cohort) Study
Known
Exposed
individuals
Follow up
Leukemia
assessment
Leukemic
Healthy
Known Non-
exposed
individuals
Follow up
Leukemia
assessment
Leukemic
Healthy

Leukemia
Known exposed to EMF Affected Not Affected
Known Non-exposed to
of EMF
1. Select one group with non exposure to EMF and another group with high
exposure.
2. Then follow all subjects and tabulate the numbers that get leukemia.
3. Subjects affected are in one column; the rest in the other column.
Prospective (Cohort) Study

Retrospective (Case-Control) Study
Known leukemic
individuals (cases)
Exposure
assessment
Exposed
Non-Exposed
Known healthy
individuals (controls)
Exposure
assessment
Exposed
Non-Exposed

Leukemia
Exposed to low
levels of EMF
Known Affected
(Case)
Known Non-
Affected (Control)
Exposed to high
levels of EMF
Known Affected
(Case)
Known Non-
Affected (Control)
1. Recruit one group of subjects with leukemia and a control group that does not
have leukemia but is otherwise similar.
2. Then assess high or low EMF exposure in all subjects.
3. Enter the number with low exposure in one row, and the number with high
exposure in the other row.
Retrospective (Case-Control) Study

 Crosstabs show the association between exposure and
outcome.
 Crosstabs do not determine the intensity of the association
per se.
 To quantify intensity of association we may calculate:
 Relative risk or Risk ratio (RR)
 Absolut risk reduction (ARR)
 Odds ratio (OR)
 What is the difference between odds ratio and relative
risk?

Relative Risk or Risk Ratio (RR)
Relative risk is the risk of an
adverse event in the exposed
group relative to unexposed
group.
The left group is exposed to
treatment, but the right
group is not exposed.
The risk of an adverse
outcome is shown in brown.
The relative risk is reduced
by 50% (RR = 0.5)

Progress No Progress Total Progress Relative Risk
AZT 76 399 475 0.16 0.57
Placebo 129 332 461 0.28
Total 205 731 936
AZT: Azathioprine
First column of data (whatever it is),
is divided by the total of data
No
Progress
Progress Total
No
Progress
Relative Risk
AZT 399 76 475 0.84 1.17
Placebo 332 129 461 0.72
Total 731 205 936
For calculating the relative risk, the order of the two columns matters,
the order of rows not so much.
Relative Risk or Risk Ratio (RR)

Absolut Risk Reduction (ARR)
Absolut risk is the difference
of the risk of an adverse
event in the exposed group
and unexposed group.
The left group is exposed to
treatment, but the right
group is not exposed.
The risk of an adverse
outcome is shown in brown.
The absolut risk is reduced
by 25% (AR = 0.25)

Relative Risk Vs. Absolut Risk

Odds Ratio (OR)
 Odds ratio quantifies the strength of association
between exposed cases and exposed controls.
 Odds ratio is most commonly used in case-control
studies
 It may also be used in cross-sectional and cohort
study.

Cases
(Lung cancer)
Controls
(No Lung cancer)
Smoked
(Exposed)
688 650
Never smoked
(Not Exposed)
21 59
• the odds of a case being a smoker (exposed cases): 688/21 = 32.8
• The odds of a control being a smoker (exposed controls): 650/59 = 11.0
• The odds ratio (exposed cases to exposed controls): 32.8 / 11.0 = 3.0
• The risk of a smoker getting lung cancer is 3 times the risk of a nonsmoker
Odds Ratio in a Case-Control Study

Odds Ratio (OR) Vs. Relative Risk (RR)
• Odds Ratio (OR):
• Odds of being exposed among
the cases to odds of being
exposed among the controls
• OR compared based on event
(usually a disease)
• Relative Risk (RR):
• Risk of outcome in the exposed
group to Risk of outcome in the
unexposed group
• RR compares based on exposure
(usually a risk factor)

Interpretation:
RR > 1
Risk in exposed > Risk in non-
exposed
Positive association; causal?
(maybe, maybe not)
RR = 1
Risk in exposed = Risk in non-
exposed
No association
RR < 1
Risk in exposed < Risk in non-
exposed
Negative association;
protective? (maybe, maybe not)
Interpretation:
OR > 1
Exposure is positively related to
disease
Positive association: causal?
(maybe, maybe not)
OR = 1
Exposure is not related to
disease
No association: independent
OR < 1
Exposure is negatively related to
disease
Negative association:
protective? (maybe, maybe not)

 Relative risk and odds ratio both give the same
information, just on different scales.
 In clinical studies the parameter of greatest interest is
often the risk (RR).
 However, frequently the available data only allows
the computation of the OR.

Chi-square Test
 Chi-square test is applied for 2 categorical (ordinal or
nominal) variables from a single population.
 It determines whether there is an association between
levels of the 2 categorical variables.
 It will NOT tell you anything about the nature of
relationship (eg: strength) between them.
 For example:
 Is gender related to voting preference?
 Is satisfaction category related to marital status?

Chi-square
Voting Preferences
Total
Republican Democrat Independent
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
Is gender related to voting preference?

Chi-square Test
 In contingency tables, the “observed” frequencies are
compared with the “expected” frequencies (O vs. E).
 The comparison is performed by “chi-square” test
and a P value is reported.
 The null hypothesis (H0) dictates one or some of the
followings:
 Proportions are not different in 2 (or more) groups.
 The observed and expected values are not different.
 Alternative treatments do not cause alternative results.
 The rows and columns are not related / associated.

Chi-square Test
 Chi-square test is used in 2 main contexts:
1. To compare the observed distribution of 1 nominal
variable with an expected distribution from a known
(external) theory.
 we know the proportions before collecting the data, eg:
1:1 sex ratio, or 1:2:1 ratio in a genetic cross.
 The expected frequencies are theoretical, and we do not
really believe the observed and expected values match
exactly.
2. To analyze a contingency table with 2 or more
nominal variables, and expected values are computed
from (internal) data.

Observed Expected
Right Bill 1752 1823.5 (3647* 0.5)
Left Bill 1895 1823.5 (3647* 0.5)
Total 3647
P=0.018, so there are significantly more left-billed, and H0 is rejected.
Chi-square from a known theory

Is this dice fair?

Outcome Observed Expected
(𝑬 − 𝑶)𝟐
𝑬
1 8 6 0.667
2 5 6 0.167
3 9 6 1.500
4 2 6 2.667
5 7 6 0.167
6 5 6 0.167
Outcome Frequencies from a Dice after 36 roll
The value of Chi Square is 5.333. the probability (P Value) of a Chi Square of 5.333 or
larger is 0.377. Therefore, the null hypothesis that the dice is fair cannot be rejected.

Frequencies for Diet and Health Study.
Diet Cancers Heart Disease Healthy Total
American 15 49 239 303
Mediterranean 7 22 273 302
Total 22 71 512 605
Note that the total of the rows and columns are the same.
This means one subject has participated only once, and
variables are independent. This is an important
assumption in Chi-square test.

Observed and Expected Frequencies (Expected values
are in parenthesis)
Diet Cancers Heart Disease Healthy Total
American
15 49 239
303
(11.02) (35.56) (256.42)
Mediterranean
7 22 273
302
(10.98) (35.44) (255.58)
Total 22 71 512 605
22 / 605 = 0.0364
0.0364 *302 = 10.98

Fisher's Exact Test
 For crosstabs with only two rows and two columns,
fisher's test is the best choice as it gives the exact P
value.
 It is better than chi-square test when:
 Any number in crosstab is less than 6, and / or
 Total sample size is less than 1000
 Fisher's test is equivalent to unpaired t test, but for
binomial variables / outcomes.

McNemar Test
 McNemar test is used when the two samples are
dependent and there are two pairs of observations
such as:
 Individuals before and after a treatment.
 Individuals diagnosed using two different techniques.
 McNemar test is equivalent to paired t test (pretest-
posttest study designs) but for binomial variables /
outcomes.

Cochran Q Test
 Cochran's Q test determines differences on a binomial
variable between three or more related groups.
 It is used when data from 2×2 tables are repeated at
different times or locations.
 There are 3 variables in cochran's Q: two variables of
the 2×2 table, and the third for the repeats.
 Cochran's Q test is used to analyze longitudinal study
designs with multiple different treatments.
 It is similar to the one-way repeated measures
ANOVA, but for a binomial variable.

Crosstabs & related Tests
Hands-on practice
 To calculate crosstabs in Excel:
 Calculate manually!
 To calculate crosstabs, chi square, Odds ratio, and
Relative risk in SPSS:
 Analyze => Descriptive Statistics => Crosstabs…
 To calculate crosstabs, chi square, Odds ratio, and
Relative risk in Prism:
 Contingency (from welcome screen) => Analyze =>
choose appropriate option

Choosing Tests
Type of Data
Goal
Measurement
(normal)
Rank, Score, or
Measurement (non
normal)
Binomial
(Two Possible
Outcomes)
Compare one group to
a hypothetical value
One-sample t Wilcoxon Chi-square
Compare two
unpaired groups
Unpaired t Mann-Whitney
Fisher's test
(chi-square for large
samples)
Compare two paired
groups
Paired t Wilcoxon McNemar's test
Compare three or more
unmatched groups
One-way ANOVA Kruskal-Wallis Chi-square test
Compare three or more
matched groups
Repeated-
measures ANOVA
Friedman Cochrane Q

Survival Analysis
 Survival curves measure follow-up time from a
starting point to the occurrence of an event eg:
 Time until death.
 Time to first metastasis of a tumor.
 Time until marriage after graduation.
 Time for restoration of renal function.
 Time of discharge from a hospital.
 The goal is usually to determine whether a treatment
changes survival.
 The null hypothesis indicates the treatment did not
change survival.

Survival Analysis
 Survival curves are estimated for each group by
Kaplan-Meier method.
 Kaplan-Meier estimates the probability of survival
past given time points eg:
 Abstinence time from alcohol between those receiving
brief intervention and standard care.
 Survival between chemotherapy before and after
gastric surgery.
 Then, the curves are compared by log-rank test eg:

Survival Analysis
Abstinence Time from Alcohol

Survival Analysis
Chemotherapy Before and After Surgery

Survival Analysis
Goal Survival Time
Describe one group Kaplan Meier
Compare two unpaired groups
Log-rank test or
Mantel-Haenszel

Survival Analysis
 Sometimes we have to Censor data because:
 Some subjects may still be alive at the end of the study.
 Persons dropped out of the study (eg: took a medication
disallowed).
 In both cases, information about these patients is said to
be censored.
 Every subject in a survival study either dies (marries, etc)
or is censored.
 The survival methods are only useful if the horizontal
axis is time, and you know the survival time for each
subject.

Survival Analysis
 Hazard is the slope of the survival curve or a measure
of how rapidly subjects are dying.
 If the hazard ratio is 2, then the rate of deaths in one
group is twice the other group.
 The hazard ratio is not computed at one point, but
from all the data in the survival curve.
 If two survival curves cross, the hazard ratios are not
consistent.

Survival Analysis
Hands-on practice
 To calculate survival analysis in SPSS:
 Analyze => Survival => Life Tables...
 Analyze => Survival => Kaplan-Meier...
 To calculate survival analysis in Prism:
 Survival (from welcome screen) => Choose
appropriate option (data is automatically analyzed)

Thank you
Any question?
mh_farjoo@yahoo.com

‫اختالف‬ ‫شدت‬

Risk
‫یا‬
Rate
:
‫که‬ ‫گروه‬ ‫یک‬ ‫از‬ ‫افرادی‬ ‫درصد‬
‫اند‬ ‫شده‬ ‫عارضه‬ ‫یک‬ ‫دچار‬
.

‫اثر‬ ‫بررسی‬
‫پروژسترون‬
‫نرمال‬ ‫حد‬ ‫از‬ ‫باالتر‬
‫پستان‬ ‫سرطان‬ ‫بر‬
40 10
49 1 ‫پایین‬ ‫پروژسترون‬
‫باال‬ ‫پروژسترون‬
‫پستان‬ ‫سرطان‬
‫سالم‬
•
risk
‫با‬ ‫ما‬ ‫گروه‬ ‫در‬ ‫سرطان‬ ‫بروز‬
‫باال؟‬ ‫پروژسترون‬
20
%
•
‫تعریف‬
EER
(
Experimental Event Rate
:)
•
‫با‬ ‫گروه‬ ‫در‬ ‫سرطان‬ ‫بروز‬ ‫ریسک‬
‫؟‬ ‫پایین‬ ‫پروژسترون‬
2
%


‫اختالف؟‬ ‫شدت‬ ‫محاسبه‬ ‫راههای‬
.I
‫ریسک‬ ‫دو‬ ‫تفاوت‬
.II
‫ریسک‬ ‫دو‬ ‫نسبت‬


‫دوریسک‬ ‫نسبت‬
=
EER/CER
=
relative risk(RR)
40 10
49 1 ‫پایین‬ ‫پروژسترون‬
‫باال‬ ‫پروژسترون‬
‫پستان‬ ‫سرطان‬
‫سالم‬
RR
‫چ‬ ‫مطالعه‬ ‫این‬ ‫در‬
10
‫مثبت‬ ‫رابطه‬


‫بر‬ ‫آسپیرین‬ ‫اثر‬ ‫بررسی‬
‫معده‬ ‫زخم‬

EER
=
10/50
=
20%

CER
=
8/50
=
16%

RR
=
20/16
=
1.25
‫رابطه‬
‫مثبت‬
40 10
42 8 ‫آسپیرین‬ ‫مصرف‬ ‫عدم‬
‫آسپیرین‬ ‫مصرف‬
‫معده‬ ‫زخم‬
‫سالم‬


‫فشار‬ ‫ضد‬ ‫داروی‬ ‫دادن‬ ‫اثر‬ ‫بررسی‬
‫عارضه‬ ‫ایجاد‬ ‫بر‬ ‫خون‬
(
stroke-MI-
death
)

EER
=
2/100
=
2%

CER
=
28/100
=
28%

RR
=
2/28
=
0.07
‫منفی‬ ‫رابطه‬
98 2
69 28 ‫دارو‬ ‫مصرف‬ ‫عدم‬
‫دارو‬ ‫مصرف‬
‫عارضه‬ ‫با‬
‫عارضه‬ ‫بدون‬


‫بر‬ ‫سیگار‬ ‫اثر‬ ‫بررسی‬
‫ریه‬ ‫سرطان‬

RR
‫مطالعه‬ ‫این‬ ‫در‬
‫است؟‬ ‫چقدر‬
30/40
20/60

‫کردن‬ ‫دوبرابر‬ ‫با‬
‫کنترل‬ ‫گروه‬ ‫حجم‬
:
10 30
40 20 ‫سیگاری‬ ‫غیر‬
‫سیگاری‬
‫سالم‬
=
2.25
=
2.25 30/50
20/100


Odds
:
‫یک‬ ‫که‬ ‫دفعاتی‬ ‫تعداد‬ ‫نسبت‬
‫دفعاتی‬ ‫تعداد‬ ‫به‬ ‫دهد‬ ‫می‬ ‫رخ‬ ‫واقعه‬
‫دهد‬ ‫نمی‬ ‫رخ‬ ‫که‬

Odds
‫سرطانی‬ ‫گروه‬ ‫در‬
=
30/20
=
1.5

Odds
‫سرطانی‬ ‫غیر‬ ‫گروه‬ ‫در‬
=
10/40
=
0.25
Odds ratio
:
‫نسبت‬
odds
‫گروه‬ ‫در‬
‫عارضه‬ ‫بدون‬ ‫گروه‬ ‫به‬ ‫عارضه‬ ‫دارای‬

Odds ratio
=
1.5/0.25
=
6
10 30
40 20 ‫سیگاری‬ ‫غیر‬
‫سیگاری‬
‫سالم‬


‫ریسک‬ ‫دو‬ ‫تفاوت‬
:

CER
:
‫پایه‬ ‫ریسک‬

EER-CER|
:
‫خود‬ ‫به‬ ‫مربوط‬ ‫ریسک‬
exposure
|EER-CER|
=
Absolute Risk Reduction)ARR
(
ARI=
22
23
24
25
26
27
28
29
30
exposure nonexposue
level of risk
‫پایه‬ ‫ریسک‬

98 2
69 28
‫دارو‬ ‫مصرف‬
‫عارضه‬ ‫با‬
‫عارضه‬ ‫بدون‬
ARR= | 2-27 |=25%
ARR= | 20-16 |=4%
‫دارو‬ ‫مصرف‬ ‫عدم‬
40 10
42 8 ‫آسپیرین‬ ‫مصرف‬ ‫عدم‬
‫آسپیرین‬ ‫مصرف‬
‫م‬ ‫زخم‬Z‫عده‬
‫سالم‬

Applied statistics part 3

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Applied statistics part 3

Similar to Applied statistics part 3 (20)

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences

More from Mohammad Hadi Farjoo MD, PhD, Shahid behehsti University of Medical Sciences (20)

Recently uploaded

Recently uploaded (20)

Applied statistics part 3