This document discusses multivariate analysis and the relationship between smoking and lung cancer. It provides several key studies that established this relationship:
- A 1950 case-control study that associated lung cancer with smoking.
- A 1898 study finding elevated lung tumors in tobacco workers exposed to tobacco dust.
- Later studies in the 1930s-1950s further strengthened the relationship by showing higher rates of lung cancer in heavy smokers.
Smoking & lungcancer
Good case-control study associating lung cancer to smoking (Wynder EL, Graham E.
Tobacco smoking as a possible etiologic factor in bronchiogenic carcinoma: a study of 684
proven cases. JAMA 1950;143:329-36.)
Tobacco dust (not smoke) might be causing the elevated incidence of lung tumours among German tobacco
workers. (Hermann Rottmann in Würzburg 1898)
Difference Causal
Smoking might be related to lung cancer, but lung cancer is still rare (Adler I. Primary Malignant
Growths of the Lungs and Bronchi. London: Longmans, 1912:22)
86 lung cancers patients were likely smoked (Müller FH. Tabakmissbrauch und Lungencarcinom.
Zeitschrift für Krebsforschung 1939;49:57–85.)
Smoking 35 sticks per day increase risk to 40 times (Doll R,
Hill AB. The mortality of doctors in relation to their smoking
habits. BMJ 1954;1:1451–5.)
Animal study associating cigarette smoke tar with cancer (Wynder E,
Graham EA, Croninger AB. Experimental production of carcinoma with
cigarette tar. Cancer Res 1953;13:855–66)
8November2016(C)JamalludinAbRahman2015
2
3.
It is aboutrelationship
Analysis of the
relationships
between two or
more variables.
8November2016(C)JamalludinAbRahman2015
3
4.
Multivariate?
Multivariate -general
term – multiple IV
May involved multiple DV
DVIV IV
DV DV
IV
IV
8November2016(C)JamalludinAbRahman2015
4
Exercise & fitness
LowModerate High
Is there any difference % between
Low & Moderate intensity?
How big is the difference %
between Low & Moderate?
Is there any pattern now?
What is your conclusion?
Fitness level
Exercise intensity
8November2016(C)JamalludinAbRahman2015
8
The 3rd factorscan be a...
1. Confounder
2. Mediator or intervening factor
3. Moderator or effect modifier (interaction)
8November2016(C)JamalludinAbRahman2015
11
Stress vs. MSvs. Coping mechanism
Multiple
sclerosis
new
lesions
Coping
Mech.
Stress
Mohr, D. C., Goodkin, D. E., Nelson, S., Cox, D., & Weiner, M. (2002).
Moderating Effects of Coping on the Relationship Between Stress and the
Development of New Brain Lesions in Multiple Sclerosis. Psychosom Med,
64(5), 803-809.
OR = 1.62, p = 0.009
Distraction (OR=0.69, p=0.009),
instrumental (OR=0.77, p=0.081),
emotional preoccupation (OR=1.46, p=0.088)
& palliative (NS)
8November2016(C)JamalludinAbRahman2015
16
Why multivariate?
Multi-factorial– which are the significant factors?
Multiple outcomes
Multiple unit of measurements
Exploration of associations
8November2016(C)JamalludinAbRahman2015
20
21.
Regression
Most commonmultivariate
technique
Best line to fit the data - OLS
Many types:
Linear regression
Logistic regression
& many others!!
8November2016(C)JamalludinAbRahman2015
21
x
y
𝑦 = 𝛽0 + 𝛽1 𝑥
𝛽0
22.
Regression equation
e.g.Linear regression
8November2016(C)JamalludinAbRahman2015
22
Dependent Var
Intercept
Coefficient for Var x1
Explanatory Var x1
Error/Residual
𝑌 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽 𝑛 𝑥 𝑛 + 𝜀
Logistic regression
𝐿𝑛
𝑃
1−𝑃
=𝛽𝑜 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑛 𝑥 𝑛
P = Probability that Y=1 i.e. Event occurs
1-P = Event not occur
𝐿𝑛
𝑃
1−𝑃
= Ln of OR = Logit
8November2016(C)JamalludinAbRahman2015
24
25.
R2 = 0.986,meaning 99% of variation in ABP is
explained by Age, Body Weight, Pulse Rate &
Stress (F(4,15)=291.948), P<0.001)
8November2016(C)JamalludinAbRahman2015
25
26.
Main Result
Arterial BP= 17.3 + 0.6(Age) + 0.9(Body weight) +
0.09(Pulse rate) + 0.01(Stress)
8November2016(C)JamalludinAbRahman2015
26
27.
Example #2
Snoringand risk of cardiovascular disease in women . Hu 2000.
From The Nurses’ Health Study. Cohort. Baseline, N=71,779
women 40 to 65 years old and without diagnosed CVD or cancer
in 1986. Till 31st May 1994.
CVD = Snoring + Age +Smoking + BMI + Alcohol + Physical
Activity + Menopausal status + Family history of MI + DM +
Cholesterol + Hours sleeping + Sleeping position
8November2016(C)JamalludinAbRahman2015
27
Influential data
x
y AB
C
A = Outlier, still within the
range of x, large residual value
B & C = Leverage points
B = Good leverage, it won’t
impact the regression line
C = Bad leverage. It will
change the regression line
8November2016(C)JamalludinAbRahman2015
35
36.
Type of multivariatetests
Dependent
Variables
Independent
Variables
Test
1 – Cont ≥ 2 – All Cont Linear Regression
1 – Cont ≥ 2 – All Cat ANOVA
1 – Cont ≥ 2 – Cont + Cat ANCOVA
> 1 – Cont All Cat MANOVA
> 1 – Cont Cat + Cont MANCOVA
1 – Dichotomous ≥ 2 – Cont + Cat Binary Logistic Regression
8November2016(C)JamalludinAbRahman2015
36