Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multivariate Analysis
Prof. Dr. Jamalludin Ab Rahman MD MPH
Department of Community Medicine
Kulliyyah of Medicine
Smoking & lung cancer
Good case-control study associating lung cancer to smoking (Wynder EL, Graham E.
Tobacco smoking as ...
It is about relationship
Analysis of the
relationships
between two or
more variables.
8November2016(C)JamalludinAbRahman20...
Multivariate?
 Multivariate - general
term – multiple IV
 May involved multiple DV
DVIV IV
DV DV
IV
IV
8November2016(C)J...
Outcome
Exposure
Exposure Exposure
Exposure
Exposure
Effect modifier or Moderator
Confounder
Mediator
8November2016(C)Jama...
8November2016(C)JamalludinAbRahman2015
6
8November2016(C)JamalludinAbRahman2015
7
Exercise & fitness
Low Moderate High
Is there any difference % between
Low & Moderate intensity?
How big is the difference...
Physical activity & blood pressure
Time SBP
30 140
40 145
85 130
90 143
100 130
110 120
110 110
120 120
130 110
130 109
14...
The 3rd variable
Outcome
Exposure
Exposure
8November2016(C)JamalludinAbRahman2015
10
The 3rd factors can be a...
1. Confounder
2. Mediator or intervening factor
3. Moderator or effect modifier (interaction)
...
The 3rd variable
Outcome
Exposure
Exposure
Confounder
Confounder
influence a
relationship
(between two
variables) but it i...
8November2016(C)JamalludinAbRahman2015
13
The 3rd variable
Outcome
Exposure
Exposure
Moderator
When an
exposure has
different effects
on disease at
different values...
8November2016(C)JamalludinAbRahman2015
15
Stress vs. MS vs. Coping mechanism
Multiple
sclerosis
new
lesions
Coping
Mech.
Stress
Mohr, D. C., Goodkin, D. E., Nelson,...
The 3rd variable
Outcome
Exposure
Exposure
Mediator
Mediator
influence a
relationship
(between two
variables) and it
is al...
8November2016(C)JamalludinAbRahman2015
18
8November2016(C)JamalludinAbRahman2015
19
Why multivariate?
 Multi-factorial – which are the significant factors?
 Multiple outcomes
 Multiple unit of measuremen...
Regression
 Most common multivariate
technique
 Best line to fit the data - OLS
 Many types:
 Linear regression
 Logi...
Regression equation
 e.g. Linear regression
8November2016(C)JamalludinAbRahman2015
22
Dependent Var
Intercept
Coefficient...
Example #1
Arterial BP = Constant + Age + Body weight + Pulse rate +
Stress + Residual
𝑌 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽 𝑛 𝑥 𝑛 ...
Logistic regression
 𝐿𝑛
𝑃
1−𝑃
= 𝛽𝑜 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑛 𝑥 𝑛
 P = Probability that Y=1 i.e. Event occurs
 1-P = Event not o...
R2 = 0.986, meaning 99% of variation in ABP is
explained by Age, Body Weight, Pulse Rate &
Stress (F(4,15)=291.948), P<0.0...
Main Result
Arterial BP = 17.3 + 0.6(Age) + 0.9(Body weight) +
0.09(Pulse rate) + 0.01(Stress)
8November2016(C)JamalludinA...
Example #2
 Snoring and risk of cardiovascular disease in women . Hu 2000.
From The Nurses’ Health Study. Cohort. Baselin...
8November2016(C)JamalludinAbRahman2015
28
8November2016(C)JamalludinAbRahman2015
29
Testing assumptions for linear
regression
1. Linearity between DV & IVs – Scatter plot residuals vs. predicted
2. Normalit...
Residuals
x
y
Residual = Observed - Predicted
𝑦 = 𝛽0 + 𝛽1 𝑥
𝛽0
8November2016(C)JamalludinAbRahman2015
31
Residual statistics - Linearity
Predicted
Residuals Residuals
Linear Non linear
8November2016(C)JamalludinAbRahman2015
32
Residual statistics - Normality
Predicted
Residuals Residuals
Normal distribution Not Normal
8November2016(C)JamalludinAbR...
Residual statistics - Heterogeneity
Predicted
Residuals Residuals
Homogenous Heterogenous
8November2016(C)JamalludinAbRahm...
Influential data
x
y A B
C
A = Outlier, still within the
range of x, large residual value
B & C = Leverage points
B = Good...
Type of multivariate tests
Dependent
Variables
Independent
Variables
Test
1 – Cont ≥ 2 – All Cont Linear Regression
1 – Co...
Upcoming SlideShare
Loading in …5
×

Introduction to Multivariate analysis

870 views

Published on

The updated slides on Introduction to Multivariate Analysis

Published in: Education
  • Baru lepas hadir kursus biostatistik oleh Prof Jamal pada 14 dan 15 Ogos 2018. Terima kasih kerana sudi share slide yang digunakan semasa kursus. Pergi kursus lebih banyak belajar tips and tricks! Akan download juga slide2 lain yang berkaitan walaupun slide tersebut tiada semasa kursus. Perkongsian maklumat sebegini amat dihargai. Terima kasih Prof! Harap halalkan ilmu yang diberi. Semoga prof sentiasa murah rezeki. Aaminn.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Multivariate analysis

  1. 1. Multivariate Analysis Prof. Dr. Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine
  2. 2. Smoking & lung cancer Good case-control study associating lung cancer to smoking (Wynder EL, Graham E. Tobacco smoking as a possible etiologic factor in bronchiogenic carcinoma: a study of 684 proven cases. JAMA 1950;143:329-36.) Tobacco dust (not smoke) might be causing the elevated incidence of lung tumours among German tobacco workers. (Hermann Rottmann in Würzburg 1898) Difference Causal Smoking might be related to lung cancer, but lung cancer is still rare (Adler I. Primary Malignant Growths of the Lungs and Bronchi. London: Longmans, 1912:22) 86 lung cancers patients were likely smoked (Müller FH. Tabakmissbrauch und Lungencarcinom. Zeitschrift für Krebsforschung 1939;49:57–85.) Smoking 35 sticks per day increase risk to 40 times (Doll R, Hill AB. The mortality of doctors in relation to their smoking habits. BMJ 1954;1:1451–5.) Animal study associating cigarette smoke tar with cancer (Wynder E, Graham EA, Croninger AB. Experimental production of carcinoma with cigarette tar. Cancer Res 1953;13:855–66) 8November2016(C)JamalludinAbRahman2015 2
  3. 3. It is about relationship Analysis of the relationships between two or more variables. 8November2016(C)JamalludinAbRahman2015 3
  4. 4. Multivariate?  Multivariate - general term – multiple IV  May involved multiple DV DVIV IV DV DV IV IV 8November2016(C)JamalludinAbRahman2015 4
  5. 5. Outcome Exposure Exposure Exposure Exposure Exposure Effect modifier or Moderator Confounder Mediator 8November2016(C)JamalludinAbRahman2015 5
  6. 6. 8November2016(C)JamalludinAbRahman2015 6
  7. 7. 8November2016(C)JamalludinAbRahman2015 7
  8. 8. Exercise & fitness Low Moderate High Is there any difference % between Low & Moderate intensity? How big is the difference % between Low & Moderate? Is there any pattern now? What is your conclusion? Fitness level Exercise intensity 8November2016(C)JamalludinAbRahman2015 8
  9. 9. Physical activity & blood pressure Time SBP 30 140 40 145 85 130 90 143 100 130 110 120 110 110 120 120 130 110 130 109 140 98 150 100 140 110 135 120 160 100 160 96 170 100 200 89 200 100 240 80 y = -0.3287x + 155.89 R² = 0.8508 0 20 40 60 80 100 120 140 160 0 50 100 150 200 250 300 SBP(mmHg) Time (minutes/week) 8November2016(C)JamalludinAbRahman2015 9
  10. 10. The 3rd variable Outcome Exposure Exposure 8November2016(C)JamalludinAbRahman2015 10
  11. 11. The 3rd factors can be a... 1. Confounder 2. Mediator or intervening factor 3. Moderator or effect modifier (interaction) 8November2016(C)JamalludinAbRahman2015 11
  12. 12. The 3rd variable Outcome Exposure Exposure Confounder Confounder influence a relationship (between two variables) but it is not a part of the pathway 8November2016(C)JamalludinAbRahman2015 12
  13. 13. 8November2016(C)JamalludinAbRahman2015 13
  14. 14. The 3rd variable Outcome Exposure Exposure Moderator When an exposure has different effects on disease at different values of a variable (interaction) 8November2016(C)JamalludinAbRahman2015 14
  15. 15. 8November2016(C)JamalludinAbRahman2015 15
  16. 16. Stress vs. MS vs. Coping mechanism Multiple sclerosis new lesions Coping Mech. Stress Mohr, D. C., Goodkin, D. E., Nelson, S., Cox, D., & Weiner, M. (2002). Moderating Effects of Coping on the Relationship Between Stress and the Development of New Brain Lesions in Multiple Sclerosis. Psychosom Med, 64(5), 803-809. OR = 1.62, p = 0.009 Distraction (OR=0.69, p=0.009), instrumental (OR=0.77, p=0.081), emotional preoccupation (OR=1.46, p=0.088) & palliative (NS) 8November2016(C)JamalludinAbRahman2015 16
  17. 17. The 3rd variable Outcome Exposure Exposure Mediator Mediator influence a relationship (between two variables) and it is also a part of the pathway 8November2016(C)JamalludinAbRahman2015 17
  18. 18. 8November2016(C)JamalludinAbRahman2015 18
  19. 19. 8November2016(C)JamalludinAbRahman2015 19
  20. 20. Why multivariate?  Multi-factorial – which are the significant factors?  Multiple outcomes  Multiple unit of measurements  Exploration of associations 8November2016(C)JamalludinAbRahman2015 20
  21. 21. Regression  Most common multivariate technique  Best line to fit the data - OLS  Many types:  Linear regression  Logistic regression  & many others!! 8November2016(C)JamalludinAbRahman2015 21 x y 𝑦 = 𝛽0 + 𝛽1 𝑥 𝛽0
  22. 22. Regression equation  e.g. Linear regression 8November2016(C)JamalludinAbRahman2015 22 Dependent Var Intercept Coefficient for Var x1 Explanatory Var x1 Error/Residual 𝑌 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽 𝑛 𝑥 𝑛 + 𝜀
  23. 23. Example #1 Arterial BP = Constant + Age + Body weight + Pulse rate + Stress + Residual 𝑌 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ 𝛽 𝑛 𝑥 𝑛 + 𝜀 8November2016(C)JamalludinAbRahman2015 23
  24. 24. Logistic regression  𝐿𝑛 𝑃 1−𝑃 = 𝛽𝑜 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑛 𝑥 𝑛  P = Probability that Y=1 i.e. Event occurs  1-P = Event not occur  𝐿𝑛 𝑃 1−𝑃 = Ln of OR = Logit 8November2016(C)JamalludinAbRahman2015 24
  25. 25. R2 = 0.986, meaning 99% of variation in ABP is explained by Age, Body Weight, Pulse Rate & Stress (F(4,15)=291.948), P<0.001) 8November2016(C)JamalludinAbRahman2015 25
  26. 26. Main Result Arterial BP = 17.3 + 0.6(Age) + 0.9(Body weight) + 0.09(Pulse rate) + 0.01(Stress) 8November2016(C)JamalludinAbRahman2015 26
  27. 27. Example #2  Snoring and risk of cardiovascular disease in women . Hu 2000. From The Nurses’ Health Study. Cohort. Baseline, N=71,779 women 40 to 65 years old and without diagnosed CVD or cancer in 1986. Till 31st May 1994.  CVD = Snoring + Age +Smoking + BMI + Alcohol + Physical Activity + Menopausal status + Family history of MI + DM + Cholesterol + Hours sleeping + Sleeping position 8November2016(C)JamalludinAbRahman2015 27
  28. 28. 8November2016(C)JamalludinAbRahman2015 28
  29. 29. 8November2016(C)JamalludinAbRahman2015 29
  30. 30. Testing assumptions for linear regression 1. Linearity between DV & IVs – Scatter plot residuals vs. predicted 2. Normality – Histogram of residuals 3. No outliers – Casewise diagnostics (within +/- 2SD), Cook’s D (for influential points, <1), leverage point (< 4/n) 4. Homogeneity – Scatter plot 5. Independence (no autocorrelation) – Durbin Watson 1.5-2.5 (or some says 0-4) 6. No multicollinearity – Tolerance > 0.1, VIF <10 8November2016(C)JamalludinAbRahman2015 30
  31. 31. Residuals x y Residual = Observed - Predicted 𝑦 = 𝛽0 + 𝛽1 𝑥 𝛽0 8November2016(C)JamalludinAbRahman2015 31
  32. 32. Residual statistics - Linearity Predicted Residuals Residuals Linear Non linear 8November2016(C)JamalludinAbRahman2015 32
  33. 33. Residual statistics - Normality Predicted Residuals Residuals Normal distribution Not Normal 8November2016(C)JamalludinAbRahman2015 33
  34. 34. Residual statistics - Heterogeneity Predicted Residuals Residuals Homogenous Heterogenous 8November2016(C)JamalludinAbRahman2015 34
  35. 35. Influential data x y A B C A = Outlier, still within the range of x, large residual value B & C = Leverage points B = Good leverage, it won’t impact the regression line C = Bad leverage. It will change the regression line 8November2016(C)JamalludinAbRahman2015 35
  36. 36. Type of multivariate tests Dependent Variables Independent Variables Test 1 – Cont ≥ 2 – All Cont Linear Regression 1 – Cont ≥ 2 – All Cat ANOVA 1 – Cont ≥ 2 – Cont + Cat ANCOVA > 1 – Cont All Cat MANOVA > 1 – Cont Cat + Cont MANCOVA 1 – Dichotomous ≥ 2 – Cont + Cat Binary Logistic Regression 8November2016(C)JamalludinAbRahman2015 36

×