SlideShare a Scribd company logo
1 of 38
Introduction to
Logistic Regression
Rachid Salmi,
Jean-Claude Desenclos,
Thomas Grein,
Alain Moren
Oral contraceptives (OC) and
myocardial infarction (MI)
Case-control study, unstratified data
OC MI Controls OR
Yes 693 320 4.8
No 307 680 Ref.
Total 1000 1000
Oral contraceptives (OC) and
myocardial infarction (MI)
Case-control study, unstratified data
Smoking MI Controls OR
Yes 700 500 2.3
No 300 500 Ref.
Total 1000 1000
Smokers
OC MI Controls OR
Yes 517 160 6.0
No 183 340 Ref.
Total 700 500
Nonsmokers
OC MI Controls OR
Yes 176 160 3.0
No 124 340 Ref.
Total 300 500
Odds ratio for OC adjusted for smoking = 4 .5
Number
of cases
One case
18 19 20 21 22 23 24 25 26 27
17
16
15
13 14
0
5
10
Days
Cases of gastroenteritis among residents of a nursing
home, by date of onset, Pennsylvania, October 1986
Protein Total Cases AR% RR
suppl.
YES 29 22 76 3.3
NO 74 17 23
Total 103 39 38
Cases of gastroenteritis among residents of a nursing home according to
protein supplement consumption, Pa, 1986
Sex-specific attack rates of gastroenteritis
among residents of a nursing home, Pa, 1986
Sex Total Cases AR(%) RR & 95% CI
Male 22 5 23 Reference
Female 81 34 42 1.8 (0.8-4.2)
Total 103 39 38
Attack rates of gastroenteritis
among residents of a nursing home,
by place of meal, Pa, 1986
Meal Total Cases AR(%) RR & 95% CI
Dining room 41 12 29 Reference
Bedroom 62 27 44 1.5 (0.9-2.6)
Total 103 39 38
Age – specific attack rates of gastroenteritis
among residents of a nursing home, Pa, 1986
Age group Total Cases AR(%)
50-59 1 2 50
60-69 9 2 22
70-79 28 9 32
80-89 45 17 38
90+ 19 10 53
Total 103 39 38
Attack rates of gastroenteritis
among residents of a nursing home,
by floor of residence, Pa, 1986
Floor Total Cases AR (%)
One 12 3 25
Two 32 17 53
Three 30 7 23
Four 29 12 41
Total 103 39 38
Multivariate analysis
• Multiple models
– Linear regression
– Logistic regression
– Cox model
– Poisson regression
– Loglinear model
– Discriminant analysis
– ......
• Choice of the tool according to the objectives,
the study, and the variables
Simple linear regression
Age SBP Age SBP Age SBP
22 131 41 139 52 128
23 128 41 171 54 105
24 116 46 137 56 145
27 106 47 111 57 141
28 114 48 115 58 153
29 123 49 133 59 157
30 117 49 128 63 155
32 122 50 183 67 176
33 99 51 130 71 172
35 121 51 133 77 178
40 147 51 144 81 217
Table 1 Age and systolic blood pressure (SBP) among 33 adult women
80
100
120
140
160
180
200
220
20 30 40 50 60 70 80 90
SBP (mm Hg)
Age (years)
adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
Age
1.222
81.54
SBP 


Simple linear regression
• Relation between 2 continuous variables (SBP and age)
• Regression coefficient b1
– Measures association between y and x
– Amount by which y changes on average when x changes by
one unit
– Least squares method
y
x
x
β
α
y 1
1


Slope
Multiple linear regression
• Relation between a continuous variable and a set of
i continuous variables
• Partial regression coefficients bi
– Amount by which y changes on average
when xi changes by one unit
and all the other xis remain constant
– Measures association between xi and y adjusted for all other xi
• Example
– SBP versus age, weight, height, etc
x
β
...
x
β
x
β
α
y i
i
2
2
1
1 




Multiple linear regression
Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Dependent Independent variables
x
β
...
x
β
x
β
α
y i
i
2
2
1
1 




Logistic regression (1)
Age CD Age CD Age CD
22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
Table 2 Age and signs of coronary heart disease (CD)
How can we analyse these data?
• Compare mean age of diseased and non-diseased
– Non-diseased: 38.6 years
– Diseased: 58.7 years (p<0.0001)
• Linear regression?
Dot-plot: Data from Table 2
A
G
E
(
y
e
a
r
s
)
Si
g
ns
of
c
o
r
o
nar
y
di
s
e
as
e
N
o
Y
e
s
0 2
0 4
0 6
0 8
0 1
0
0
Logistic regression (2)
Table 3 Prevalence (%) of signs of CD according to age group
Diseased
Age group # in group # %
20 - 29 5 0 0
30 - 39 6 1 17
40 - 49 7 2 29
50 - 59 7 4 57
60 - 69 5 4 80
70 - 79 2 2 100
80 - 89 1 1 100
Dot-plot: Data from Table 3
0
20
40
60
80
100
0 2 4 6 8
Diseased %
Age group
Logistic function (1)
0.0
0.2
0.4
0.6
0.8
1.0
Probability of
disease
x
P y x
e
e
x
x
( ) 



 b
 b
1
ln
( )
( )
P y x
P y x
x
1





  
 b
Transformation
logit of P(y|x)
{
P y x
e
e
x
x
( ) 



 b
 b
1
 = log odds of disease
in unexposed
b = log odds ratio associated
with being exposed
e b
= odds ratio
)
(
)
(
x
y
P
x
y
P

1
Fitting equation to the data
• Linear regression: Least squares
• Logistic regression: Maximum likelihood
• Likelihood function
– Estimates parameters  and b
– Practically easier to work with log-likelihood
     
 









n
i
i
i
i
i x
y
x
y
l
L
1
)
(
1
ln
)
1
(
)
(
ln
)
(
ln
)
( 

Maximum likelihood
• Iterative computing
– Choice of an arbitrary value for the coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)
• Results
– Maximum Likelihood Estimates (MLE) for  and b
– Estimates of P(y) for a given value of x
Multiple logistic regression
• More than one independent variable
– Dichotomous, ordinal, nominal, continuous …
• Interpretation of bi
– Increase in log-odds for a one unit increase in xi with all
the other xis constant
– Measures association between xi and log-odds adjusted
for all other xi
i
i
2
2
1
1 x
β
...
x
β
x
β
α
P
-
1
P
ln 









Statistical testing
• Question
– Does model including given independent variable
provide more information about dependent variable than
model without this variable?
• Three tests
– Likelihood ratio statistic (LRS)
– Wald test
– Score test
Likelihood ratio statistic
• Compares two nested models
Log(odds) =  + b1x1 + b2x2 + b3x3 (model 1)
Log(odds) =  + b1x1 + b2x2 (model 2)
• LR statistic
-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood model 1)
LR statistic is a 2 with DF = number of extra parameters
in model
Coding of variables (2)
• Nominal variables or ordinal with unequal
classes:
– Tobacco smoked: no=0, grey=1, brown=2, blond=3
– Model assumes that OR for blond tobacco
= OR for grey tobacco3
– Use indicator variables (dummy variables)
Indicator variables: Type of tobacco
• Neutralises artificial hierarchy between classes in the
variable "type of tobacco"
• No assumptions made
• 3 variables (3 df) in model using same reference
• OR for each type of tobacco adjusted for the others in
reference to non-smoking
Dummy variables
Tobacco
consumption Grey Brown Blond
Blond 0 0 1
Brown 0 1 0
Grey 1 0 0
None 0 0 0
Reference
• Hosmer DW, Lemeshow S. Applied logistic
regression. Wiley & Sons, New York, 1989
Logistic regression
Synthesis
Salmonella enteritidis
Protein supplement
S. Enteritidis
gastroenteritis
Sex
Floor
Age
Place of meal
Blended diet
•Unconditional Logistic Regression
Term
Odds
Ratio 95% C.I. Coef. S. E.
Z-
Statistic
P-
Value
AGG (2/1) 1,6795 0,2634 10,7082 0,5185 0,9452 0,5486 0,5833
AGG (3/1) 1,7570 0,3249 9,5022 0,5636 0,8612 0,6545 0,5128
Blended (Yes/No) 1,0345 0,3277 3,2660 0,0339 0,5866 0,0578 0,9539
Floor (2/1) 1,6126 0,2675 9,7220 0,4778 0,9166 0,5213 0,6022
Floor (3/1) 0,7291 0,0991 5,3668 -0,3159 1,0185 -0,3102 0,7564
Floor (4/1) 1,1137 0,1573 7,8870 0,1076 0,9988 0,1078 0,9142
Meal 1,5942 0,4953 5,1317 0,4664 0,5965 0,7819 0,4343
Protein (Yes/No) 9,0918 3,0219 27,3533 2,2074 0,5620 3,9278 0,0001
Sex 1,3024 0,2278 7,4468 0,2642 0,8896 0,2970 0,7665
CONSTANT * * * -3,0080 2,0559 -1,4631 0,1434
•Unconditional Logistic Regression
Term Odds Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value
Age 1,0234 0,9660 1,0842 0,0231 0,0294 0,7848 0,4326
Blended (Yes/No) 1,0184 0,3220 3,2207 0,0183 0,5874 0,0311 0,9752
Floor (2/1) 1,6440 0,2745 9,8468 0,4971 0,9133 0,5443 0,5862
Floor (3/1) 0,7132 0,0972 5,2321 -0,3379 1,0167 -0,3324 0,7396
Floor (4/1) 1,0708 0,1522 7,5322 0,0684 0,9953 0,0687 0,9452
Meal 1,6561 0,5236 5,2379 0,5045 0,5875 0,8587 0,3905
Protein (Yes/No) 8,7678 2,9521 26,0403 2,1711 0,5554 3,9091 0,0001
Sex 1,1957 0,2135 6,6981 0,1787 0,8791 0,2033 0,8389
CONSTANT * * * -4,2896 2,8908 -1,4839 0,1378
Logistic Regression Model
Summary Statistics
Value DF p-value
Deviance 107,9814 95
Likelihood ratio test 34,8068 8 < 0.001
Parameter Estimates 95% C.I.
Terms Coefficient Std.Error p-value OR Lower Upper
%GM -1,8857 1,0420 0,0703 0,1517 0,0197 1,1695
SEX ='2' 0,2139 0,8812 0,8082 1,2385 0,2202 6,9662
FLOOR ='2' 0,4987 0,9083 0,5829 1,6466 0,2776 9,7659
²FLOOR ='3' -0,3235 1,0150 0,7500 0,7236 0,0990 5,2909
FLOOR ='4' 0,1088 0,9839 0,9119 1,1150 0,1621 7,6698
MEAL ='2' 0,5308 0,5613 0,3443 1,7002 0,5659 5,1081
Protein ='1' 2,1809 0,5303 < 0.001 8,8541 3,1316 25,034
TWOAGG ='2' 0,1904 0,5162 0,7122 1,2098 0,4399 3,3272
Termwise Wald Test
Term Wald Stat. DF p-value
FLOOR 1,0812 3 0,7816
Poisson Regression Model
Summary Statistics
Value DF p-value
Deviance 60,2622 95
Likelihood ratio test 67,7378 8 < 0.001
Parameter Estimates 95% C.I.
Terms Coefficient Std.Error p-value RR Lower Upper
%GM -1,8213 0,8446 0,0310 0,1618 0,0309 0,8471
SEX ='2' 0,1295 0,7106 0,8554 1,1383 0,2827 4,5828
FLOOR ='2' 0,2503 0,6867 0,7154 1,2844 0,3344 4,9343
FLOOR ='3' -0,1422 0,8032 0,8595 0,8674 0,1797 4,1877
FLOOR ='4' 0,1368 0,7263 0,8506 1,1466 0,2761 4,7608
MEAL ='2' 0,2373 0,3854 0,5381 1,2678 0,5956 2,6987
Protein ='1' 1,0658 0,3413 0,0018 2,9032 1,4871 5,6679
TWOAGG ='2' 0,0645 0,3682 0,8611 1,0666 0,5182 2,1951
Termwise Wald Test
Term Wald Stat. DF p-value
FLOOR 0,4178 3 0,9365
Cox Proportional Hazards
Term Hazard Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value
_AGG (2/1) 1,0666 0,5183 2,195 0,0645 0,3682 0,175 0,8611
Floor(2/1) 1,2844 0,3344 4,9342 0,2503 0,6867 0,3646 0,7154
Floor(3/1) 0,8674 0,1797 4,1876 -0,1422 0,8032 -0,177 0,8595
Floor(4/1) 1,1466 0,2761 4,7607 0,1368 0,7263 0,1883 0,8506
Meal (2/1) 1,2678 0,5957 2,6986 0,2373 0,3854 0,6157 0,5381
Protein(Yes/No) 2,9032 1,4871 5,6678 1,0658 0,3413 3,1225 0,0018
Sex (2/1) 1,1383 0,2827 4,5827 0,1295 0,7106 0,1822 0,8554
Convergence: Converged
Iterations: 5
-2 * Log-Likelihood: 346,0200
Test Statistic D.F. P-Value
Score 17,1727 7 0,0163
Likelihood Ratio 15,4889 7 0,0302

More Related Content

Similar to epiet-22- Logistic regression 2006-1.ppt

Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxspoonerneddy
 
chapter15c.ppt
chapter15c.pptchapter15c.ppt
chapter15c.pptdawitg2
 
Measurement of disease frequency
Measurement of disease frequencyMeasurement of disease frequency
Measurement of disease frequencyEhealthMoHS
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).pptRAJESHKUMAR428748
 
ACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAflasco_org
 
Unemployment in America
Unemployment in AmericaUnemployment in America
Unemployment in AmericaJoseph Reiter
 
Kshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryKshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryOleg Kshivets
 
Sociology 601 class 7
Sociology 601 class 7Sociology 601 class 7
Sociology 601 class 7Rishabh Gupta
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analysesMike LaValley
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detectionHyun-hwan Jeong
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 
Burden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleBurden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleRobert Simons
 
Burden of Proof Proof of Principle
Burden of Proof Proof of PrincipleBurden of Proof Proof of Principle
Burden of Proof Proof of PrincipleRobert Simons
 
Health Economics and Outcomes Research: Minimizing Uncertainty
Health Economics and Outcomes Research: Minimizing Uncertainty Health Economics and Outcomes Research: Minimizing Uncertainty
Health Economics and Outcomes Research: Minimizing Uncertainty Robert Simons
 
Medical statistics Basic concept and applications [Square one]
Medical statistics Basic concept and applications [Square one]Medical statistics Basic concept and applications [Square one]
Medical statistics Basic concept and applications [Square one]Tarek Tawfik Amin
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_intermRoopan Verma
 

Similar to epiet-22- Logistic regression 2006-1.ppt (20)

Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docx
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
chapter15c.ppt
chapter15c.pptchapter15c.ppt
chapter15c.ppt
 
chapter15c.ppt
chapter15c.pptchapter15c.ppt
chapter15c.ppt
 
Measurement of disease frequency
Measurement of disease frequencyMeasurement of disease frequency
Measurement of disease frequency
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
 
Distributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
Distributed Multi-Level Matrix Completion for Medical Databases by Julie JosseDistributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
Distributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
 
ACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIA
 
Unemployment in America
Unemployment in AmericaUnemployment in America
Unemployment in America
 
Kshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryKshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse Surgery
 
Sociology 601 class 7
Sociology 601 class 7Sociology 601 class 7
Sociology 601 class 7
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
 
a brief introduction to epistasis detection
a brief introduction to epistasis detectiona brief introduction to epistasis detection
a brief introduction to epistasis detection
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Burden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleBurden of Proof, Proof of Principle
Burden of Proof, Proof of Principle
 
Burden of Proof Proof of Principle
Burden of Proof Proof of PrincipleBurden of Proof Proof of Principle
Burden of Proof Proof of Principle
 
Health Economics and Outcomes Research: Minimizing Uncertainty
Health Economics and Outcomes Research: Minimizing Uncertainty Health Economics and Outcomes Research: Minimizing Uncertainty
Health Economics and Outcomes Research: Minimizing Uncertainty
 
Medical statistics Basic concept and applications [Square one]
Medical statistics Basic concept and applications [Square one]Medical statistics Basic concept and applications [Square one]
Medical statistics Basic concept and applications [Square one]
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_interm
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

epiet-22- Logistic regression 2006-1.ppt

  • 1. Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren
  • 2. Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data OC MI Controls OR Yes 693 320 4.8 No 307 680 Ref. Total 1000 1000
  • 3. Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data Smoking MI Controls OR Yes 700 500 2.3 No 300 500 Ref. Total 1000 1000
  • 4. Smokers OC MI Controls OR Yes 517 160 6.0 No 183 340 Ref. Total 700 500 Nonsmokers OC MI Controls OR Yes 176 160 3.0 No 124 340 Ref. Total 300 500 Odds ratio for OC adjusted for smoking = 4 .5
  • 5. Number of cases One case 18 19 20 21 22 23 24 25 26 27 17 16 15 13 14 0 5 10 Days Cases of gastroenteritis among residents of a nursing home, by date of onset, Pennsylvania, October 1986
  • 6. Protein Total Cases AR% RR suppl. YES 29 22 76 3.3 NO 74 17 23 Total 103 39 38 Cases of gastroenteritis among residents of a nursing home according to protein supplement consumption, Pa, 1986
  • 7. Sex-specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Sex Total Cases AR(%) RR & 95% CI Male 22 5 23 Reference Female 81 34 42 1.8 (0.8-4.2) Total 103 39 38
  • 8. Attack rates of gastroenteritis among residents of a nursing home, by place of meal, Pa, 1986 Meal Total Cases AR(%) RR & 95% CI Dining room 41 12 29 Reference Bedroom 62 27 44 1.5 (0.9-2.6) Total 103 39 38
  • 9. Age – specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Age group Total Cases AR(%) 50-59 1 2 50 60-69 9 2 22 70-79 28 9 32 80-89 45 17 38 90+ 19 10 53 Total 103 39 38
  • 10. Attack rates of gastroenteritis among residents of a nursing home, by floor of residence, Pa, 1986 Floor Total Cases AR (%) One 12 3 25 Two 32 17 53 Three 30 7 23 Four 29 12 41 Total 103 39 38
  • 11. Multivariate analysis • Multiple models – Linear regression – Logistic regression – Cox model – Poisson regression – Loglinear model – Discriminant analysis – ...... • Choice of the tool according to the objectives, the study, and the variables
  • 12. Simple linear regression Age SBP Age SBP Age SBP 22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217 Table 1 Age and systolic blood pressure (SBP) among 33 adult women
  • 13. 80 100 120 140 160 180 200 220 20 30 40 50 60 70 80 90 SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974 Age 1.222 81.54 SBP   
  • 14. Simple linear regression • Relation between 2 continuous variables (SBP and age) • Regression coefficient b1 – Measures association between y and x – Amount by which y changes on average when x changes by one unit – Least squares method y x x β α y 1 1   Slope
  • 15. Multiple linear regression • Relation between a continuous variable and a set of i continuous variables • Partial regression coefficients bi – Amount by which y changes on average when xi changes by one unit and all the other xis remain constant – Measures association between xi and y adjusted for all other xi • Example – SBP versus age, weight, height, etc x β ... x β x β α y i i 2 2 1 1     
  • 16. Multiple linear regression Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables Dependent Independent variables x β ... x β x β α y i i 2 2 1 1     
  • 17. Logistic regression (1) Age CD Age CD Age CD 22 0 40 0 54 0 23 0 41 1 55 1 24 0 46 0 58 1 27 0 47 0 60 1 28 0 48 0 60 0 30 0 49 1 62 1 30 0 49 0 65 1 32 0 50 1 67 1 33 0 51 0 71 1 35 1 51 1 77 1 38 0 52 0 81 1 Table 2 Age and signs of coronary heart disease (CD)
  • 18. How can we analyse these data? • Compare mean age of diseased and non-diseased – Non-diseased: 38.6 years – Diseased: 58.7 years (p<0.0001) • Linear regression?
  • 19. Dot-plot: Data from Table 2 A G E ( y e a r s ) Si g ns of c o r o nar y di s e as e N o Y e s 0 2 0 4 0 6 0 8 0 1 0 0
  • 20. Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group Diseased Age group # in group # % 20 - 29 5 0 0 30 - 39 6 1 17 40 - 49 7 2 29 50 - 59 7 4 57 60 - 69 5 4 80 70 - 79 2 2 100 80 - 89 1 1 100
  • 21. Dot-plot: Data from Table 3 0 20 40 60 80 100 0 2 4 6 8 Diseased % Age group
  • 22. Logistic function (1) 0.0 0.2 0.4 0.6 0.8 1.0 Probability of disease x P y x e e x x ( )      b  b 1
  • 23. ln ( ) ( ) P y x P y x x 1          b Transformation logit of P(y|x) { P y x e e x x ( )      b  b 1  = log odds of disease in unexposed b = log odds ratio associated with being exposed e b = odds ratio ) ( ) ( x y P x y P  1
  • 24. Fitting equation to the data • Linear regression: Least squares • Logistic regression: Maximum likelihood • Likelihood function – Estimates parameters  and b – Practically easier to work with log-likelihood                  n i i i i i x y x y l L 1 ) ( 1 ln ) 1 ( ) ( ln ) ( ln ) (  
  • 25. Maximum likelihood • Iterative computing – Choice of an arbitrary value for the coefficients (usually 0) – Computing of log-likelihood – Variation of coefficients’ values – Reiteration until maximisation (plateau) • Results – Maximum Likelihood Estimates (MLE) for  and b – Estimates of P(y) for a given value of x
  • 26. Multiple logistic regression • More than one independent variable – Dichotomous, ordinal, nominal, continuous … • Interpretation of bi – Increase in log-odds for a one unit increase in xi with all the other xis constant – Measures association between xi and log-odds adjusted for all other xi i i 2 2 1 1 x β ... x β x β α P - 1 P ln          
  • 27. Statistical testing • Question – Does model including given independent variable provide more information about dependent variable than model without this variable? • Three tests – Likelihood ratio statistic (LRS) – Wald test – Score test
  • 28. Likelihood ratio statistic • Compares two nested models Log(odds) =  + b1x1 + b2x2 + b3x3 (model 1) Log(odds) =  + b1x1 + b2x2 (model 2) • LR statistic -2 log (likelihood model 2 / likelihood model 1) = -2 log (likelihood model 2) minus -2log (likelihood model 1) LR statistic is a 2 with DF = number of extra parameters in model
  • 29. Coding of variables (2) • Nominal variables or ordinal with unequal classes: – Tobacco smoked: no=0, grey=1, brown=2, blond=3 – Model assumes that OR for blond tobacco = OR for grey tobacco3 – Use indicator variables (dummy variables)
  • 30. Indicator variables: Type of tobacco • Neutralises artificial hierarchy between classes in the variable "type of tobacco" • No assumptions made • 3 variables (3 df) in model using same reference • OR for each type of tobacco adjusted for the others in reference to non-smoking Dummy variables Tobacco consumption Grey Brown Blond Blond 0 0 1 Brown 0 1 0 Grey 1 0 0 None 0 0 0
  • 31. Reference • Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989
  • 33. Salmonella enteritidis Protein supplement S. Enteritidis gastroenteritis Sex Floor Age Place of meal Blended diet
  • 34. •Unconditional Logistic Regression Term Odds Ratio 95% C.I. Coef. S. E. Z- Statistic P- Value AGG (2/1) 1,6795 0,2634 10,7082 0,5185 0,9452 0,5486 0,5833 AGG (3/1) 1,7570 0,3249 9,5022 0,5636 0,8612 0,6545 0,5128 Blended (Yes/No) 1,0345 0,3277 3,2660 0,0339 0,5866 0,0578 0,9539 Floor (2/1) 1,6126 0,2675 9,7220 0,4778 0,9166 0,5213 0,6022 Floor (3/1) 0,7291 0,0991 5,3668 -0,3159 1,0185 -0,3102 0,7564 Floor (4/1) 1,1137 0,1573 7,8870 0,1076 0,9988 0,1078 0,9142 Meal 1,5942 0,4953 5,1317 0,4664 0,5965 0,7819 0,4343 Protein (Yes/No) 9,0918 3,0219 27,3533 2,2074 0,5620 3,9278 0,0001 Sex 1,3024 0,2278 7,4468 0,2642 0,8896 0,2970 0,7665 CONSTANT * * * -3,0080 2,0559 -1,4631 0,1434
  • 35. •Unconditional Logistic Regression Term Odds Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value Age 1,0234 0,9660 1,0842 0,0231 0,0294 0,7848 0,4326 Blended (Yes/No) 1,0184 0,3220 3,2207 0,0183 0,5874 0,0311 0,9752 Floor (2/1) 1,6440 0,2745 9,8468 0,4971 0,9133 0,5443 0,5862 Floor (3/1) 0,7132 0,0972 5,2321 -0,3379 1,0167 -0,3324 0,7396 Floor (4/1) 1,0708 0,1522 7,5322 0,0684 0,9953 0,0687 0,9452 Meal 1,6561 0,5236 5,2379 0,5045 0,5875 0,8587 0,3905 Protein (Yes/No) 8,7678 2,9521 26,0403 2,1711 0,5554 3,9091 0,0001 Sex 1,1957 0,2135 6,6981 0,1787 0,8791 0,2033 0,8389 CONSTANT * * * -4,2896 2,8908 -1,4839 0,1378
  • 36. Logistic Regression Model Summary Statistics Value DF p-value Deviance 107,9814 95 Likelihood ratio test 34,8068 8 < 0.001 Parameter Estimates 95% C.I. Terms Coefficient Std.Error p-value OR Lower Upper %GM -1,8857 1,0420 0,0703 0,1517 0,0197 1,1695 SEX ='2' 0,2139 0,8812 0,8082 1,2385 0,2202 6,9662 FLOOR ='2' 0,4987 0,9083 0,5829 1,6466 0,2776 9,7659 ²FLOOR ='3' -0,3235 1,0150 0,7500 0,7236 0,0990 5,2909 FLOOR ='4' 0,1088 0,9839 0,9119 1,1150 0,1621 7,6698 MEAL ='2' 0,5308 0,5613 0,3443 1,7002 0,5659 5,1081 Protein ='1' 2,1809 0,5303 < 0.001 8,8541 3,1316 25,034 TWOAGG ='2' 0,1904 0,5162 0,7122 1,2098 0,4399 3,3272 Termwise Wald Test Term Wald Stat. DF p-value FLOOR 1,0812 3 0,7816
  • 37. Poisson Regression Model Summary Statistics Value DF p-value Deviance 60,2622 95 Likelihood ratio test 67,7378 8 < 0.001 Parameter Estimates 95% C.I. Terms Coefficient Std.Error p-value RR Lower Upper %GM -1,8213 0,8446 0,0310 0,1618 0,0309 0,8471 SEX ='2' 0,1295 0,7106 0,8554 1,1383 0,2827 4,5828 FLOOR ='2' 0,2503 0,6867 0,7154 1,2844 0,3344 4,9343 FLOOR ='3' -0,1422 0,8032 0,8595 0,8674 0,1797 4,1877 FLOOR ='4' 0,1368 0,7263 0,8506 1,1466 0,2761 4,7608 MEAL ='2' 0,2373 0,3854 0,5381 1,2678 0,5956 2,6987 Protein ='1' 1,0658 0,3413 0,0018 2,9032 1,4871 5,6679 TWOAGG ='2' 0,0645 0,3682 0,8611 1,0666 0,5182 2,1951 Termwise Wald Test Term Wald Stat. DF p-value FLOOR 0,4178 3 0,9365
  • 38. Cox Proportional Hazards Term Hazard Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value _AGG (2/1) 1,0666 0,5183 2,195 0,0645 0,3682 0,175 0,8611 Floor(2/1) 1,2844 0,3344 4,9342 0,2503 0,6867 0,3646 0,7154 Floor(3/1) 0,8674 0,1797 4,1876 -0,1422 0,8032 -0,177 0,8595 Floor(4/1) 1,1466 0,2761 4,7607 0,1368 0,7263 0,1883 0,8506 Meal (2/1) 1,2678 0,5957 2,6986 0,2373 0,3854 0,6157 0,5381 Protein(Yes/No) 2,9032 1,4871 5,6678 1,0658 0,3413 3,1225 0,0018 Sex (2/1) 1,1383 0,2827 4,5827 0,1295 0,7106 0,1822 0,8554 Convergence: Converged Iterations: 5 -2 * Log-Likelihood: 346,0200 Test Statistic D.F. P-Value Score 17,1727 7 0,0163 Likelihood Ratio 15,4889 7 0,0302