SlideShare a Scribd company logo
1 of 38
Introduction to
Logistic Regression
Rachid Salmi,
Jean-Claude Desenclos,
Thomas Grein,
Alain Moren
Oral contraceptives (OC) and
myocardial infarction (MI)
Case-control study, unstratified data
OC MI Controls OR
Yes 693 320 4.8
No 307 680 Ref.
Total 1000 1000
Oral contraceptives (OC) and
myocardial infarction (MI)
Case-control study, unstratified data
Smoking MI Controls OR
Yes 700 500 2.3
No 300 500 Ref.
Total 1000 1000
Smokers
OC MI Controls OR
Yes 517 160 6.0
No 183 340 Ref.
Total 700 500
Nonsmokers
OC MI Controls OR
Yes 176 160 3.0
No 124 340 Ref.
Total 300 500
Odds ratio for OC adjusted for smoking = 4 .5
Number
of cases
One case
18 19 20 21 22 23 24 25 26 2717161513 14
0
5
10
Days
Cases of gastroenteritis among residents of a nursing
home, by date of onset, Pennsylvania, October 1986
Protein Total Cases AR% RR
suppl.
YES 29 22 76 3.3
NO 74 17 23
Total 103 39 38
Cases of gastroenteritis among residents of a nursing home according to
protein supplement consumption, Pa, 1986
Sex-specific attack rates of gastroenteritis
among residents of a nursing home, Pa, 1986
Sex Total Cases AR(%) RR & 95% CI
Male 22 5 23 Reference
Female 81 34 42 1.8 (0.8-4.2)
Total 103 39 38
Attack rates of gastroenteritis
among residents of a nursing home,
by place of meal, Pa, 1986
Meal Total Cases AR(%) RR & 95% CI
Dining room 41 12 29 Reference
Bedroom 62 27 44 1.5 (0.9-2.6)
Total 103 39 38
Age – specific attack rates of gastroenteritis
among residents of a nursing home, Pa, 1986
Age group Total Cases AR(%)
50-59 1 2 50
60-69 9 2 22
70-79 28 9 32
80-89 45 17 38
90+ 19 10 53
Total 103 39 38
Attack rates of gastroenteritis
among residents of a nursing home,
by floor of residence, Pa, 1986
Floor Total Cases AR (%)
One 12 3 25
Two 32 17 53
Three 30 7 23
Four 29 12 41
Total 103 39 38
Multivariate analysis
• Multiple models
– Linear regression
– Logistic regression
– Cox model
– Poisson regression
– Loglinear model
– Discriminant analysis
– ......
• Choice of the tool according to the objectives,
the study, and the variables
Simple linear regression
Age SBP Age SBP Age SBP
22 131 41 139 52 128
23 128 41 171 54 105
24 116 46 137 56 145
27 106 47 111 57 141
28 114 48 115 58 153
29 123 49 133 59 157
30 117 49 128 63 155
32 122 50 183 67 176
33 99 51 130 71 172
35 121 51 133 77 178
40 147 51 144 81 217
Table 1 Age and systolic blood pressure (SBP) among 33 adult women
80
100
120
140
160
180
200
220
20 30 40 50 60 70 80 90
SBP (mm Hg)
Age (years)
adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
Age1.22281.54SBP 
Simple linear regression
• Relation between 2 continuous variables (SBP and age)
• Regression coefficient b1
– Measures association between y and x
– Amount by which y changes on average when x changes by
one unit
– Least squares method
y
x
xβαy 11Slope
Multiple linear regression
• Relation between a continuous variable and a set of
i continuous variables
• Partial regression coefficients bi
– Amount by which y changes on average
when xi changes by one unit
and all the other xis remain constant
– Measures association between xi and y adjusted for all other xi
• Example
– SBP versus age, weight, height, etc
xβ...xβxβαy ii2211 
Multiple linear regression
Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Dependent Independent variables
xβ...xβxβαy ii2211 
Logistic regression (1)
Age CD Age CD Age CD
22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1
Table 2 Age and signs of coronary heart disease (CD)
How can we analyse these data?
• Compare mean age of diseased and non-diseased
– Non-diseased: 38.6 years
– Diseased: 58.7 years (p<0.0001)
• Linear regression?
Dot-plot: Data from Table 2
AGE(years)
Signsofcoronarydisease
No
Yes
0 20 40 60 80 100
Logistic regression (2)
Table 3 Prevalence (%) of signs of CD according to age group
Diseased
Age group # in group # %
20 - 29 5 0 0
30 - 39 6 1 17
40 - 49 7 2 29
50 - 59 7 4 57
60 - 69 5 4 80
70 - 79 2 2 100
80 - 89 1 1 100
Dot-plot: Data from Table 3
0
20
40
60
80
100
0 2 4 6 8
Diseased %
Age group
Logistic function (1)
0.0
0.2
0.4
0.6
0.8
1.0
Probability of
disease
x
P y x
e
e
x
x
( ) 



 b
 b
1
ln
( )
( )
P y x
P y x
x
1





   b
Transformation
logit of P(y|x)
{P y x
e
e
x
x
( ) 



 b
 b
1
 = log odds of disease
in unexposed
b = log odds ratio associated
with being exposed
e b
= odds ratio
)(
)(
xyP
xyP
1
Fitting equation to the data
• Linear regression: Least squares
• Logistic regression: Maximum likelihood
• Likelihood function
– Estimates parameters  and b
– Practically easier to work with log-likelihood
      

n
i
iiii xyxylL
1
)(1ln)1()(ln)(ln)( 
Maximum likelihood
• Iterative computing
– Choice of an arbitrary value for the coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)
• Results
– Maximum Likelihood Estimates (MLE) for  and b
– Estimates of P(y) for a given value of x
Multiple logistic regression
• More than one independent variable
– Dichotomous, ordinal, nominal, continuous …
• Interpretation of bi
– Increase in log-odds for a one unit increase in xi with all
the other xis constant
– Measures association between xi and log-odds adjusted
for all other xi
ii2211 xβ...xβxβα
P-1
P
ln 





Statistical testing
• Question
– Does model including given independent variable
provide more information about dependent variable than
model without this variable?
• Three tests
– Likelihood ratio statistic (LRS)
– Wald test
– Score test
Likelihood ratio statistic
• Compares two nested models
Log(odds) =  + b1x1 + b2x2 + b3x3 (model 1)
Log(odds) =  + b1x1 + b2x2 (model 2)
• LR statistic
-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood model 1)
LR statistic is a 2 with DF = number of extra parameters
in model
Coding of variables (2)
• Nominal variables or ordinal with unequal
classes:
– Tobacco smoked: no=0, grey=1, brown=2, blond=3
– Model assumes that OR for blond tobacco
= OR for grey tobacco3
– Use indicator variables (dummy variables)
Indicator variables: Type of tobacco
• Neutralises artificial hierarchy between classes in the
variable "type of tobacco"
• No assumptions made
• 3 variables (3 df) in model using same reference
• OR for each type of tobacco adjusted for the others in
reference to non-smoking
Dummy variablesTobacco
consumption Grey Brown Blond
Blond 0 0 1
Brown 0 1 0
Grey 1 0 0
None 0 0 0
Reference
• Hosmer DW, Lemeshow S. Applied logistic
regression. Wiley & Sons, New York, 1989
Logistic regression
Synthesis
Salmonella enteritidis
Protein supplement
S. Enteritidis
gastroenteritis
Sex
Floor
Age
Place of meal
Blended diet
•Unconditional Logistic Regression
Term
Odds
Ratio 95% C.I. Coef. S. E.
Z-
Statistic
P-
Value
AGG (2/1) 1,6795 0,2634 10,7082 0,5185 0,9452 0,5486 0,5833
AGG (3/1) 1,7570 0,3249 9,5022 0,5636 0,8612 0,6545 0,5128
Blended (Yes/No) 1,0345 0,3277 3,2660 0,0339 0,5866 0,0578 0,9539
Floor (2/1) 1,6126 0,2675 9,7220 0,4778 0,9166 0,5213 0,6022
Floor (3/1) 0,7291 0,0991 5,3668 -0,3159 1,0185 -0,3102 0,7564
Floor (4/1) 1,1137 0,1573 7,8870 0,1076 0,9988 0,1078 0,9142
Meal 1,5942 0,4953 5,1317 0,4664 0,5965 0,7819 0,4343
Protein (Yes/No) 9,0918 3,0219 27,3533 2,2074 0,5620 3,9278 0,0001
Sex 1,3024 0,2278 7,4468 0,2642 0,8896 0,2970 0,7665
CONSTANT * * * -3,0080 2,0559 -1,4631 0,1434
•Unconditional Logistic Regression
Term Odds Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value
Age 1,0234 0,9660 1,0842 0,0231 0,0294 0,7848 0,4326
Blended (Yes/No) 1,0184 0,3220 3,2207 0,0183 0,5874 0,0311 0,9752
Floor (2/1) 1,6440 0,2745 9,8468 0,4971 0,9133 0,5443 0,5862
Floor (3/1) 0,7132 0,0972 5,2321 -0,3379 1,0167 -0,3324 0,7396
Floor (4/1) 1,0708 0,1522 7,5322 0,0684 0,9953 0,0687 0,9452
Meal 1,6561 0,5236 5,2379 0,5045 0,5875 0,8587 0,3905
Protein (Yes/No) 8,7678 2,9521 26,0403 2,1711 0,5554 3,9091 0,0001
Sex 1,1957 0,2135 6,6981 0,1787 0,8791 0,2033 0,8389
CONSTANT * * * -4,2896 2,8908 -1,4839 0,1378
Logistic Regression Model
Summary Statistics
Value DF p-value
Deviance 107,9814 95
Likelihood ratio test 34,8068 8 < 0.001
Parameter Estimates 95% C.I.
Terms Coefficient Std.Error p-value OR Lower Upper
%GM -1,8857 1,0420 0,0703 0,1517 0,0197 1,1695
SEX ='2' 0,2139 0,8812 0,8082 1,2385 0,2202 6,9662
FLOOR ='2' 0,4987 0,9083 0,5829 1,6466 0,2776 9,7659
²FLOOR ='3' -0,3235 1,0150 0,7500 0,7236 0,0990 5,2909
FLOOR ='4' 0,1088 0,9839 0,9119 1,1150 0,1621 7,6698
MEAL ='2' 0,5308 0,5613 0,3443 1,7002 0,5659 5,1081
Protein ='1' 2,1809 0,5303 < 0.001 8,8541 3,1316 25,034
TWOAGG ='2' 0,1904 0,5162 0,7122 1,2098 0,4399 3,3272
Termwise Wald Test
Term Wald Stat. DF p-value
FLOOR 1,0812 3 0,7816
Poisson Regression Model
Summary Statistics
Value DF p-value
Deviance 60,2622 95
Likelihood ratio test 67,7378 8 < 0.001
Parameter Estimates 95% C.I.
Terms Coefficient Std.Error p-value RR Lower Upper
%GM -1,8213 0,8446 0,0310 0,1618 0,0309 0,8471
SEX ='2' 0,1295 0,7106 0,8554 1,1383 0,2827 4,5828
FLOOR ='2' 0,2503 0,6867 0,7154 1,2844 0,3344 4,9343
FLOOR ='3' -0,1422 0,8032 0,8595 0,8674 0,1797 4,1877
FLOOR ='4' 0,1368 0,7263 0,8506 1,1466 0,2761 4,7608
MEAL ='2' 0,2373 0,3854 0,5381 1,2678 0,5956 2,6987
Protein ='1' 1,0658 0,3413 0,0018 2,9032 1,4871 5,6679
TWOAGG ='2' 0,0645 0,3682 0,8611 1,0666 0,5182 2,1951
Termwise Wald Test
Term Wald Stat. DF p-value
FLOOR 0,4178 3 0,9365
Cox Proportional Hazards
Term Hazard Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value
_AGG (2/1) 1,0666 0,5183 2,195 0,0645 0,3682 0,175 0,8611
Floor(2/1) 1,2844 0,3344 4,9342 0,2503 0,6867 0,3646 0,7154
Floor(3/1) 0,8674 0,1797 4,1876 -0,1422 0,8032 -0,177 0,8595
Floor(4/1) 1,1466 0,2761 4,7607 0,1368 0,7263 0,1883 0,8506
Meal (2/1) 1,2678 0,5957 2,6986 0,2373 0,3854 0,6157 0,5381
Protein(Yes/No) 2,9032 1,4871 5,6678 1,0658 0,3413 3,1225 0,0018
Sex (2/1) 1,1383 0,2827 4,5827 0,1295 0,7106 0,1822 0,8554
Convergence: Converged
Iterations: 5
-2 * Log-Likelihood: 346,0200
Test Statistic D.F. P-Value
Score 17,1727 7 0,0163
Likelihood Ratio 15,4889 7 0,0302

More Related Content

Similar to Logistic regression1

epiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptepiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptssuser031f35
 
Looking at data
Looking at dataLooking at data
Looking at datapcalabri
 
Kshivets O. Esophagogastric Cancer Surgery
Kshivets O. Esophagogastric Cancer SurgeryKshivets O. Esophagogastric Cancer Surgery
Kshivets O. Esophagogastric Cancer SurgeryOleg Kshivets
 
Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxspoonerneddy
 
Chapter 4Summarizing Data Collected in the Sample.docx
Chapter 4Summarizing Data Collected in the Sample.docxChapter 4Summarizing Data Collected in the Sample.docx
Chapter 4Summarizing Data Collected in the Sample.docxketurahhazelhurst
 
Missing value imputation (slide)
Missing value imputation (slide)Missing value imputation (slide)
Missing value imputation (slide)KyusonLim
 
Burden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleBurden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleRobert Simons
 
Burden of Proof Proof of Principle
Burden of Proof Proof of PrincipleBurden of Proof Proof of Principle
Burden of Proof Proof of PrincipleRobert Simons
 
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...dhaine
 
Unemployment in America
Unemployment in AmericaUnemployment in America
Unemployment in AmericaJoseph Reiter
 
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...John Blue
 
Measurement of disease frequency
Measurement of disease frequencyMeasurement of disease frequency
Measurement of disease frequencyEhealthMoHS
 
Kshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryKshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryOleg Kshivets
 
ACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAflasco_org
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analysesMike LaValley
 

Similar to Logistic regression1 (20)

epiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.pptepiet-22- Logistic regression 2006-1.ppt
epiet-22- Logistic regression 2006-1.ppt
 
Looking at data
Looking at dataLooking at data
Looking at data
 
Kshivets O. Esophagogastric Cancer Surgery
Kshivets O. Esophagogastric Cancer SurgeryKshivets O. Esophagogastric Cancer Surgery
Kshivets O. Esophagogastric Cancer Surgery
 
Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docx
 
Chapter 4Summarizing Data Collected in the Sample.docx
Chapter 4Summarizing Data Collected in the Sample.docxChapter 4Summarizing Data Collected in the Sample.docx
Chapter 4Summarizing Data Collected in the Sample.docx
 
Missing value imputation (slide)
Missing value imputation (slide)Missing value imputation (slide)
Missing value imputation (slide)
 
AML: improving standard therapy
AML: improving standard therapyAML: improving standard therapy
AML: improving standard therapy
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Burden of Proof, Proof of Principle
Burden of Proof, Proof of PrincipleBurden of Proof, Proof of Principle
Burden of Proof, Proof of Principle
 
Burden of Proof Proof of Principle
Burden of Proof Proof of PrincipleBurden of Proof Proof of Principle
Burden of Proof Proof of Principle
 
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
Sampling Strategies to Control Misclassification Bias in Longitudinal Udder H...
 
Distributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
Distributed Multi-Level Matrix Completion for Medical Databases by Julie JosseDistributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
Distributed Multi-Level Matrix Completion for Medical Databases by Julie Josse
 
Unemployment in America
Unemployment in AmericaUnemployment in America
Unemployment in America
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...
Dr. Dave Rosero - Influence of Wean Age and Disease Challenge on Progeny Life...
 
Measurement of disease frequency
Measurement of disease frequencyMeasurement of disease frequency
Measurement of disease frequency
 
Kshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse SurgeryKshivets O. Gastric Cancer Relapse Surgery
Kshivets O. Gastric Cancer Relapse Surgery
 
ACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIAACUTE MYELOID LEUKEMIA
ACUTE MYELOID LEUKEMIA
 
Drug-Eluting Stents for Multivessel PCI
Drug-Eluting Stents for Multivessel PCIDrug-Eluting Stents for Multivessel PCI
Drug-Eluting Stents for Multivessel PCI
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
 

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Recently uploaded (20)

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Logistic regression1

  • 1. Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren
  • 2. Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data OC MI Controls OR Yes 693 320 4.8 No 307 680 Ref. Total 1000 1000
  • 3. Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data Smoking MI Controls OR Yes 700 500 2.3 No 300 500 Ref. Total 1000 1000
  • 4. Smokers OC MI Controls OR Yes 517 160 6.0 No 183 340 Ref. Total 700 500 Nonsmokers OC MI Controls OR Yes 176 160 3.0 No 124 340 Ref. Total 300 500 Odds ratio for OC adjusted for smoking = 4 .5
  • 5. Number of cases One case 18 19 20 21 22 23 24 25 26 2717161513 14 0 5 10 Days Cases of gastroenteritis among residents of a nursing home, by date of onset, Pennsylvania, October 1986
  • 6. Protein Total Cases AR% RR suppl. YES 29 22 76 3.3 NO 74 17 23 Total 103 39 38 Cases of gastroenteritis among residents of a nursing home according to protein supplement consumption, Pa, 1986
  • 7. Sex-specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Sex Total Cases AR(%) RR & 95% CI Male 22 5 23 Reference Female 81 34 42 1.8 (0.8-4.2) Total 103 39 38
  • 8. Attack rates of gastroenteritis among residents of a nursing home, by place of meal, Pa, 1986 Meal Total Cases AR(%) RR & 95% CI Dining room 41 12 29 Reference Bedroom 62 27 44 1.5 (0.9-2.6) Total 103 39 38
  • 9. Age – specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Age group Total Cases AR(%) 50-59 1 2 50 60-69 9 2 22 70-79 28 9 32 80-89 45 17 38 90+ 19 10 53 Total 103 39 38
  • 10. Attack rates of gastroenteritis among residents of a nursing home, by floor of residence, Pa, 1986 Floor Total Cases AR (%) One 12 3 25 Two 32 17 53 Three 30 7 23 Four 29 12 41 Total 103 39 38
  • 11. Multivariate analysis • Multiple models – Linear regression – Logistic regression – Cox model – Poisson regression – Loglinear model – Discriminant analysis – ...... • Choice of the tool according to the objectives, the study, and the variables
  • 12. Simple linear regression Age SBP Age SBP Age SBP 22 131 41 139 52 128 23 128 41 171 54 105 24 116 46 137 56 145 27 106 47 111 57 141 28 114 48 115 58 153 29 123 49 133 59 157 30 117 49 128 63 155 32 122 50 183 67 176 33 99 51 130 71 172 35 121 51 133 77 178 40 147 51 144 81 217 Table 1 Age and systolic blood pressure (SBP) among 33 adult women
  • 13. 80 100 120 140 160 180 200 220 20 30 40 50 60 70 80 90 SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974 Age1.22281.54SBP 
  • 14. Simple linear regression • Relation between 2 continuous variables (SBP and age) • Regression coefficient b1 – Measures association between y and x – Amount by which y changes on average when x changes by one unit – Least squares method y x xβαy 11Slope
  • 15. Multiple linear regression • Relation between a continuous variable and a set of i continuous variables • Partial regression coefficients bi – Amount by which y changes on average when xi changes by one unit and all the other xis remain constant – Measures association between xi and y adjusted for all other xi • Example – SBP versus age, weight, height, etc xβ...xβxβαy ii2211 
  • 16. Multiple linear regression Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables Dependent Independent variables xβ...xβxβαy ii2211 
  • 17. Logistic regression (1) Age CD Age CD Age CD 22 0 40 0 54 0 23 0 41 1 55 1 24 0 46 0 58 1 27 0 47 0 60 1 28 0 48 0 60 0 30 0 49 1 62 1 30 0 49 0 65 1 32 0 50 1 67 1 33 0 51 0 71 1 35 1 51 1 77 1 38 0 52 0 81 1 Table 2 Age and signs of coronary heart disease (CD)
  • 18. How can we analyse these data? • Compare mean age of diseased and non-diseased – Non-diseased: 38.6 years – Diseased: 58.7 years (p<0.0001) • Linear regression?
  • 19. Dot-plot: Data from Table 2 AGE(years) Signsofcoronarydisease No Yes 0 20 40 60 80 100
  • 20. Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group Diseased Age group # in group # % 20 - 29 5 0 0 30 - 39 6 1 17 40 - 49 7 2 29 50 - 59 7 4 57 60 - 69 5 4 80 70 - 79 2 2 100 80 - 89 1 1 100
  • 21. Dot-plot: Data from Table 3 0 20 40 60 80 100 0 2 4 6 8 Diseased % Age group
  • 22. Logistic function (1) 0.0 0.2 0.4 0.6 0.8 1.0 Probability of disease x P y x e e x x ( )      b  b 1
  • 23. ln ( ) ( ) P y x P y x x 1         b Transformation logit of P(y|x) {P y x e e x x ( )      b  b 1  = log odds of disease in unexposed b = log odds ratio associated with being exposed e b = odds ratio )( )( xyP xyP 1
  • 24. Fitting equation to the data • Linear regression: Least squares • Logistic regression: Maximum likelihood • Likelihood function – Estimates parameters  and b – Practically easier to work with log-likelihood         n i iiii xyxylL 1 )(1ln)1()(ln)(ln)( 
  • 25. Maximum likelihood • Iterative computing – Choice of an arbitrary value for the coefficients (usually 0) – Computing of log-likelihood – Variation of coefficients’ values – Reiteration until maximisation (plateau) • Results – Maximum Likelihood Estimates (MLE) for  and b – Estimates of P(y) for a given value of x
  • 26. Multiple logistic regression • More than one independent variable – Dichotomous, ordinal, nominal, continuous … • Interpretation of bi – Increase in log-odds for a one unit increase in xi with all the other xis constant – Measures association between xi and log-odds adjusted for all other xi ii2211 xβ...xβxβα P-1 P ln      
  • 27. Statistical testing • Question – Does model including given independent variable provide more information about dependent variable than model without this variable? • Three tests – Likelihood ratio statistic (LRS) – Wald test – Score test
  • 28. Likelihood ratio statistic • Compares two nested models Log(odds) =  + b1x1 + b2x2 + b3x3 (model 1) Log(odds) =  + b1x1 + b2x2 (model 2) • LR statistic -2 log (likelihood model 2 / likelihood model 1) = -2 log (likelihood model 2) minus -2log (likelihood model 1) LR statistic is a 2 with DF = number of extra parameters in model
  • 29. Coding of variables (2) • Nominal variables or ordinal with unequal classes: – Tobacco smoked: no=0, grey=1, brown=2, blond=3 – Model assumes that OR for blond tobacco = OR for grey tobacco3 – Use indicator variables (dummy variables)
  • 30. Indicator variables: Type of tobacco • Neutralises artificial hierarchy between classes in the variable "type of tobacco" • No assumptions made • 3 variables (3 df) in model using same reference • OR for each type of tobacco adjusted for the others in reference to non-smoking Dummy variablesTobacco consumption Grey Brown Blond Blond 0 0 1 Brown 0 1 0 Grey 1 0 0 None 0 0 0
  • 31. Reference • Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989
  • 33. Salmonella enteritidis Protein supplement S. Enteritidis gastroenteritis Sex Floor Age Place of meal Blended diet
  • 34. •Unconditional Logistic Regression Term Odds Ratio 95% C.I. Coef. S. E. Z- Statistic P- Value AGG (2/1) 1,6795 0,2634 10,7082 0,5185 0,9452 0,5486 0,5833 AGG (3/1) 1,7570 0,3249 9,5022 0,5636 0,8612 0,6545 0,5128 Blended (Yes/No) 1,0345 0,3277 3,2660 0,0339 0,5866 0,0578 0,9539 Floor (2/1) 1,6126 0,2675 9,7220 0,4778 0,9166 0,5213 0,6022 Floor (3/1) 0,7291 0,0991 5,3668 -0,3159 1,0185 -0,3102 0,7564 Floor (4/1) 1,1137 0,1573 7,8870 0,1076 0,9988 0,1078 0,9142 Meal 1,5942 0,4953 5,1317 0,4664 0,5965 0,7819 0,4343 Protein (Yes/No) 9,0918 3,0219 27,3533 2,2074 0,5620 3,9278 0,0001 Sex 1,3024 0,2278 7,4468 0,2642 0,8896 0,2970 0,7665 CONSTANT * * * -3,0080 2,0559 -1,4631 0,1434
  • 35. •Unconditional Logistic Regression Term Odds Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value Age 1,0234 0,9660 1,0842 0,0231 0,0294 0,7848 0,4326 Blended (Yes/No) 1,0184 0,3220 3,2207 0,0183 0,5874 0,0311 0,9752 Floor (2/1) 1,6440 0,2745 9,8468 0,4971 0,9133 0,5443 0,5862 Floor (3/1) 0,7132 0,0972 5,2321 -0,3379 1,0167 -0,3324 0,7396 Floor (4/1) 1,0708 0,1522 7,5322 0,0684 0,9953 0,0687 0,9452 Meal 1,6561 0,5236 5,2379 0,5045 0,5875 0,8587 0,3905 Protein (Yes/No) 8,7678 2,9521 26,0403 2,1711 0,5554 3,9091 0,0001 Sex 1,1957 0,2135 6,6981 0,1787 0,8791 0,2033 0,8389 CONSTANT * * * -4,2896 2,8908 -1,4839 0,1378
  • 36. Logistic Regression Model Summary Statistics Value DF p-value Deviance 107,9814 95 Likelihood ratio test 34,8068 8 < 0.001 Parameter Estimates 95% C.I. Terms Coefficient Std.Error p-value OR Lower Upper %GM -1,8857 1,0420 0,0703 0,1517 0,0197 1,1695 SEX ='2' 0,2139 0,8812 0,8082 1,2385 0,2202 6,9662 FLOOR ='2' 0,4987 0,9083 0,5829 1,6466 0,2776 9,7659 ²FLOOR ='3' -0,3235 1,0150 0,7500 0,7236 0,0990 5,2909 FLOOR ='4' 0,1088 0,9839 0,9119 1,1150 0,1621 7,6698 MEAL ='2' 0,5308 0,5613 0,3443 1,7002 0,5659 5,1081 Protein ='1' 2,1809 0,5303 < 0.001 8,8541 3,1316 25,034 TWOAGG ='2' 0,1904 0,5162 0,7122 1,2098 0,4399 3,3272 Termwise Wald Test Term Wald Stat. DF p-value FLOOR 1,0812 3 0,7816
  • 37. Poisson Regression Model Summary Statistics Value DF p-value Deviance 60,2622 95 Likelihood ratio test 67,7378 8 < 0.001 Parameter Estimates 95% C.I. Terms Coefficient Std.Error p-value RR Lower Upper %GM -1,8213 0,8446 0,0310 0,1618 0,0309 0,8471 SEX ='2' 0,1295 0,7106 0,8554 1,1383 0,2827 4,5828 FLOOR ='2' 0,2503 0,6867 0,7154 1,2844 0,3344 4,9343 FLOOR ='3' -0,1422 0,8032 0,8595 0,8674 0,1797 4,1877 FLOOR ='4' 0,1368 0,7263 0,8506 1,1466 0,2761 4,7608 MEAL ='2' 0,2373 0,3854 0,5381 1,2678 0,5956 2,6987 Protein ='1' 1,0658 0,3413 0,0018 2,9032 1,4871 5,6679 TWOAGG ='2' 0,0645 0,3682 0,8611 1,0666 0,5182 2,1951 Termwise Wald Test Term Wald Stat. DF p-value FLOOR 0,4178 3 0,9365
  • 38. Cox Proportional Hazards Term Hazard Ratio 95% C.I. Coefficient S. E. Z-Statistic P-Value _AGG (2/1) 1,0666 0,5183 2,195 0,0645 0,3682 0,175 0,8611 Floor(2/1) 1,2844 0,3344 4,9342 0,2503 0,6867 0,3646 0,7154 Floor(3/1) 0,8674 0,1797 4,1876 -0,1422 0,8032 -0,177 0,8595 Floor(4/1) 1,1466 0,2761 4,7607 0,1368 0,7263 0,1883 0,8506 Meal (2/1) 1,2678 0,5957 2,6986 0,2373 0,3854 0,6157 0,5381 Protein(Yes/No) 2,9032 1,4871 5,6678 1,0658 0,3413 3,1225 0,0018 Sex (2/1) 1,1383 0,2827 4,5827 0,1295 0,7106 0,1822 0,8554 Convergence: Converged Iterations: 5 -2 * Log-Likelihood: 346,0200 Test Statistic D.F. P-Value Score 17,1727 7 0,0163 Likelihood Ratio 15,4889 7 0,0302