Logistic RegressionLogistic Regression
Dr Mike BlythDr Mike Blyth
February 2006February 2006
Logistic RegressionLogistic Regression
A way to look at effect ofA way to look at effect of
– ““Numeric” (interval or ratio) independentNumeric” (interval or ratio) independent
variablevariable
OnOn
– BinaryBinary (yes-no) dependent variable(yes-no) dependent variable
Dependent variable is continuousDependent variable is continuous intervalinterval oror
ratioratio (numeric)(numeric)
Independent variables are also interval orIndependent variables are also interval or
ratioratio
ExamplesExamples
– Effect of weight on blood pressureEffect of weight on blood pressure
– Effect of drug dose on reticulocyte countEffect of drug dose on reticulocyte count
Review Linear RegressionReview Linear Regression
Linear RegressionLinear Regression
Independent Variable Dependent Variable
Logistic RegressionLogistic Regression
Independent Variable Dependent Variable
Logistic RegressionLogistic Regression
Dependent variable is binary (yes/no) outcome.Dependent variable is binary (yes/no) outcome.
Independent variables are continuous intervalIndependent variables are continuous interval
Examples:Examples:
– Relation of weight and BP to 10 year risk of deathRelation of weight and BP to 10 year risk of death
– Relation of CD4 count to 1 year risk of AIDS diagnosisRelation of CD4 count to 1 year risk of AIDS diagnosis
Why do we need it?Why do we need it?
Could use categorical analysis such as frequency tableCould use categorical analysis such as frequency table
AIDSAIDS No AIDSNo AIDS
CD4 > 350CD4 > 350 8080 2020
150 < CD4 < 350150 < CD4 < 350 5050 5050
CD4 < 150CD4 < 150 2020 8080
• Problems
a) some information is lost when we collapse the
numeric data into categories. This leads to loss
of power.
b) no estimate of magnitude of relation
Odds RatioOdds Ratio
Probability:Probability:
p = probability of eventp = probability of event
1 - p = probabilty of1 - p = probabilty of notnot the event (also called q)the event (also called q)
p varies from 0 to 1p varies from 0 to 1
OddsOdds
– Ratio of probability of event to probability of notRatio of probability of event to probability of not
having the event: Odds = p/(1 - p)having the event: Odds = p/(1 - p)
– When p = 0.5, odds = 1 (or “1:1 odds”)When p = 0.5, odds = 1 (or “1:1 odds”)
– When p = 0.1, odds = 0.1/0.9 = 0.11When p = 0.1, odds = 0.1/0.9 = 0.11
Log Odds RatioLog Odds Ratio
The log odds ratio (also called “logit”) is simply the naturalThe log odds ratio (also called “logit”) is simply the natural
logarithm of the odds ratio:logarithm of the odds ratio:
¤ logitlogit = ln(odds ratio)= ln(odds ratio)
= ln(p/(1-p))= ln(p/(1-p))
= ln(p) – ln(1-p)= ln(p) – ln(1-p)
ln (1) = 0, so logit is 0 when odds are 1:1, orln (1) = 0, so logit is 0 when odds are 1:1, or
probability = 50%probability = 50%
The logit for event of probability p is the opposite of the logitThe logit for event of probability p is the opposite of the logit
for the probability of not having the event.for the probability of not having the event.
Relation between probability p and logit
0.000
0.250
0.500
0.750
1.000
-8 -6 -4 -2 0 2 4 6 8
logit = ln[p/(1-p)]
Logistic regression modelLogistic regression model
The linear regression model with one variableThe linear regression model with one variable
isis
y = a + bx + ey = a + bx + e
The logistic regression model with oneThe logistic regression model with one
variable isvariable is
logit = a + bx + elogit = a + bx + e
wherewhere
logit = ln(p/(1-p))logit = ln(p/(1-p))
The logistic regression model with oneThe logistic regression model with one
variable isvariable is
logit = a + bxlogit = a + bx where logit = ln(p/(1-p))where logit = ln(p/(1-p))
In other words, the model says the odds of the eventIn other words, the model says the odds of the event
happening arehappening are
– A constant factor (a)A constant factor (a)
– Some other constant (b)Some other constant (b)
– times a numeric risk factor (x) (for example, SBP)times a numeric risk factor (x) (for example, SBP)
Logistic regression modelLogistic regression model
Logistic regression modelLogistic regression model
Given value of the independent variables, theGiven value of the independent variables, the
regression equation predicts theregression equation predicts the
Log Odds RatioLog Odds Ratio
Logistic regression modelLogistic regression model
The statistics program calculates theThe statistics program calculates the
coefficient bcoefficient b
TheThe coefficient bcoefficient b shows how much the oddsshows how much the odds
ratio changes with a change in theratio changes with a change in the
independent variableindependent variable
Positive bPositive b  higher risk with higher valueshigher risk with higher values
Negative bNegative b  lower risk with higher valueslower risk with higher values
Logistic regression modelLogistic regression model
Hypothetical example given above examining relation of BP toHypothetical example given above examining relation of BP to
risk of stroke/death. The model predicts:risk of stroke/death. The model predicts:
ln(odds ratio) = constant + bln(odds ratio) = constant + b ∙ SBPSBP
ee(lnoddsratio)(lnoddsratio)
= e= e(c+b(c+b∙ SBP)SBP)
Odds RatioOdds Ratio == ee(c+b(c+b∙SBP)SBP)
== eecc
∙ ee(b(b∙SBP)SBP)
Logistic regression modelLogistic regression model
The coefficient b shows how much the odds ratioThe coefficient b shows how much the odds ratio
changes with a change in the independent variablechanges with a change in the independent variable
Odds RatioOdds Ratio == eecc
∙ ee(bx)(bx)
In other words,In other words,
Odds RatioOdds Ratio == somethingsomething ∙ (e(ebb
))(x)(x)
Logistic regression modelLogistic regression model
Odds RatioOdds Ratio = constant= constant ∙ ((eebb
))(x)(x)
SoSo eebb
is the factor indicating effect of x on theis the factor indicating effect of x on the
event.event.
Each one unit change in x will multiply the oddsEach one unit change in x will multiply the odds
ratio by a factor of eratio by a factor of ebb
..
Logistic regression modelLogistic regression model
Odds RatioOdds Ratio = constant= constant ∙ ((eebb
))(x)(x)
– Suppose b = 0.693 so eSuppose b = 0.693 so ebb
= 2= 2
– A one-unit change in x willA one-unit change in x will doubledouble the odds ratiothe odds ratio
– Suppose b = -0.693 so eSuppose b = -0.693 so ebb
= 0.5= 0.5
– A one-unit change in x willA one-unit change in x will halvehalve the odds ratio.the odds ratio.
– If b = 0, eIf b = 0, ebb
= 1, and x has no effect on OR= 1, and x has no effect on OR
Logistic regression modelLogistic regression model
For the hypothetical example above, the report isFor the hypothetical example above, the report is
given by Epi Info asgiven by Epi Info as
TermTerm OddsOdds
RatioRatio
95% CI95% CI CoeffCoeff S. E.S. E. ZZ PP
BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.01850.0185 3.1313.131 0.00170.0017
ConstConst ** ** ** -7.201-7.201 2.29942.2994 3.1313.131 0.00170.0017
Logistic regression modelLogistic regression model
TermTerm Odds RatioOdds Ratio 95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value
BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.0180.018 3.1313.131 0.00170.0017
ConstantConstant ** ** ** -7.2014-7.2014 2.2992.299 3.1313.131 0.00170.0017
Coefficient, or beta, or b, is the slope or magnitude
of the effect.
Logistic regression modelLogistic regression model
TermTerm OddsOdds
RatioRatio
95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value
BPBP 1.05971.0597 1.02201.0220 1.09871.0987 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017
ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017
Odds ratio for one unit change in the
independent variable (e.g. BP). This is the
calculated eb
eb
A one unit change in BP multiplies the odds ratio by
1.0597.
Logistic regression modelLogistic regression model
TermTerm Odds RatioOdds Ratio 95% CI95% CI CoeffCoeff S. E.S. E. ZZ P-valueP-value
BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017
ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017
95% confidence interval for that odds ratio.
The confidence interval does not include 1, so the
effect is statistically significant
Using more than one independentUsing more than one independent
variablevariable
Single variable:Single variable:
logit = c + bxlogit = c + bx
OR = c’ ∙ (eOR = c’ ∙ (ebb
))xx
Multiple variables:Multiple variables:
logit = c + blogit = c + b11xx11 + b+ b22xx22 + … + b+ … + bnnxxnn
OR = c’ ∙ (eOR = c’ ∙ (eb1b1
))x1x1
∙ (e∙ (eb2b2
))x2x2
∙ … ∙ (e∙ … ∙ (ebnbn
))xnxn
Note that the termsNote that the terms multiplymultiply their effect ontheir effect on
odds ratio.odds ratio.
Using more than one independentUsing more than one independent
variablevariable
Analysis reports a b coefficient for eachAnalysis reports a b coefficient for each
independent variable.independent variable.
That coefficient is the effect of the givenThat coefficient is the effect of the given
independent variable, separated from theindependent variable, separated from the
effects of all the other independent variables.effects of all the other independent variables.
Real Life ExampleReal Life Example
Prospective cohort study of causes ofProspective cohort study of causes of
cardiac disease: Evans County Study 1965cardiac disease: Evans County Study 1965
Independent variables = age, gender,Independent variables = age, gender,
race, social index, SBP, diabetes, smoking,race, social index, SBP, diabetes, smoking,
cholesterol, and an obesity indexcholesterol, and an obesity index
Dependent variable = risk of dying duringDependent variable = risk of dying during
10 year period10 year period
VariableVariable RangeRange b coeffb coeff SESE pp
ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001
AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001
GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011
Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160
(Soc ind)(Soc ind)22
400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082
SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001
DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001
SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043
CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041
QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014
(Quartlet)(Quartlet)22
4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022
Cited in Kelsey et al., Methods in Observational Epidemiology, 1986
VariableVariable RangeRange b coeffb coeff SESE pp
ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001
AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001
GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011
Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160
(Soc ind)(Soc ind)22
400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082
SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001
DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001
SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043
CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041
QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014
(Quartlet)(Quartlet)22
4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022
Statistical SignificanceStatistical Significance
The p value indicates statistical significanceThe p value indicates statistical significance
Age is positively correlated with risk of deathAge is positively correlated with risk of death
Gender has positive b coefficient, but the p valueGender has positive b coefficient, but the p value
is 0.12, indicating that we cannot say that there isis 0.12, indicating that we cannot say that there is
a significant relationship.a significant relationship.
VariableVariable RangeRange b coeffb coeff SESE pp
AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001
GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
Dichotomous (yes-no) variablesDichotomous (yes-no) variables
Gender is coded as 0 for male, 1 for femaleGender is coded as 0 for male, 1 for female
eebb
[e[e1.51.5
= 4.48] is change in OR for 1 unit change in gender,= 4.48] is change in OR for 1 unit change in gender,
i.e. OR for females relative to malesi.e. OR for females relative to males
eebb
for any dummy variable (coded 0-1) is the adjustedfor any dummy variable (coded 0-1) is the adjusted
OR for that risk factor, since “1 unit of change” =OR for that risk factor, since “1 unit of change” =
presence vs. absence of risk factorpresence vs. absence of risk factor
VariableVariable RangeRange b coeffb coeff SESE pp
ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001
AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001
GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
Squared termsSquared terms
Social index squared is included as well asSocial index squared is included as well as
social index itself.social index itself.
Squared terms allow for curvilinearSquared terms allow for curvilinear
relationships, just as in ordinaryrelationships, just as in ordinary
regressionregression
VariableVariable RangeRange b coeffb coeff SESE pp
Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011
Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160
(Soc ind)(Soc ind)22
400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082
Interaction termsInteraction terms
Age and gender are entered into model asAge and gender are entered into model as
separate termsseparate terms
Age x gender included to see whether ageAge x gender included to see whether age
has different effect in males than inhas different effect in males than in
females.females.
VariableVariable RangeRange b coeffb coeff SESE pp
AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001
GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
Age x genderAge x gender M: 0-0M: 0-0
F: 40-69F: 40-69
-0.043-0.043 0.0170.017 0.0110.011
InterpretationInterpretation
With binary, dummy variables, eWith binary, dummy variables, ebb
is the odds ratio.is the odds ratio.
You can compare the strength (slope) of the effectYou can compare the strength (slope) of the effect
by comparing b.by comparing b.
With numeric variables, b is not a direct measure ofWith numeric variables, b is not a direct measure of
strength of effect.strength of effect.
– Example: b is quite small in effect of BP on mortality,Example: b is quite small in effect of BP on mortality,
because it is the effect of onlybecause it is the effect of only one mmHgone mmHg change in BP. BPchange in BP. BP
is still an important factor in mortality because there is ais still an important factor in mortality because there is a
widewide rangerange in the BP.in the BP.
InterpretationInterpretation
In a prospective cohort study we can useIn a prospective cohort study we can use
logistic regression model to predictlogistic regression model to predict probabilityprobability
of the event given the independent variables.of the event given the independent variables.
Also can derive relative risk.Also can derive relative risk.
In a cross sectional study we only have theIn a cross sectional study we only have the
odds ratio.odds ratio.
Selection of variablesSelection of variables
Same principle as with ordinary regressionSame principle as with ordinary regression
Forward selection: add one variable at a timeForward selection: add one variable at a time
until there are no more that make a significantuntil there are no more that make a significant
differencedifference
Backward selection: start with all, remove oneBackward selection: start with all, remove one
at a time to see if they made a significantat a time to see if they made a significant
contributioncontribution
EPI Info has suggestions on how to do thisEPI Info has suggestions on how to do this

Logistic regression (blyth 2006) (simplified)

  • 1.
    Logistic RegressionLogistic Regression DrMike BlythDr Mike Blyth February 2006February 2006
  • 2.
    Logistic RegressionLogistic Regression Away to look at effect ofA way to look at effect of – ““Numeric” (interval or ratio) independentNumeric” (interval or ratio) independent variablevariable OnOn – BinaryBinary (yes-no) dependent variable(yes-no) dependent variable
  • 3.
    Dependent variable iscontinuousDependent variable is continuous intervalinterval oror ratioratio (numeric)(numeric) Independent variables are also interval orIndependent variables are also interval or ratioratio ExamplesExamples – Effect of weight on blood pressureEffect of weight on blood pressure – Effect of drug dose on reticulocyte countEffect of drug dose on reticulocyte count Review Linear RegressionReview Linear Regression
  • 4.
  • 5.
  • 6.
    Logistic RegressionLogistic Regression Dependentvariable is binary (yes/no) outcome.Dependent variable is binary (yes/no) outcome. Independent variables are continuous intervalIndependent variables are continuous interval Examples:Examples: – Relation of weight and BP to 10 year risk of deathRelation of weight and BP to 10 year risk of death – Relation of CD4 count to 1 year risk of AIDS diagnosisRelation of CD4 count to 1 year risk of AIDS diagnosis
  • 7.
    Why do weneed it?Why do we need it? Could use categorical analysis such as frequency tableCould use categorical analysis such as frequency table AIDSAIDS No AIDSNo AIDS CD4 > 350CD4 > 350 8080 2020 150 < CD4 < 350150 < CD4 < 350 5050 5050 CD4 < 150CD4 < 150 2020 8080 • Problems a) some information is lost when we collapse the numeric data into categories. This leads to loss of power. b) no estimate of magnitude of relation
  • 8.
    Odds RatioOdds Ratio Probability:Probability: p= probability of eventp = probability of event 1 - p = probabilty of1 - p = probabilty of notnot the event (also called q)the event (also called q) p varies from 0 to 1p varies from 0 to 1 OddsOdds – Ratio of probability of event to probability of notRatio of probability of event to probability of not having the event: Odds = p/(1 - p)having the event: Odds = p/(1 - p) – When p = 0.5, odds = 1 (or “1:1 odds”)When p = 0.5, odds = 1 (or “1:1 odds”) – When p = 0.1, odds = 0.1/0.9 = 0.11When p = 0.1, odds = 0.1/0.9 = 0.11
  • 9.
    Log Odds RatioLogOdds Ratio The log odds ratio (also called “logit”) is simply the naturalThe log odds ratio (also called “logit”) is simply the natural logarithm of the odds ratio:logarithm of the odds ratio: ¤ logitlogit = ln(odds ratio)= ln(odds ratio) = ln(p/(1-p))= ln(p/(1-p)) = ln(p) – ln(1-p)= ln(p) – ln(1-p) ln (1) = 0, so logit is 0 when odds are 1:1, orln (1) = 0, so logit is 0 when odds are 1:1, or probability = 50%probability = 50% The logit for event of probability p is the opposite of the logitThe logit for event of probability p is the opposite of the logit for the probability of not having the event.for the probability of not having the event.
  • 10.
    Relation between probabilityp and logit 0.000 0.250 0.500 0.750 1.000 -8 -6 -4 -2 0 2 4 6 8 logit = ln[p/(1-p)]
  • 11.
    Logistic regression modelLogisticregression model The linear regression model with one variableThe linear regression model with one variable isis y = a + bx + ey = a + bx + e The logistic regression model with oneThe logistic regression model with one variable isvariable is logit = a + bx + elogit = a + bx + e wherewhere logit = ln(p/(1-p))logit = ln(p/(1-p))
  • 12.
    The logistic regressionmodel with oneThe logistic regression model with one variable isvariable is logit = a + bxlogit = a + bx where logit = ln(p/(1-p))where logit = ln(p/(1-p)) In other words, the model says the odds of the eventIn other words, the model says the odds of the event happening arehappening are – A constant factor (a)A constant factor (a) – Some other constant (b)Some other constant (b) – times a numeric risk factor (x) (for example, SBP)times a numeric risk factor (x) (for example, SBP) Logistic regression modelLogistic regression model
  • 13.
    Logistic regression modelLogisticregression model Given value of the independent variables, theGiven value of the independent variables, the regression equation predicts theregression equation predicts the Log Odds RatioLog Odds Ratio
  • 14.
    Logistic regression modelLogisticregression model The statistics program calculates theThe statistics program calculates the coefficient bcoefficient b TheThe coefficient bcoefficient b shows how much the oddsshows how much the odds ratio changes with a change in theratio changes with a change in the independent variableindependent variable Positive bPositive b  higher risk with higher valueshigher risk with higher values Negative bNegative b  lower risk with higher valueslower risk with higher values
  • 15.
    Logistic regression modelLogisticregression model Hypothetical example given above examining relation of BP toHypothetical example given above examining relation of BP to risk of stroke/death. The model predicts:risk of stroke/death. The model predicts: ln(odds ratio) = constant + bln(odds ratio) = constant + b ∙ SBPSBP ee(lnoddsratio)(lnoddsratio) = e= e(c+b(c+b∙ SBP)SBP) Odds RatioOdds Ratio == ee(c+b(c+b∙SBP)SBP) == eecc ∙ ee(b(b∙SBP)SBP)
  • 16.
    Logistic regression modelLogisticregression model The coefficient b shows how much the odds ratioThe coefficient b shows how much the odds ratio changes with a change in the independent variablechanges with a change in the independent variable Odds RatioOdds Ratio == eecc ∙ ee(bx)(bx) In other words,In other words, Odds RatioOdds Ratio == somethingsomething ∙ (e(ebb ))(x)(x)
  • 17.
    Logistic regression modelLogisticregression model Odds RatioOdds Ratio = constant= constant ∙ ((eebb ))(x)(x) SoSo eebb is the factor indicating effect of x on theis the factor indicating effect of x on the event.event. Each one unit change in x will multiply the oddsEach one unit change in x will multiply the odds ratio by a factor of eratio by a factor of ebb ..
  • 18.
    Logistic regression modelLogisticregression model Odds RatioOdds Ratio = constant= constant ∙ ((eebb ))(x)(x) – Suppose b = 0.693 so eSuppose b = 0.693 so ebb = 2= 2 – A one-unit change in x willA one-unit change in x will doubledouble the odds ratiothe odds ratio – Suppose b = -0.693 so eSuppose b = -0.693 so ebb = 0.5= 0.5 – A one-unit change in x willA one-unit change in x will halvehalve the odds ratio.the odds ratio. – If b = 0, eIf b = 0, ebb = 1, and x has no effect on OR= 1, and x has no effect on OR
  • 19.
    Logistic regression modelLogisticregression model For the hypothetical example above, the report isFor the hypothetical example above, the report is given by Epi Info asgiven by Epi Info as TermTerm OddsOdds RatioRatio 95% CI95% CI CoeffCoeff S. E.S. E. ZZ PP BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.01850.0185 3.1313.131 0.00170.0017 ConstConst ** ** ** -7.201-7.201 2.29942.2994 3.1313.131 0.00170.0017
  • 20.
    Logistic regression modelLogisticregression model TermTerm Odds RatioOdds Ratio 95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.0180.018 3.1313.131 0.00170.0017 ConstantConstant ** ** ** -7.2014-7.2014 2.2992.299 3.1313.131 0.00170.0017 Coefficient, or beta, or b, is the slope or magnitude of the effect.
  • 21.
    Logistic regression modelLogisticregression model TermTerm OddsOdds RatioRatio 95% CI95% CI CoefficientCoefficient S. E.S. E. ZZ P-valueP-value BPBP 1.05971.0597 1.02201.0220 1.09871.0987 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017 ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017 Odds ratio for one unit change in the independent variable (e.g. BP). This is the calculated eb eb A one unit change in BP multiplies the odds ratio by 1.0597.
  • 22.
    Logistic regression modelLogisticregression model TermTerm Odds RatioOdds Ratio 95% CI95% CI CoeffCoeff S. E.S. E. ZZ P-valueP-value BPBP 1.05971.0597 1.0221.022 1.0981.098 0.05790.0579 0.01850.0185 3.13193.1319 0.00170.0017 ConstantConstant ** ** ** -7.2014-7.2014 2.29942.2994 3.13193.1319 0.00170.0017 95% confidence interval for that odds ratio. The confidence interval does not include 1, so the effect is statistically significant
  • 23.
    Using more thanone independentUsing more than one independent variablevariable Single variable:Single variable: logit = c + bxlogit = c + bx OR = c’ ∙ (eOR = c’ ∙ (ebb ))xx Multiple variables:Multiple variables: logit = c + blogit = c + b11xx11 + b+ b22xx22 + … + b+ … + bnnxxnn OR = c’ ∙ (eOR = c’ ∙ (eb1b1 ))x1x1 ∙ (e∙ (eb2b2 ))x2x2 ∙ … ∙ (e∙ … ∙ (ebnbn ))xnxn Note that the termsNote that the terms multiplymultiply their effect ontheir effect on odds ratio.odds ratio.
  • 24.
    Using more thanone independentUsing more than one independent variablevariable Analysis reports a b coefficient for eachAnalysis reports a b coefficient for each independent variable.independent variable. That coefficient is the effect of the givenThat coefficient is the effect of the given independent variable, separated from theindependent variable, separated from the effects of all the other independent variables.effects of all the other independent variables.
  • 25.
    Real Life ExampleRealLife Example Prospective cohort study of causes ofProspective cohort study of causes of cardiac disease: Evans County Study 1965cardiac disease: Evans County Study 1965 Independent variables = age, gender,Independent variables = age, gender, race, social index, SBP, diabetes, smoking,race, social index, SBP, diabetes, smoking, cholesterol, and an obesity indexcholesterol, and an obesity index Dependent variable = risk of dying duringDependent variable = risk of dying during 10 year period10 year period
  • 26.
    VariableVariable RangeRange bcoeffb coeff SESE pp ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001 AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001 GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121 Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011 Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160 (Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082 SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001 DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001 SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043 CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041 QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014 (Quartlet)(Quartlet)22 4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022 Cited in Kelsey et al., Methods in Observational Epidemiology, 1986
  • 27.
    VariableVariable RangeRange bcoeffb coeff SESE pp ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001 AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001 GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121 Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011 Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160 (Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082 SBPSBP 88-31088-310 0.0190.019 0.0020.002 <0.001<0.001 DiabetesDiabetes 0=n, 1=y0=n, 1=y 1.1231.123 0.2610.261 <0.001<0.001 SmokingSmoking 0=n, 1=y0=n, 1=y 0.3170.317 0.1570.157 0.0430.043 CholesterolCholesterol 94-54694-546 0.00310.0031 0.00150.0015 0.0410.041 QuartletQuartlet 2.11-8.762.11-8.76 -1.064-1.064 0.4320.432 0.0140.014 (Quartlet)(Quartlet)22 4.44-76.84.44-76.8 0.1120.112 0.0490.049 0.0220.022
  • 28.
    Statistical SignificanceStatistical Significance Thep value indicates statistical significanceThe p value indicates statistical significance Age is positively correlated with risk of deathAge is positively correlated with risk of death Gender has positive b coefficient, but the p valueGender has positive b coefficient, but the p value is 0.12, indicating that we cannot say that there isis 0.12, indicating that we cannot say that there is a significant relationship.a significant relationship. VariableVariable RangeRange b coeffb coeff SESE pp AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001 GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
  • 29.
    Dichotomous (yes-no) variablesDichotomous(yes-no) variables Gender is coded as 0 for male, 1 for femaleGender is coded as 0 for male, 1 for female eebb [e[e1.51.5 = 4.48] is change in OR for 1 unit change in gender,= 4.48] is change in OR for 1 unit change in gender, i.e. OR for females relative to malesi.e. OR for females relative to males eebb for any dummy variable (coded 0-1) is the adjustedfor any dummy variable (coded 0-1) is the adjusted OR for that risk factor, since “1 unit of change” =OR for that risk factor, since “1 unit of change” = presence vs. absence of risk factorpresence vs. absence of risk factor VariableVariable RangeRange b coeffb coeff SESE pp ConstantConstant -6.376-6.376 1.6341.634 <0.001<0.001 AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001 GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121
  • 30.
    Squared termsSquared terms Socialindex squared is included as well asSocial index squared is included as well as social index itself.social index itself. Squared terms allow for curvilinearSquared terms allow for curvilinear relationships, just as in ordinaryrelationships, just as in ordinary regressionregression VariableVariable RangeRange b coeffb coeff SESE pp Age x genderAge x gender -0.043-0.043 0.0170.017 0.0110.011 Social indexSocial index 20-8420-84 -0.056-0.056 0.0400.040 0.1600.160 (Soc ind)(Soc ind)22 400-7056400-7056 0.00060.0006 0.00030.0003 0.0820.082
  • 31.
    Interaction termsInteraction terms Ageand gender are entered into model asAge and gender are entered into model as separate termsseparate terms Age x gender included to see whether ageAge x gender included to see whether age has different effect in males than inhas different effect in males than in females.females. VariableVariable RangeRange b coeffb coeff SESE pp AgeAge 40-69 y40-69 y 0.0860.086 0.1150.115 <0.001<0.001 GenderGender 0=m, 1=f0=m, 1=f 1.5001.500 0.9670.967 0.1210.121 Age x genderAge x gender M: 0-0M: 0-0 F: 40-69F: 40-69 -0.043-0.043 0.0170.017 0.0110.011
  • 32.
    InterpretationInterpretation With binary, dummyvariables, eWith binary, dummy variables, ebb is the odds ratio.is the odds ratio. You can compare the strength (slope) of the effectYou can compare the strength (slope) of the effect by comparing b.by comparing b. With numeric variables, b is not a direct measure ofWith numeric variables, b is not a direct measure of strength of effect.strength of effect. – Example: b is quite small in effect of BP on mortality,Example: b is quite small in effect of BP on mortality, because it is the effect of onlybecause it is the effect of only one mmHgone mmHg change in BP. BPchange in BP. BP is still an important factor in mortality because there is ais still an important factor in mortality because there is a widewide rangerange in the BP.in the BP.
  • 33.
    InterpretationInterpretation In a prospectivecohort study we can useIn a prospective cohort study we can use logistic regression model to predictlogistic regression model to predict probabilityprobability of the event given the independent variables.of the event given the independent variables. Also can derive relative risk.Also can derive relative risk. In a cross sectional study we only have theIn a cross sectional study we only have the odds ratio.odds ratio.
  • 34.
    Selection of variablesSelectionof variables Same principle as with ordinary regressionSame principle as with ordinary regression Forward selection: add one variable at a timeForward selection: add one variable at a time until there are no more that make a significantuntil there are no more that make a significant differencedifference Backward selection: start with all, remove oneBackward selection: start with all, remove one at a time to see if they made a significantat a time to see if they made a significant contributioncontribution EPI Info has suggestions on how to do thisEPI Info has suggestions on how to do this