2. Introduction
Analysis of discrete data or discrete data analysis (CDA) refers
to methods for discrete response variables.
DDA in practice is the analysis of count response variables.
Statistical computations and analyses assume that the variables
have a specific levels of measurement
2
3. 3
Categorical/Discrete/Qualitative data: Measures on categorical or discrete
variables consist of assigning observations to one of a number of
categories in terms of counts or proportions
Counts are variables representing frequency of occurrence of an event:
Example: number of graduate students in the department of public
health, SPHMMC.
Proportions or “bounded counts” are ratios of counts:
Example: number of graduate students in the department of public
health divided by the total number of graduate students, SPHMMC.
4. Discretely measured responses
Discretely measured responses can be:
Nominal(unordered)variables:e.g., gender, ethnic background,
religious or political affiliation
Ordinal (ordered) variables, e.g., grade levels, income levels, school
grades
Discrete interval variables with only a few values,
e.g., number of times married
Continuous variables grouped into small number of categories,
e.g., income grouped into subsets, blood pressure levels (normal,
high-normal etc)
4
5. What is a categorical (qualitative) variable?
A categorical variable has a measurement consisting of a set of
categories. For example:
Field goal result – success or failure
Patient survival – yes or no
Criminal offense convictions – murder, robbery, assault, …
Highest attained education level –HS, BSc, MSc, PhD
Monthly income < 650, 651-1200,…., > 10,999 ETB
• Religious affiliation
∴We are living in categorical world.
5
6. Types of Categorical Variables
1. Ordinal –categories have an ordering
Education level (none, BSc., MSc., PhD, Professor)
Social class (upper, middle, lower)
Patient condition (good, fair, serious, critical)
2. Nominal –categories do not have an ordering
Religious affiliation (Orthodox, Catholic, Protestant, Muslim, other)
Mode of transportation to work (walk, bike, automobile, bus)
Favorite type of music (classical, country, folk, jazz, rock)
Choice of residence (apartment, condominium, house, other)
6
7. Note: The way that a variable is measured determines its classification.
For example, “education” is
only nominal when measured as public school or private school;
it is ordinal when measured by highest degree attained, using the
categories none, high school, bachelor’s, master’s, and doctorate;
it is interval when measured by number of years of education,
using the integers 0,1, 2, … .
7
8. Where do Categorical Data occur?
Social sciences: opinions on issues
Health sciences: response to treatments/drugs
Behavioral sciences: type of mental illness
Public health: AIDS awareness
Zoology: animals food preferences
Education: students' response to exams
Marketing: consumer preferences
Categorical data occur almost everywhere.
8
9. Variables and types of data for CDA
Response variable(s) is categorical
Explanatory/predictor variable(s) may be categorical or continuous ;they can be
of any type
Discrete Distributions
Statistical inference requires assumptions about the probability distribution(i.e
random mechanism, sampling model) that generated the data.
For example for a t-test, we assume that a random variable follows a normal
distribution.
For discrete data key distributions are: Bernoulli, Binomial, Poisson and
Multinomial.
9
10. Bernoulli Probability Distributions
10
Suppose Y = 1is a success where the probability of a success is π.
Also, suppose Y =0 is a failure with probability of a failure is 1−π.
Bernoulli probability mass function (pmf)
Notice that; P(Y=1) = π and P(Y=0) = 1 - π
Since E Yi = E Y2
i = 1 ∗ π + 0 ∗ 1 − π = π
E Yi = π and Var (Yi) =π(1 - π)
11. The Binomial distribution
• It is one of the most widely encountered discrete distributions.
• The origin of binomial distribution lies in Bernoulli’s trials.
• When a single trial of some experiment can result in only one of two
mutually exclusive outcomes (success or failure; dead or alive; sick or
well, male or female) the trail is called Bernoulli trial.
• Suppose an event can have only binary outcomes A and B. Let the
probability of A is π and that of B is 1 - π. The probability π stays the
same each time the event occurs.
11
12. • If an experiment repeated n times and the outcome is
independent from one trial to another, the probability that
outcome A occurs exactly y times is
•
• We write Y ∼ B(n, π)
12
13. Characteristics of a Binomial Distribution
• The experiment consist of n identical trials.
• There are only two possible outcomes on each trial.
• The probability of A remains the same from trial to trial. This probability is
denoted by p, and the probability of B is denoted by q. Note that q=1- p.
• The trials are independent.
• The binomial random variable Y is the number of A’s in n trials.
• n and π are the parameters of the binomial distribution.
• The mean is nπ and the variance is nπ(1- π)
13
14. Poisson distribution
• A different kind of discrete data arise when we count the number of
occurrences of an event , perhaps for different subjects or for units of time.
Examples:
- Daily number of new cases of breast cancer notified to a cancer registry
- Number of abnormal cells in a fixed area of histological slides from a
series of liver biopsies
• Suppose events happen randomly and independently in time at a constant
rate.
- If events happen with rate λ events per unit time, the probability of y
events happening in unit time is ; 𝜆 > 0
14
15. Characteristics of a Poisson distribution
• The Poisson distribution is very asymmetric when its mean is small
• With large means it becomes nearly symmetric
• It has no theoretical maximum value, but the probabilities tail off
towards zero very quickly
• λ is the parameter of the Poisson distribution
• The mean is λ and the variance is also λ
15
17. • The chi-square distribution is the most frequently employed statistical
technique for the analysis of count or frequency data.
• For example, we may know for a sample of hospitalized patients how many
are male and how many are female.
• For the same sample we may also know how many have private insurance
coverage, how many have Medicare insurance, and how many are on
Medicaid assistance.
• We may wish to know, for the population from which the sample was
drawn, if the type of insurance coverage differs according to gender.
• Chi-square analysis has solution to such type of question.
• The chi-square distribution may be derived from normal distributions.
17
18. Chi square distribution
1. Chi-square distribution is a nonsymmetrical distribution
2. Chi-square distributions are determined by degree of freedom
18
19. Chi-square test statistic
Cannot be negative because all discrepancies are squared.
Will be zero only in the unusual event that each observed
frequency exactly equals the corresponding expected frequency.
Larger the discrepancy between the expected frequencies and
their corresponding observed frequencies, the larger the
observed value of chi-square.
19
20. Types of Chi-Square Tests
i. Tests of goodness-of-fit
ii. Tests of independence
iii. Tests of homogeneity
20
21. Tests of goodness-of-fit
• All of the chi-square tests that we employ may be thought of as goodness-of-
fit tests.
• The phrase “goodness-of-fit” for use in a more restricted sense.
• We use it to refer to a comparison of a sample distribution to some theoretical
distribution that it is assumed describes the population from which the
sample came.
• Karl Pearson, who showed that the chi-square distribution may be used as a
test of the agreement between observation and hypothesis whenever the data
are in the form of frequencies.
21
22. Observed versus Expected Frequencies
• Observed frequencies: are the number of subjects or objects in our sample
that fall into the various categories of the variable of interest.
For example: if we have a sample of 100 hospital patients, we may observe that
50 are married, 30 are single, 15 are widowed, and 5 are divorced.
• Expected frequencies: are the number of subjects or objects in our sample
that we would expect to observe if some null hypothesis about the variable is
true.
For example, our null hypothesis might be that the four categories of marital
status are equally represented in the population from which we drew our
sample. In that case we would expect our sample to contain 25 married, 25
single, 25 widowed, and 25 divorced patients.
22
23. Chi-Square Test Statistic
• When the null hypothesis is true, 𝜒2
is distributed approximately as 𝜒2
with k-r
degrees of freedom. Where;
k- is equal to the number of groups for which observed and expected frequencies
are available
r - is the number of restrictions or constraints imposed on the given comparison
Oi is the observed frequency for the ith category of the variable of interest
Ei is the expected frequency
• This test is a right-tailed test, since when the O - E values are squared, the
answer will be positive or zero.
23
24. Two assumptions are needed for the goodness-of-fit test.
1. The data are obtained from a random sample.
2. The expected frequency for each category must be 5 or more.
- The steps for the chi-square goodness-of-fit test are summarized in this
procedure Table.
24
25. • When there is perfect agreement between the observed and the expected values,
χ2= 0. Also, 𝜒2 can never be negative.
• Finally, the test is right-tailed because “H0: Good fit” and “H1: Not a good fit”
mean that 𝜒2 will be small in the first case and large in the second case.
For example, suppose as a market analyst you wished to see whether consumers
have any preference among five flavors of a new fruit soda. A sample of 100 people
provided these data:
If there were no preference, you would expect each flavor to be selected with equal
frequency, i.e. 100/5 =20.
25
26. - Is there enough evidence to reject the claim that there is no preference in the
selection of fruit soda flavors? Let 𝛼 = 0.05.
Solution
- Step 1: State the hypotheses and identify the claim.
H0 : Consumers show no preference for flavors (claim).
H1: Consumers show a preference.
- Step 2: Find the critical value. The degrees of freedom are 5 -1= 4, and 𝛼 = 0.05.
Hence, the critical value from Chi-square table is 9.488.
- Step 3: Compute the test value by subtracting the expected value from the
corresponding observed value, squaring the result and dividing by the expected
value, and finding the sum. The expected value for each category is 20.
26
27. Step 4: Make the decision. The decision is to reject the null hypothesis,
since 18.0 > 9.488
Step 5: Summarize the results. There is enough evidence to reject the
claim that consumers show no preference for the flavors.
27
28. • The chi-square goodness-of-fit test can be used to test a variable to see if it
is normally distributed. The hypotheses are
H0: The variable is normally distributed.
H1: The variable is not normally distributed.
28
29. TESTS OF INDEPENDENCE
• The chi-square independence test can be used to test the independence of two
variables.
• To test the null hypothesis by using the chi-square independence test,
expected frequencies must be computed
• When data are arranged in table form for the chi-square independence test,
the table is called a contingency table.
• The table is made up of R rows and C columns.
• The degrees of freedom for any contingency table are (rows-1) times(columns-
1); that is, d.f. (R -1)(C -1).
• The reason for this formula for d.f. is that all the expected values except one
are free to vary in each row and in each column.
29
30. • For example, suppose a new postoperative procedure is administered to a
number of patients in a large hospital.
• The researcher can ask the question, do the doctors feel differently about this
procedure from the nurses, or do they feel basically the same way?
• Note that the question is not whether they prefer the procedure but whether
there is a difference of opinion between the two groups.
• To answer this question, a researcher selects a sample of nurses and doctors
and tabulates the data in table form, as shown.
30
31. H0:The opinion about the procedure is independent of the profession.
H1: The opinion about the procedure is dependent on the profession
• The degree of freedom for this case is (2-1)(3-1)= (1)(2) =2
31
34. • The final steps are to make the decision and summarize the results.
• This test is always a right-tailed test, and the degrees of freedom are (R-1)(C-1)
(2-1)(3-1)=2.
• If 𝛼=0.05,the critical value from Chi-square table is 5.991. Hence, the decision is
to reject the null hypothesis, since 26.67 > 5.991
• The conclusion is that there is enough evidence to support the claim that
opinion is related to (dependent on) profession—that is, that the doctors and
nurses differ in their opinions about the procedure.
34
35. The 2 X 2 Contingency Table
• Sometimes each of two criteria of classification may be broken down into only
two categories, or levels.
• When data are cross classified in this manner, the result is a contingency table
consisting of two rows and two columns.
• Such a table is commonly referred to as a 2X2 table.
• In the case of a 2X2 contingency table, however, X2
may be calculated by the
following shortcut formula:
35
36. • Where a, b, c, and d are the observed cell frequencies as shown in the
following table.
• When we apply the (r-1)(c-1) rule for finding degrees of freedom to a 2X2
table, the result is 1 degree of freedom.
A 2X2 Contingency Table
36
37. Example:
• According to Silver and Aiello study finding falls are of major concern among
polio survivors.
• Researchers wanted to determine the impact of a fall on lifestyle changes.
• The following table shows the results of a study of 233 polio survivors on
whether fear of falling resulted in lifestyle changes.
37
38. • Solution:
1. Data. From the information given we may construct the 2X2 contingency
table
2. Assumptions. We assume that the sample is equivalent to a simple random
sample.
3. Hypotheses.
H0: Fall status and lifestyle change because of fear of falling are independent.
H1: The two variables are not independent.
Let 𝛼 =.05
4. Test statistic. The test statistic is
OR
Answer: χ2
𝐶𝑎𝑙 = 31.74 38
39. Small Expected Frequencies
• The problems of how to handle small expected frequencies and small total sample
sizes may arise in the analysis of 2X2 contingency tables.
• Cochran suggests that the 𝜒2 test should not be used if n<20 or if 20<n<40 and any
expected frequency is less than 5.
• When n=40, an expected cell frequency as small as 1 can be tolerated.
39
40. • Yates’s Correction
• The observed frequencies in a contingency table are discrete and thereby give
rise to a
discrete statistic, 𝜒2
, which is approximated by the 𝜒2
distribution, which is
continuous.
• Yates proposed a procedure for correcting for this in the case of 2X2 tables.
• No correction is necessary for larger contingency tables
40
41. n≥40 and E≥5 n≥40 and 1≤ E < 5 n<40 or E<1
E
E
O 2
2 )
5
.
0
|
(|
)
)(
)(
)(
(
)
2
/
|
(| 2
2
d
b
c
a
d
c
b
a
n
n
bc
ad
E
E
O 2
2 )
(
)
)(
)(
)(
(
)
( 2
2
d
b
c
a
d
c
b
a
n
bc
ad
!
!
!
!
!
)!
(
)!
(
)!
(
)!
(
n
d
c
b
a
d
b
c
a
d
c
b
a
P
41
43. Logistic Regression
• Much research in the health sciences is motivated by a desire to
understand and describe the relationship between independent
variables and categorical dependent variable.
• Particularly plentiful are circumstances in which the outcome
variable is dichotomous (a variable that can assume only one of two
mutually exclusive values).
• These values are usually coded as Y=1 for a success and Y=0 for a
failure 43
44. • Logistic regression is the type of regression analysis that is usually
employed when the dependent variable is categorical.
• There can be many predictor variables (x’s) that could be categorical
or continuous.
44
45. Types of Logistic Regression
• Binary logistic regression: a regression analysis used to model outcome
variable with two categories
• Multinomial logistic regression: a regression analysis used to model
outcome variable of nominal scale with more than two categories
• Ordinal Logistic regression: a regression analysis used to model
outcome variable of ordinal scale with more than two categories
45
46. Linear vs. Logistic Regression
• What distinguishes logistic regression from linear regression
model is that the type of outcome variable.
• Linear regression: Outcome variable y is continuous
• Logistic regression: Outcome variable y is categorical
• The question a researcher need ask when choosing a regression
method is:
o What does my outcome look like?
46
47. • The difference is reflected both in
o the choice of a parametric model and
o the assumptions.
• However the methods employed in an analysis using logistic regression follow the
same general principles used in linear regression.
• Why not linear regression model for categorical outcome variables?
o Because having a categorical outcome variable violates the assumption of
linearity in linear regression.
o The error terms are heteroskedastic and is not normally distributed because Y
takes on only two values(0 and 1).
47
48. • The predicted probabilities can be greater than 1 or less than 0 which can
be a problem if the predicted values are used in a subsequent analysis.
• Some people try to solve this problem by setting probabilities that are
greater than (less than) 1 (0) to be equal to 1 (0).
• This amounts to an interpretation that a high probability of the Event
(Nonevent) occurring is considered a sure thing.
48
49. Objectives of Logistic Regression
• Estimating magnitude of outcome/exposure relationship
oTo evaluate the association of a binary outcome with a set of predictors
• Prediction
oDevelop an equation to determine the probability or likelihood that
individual has the condition (y = 1) that depends on the independent
variables (the x’s)
49
50. Assumptions of Logistic regression
• The outcome must be categorical
• Requires enough responses in each category of a given variable
• Groups should be mutually exclusive (e.g. multicollinear) which will make
maximum likelihood estimation impossible
• There is no assumption about the predictors being linearly related to each other
• There should not be multi-collinearity
• There should not be outliers and influential
• Independence of errors –assumes a between subjects design.
50
51. Logistic Regression Model
• The probability of the outcome is measured by the odds of occurrence of an event.
• If P is the probability of an event, then (1-P) is the probability of it not occurring.
o Odds of event = P/1-P
• In linear regression the estimates of effect directly quantified by the mean value
of response variable
• In logistic regression the estimates of effect are instead quantified by “Odds
Ratios”
51
54. • Taking the logarithms of both sides
• Can be transformed as follows
• Sometimes written as:
• Where ln ( or log) is the natural logarithm (base e)
54
55. Cont’d…
Logistic Vs. Linear Regression Equation
Logistic Regression:
Linear Regression:
• The other difference between linear and logistic regression models
concerns the conditional distribution of error.
55
56. Cont’d…
• In the linear regression model we assume that an observation of
the outcome variable may be expressed as y = E(𝒀𝒊|𝑿𝒊) + 𝜺.
• The error (𝜺) is an observation's deviation from the conditional
mean of y.
• The errors 𝜺 are normally distributed with mean 0 and constant
variance 𝜎2
(Equal variance). That is: 𝜀 ~N(0, 𝜎2
)
56
57. Cont’d…
• With a dichotomous outcome variable the conditional distribution of error term is
different.
• In this situation we may express the value of the outcome variable given x as y =
P(x)+ 𝜺.
• Here the quantity 𝜺 may assume one of two possible values.
o If y = 1 then 𝜺 =1-P(x)
o If y = 0 then 𝜺 = -P(x)
• Thus, 𝜺 are distributed with mean zero and variance equal to P(x)[1-P(x)].
• The conditional distribution of the outcome variable follows a binomial
distribution with probability given by the conditional mean, P(x).
57
59. Why log transformation?
• The odds has a range of 0 to ∞
• Odds > 1 associated with an event being more likely to occur than
to not occur
• Odds <1 associated with an event that is less likely to occur than
not occur
• Transformation is useful because it creates variable ranges from -
∞ to +∞
• Hence, it solves the problem we encountered in fitting a linear
model to probabilities
59
60. Estimating Logistic regression
Simple Logistic regression
• The logistic model with a single independent variable X where the effects of
other variables is uncontrolled.
Multiple Logistic regression
oThe logistic model with a single predictor variable X can be extended to
two or more predictor variables.
60
61. Interpretation of slope
• 𝛽1 is the estimated change in the log odds of the outcome for a one unit
increase in 𝑥1
• It estimates the log odds ratio for comparing two groups of observations
• This estimated slope can be exponentiated to get the corresponding estimated
odds ratio.
What about the Intercept?
• The intercept is mathematically necessary to specify the entire equation.
61
62. Maximum Likelihood Estimation
• The method used to estimate the regression coefficients in logistic regression
is called Maximum Likelihood Estimation (MLE)
• Ordinary least square(OLS) is method used to estimate the regression
coefficients in linear regression
• MLE yields values for the unknown parameters which maximize the
probability of obtaining the observed set of data.
62
63. Cont’d…
• Basically, the resulting estimates of the slope and intercept are the
values that make the observed data most likely among all choices
of values for 𝛽0and 𝛽1.
• Along with the estimates of 𝛽0and 𝛽1this method yields estimates
of the standard error for each: that can be used to create confidence
intervals and do hypothesis tests
63
64. Test of Significance of Coefficients
• The fitted relationship i.e. the estimated value of 𝛽0 & 𝛽1 may simply be
the result of chance phenomena.
• We need to test whether or not the sample data set exhibits sufficient
evidence to indicate that X actually contributes significantly to the
prediction of the log odds of Y for a given value of X
• The test statistics is:
64
65. Example: Coronary Heart Disease (CD) and Age: In this study sampled individuals
were examined for signs of CD (present = 1 / absent = 0) and the potential
relationship between this outcome and their age (yrs.) was considered.
65
66. • For the CHD-age data set, we could try to estimate the following:
• p = probability of CHD evidence (proportion of persons with CHD evidence), 𝑥1 = age
• are called regression coefficients
• Another way to write the above equation:
• Recall, the higher the odds of an event, the larger the probability of an event
• A predictor 𝑥1 that is positively associated with the odds will also be positively associated
with the probability of the event (i.e. estimated slope 𝛽1will be positive)
66
67. • A predictor 𝑥1 that is negatively associated with the odds will also be negatively
associated with the probability of the event (i.e. estimated slope 𝛽1will be
negative)
• Results from logistic regression of log odds of CHD evidence on age:
• The resulting equation: = -5.34+0.11XAge
67
68. Cont’d..
• Where p is estimated probability of persons to have CHD amongst persons of a
given age
• The estimated coefficient (𝛽1) of age (𝑥1) is positive; hence we have
o Estimated a positive association between age and log odds of CHD
o Estimated a positive association between age and probability of CHD
• How can we actually interpret the value 0.11?
• Lets write out the equation comparing two groups of individuals who differ in
age by one year:
• Group 1, age = k years; Group 2, age = k + 1 years
68
69. Cont’d…
• The resulting equations estimating the ln odds of CHD evidence in each age
group
• Multiplying out, and taking the difference (subtracting)
69
70. Cont’d…
• So, when the dust settles:
• Reversing one of the famous properties of logarithms:
• So , 𝛽1the estimated slope for 𝑥1 is the natural log of an estimated odds
ratio:
• To get the estimated odds ratio, exponentiation of 𝛽1, i.e.:
70
71. Cont’d…
• In our example, recall 𝛽1= 0.11
• Here, 𝑂𝑅 = 𝑒𝛽1 = 𝑒0.11 = 1.116
• The estimated odds ratio of CHD evidence for a one year age difference is
1.116, older compared to younger.
o60 year olds compared to 59 years olds
o45 year old compared to 44 year olds
71
72. Interpretation of slope
• Change in the log odds of CHD for a one year increase in age
• One group with 𝑥1 one unit higher than the other
The Intercept?
• The resulting equation
= -5.34+0.11XAge
• Here, the intercept estimate 𝛽0is the estimated ln odds of CHD evidence
for persons of age 0
72
73. Test of Significance of Coefficients
• Hypothesis
• Assume null true, and calculate standardized “distance “ of 𝛽1from 0
=
0.11
0.03
= 3.67
• p-value is probability of being 3.67 or more standard errors away from 0 on a
normal curve: very low in this example, p < 0.001
73
74. Multiple Logistic Regression
• Multiple logistic regression allows us to model the relationships of several independent
variables to a response variable.
• These independent variables may be either continuous or discrete or a combination of
the two
• We can also estimate the association between each predictor and Pr(y = 1) controlling
for all other predictors
• In the previous example we found a statistically significant positive association
between CHD and age
= -5.34+0.11XAge
74
75. Cont’d…
• Smoking status of study participants was also included in the model to assess if
it do have a relationship with CHD
• What if smoking is also associated with age?
• Age could be a confounder of the smoking and CHD relationship (and vice-
versa)
• Can we estimate the age adjusted relationship between CHD and smoking?
• Even if smoking and age not related, and hence there is no confounding, both
predictors may tell more about CHD evidence than either alone.
75
76. Cont’d…
• Here, we need a logistic regression model with 2 predictors (𝑋𝑠):
• Where p = Pr(CHD evidence), 𝑋1 = age, 𝑋2 = smoking status (1=yes)
• How would we interpret the coefficients from a multiple logistic regression?
And the resulting odds ratio estimates?
76
77. Cont’d…
• 𝛽1is the estimated regression coefficient associated with age:
• It estimates the ln odds ratio for comparing two individuals (groups) who
differ by one year in age and are either both smokers or non-smokers
• 𝛽1is the estimated smoking-adjusted log odds ratio for age
• Just to demonstrate: Write out 2 equations for two groups of persons who
differ by one year in age and are all smokers
77
79. Cont’d…
• 𝑿𝟏 is the age variable
• 𝛽1 is the estimated adjusted ln OR of CHD associated with age, after adjusting
for smoking status
• 𝑒𝛽1 is the estimated adjusted OR of CHD associated with age, after adjusting
for smoking status
• This 𝑂𝑅 compares two groups of individuals of the same smoking status but
who differ by one year in age (older to younger)
79
80. Cont’d…
• 𝑿𝟐 is the smoking variable
• 𝛽2 is the estimated regression coefficient associated with smoking:
• It estimates the ln odds ratio for comparing two groups of individuals of the
same age, where one group is smokers and the other is non-smokers
• 𝑒𝛽2 estimates the odds ratio for comparing two groups of individuals of the
same age, where one group is smokers and the other is non-smokers
80
81. Inference in Multiple Logistic Regression
• We can estimate each regression coefficients and ORs by constructing a
range of plausible values i.e. CIs
• We can also test the statistical significance of regression coefficients and
ORs using magnitude of test statistics or corresponding p-values or CI
• Each coefficient estimate has its own associated standard error
• Approach very similar to approach from simple logistic regression
81
83. Model Development
• The approach to model development in multiple logistic regression analysis
is similar to the approach in normal theory multiple linear regression.
• Models are compared to assess the statistical significance of the extra
predictors in the larger model, controlling for the predictors in the smaller
model.
• This is done using the likelihood ratio test.
• If the likelihood ratio statistic is significant, we say that the added variables
are significant in adjusted analysis.
83
101. Multicategory response
• The binary logistic regression provided analysis methods when there were
binary responses.
• What about more than two response categories?
Examples:
• Canadian political party affiliation – Conservative, New Democratic, Liberal
• Chemical compounds in drug discovery experiments – Positive, blocker, or
neither
• Five-level Likert scale – Strongly disagree, disagree, neutral, agree, or strongly
agree.
101
102. Cont’d…
• For these examples, some responses are ordinal (e.g., Likert scale) and
some are not (e.g., chemical compounds).
• We will investigate both nominal (unordered) and ordinal multicategory
responses.
Multinomial Probability Distribution
• The multinomial probability distribution is the extension of the binomial
distribution to situations where there are more than two categories for a
response.
• The probability mass function for observing particular values of n1, …,nc
is
102
103. 𝑊here
• Y denotes the response category with levels of j = 1, …, c
• Each category has a probability of 𝜋𝑗= P(Y=j).
• n denotes the number of trials
• n1, …, nc denote the response count for category j
103
104. NOMINAL RESPONSES: BASELINE-CATEGORY LOGIT MODELS
• Multinomial logistic regression is an extension of the (binary) logistic
regression model when the categorical response variable has more than two
levels.
• One possible way to handle such situations is to split the categorical response
variable and apply binary logistic regression to each dichotomous variable.
• However, this will result in several different analyses for only one categorical
response.
• A more structured approach is to formulate one model for the categorical
response by means of so-called generalized logits.
104
105. Cont’d…
• Suppose there are J categories for the response variable with corresponding
probabilities 𝜋1, 𝜋2, …, 𝜋𝐽.
• Using the first category as a “baseline”, we can form “baseline category logits” as
log(𝜋𝐽/𝜋1) for j = 2, …, J, which are simply log odds.
• When J = 2, we have log(𝜋2/𝜋1) = log(𝜋2/(1-𝜋2)),which is equivalent to log(𝜋/(1-
𝜋)) in binary logistic regression with 𝜋 = 𝜋2.
• When there is only one explanatory variable x, we can form the multinomial
logistic regression model of
105
106. Cont’d…
• One can easily compare other categories so that category 1 is not always
used.
• For example, suppose you would like to compare category 2 to 3 for J ≥ 3.
Then
and
• For more than one explanatory variable, the model becomes:
106
107. Odds ratios
• Because the log-odds are being modeled directly in a
multinomial regression model, odds ratios are useful for
interpreting an explanatory variable's relationship with the
response.
• Consider the model again of
• The odds of a category j response vs. a category 1 response are
𝐸𝑥𝑝 𝛽𝑗0 + 𝛽𝑗1𝑥 . This directly leads to using odds ratios as a way
to understand the explanatory variable in the model.
107
108. Cont’d…
• Thus, the odds of a category j vs. a category 1 response change by 𝒆𝒄𝜷𝒋𝟏 times for
every c-unit increase in x.
• In a similar manner, we could also compare category j to j(j j, j > 1, j> 1):
• Wald and LR-based inference methods for odds ratios are performed.
108
109. Ordinal response models
• Suppose that the response categories are ordered in the following way:
category 1 < category 2 <….< category J
• For example, a response variable may be measured using a Likert scale with
categories strongly disagree, disagree, neutral, agree, or strongly agree.
• Logit transformations of the probabilities can incorporate these orderings in a
variety of ways.
• In this section, we focus on one way where probabilities are cumulated based on
these orderings.
109
110. Cont’d…
• The cumulative probability for Y is
P(Y j) = 1 + … + j for j = 1, …, J.
• Note that: P(Y J) = 1.
• The logit of the cumulative probabilities can be written as
for j = 1, …, J – 1. For each j, we are computing the log odds of being in
categories 1 through j vs. categories j + 1 through J.
110
111. Cont’d…
• When there is only one explanatory variable x, we can allow the log odds to vary
by using a proportional odds model:
for j = 1, …, J – 1.
• The proportional odds name comes from there being no j subscripts on the
𝛽 parameter, which means these parameters are the same for each possible log-
odds that can be formed. This leads to each odds being a multiple of exp (𝛽𝑗0).
111
112. Cont’d…
• Notes:
• 10<<J0 due to the cumulative probabilities. Thus, the odds increasingly become
larger for j=1, …, J – 1.
• A proportional odds model actually is a special case of a cumulative probability
model, which allows the parameter coefficient on each explanatory variable to vary
as a function of j.
112
113. Cont’d…
• For more than one explanatory variable, the model becomes:
• Consider the case of one explanatory variable x again:
113
114. Odds ratio
• Odds ratios are easily formed because the proportional odds model
equates log-odds to the linear predictor.
• The main difference now is the odds involve cumulative probabilities.
• Consider the model again of
• The odds ratio is
Where Oddsx (Y≤ j) denotes the odds of observing category j or smaller for
Y.
114
115. Cont’d…
• The formal interpretation of the odds ratio is
- The odds of Y ≤ j vs. Y > j change by exp(𝛽1) times for a c-unit increase in x.
Notes:
• When there is more than one explanatory variable, we will need to include a
statement like “holding the other variables in the model constant”.
• Adjustments need to be made to an odds ratio interpretation when
interactions or transformations are present in the model.
• Wald and LR-based inference methods for odds ratios are performed
115
117. Logistic Regression - Multiple Dependent Variables
• Is it possible to list multiple dependent variables (DVs) in a single SPSS
logistic regression procedure?
• The Logistic Regression procedure does not allow you to list more than one
dependent variable, even in a syntax command.
• it is possible to write a short macro that loops through a list of dependent
variables.
• The list is an argument in the macro call and the Logistic Regression command
is embedded in the macro.
117
118. * compute a set of binary dependent variables to illustrate the macro.
do repeat y = y1 to yn.
compute y = (uniform(1) > .6).
end repeat.
exe.
define lrdef (!pos !charend('/') )
!do !i !in ( !1)
logistic regression !i
/method = enter v1v2v3..vn
/contrast (v1)=indicator /contrast (v2)=indicator
/save = pred
/criteria = pin(.05) pout(.10) iterate(20) cut(.5) .
!doend
!enddefine.
lrdef y1 y2 y3 …yn/.
118