Working Papers N° 6 - July 2012 Ministry of Economy

Working Papers
N° 6 - July 2012
Ministry of Economy and Finance
Department of the Treasury
ISSN 1972-411X
lclogit: a Stata module for estimating a mixed
logit model with discrete mixing distribution
via the Expectation-Maximization algorithm
Daniele Pacifico, Hong il Yoo
Working Papers
The working paper series promotes the dissemination of
economic research produced in
the Department of the Treasury (DT) of the Italian Ministry of
Economy and Finance
(MEF) or presented by external economists on the occasion of
seminars organised by
MEF on topics of institutional interest to the DT, with the aim

of stimulating comments
and suggestions.
The views expressed in the working papers are those of the
authors and do not
necessarily reflect those of the MEF and the DT.
Copyright: ©
2012, Daniele Pacifico, Hong il Yoo.
The document can be downloaded from the Website
www.dt.tesoro.it and freely
used, providing that its source and author(s) are quoted.
Editorial Board: Lorenzo Codogno, Mauro Marè, Libero
Monteforte, Francesco Nucci, Franco Peracchi
Organisational coordination: Marina Sabatini
1
CONTENTS
1 INTRODUCTION
...............................................................................................
..................... 2
2 EM ALGORITHM FOR LATENT CLASS LOGIT
................................................................... 3

3 THE LCLOGIT COMMAND
...............................................................................................
.... 4
4 POST-ESTIMATION COMMAND: LCLOGITPR
................................................................... 6
5 POST-ESTIMATION COMMAND: LCLOGITCOV
................................................................ 6
6 APPLICATION
...............................................................................................
........................ 7
7 ACKNOWLEDGMENTS
...............................................................................................
....... 12
8 REFERENCES
...............................................................................................
...................... 12
2
lclogit: a Stata module for estimating a mixed
logit model with discrete mixing distribution
via the Expectation-Maximization algorithm
Daniele Pacifico (

), Hong il Yoo (
)
Abstract
This paper describe lclogit, a Stata module to fit latent class
logit models through the
Expectation-Maximization algorithm. The stability of this
estimation method allows
overcoming some of the computational difficulties that normally
arise when fitting such models
with many latent classes. This, in turn, permits users to estimate
nonparameterically the mixing
distribution of the random coefficients because the more the
mass points of the latent class
model, the better the approximation of the unknown joint
density of the random coefficients.
Keywords: st0001, lclogit, latent class model, EM algorithm,
mixed logit.
(

) Works with the Italian Department of the Treasury (Rome,
Italy). E-mail: [email protected]
(
) Is a PhD student at the School of Economics, the University
of New South Wales (Sydney, Australia).
E-mail: [email protected]
mailto:[email protected]
mailto:[email protected]
3
4
5

10
11
12
Address:
Via XX Settembre, 97
00187 - Rome
Websites:
www.mef.gov.it
www.dt.tesoro.it

e-mail:
[email protected]
Telephone:
+39 06 47614202
+39 06 47614197
Fax:
+39 06 47821886
Ministry of Economy and Finance
Department of the Treasury
Directorate I: Economic and Financial Analysis
An Endogenous Segmentation Mode Choice Model with an
Application to Intercity Travel
Author(s): CHANDRA R. BHAT
Source: Transportation Science, Vol. 31, No. 1 (February 1997),
pp. 34-48
Published by: INFORMS
Stable URL: https://www.jstor.org/stable/25768750
Accessed: 07-11-2019 05:27 UTC
JSTOR is a not-for-profit service that helps scholars,
researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information

technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about
JSTOR, please contact [email protected]
Your use of the JSTOR archive indicates your acceptance of the
Terms & Conditions of Use, available at
https://about.jstor.org/terms
INFORMS is collaborating with JSTOR to digitize, preserve and
extend access to
Transportation Science
This content downloaded from 134.153.184.170 on Thu, 07 Nov
2019 05:27:50 UTC
All use subject to https://about.jstor.org/terms
An Endogenous Segmentation Mode Choice
Model with an Application to Intercity
Travel
CHANDRA R. BHAT
Department of Civil and Environmental Engineering,
University of Massachusetts, Amherst
This article uses an endogenous segmentation approach to
model mode choice. This approach
jointly determines the number of market segments in the travel
population, assigns individuals
probabilistically to each segment, and develops a distinct mode
choice model for each segment
group. The author proposes a stable and effective hybrid

estimation approach for the endoge
nous segmentation model that combines an Expectation-
Maximization algorithm with stan
dard likelihood maximization routines. If access to general
maximum-likelihood software is not
available, the multinomial-logit based Expectation-
Maximization algorithm can be used in
isolation. The endogenous segmentation model, and other
commonly used models in the travel
demand field to capture systematic heterogeneity, are estimated
using a Canadian intercity
mode choice dataset. The results show that the endogenous
segmentation model fits the data
best and provides intuitively more reasonable results compared
to the other approaches.
T J. he estimation of travel mode choice models is an
important component of urban and intercity travel
demand analysis and has received substantial at
tention in the transportation literature (see BEN
AKIVA and LERMAN, 1985). The most widely used
model for urban as well as intercity mode choice is
the multinomial logit model (MNL). The MNL model
is derived from random utility maximizing behavior
at the disaggregate individual level. Therefore, ide
ally, we should estimate the logit model at the indi
vidual level and obtain individual-specific parame
ters for the intrinsic mode biases and for the mode
level-of-service attributes. However, the data used
for mode choice estimation is usually cross-sectional;
that is, there is only one observation per individual.
This precludes estimation of the logit parameters at
the individual level and constrains the modeler to
pool the data across individuals (even in panel data
comprising repeated choices from the same individ
ual, the number of observations per individual is

rarely sufficient for consistent and efficient estima
tion of individual-specific parameters). In such
pooled estimations, it is important to accommodate
differences in intrinsic mode biases (preference het
erogeneity) and differences in responsiveness to lev
el-of-service attributes (response heterogeneity)
across individuals. In particular, imposing an as
sumption of preference and response homogeneity in
the population is rather strong and is untenable in
most cases (hensher, 1981). Further, if the as
sumption of homogeneity is imposed when, in fact,
there is heterogeneity, the result is biased and in
consistent parameter and choice probability esti
mates (see Chamberlain, 1980).
An issue of interest then is: how can preference
and response heterogeneity be incorporated into the
multinomial logit model when studying mode choice
behavior from cross-sectional data?
One approach is to estimate a model with (pure)
random coefficients where the logit mode bias and
level-of-service parameters are assumed to be ran
domly distributed in the population. This approach
ignores any systematic variations in preferences and
response across individuals. As such, it cannot be
considered as a substitute for careful identification
of systematic variations in the population. The ran
dom coefficients cannot even be considered as an
alternative approach (i.e., alternative to accommo
dating systematic effects) to account for heterogene
ity in choice models; it can only be considered as a
potential "add-on" to a model that has attributed as
much heterogeneity to systematic variations as pos

34
Transportation Science
Vol. 31, No. 1, February 1997
0041-1655/97/3101-0034 $05.00
? 1997 Institute for Operations Research and the Management
Sciences
2019 05:27:50 UTC
AN ENDOGENOUS SEGMENTATION MODE CHOICE
MODEL / 35
sible (HOROWITZ, 1991 makes a similar point in the
context of the use of multinomial probit models in
travel demand modeling). Typical applications of the
random-coefficients specification have used a base
systematic model with some allowance for system
atic preference heterogeneity (by including individ
ual-related variables directly in the utility function),
but with little (or no) allowance for systematic re
sponse heterogeneity.1
It is clear from above that accommodating system
atic preference and response heterogeneity in multi
nomial logit models must be a critical focus in mode
choice modeling. Systematic heterogeneity may be
accommodated in one of two broad ways: exogenous
market segmentation or endogenous market seg
mentation. We discuss each of these two approaches
in the subsequent two paragraphs (as indicated ear

lier, a random coefficients specification can be su
perimposed over the systematic model in each of
these approaches; however, in the rest of this paper,
we focus only on the systematic specifications).
The exogenous market segmentation approach to
capturing systematic heterogeneity assumes the ex
istence of a fixed, finite number of mutually exclu
sive market segments (each individual can belong to
one and only one segment). The segmentation is
based on key sociodemographic variables (sex, in
come, etc.) and possibly trip characteristics (whether
an individual travels alone, distance of trip, etc.).
Within each segment, all individuals are assumed to
have identical preferences and identical sensitivities
to level-of-service variables (i.e., the same utility
function).2 The total number of segments is a func
tion of the number of segmentation variables and
the number of segments defined for each segmenta
tion variable. Ideally, the analyst would consider all
sociodemographic and trip-related variables avail
able in the data for segmentation (we will refer to
such a segmentation scheme as full-dimensional ex
ogenous market segmentation). However, a practi
cal problem with the full-dimensional exogenous
xThe reader is referred to a report by electric power re
search institute, 1977, for an application of the random-coef
ficients logit formulation using cross-sectional data; gonul and
srinivasan (1993) applied the random-coefficient logit formula
tion to estimate brand choice models from panel data. The ran
dom-coefficients formulation has also been used in
combination
with the probit model of choice (Hausman and Wise, 1978;
Fischer and Nagin, 1981; Horowitz, 1993).

2To be precise, all individuals in a segment are assumed to
have
identical preference patterns and sensitivity to level -of-service
patterns, not necessarily identical preferences and level -of-
service
sensitivity. This is because continuous variables such as
income
and distance can appear in the segment-specific utility
functions
even if they are used as segmentation variables. For example,
within each segment, preference may be a function of income,
and
level-of-service sensitivity may be a function of trip distance.
segmentation scheme is that the number of seg
ments grows very fast with the number of segmen
tation variables, creating both interpretational and
estimation problems due to inadequate observations
in each segment. To overcome this limitation, re
searchers have used one of two methods. The first
method, which we label as the Refined Utility Func
tion Specification method, accommodates preference
heterogeneity by introducing key segmentation vari
ables directly into the utility function as alternative
specific variables and recognizes response heteroge
neity by interacting the level-of-service variables
with the segmentation variables (we will refer to
sociodemographic and trip characteristics that are
likely to impact mode preferences and level of ser
vice sensitivity as segmentation variables). The re
fined utility function specification method is a re
strictive version of the full-dimensional exogenous
market segmentation approach where only lower
order interaction effects of the segmentation vari
ables on preference and response are allowed. The

second method, which we label the Limited-Dimen
sional Exogenous Market Segmentation method,
overcomes the practical problem of the full-segmen
tation approach by using only a subset of the demo
graphic and trip variables (typically one or two) for
segmentation. The advantage of the two methods
just discussed is that they are practical (the param
eters can be efficiently estimated with data sizes
generally available for mode choice analysis) and are
easy to implement (requires only the MNL soft
ware). The disadvantage is that their practicality
comes at the expense of suppressing potentially
higher-order interaction effects of the segmentation
variables on preference and response to level-of-ser
vice measures. In addition, an intrinsic problem
with all exogenous market segmentation methods is
that the threshold values of the continuous segmen
tation variables (for example, income) which define
segments have to be established in a rather ad hoc
fashion.
The endogenous market segmentation approach,
on the other hand, attempts to accommodate sys
tematic heterogeneity in a practical manner not by
suppressing higher-order interaction effects of seg
mentation variables (on preference and response to
level-of-service measures), but by reducing the di
mensionality of the segment-space. Each segment,
however, is allowed to be characterized by a large
number of segmentation variables. The appropriate
number of segments representing the reduced seg
ment-space is determined statistically by succes
sively adding an additional segment till a point is
reached where an additional segment does not re

sult in a significant improvement in fit. Individuals
2019 05:27:50 UTC
36 / C. R. BHAT
are assigned to segments in a probabilistic fashion
based on the segmentation variables. The approach
jointly determines the number of segments, the as
signment of individuals to segments, and segment
specific choice model parameters. Since this ap
proach identifies segments without requiring a
multi-way partition of data as in the full-dimen
sional exogenous market segmentation method, it
allows the use of many segmentation variables in
practice and, therefore, facilitates incorporation of
the full order of interaction effects of the segmenta
tion variables on preference and level-of-service sen
sitivity. The method also obviates the need to (arbi
trarily) establish the threshold values defining
segments for continuous segmentation variables. As
we indicate later, the approach also does not exhibit
the individual-level independence from irrelevant
alternatives (IIA) property of the exogenous segmen
tation approach. A potential disadvantage is that
the model cannot be estimated directly using the
MNL software (we overcome this issue in the cur
rent paper).
The model formulation in this paper falls under
the endogenous segmentation approach to accommo

dating systematic heterogeneity. We use a multino
mial logit formulation for modeling both segment
membership as well as mode choice. To the author's
knowledge, no previous market segmentation anal
ysis in the travel demand field has adopted this
approach. But the approach has been applied earlier
in the marketing field. kamakura and russell
(1989) originally proposed such a method to model
brand choice. Their model assumed that the seg
ment membership probabilities were invariant
across households (i.e., the only variables in their
multinomial logit model for segment membership
were segment-specific constants). Such a model is of
limited value since the objective of segmentation
schemes is to associate heterogeneity with observ
able individual characteristics. gupta and chinta
gunta (1994) extended the work of Kamakura and
Russell by including segmentation variables in the
segment membership MNL model (dayton and Ma
cready, 1988 also propose an endogenous segmen
tation model in which segment membership is func
tionally related to individual-related variables;
however, the segment membership and choice prob
abilities do not take a MNL structure).
The model in this paper takes the same form as
the model of Gupta and Chintagunta. However, the
paper proposes and applies an efficient, stable, hy
brid estimation procedure that combines an Expec
tation-Maximization (EM) formulation (which ex
ploits the special structure of the model) with a
traditional quasi-Newton maximization algorithm
(both Kamakura and Russell, 1989 and Gupta and
Chintagunta, 1994 use a direct maximum likelihood
method, which we found to be rather unstable and to

require more time than the method proposed here).
The EM formulation requires only the standard
MNL software and can be used in isolation to esti
mate the endogenous segmentation model. Hence, it
should be of interest to researchers and practitio
ners who do not have access to, or who are not in a
position to invest time and effort in learning to use,
general-purpose maximum likelihood software. The
endogenous segmentation model is applied in a
travel demand context to study intercity business
mode choice behavior in the Toronto-Montreal cor
ridor of Canada and its empirical performance is
assessed relative to alternative methods to account
for systematic preference and response heterogene
ity.
The rest of this paper is structured as follows. The
next section presents the structure for the mode
choice model with endogenous market segmenta
tion. Section 2 outlines the estimation procedure.
Section 3 discusses the empirical results obtained
from applying the model to intercity mode choice
modeling. Section 4 presents the choice elasticities
and examines the policy implications of the results.
The final section provides a summary of the re
search findings.
1. MODEL STRUCTURE
THE MODE CHOICE MODEL with endogenous segmen
tation rests on the assumption that there are S
relatively homogenous segments in the intercity
travel market (S is to be determined); within each
segment, the pattern of intrinsic mode preferences
and the sensitivity to level of service measures are
identical across individuals. However, there are dif

ferences in intrinsic preference patterns and level
of-service sensitivity among the segments. Thus,
there is a distinct mode choice model for each seg
ment s (s = 1, 2, 3, . . . , S).
We assume a random utility framework as the
basis for individuals' choice of mode. We also assume
that the random components in the mode utilities
have a type I extreme value distribution and are
independent and identically distributed. Then, the
probability that an individual q chooses mode i from
the set Cq of available alternatives, conditional on
the individual belonging to segment s, takes the
familiar multinomial logit form (McFADDEN, 1973),
Po(i)s = ^-n (1)
2019 05:27:50 UTC
MODEL / 37
where xqi is a vector comprising level-of-service and
alternative-specific variables associated with alter
native i and individual q, and J8S is a parameter
vector to be estimated.
The probability that individual q belongs to seg
ment s is next written as a function of a vector zq of
sociodemographic and trip-related variables associ
ated with the individual (zq includes a constant).
Using a multinomial logit formulation, this proba

bility can be expressed as
ey'sZq
The unconditional (on segment membership)
probability of individual q choosing mode i from the
set Cq of available alternatives can be written from
equations (1) and (2) as
Pqd) = l PqsxlPq(i)s] (3)
The assignment of individuals to segments based
on sociodemographic and trip characteristics is a
critical part of market segmentation, as discussed in
section 1. In the current model, this assignment is
probabilistic and is based on equation (2) after re
placing the yss with their estimated counterparts.
The size of each segment (in terms of share), Rs, may
be obtained as
y p
RS = ?Q (4)
where Q is the total number of individuals in the
estimation sample.
The sociodemographic and trip-related attributes
that characterize each segment can be inferred from
the signs of the coefficients in equation (2). A more
intuitive way is to estimate the mean of the at
tributes in each segment as follows:
2 q PqsZq
zs= y p (5) Z^q * qs
The model can be used to predict the choice of

mode at the individual level, segment level, or the
market level. The individual-level choice probabili
ties can be obtained from equation (3). The segment
level mode choice shares can be obtained as
LqPqsX [Pa(l)s] Ws(i) = q y p q (6) 2Liq * qs
Finally, the market-level mode choice shares may be
computed as
W/' Pqs X [Pq(i)S] V D v/Tir/-
w(0 =-q-= 2j Rs x
(7)
The elasticities of the effect of level-of-service at
tributes can also be computed at the individual
level, the segment-level, or the market-level. These
can be derived in a straight-forward fashion from
the expressions above and are not presented here
due to space considerations. An examination of the
individual-level cross-elasticities from the model
will show that the endogenous segmentation model
is not saddled with the independence from irrele
vant alternatives property of the MNL model.
2. MODEL ESTIMATION
THE PARAMETERS to be estimated in the mode choice
model with endogenous market segmentation are
the parameter vectors j8s and ys for each s, and the
number of segments S. The log likelihood function to
be maximized for given S can be written as (we
discuss the procedure employed to determine S in

section 2.2)
q=Q ( s I
%=iz log 2 [pqsx n ipqa)sy? q = l [ 8 = 1 i<ECq I
(8)
where Cq is the choice set of alternatives for the gth
individual and 8qi is defined as follows:
1 if the qth individual chooses
alternative i
(9 = 1,2.q;iEC,) (9)
0 otherwise,
Equation (8) represents an unconditional likeli
hood of an observed choice sample and is character
istic of finite probability mixture models. It has been
noted earlier (see McLACHLAN and basford, 1988
and redner and walker, 1984) that maximization
of the likelihood function using the usual Newton or
quasi-Newton (secant) routines in such mixture
models can be computationally unstable (we docu
ment our own experience of this unstable behavior
in section 3.4). A critical issue in such cases is to
start the maximum likelihood iterations with good
start parameter estimates. To obtain good start val
ues, we develop a two-stage iterative method which
belongs to the EM family of algorithms (dempster,
Laird and Rubin, 1977). This family of algorithms
has been suggested earlier as a natural candidate
for estimation in the general class of finite-mixture
models since it is stable and tends to increase the

2019 05:27:50 UTC
38 / C. R. BHAT
log-likelihood function more than usual quadratic
maximization routines in areas of the parameter
space distant from the likelihood maximum (RUUD,
1991). The specific EM method is discussed below.
2.1. EM Method for Start Values
Consider the likelihood function in equation (8)
and rewrite it as
2 = I logf 1 PqsLq} (10)
where
lqs = n [pms]"*. Jqs
The value Ln? is the choice likelihood for individual
q conditional on the individual belonging to segment
s. Next, note that the expected value of membership
in segment s for individual q conditional on sociode
mographic/trip characteristics and on the observed
mode choice of the individual (i.e., the posterior seg
ment membership probability Pqs) can be obtained
by revising the expected value of segment member
ship conditional only on the individual's sociodemo
graphic/trip characteristics (i.e., the prior segment
membership probability Pqs) in a Bayesian fashion
as

Pqs=y p~^t * s' * qs'qs'
The necessary first-order conditions for maximizing
the likelihood function can be written from equation
(10) and (11) as:
_ / Pm dL
= 1 dPs q 2s> PqsLj a/3,
J qs
Pqs dLn
I Lqsj dBs
dlogLgs= (12)
for s = 1, 2, . . . , S
where
L*=UlPqsf- (14)
s
Equation (12) indicates that for given values of Pqs,
the maximum likelihood estimate of each /3S vector
(s = 1, 2, . . . , S) is obtained from a standard multi
nomial logit estimation for choice of mode with the
corresponding Pqs values as weights (thus, there are
S individual multinomial logit estimations). Equa
tion (13) shows that for given values of Pqs, the
maximum likelihood estimate of the ys vector is
obtained from a single multinomial logit estimation
for choice of segment with the posterior probabilities
Pqs being used as the "dependent variable" instead
of the "missing" segment choice data. A more effi
cient way to simultaneously obtain the estimates of

j3s for each segment and estimates of ys (for given
values of Pqs) is to construct a new log likelihood
function as follows:
^* = l[lPqslogLqs + ogL*] (15)
= 2 2 P?{log Lqs + log PJ. q s
Since the log-likelihood function above is a combina
tion of MNL log-likelihood functions, it is globally
concave for given Pqs values.
Equations (11) and (15) lend themselves gainfully
to the application of a EM algorithm. The E-step
involves the computation of the a posteriori mem
bership probabilities using equation (11) for some
given initial guesses for the parameters. In the M
step, equation (15) is maximized with respect to
parameters using the a posteriori membership prob
abilities computed in the E-step. Since the function
in equation (15) is globally concave for given Pqs, a
simple Newton-Raphson search will converge
quickly. The two steps are then alternated. Demp
ster, Laird and Rubin (1977) prove (under a very
general setting) that, at each iteration, the EM al
gorithm provides monotone increasing values of the
original log-likelihood function in equation (8).
In principle, the EM steps can be iterated until
convergence (at convergence, the EM algorithm will
return the maximum-likelihood estimates). But, as
we show in section 3.4, the rate of convergence of the
EM algorithm is slow in the neighborhood of the
likelihood maximum (this has also been documented
in other applications of the EM algorithm; see Red
ner and Walker, 1984 and WATSON and ENGLE,
1983). So we stopped the EM algorithm prematurely
after a few iterations and used the resulting param

2019 05:27:50 UTC
MODEL / 39
eter estimates as the start values for the direct
maximization of the log-likelihood in equation (8).
However, the reader will note that the EM algo
rithm proposed here can be used in isolation (with
out combining with the direct maximum likelihood
procedure) to obtain the parameters of the endog
enously segmented mode choice model if access to
general maximum-likelihood software is not avail
able (the EM method requires only the standard
multinomial logit software).
2.2. Determination of Number of Segments
The estimation procedure discussed above is ap
plicable for a given number of segments S. To iden
tify the most appropriate value for S, we estimate
the model for increasing values of S (S = 2, 3, 4,
. . .) until a point is reached where an additional
segment does not significantly improve model fit.
The evaluation of model fit is based on the Bayesian
Information Criterion (BIC)
BIC= -X + 0.5-iMn(iV). (16)
The first term on the right side is the negative of the
log-likelihood value at convergence; R is the number
of parameters estimated, and N is the number of

observations (see ALLENBY, 1990). As the number of
segments, S, increases, the BIC value keeps declin
ing till a point is reached where an increase in S
results in an increase in the BIC value. Estimation
is terminated at this point and the number of seg
ments corresponding to the lowest value of BIC is
considered the appropriate number for S.3
3. AN APPLICATION TO INTERCITY MODE CHOICE
MODELING
3.1. Background
In this section, we report the empirical results
obtained from applying the endogenous segmenta
tion model and other alternative methods of accom
modating systematic heterogeneity to an analysis of
intercity travel mode choice behavior. The data used
in the current empirical analysis was assembled by
VIA Rail (the Canadian national rail carrier) in 1989
to develop travel demand models to forecast future
intercity travel in the Toronto-Montreal corridor
3An alternative procedure to determine the number of seg
ments is to apply the Akaike Information Criterion developed
by
Akaike (1974). This is the procedure adopted by Kamakura and
Russell (1989). Gupta and Chintagunta (1994), on the other
hand,
use the BIC. An advantage of the BIC is that it takes into
consideration the sample size used in the analysis. The Akaike
Information Criterion tends to unduly favor a model with more
number of segments for large data samples; the BIC corrects
this
situation (see Allenby, 1990).

and to estimate shifts in mode split in response to a
variety of potential rail service improvements (see
KPMG Peat Marwick and Koppelman, 1990 for a
detailed description of this data). The data includes
sociodemographic and general trip-making charac
teristics of the traveler, and detailed information on
the current trip (purpose, party size, origin and des
tination cities, etc.). The set of modes available to
travelers for their intercity travel was determined
based on the geographic location of the trip (the
universal choice set included car, air, train, and
bus). Level of service data were generated for each
available mode and each trip based on the origin/
destination information of the trip (the assembly of
the level-of-service data was done by KPMG Peat
Marwick for VIA Rail; for information on detailed
procedures to compute level-of-service measures,
the reader is referred to KPMG Peat Marwick and
Koppelman, 1990).
In this article, we focus on intercity mode choice
for paid business travel in the corridor. The study is
further confined to a mode choice examination
among air, train, and car due to the very few number
of individuals choosing the bus mode in the sample
and also because of the poor quality of the bus data
(see forinash, 1992). This is not likely to affect the
applicability of the mode choice model in any serious
way since the bus share for paid business travel in
the corridor is less than one percent. The sample
used is a choice-based sample and comprises 3593
business travelers. The share of the choice of each
mode in the population of travelers was available
from supplementary aggregate data. Since the sam
ple is choice-based with known aggregate shares, we

employed the Weighted Exogenous Sample Maxi
mum Likelihood method proposed by manski and
lerman (1977) in estimation. The asymptotic co
variance matrix of parameters is computed as
H_1AH_1, where H is the hessian and A is the cross
product matrix of the gradients (H and A are eval
uated at the estimated parameter values). This pro
vides consistent standard errors of the parameters
(Borsch-Supan, 1987).
Five variables associated with individual sociode
mographics and trip characteristics were available
in the data, and all of these were considered for
market segmentation. The variables are: income,
sex (female or male), travel group size (traveling
alone or traveling in a group), day of travel (weekend
travel or weekday travel), and (one-way) trip dis
tance. The segmentation variables were introduced
as alternative-specific variables in the logit model of
equation (3) with the last segment being the base.
The level-of-service variables used to model choice
of mode included modal level-of-service measures
2019 05:27:50 UTC
40 / C. R. BHAT
TABLE I
Mean (and Standard Deviation) of Modal Level-of-Service
Variables

Frequency Total Cost In-Vehicle Time Out-of-Vehicle Time
Mode (departures/day) (in Canadian $) (in mins.) (in mins.)
Train 4.21 (2.3) 58.58 (17.7) 244.50 (115.0) 86.32 (22.0)
Air 25.24 (14.0) 157.33 (21.7) 57.72 (19.2) 106.74 (24.9)
Car Not applicable 70.56 (32.7) 249.60 (107.5) 0.00 (0.0)
Note: All measures are one-way values. Air and train frequency
refer to total departures by all carriers.
(frequency of service, total cost, in-vehicle travel
time and out-of-vehicle travel time) and a large city
indicator which identified whether a trip originated,
terminated, or originated and terminated in a large
city. This variable may be viewed as a "proxy" level
of-service measure capturing mode characteristics
associated with large cities. Table I provides descri p
tive statistics on the modal level-of-service measures
in the sample.
3.2. Endogenous Market Segmentation Model
Results
We estimated the endogenous segmentation
model with S = 2, 3, and 4 segments. In all these
estimations, travel time was best represented sim
ply by decomposition into in-vehicle and out-of-vehi
cle travel times. For all S, we found that the use of
travel cost was preferred to travel cost deflated by
income. The final mode choice model specification
for the segmented models included the alternative
specific mode-bias constants and the level-of-service
variables (we examined if there were differences in
preferences based on income and trip distance
within each segment by including these variables as

alternative-specific variables in the mode choice
utility, but did not find any statistically significant
effects).
We computed the BIC value as shown in equation
(16) for each number of segments. The BIC value
decreased as we moved from two to three segments,
but increased when we added the fourth segment
(the BIC value was 2291 for the two-segment case,
2247.9 for the three-segment case and 2259.8 for the
four-segment case). Thus, we selected three seg
ments as the appropriate number of segments.
The estimation results for the three-segment so
lution are shown in Table II. The signs of all the
level-of-service variables in the segment-specific
mode choice models are consistent with a priori ex
pectations. However, there are clear differences in
intrinsic preferences and sensitivities to level-of-ser
vice measures among the segments. Turning to the
segmentation model, the coefficients in this model
TABLE II
Intercity Mode Choice Analysis: Parameter Estimates for
Three-Segment
Solution
Model

Segment 1 Segment 2 Segment 3
Variable Parameter ^-statistic Parameter ^-statistic
Parameter
Segment-specific
mode choice
model
Segmentation
model
Segment size
Mode constants
Train
Air
Large city indicator
Train
Air
Frequency of service (deps./day)
Travel cost (Canadian $)
Travel time (minutes)

In-vehicle
Out-of-vehicle
Constant
Income
Female Sex
Traveling Alone
Weekend Travel
Trip Distance
-3.0617
-1.0516
1.9273
2.2240
0.1615
-0.0591
-0.0254
-0.0436
4.4227
-0.0293
-0.7614

-0.1657
0.2423
-0.0047
-2.54
-1.82
2.20
3.46
6.38
-4.53
-3.25
-2.91
7.62
-5.73
-3.46
-1.70
0.65
-5.91
4.7763

-1.3691
0.2146
-1.3691
0.5784
-0.1728
-0.0030
-0.0239
1.5366
-0.0447
0.9703
-0.7226
1.5326
-0.0030
2.12
-1.01
0.32
-1.01
3.49

-3.27
-1.20
-1.84
2.56
-8.60
4.05
-4.07
4.71
-3.79
1.1737
4.3404
-0.0840
2.6892
0.1790
-0.0166
-0.0657
-0.1627

0.60
3.36
-0.12
2.37
3.92
-0.54
-5.21
-5.01
0.4866 0.1220
Base Segment
0.3914
Note: The log-likelihood value at zero is -3947.3; the log-
likelihood value for the unsegmented model with only
alternative specific
constants is -3648.2. The log-likelihood value at convergence
for the three segment solution is -2100.5.

2019 05:27:50 UTC
MODEL / 41
TABLE III
Qualitative Characterization of Mode Choice Preferences and
Sociodemographic I Trip Attributes of Segments
Preferences/Attributes Characteristic Segment 1 Segment 2
Segment 3
Mode choice
preferences
Sociodemographic
and trip
attributes
Intrinsic preference for
Sensitivity to frequency

Sensitivity to cost
Sensitivity to IVTT
Sensitivity to OVTT
Income
Sex
Travel group size
Day of travel
Trip distance
Car; if trip originates or
ends in large city, air
is preferred
Low
Medium
Medium
(MV = $26/hour)
Medium

(MV = $44/hour)
About average market
income
High proportion of
males
Same proportion as
overall market
High proportion of
weekday travelers
Short
Train
High
High
Low
(MV = $1.05/hour)
Low

(MV = $8.30/hour)
Lower than average
market income
High proportion of
females
High proportion of
individuals
traveling in a group
High proportion of
weekend travelers
Medium
Air
Low
Low
High
(MV = $237/hour)

High
(MV = $588/hour)
Higher than average
market income
Same proportion as
overall market
High proportion of
individuals
traveling alone
High proportion of
weekday travelers
Long
Note: MV stands for "money value of time".
are, in general, very significant.4 A comparison of
the coefficients across the segments provides impor
tant qualitative information regarding the charac
teristics of each segment vis-a-vis other segments.
The constants in the segmentation model contribute

to the size of each segment and do not have any
substantive interpretation. The segment sizes are
computed using equation (4); segment one is largest
(48.7% of intercity travelers) and segment two is the
smallest (12.2% of intercity travelers).
Table III summarizes (qualitatively) the mode
choice characteristics and the sociodemographic and
trip characteristics of individuals within each seg
ment. Segment two has a high overall preference for
train, is highly sensitive to cost, and is low in sen
sitivity to travel time. This result is reasonable since
individuals in segment two earn low incomes, and
travel in a group, and on weekends; the high intrin
sic preference for train in this segment can also be
attributed to the intermediate trip distance of travel
in the segment. Further, previous studies (for exam
ple, KPMG Peat Marwick et al., 1993) have indi
cated that women prefer common-carrier surface
modes more than men do, which is consistent with
the result obtained here (segment two has the high
est female proportion). The high frequency sensitiv
ity of segment two may be associated with group

4It is useful to note that had we adopted a full-dimensional
exogenous segmentation scheme, there would have been (assum
ing only two income segments and two trip-distance segments)
25 = 32 market segments with an average of only about 110
observations within each segment.
travel; individuals traveling together may have dif
ferent schedule conveniences and may therefore
place a high premium on frequency of service. Seg
ment three has a high overall preference for air, is
not very sensitive to cost, and is highly sensitive to
travel time; this is consistent with the high-income
of individuals in this segment and the single-indi
vidual, weekday, long distance travel characteristics
of the segment. The money value of time is very high
in this segment. Segment one is the intermediate
segment, both in level-of-service sensitivity and so
ciodemographic/trip characteristics. Individuals in
the first segment whose trips do not originate or end
in a large city have an intrinsic preference for the
car and air modes over the train mode, and for the
car mode over the air mode. For individuals whose
trips originate or end in a large city, there is a
greater intrinsic preference for air over the car mode
in this segment.

To characterize each segment more intuitively, we
compute the mean values of the segmentation vari
TABLE IV
Mean Values of Demographic and Trip Variables in Each
Segment
Variable
Overall
Segment 1 Segment 2 Segment 3 Market
Income (X 103 Can$)
Female
Traveling alone
Weekend travel
Trip distance (km)
52.16
0.13
0.69
0.20

311.80
44.09
0.48
0.57
0.62
373.37
60.28
0.20
0.77
0.19
54.36
0.20
0.70
0.25
444.76 371.35
2019 05:27:50 UTC

42 / C. R. BHAT
TABLE V
Empirical Comparison of the Endogenous Segmentation Model
with Refined Utility Function Specification Models
Statistic
First-Order
Preference
Interactions
First-Order Preference and First-Order
Response Interactions
Second-Order Preference and First-Order
Response Interactions
Full version Preferred version Full version Preferred version
Number of parameters
Log-likelihood at

convergence
Adjusted likelihood
ratio index, p2 a
Result of non-nested
likelihood ratio index
test with endogenous
segmentation model
18
-2299.38
0.4129
46
-2199.07
0.4312
24
-2211.48
0.4337

66
-2154.09
0.4376
32
-2174.71
0.4409
Reject very strongly the hypotheses that any (and each) of the
above models is as good as the
endogenous segmentation model.
Note: The likelihood value at convergence for the endogenous
segmentation model is -2100.5 and the adjusted likelihood ratio
index
for this model is 0.4587 (number of parameters in the model is
36).
aThe adjusted likelihood ratio index is computed as p2 = 1 -
[(i?(j3) - k)/!?(0)], where i?(j3), i?(0), and k are the log-
likelihood function
value at convergence, the log-likelihood value at zero, and the

number of model parameters, respectively.
ables within each segment (see equation 5). The
results are presented in Table IV and support our
previous observations regarding segment character
istics.
3.3. Comparison of Endogenous Market
Segmentation with Alternative
Approaches
In this section, we compare the empirical perfor
mance of the endogenous market segmentation ap
proach relative to the two alternative extant meth
ods to accommodating systematic heterogeneity in
preference and response across the population
(these alternative methods are the refined utility
function specification method and the limited
dimensional exogenous market segmentation
method).
Table V summarizes the results of models ob
tained using the refined utility function method. We
present the overall measures of fit for three different
specifications. The first specification includes a)

first-order interactions of all the five segmentation
variables with mode preferences, b) the large city
indicator variable, and c) the four modal level-of
service variables: frequency, cost, and in-vehicle and
out-of-vehicle travel times (variants of this specifi
cation that incorporate the cost level-of-service as
cost/income and the out-of-vehicle time measure as
out-of-vehicle over distance, as has been done in
some earlier mode choice analyses, were rejected by
the specification reported here). All the five segmen
tation variables had a statistically significant im
pact on train and air mode preferences (car is the
base; the impact of each of these variables was also
not the same across the train and air modes). The
second specification adds first-order interactions of
the five segmentation variables with the level-of
service measures.5 This specification, in its full ver
sion, effectively adds 28 level-of-service variables to
the first specification. However, of these, only six of
the interaction terms were significant and these are
retained in the preferred version. The third specifi
cation, in its full version, adds all (ten) second-order
interactions of the five segmentation variables with

mode preferences to the full version of the second
specification (since there are three modes, this third
specification adds 20 additional parameters to the
full version of the second specification). In this third
specification, only seven of the second-order interac
tions with preference and seven of the first-order
interactions with the level-of-service variables were
significant. These are retained in the preferred ver
sion. As can be observed from the table, the adjusted
likelihood ratio index of the refined utility function
specification models are smaller than that of the
endogenous segmentation model. The endogenous
segmentation model rejects the refined utility func
tion models in Table V in non-nested adjusted like
lihood ratio tests (see Ben-Akiva and Lerman, 1985;
page 172 for the structure of the test). Thus, it
provides a better fit to the data. We also found that
some of the statistically significant coefficients in
the refined utility models were non-intuitive. For
example, individuals in the lower income bracket
were found to be less sensitive to cost and more
sensitive to out-of-vehicle time than individuals in
the middle and higher income brackets.
5For the two continuous variables, income and trip distance,

we
defined three distinct ranges and allowed the level-of-service
sensitivity to be different for the different ranges. The income
segments were: less than $40,000, $40,000-$65,000, and
greater
than $65,000. The segments for trip-distance were less than
250
miles, 250-500 miles, and greater than 500 miles.
2019 05:27:50 UTC
MODEL / 43
TABLE VI
Empirical Comparison of the Endogenous Segmentation Model
with Limited-Dimensional Exogenous Segmentation Models
Segmentation
Variable

Number of
Segments
Number of
Parameters
LL at
Convergence
Two-Way
Segmentation
With . . .
Number of Number of LL at
Segments Parameters Convergence
Incomed
Sex
Group size
Day of travel

Trip distance
52 -2242.9 0.4186 0.4272 Sex 6 92 -2214.8 0.4156
Group size 6 92 -2208.9 0.4186
Day of travel 6 92 -2216.6 0.4151
Trip distance 9 156 -2108.1 0.4264
32 -2287.4 0.4124 0.4160 Group size 4 64 -2262.0 0.4107
Day of travel 4 64 -2260.1 0.4112
Trip distance 6 96 -2179.7 0.4235
32 -2278.5 0.4147 0.4182 Day of travel 4 64 -2250.9 0.4135
Trip distance 6 96 -2163.2 0.4277
32 -2279.3 0.4145 0.4180 Trip distance 6 96 -2172.3 0.4253
54 -2212.4 0.4258 0.4350 - - - - -
0.4298
0.4313
0.4293
0.4568
0.4178
0.4183
0.4387
0.4206

0.4429
0.4406
Note: The adjusted likelihood ratio index for the endogenous
segmentation model is 0.4587 and the number of parameters in
this
model is 36.
aThe adjusted likelihood ratio index is computed as p2 = 1 -
[(i?(/3) - k)/!?(0)], where ??(j3), i?(0), and k are the log-
likelihood function
value at convergence, the log-likelihood value at zero, and the
number of model parameters, respectively.
bThe "favorable" adjusted likelihood ratio index for the one -
way segmentations is computed by using the log-likelihood
value at
convergence for the segmented models, but by assuming the
number of parameters to be equal to that in the unsegmented
model.
cThe "favorable" adjusted likelihood ratio index for two-way
segmentations is computed by using the log-likelihood at
convergence from
the segmented models, but assuming the number of parameters

to be the same as that in the endogenous segmentation model.
dThere was no variation in income in the highest income
segment and hence income is not introduced as an alternative-
specific
variable in the high-income segment.
Table VI presents model results for one-way seg
mentations and for all possible combinations of two
way variable segmentations in the spirit of the lim
ited-dimensional exogenous market segmentation
method. The specification adopted within each seg
ment is based on the first specification of the refined
utility function specification in Table V.
The one-way segmentation results show that, on
the basis of the adjusted likelihood ratio index p2,
distance is the most important segmentation vari
able (all segmentations reject the unsegmented
model in nested likelihood ratio tests at the 0.05
significance level or better). We also find that all the
exogenously segmented models have a p2 which is
significantly less than that of the endogenous seg

mentation model based on a non-nested test. Since
these results may be deceiving (the analyst can spec
ify improved exogenous segmentation models by
constraining coefficients across segments), we com
puted the p2 for the one-way segmented models un
der the most favorable (albeit unrealistic) scenario
where the likelihood value corresponds to the exog
enously segmented model, but the number of param
eters is assumed to be equal to that of the unseg
mented model. The resulting pfav values are shown
in the Table and indicate that, even in this most
favorable scenario, the p2 for the one-way segmen
tation models are significantly smaller than that of
the endogenous segmentation model.
The results of the two-way exogenous segmenta
tion models reject the corresponding one-way seg
mentation models in nested likelihood ratio tests.
This suggests that there is benefit to increasing the
order of interaction effects of the segmentation vari
ables on preference and response. But, even for
these two-way segmentations, the p2 values are sig
nificantly smaller than that of the endogenous seg
mentation model. As in the case of the one-way
segmentation model, we computed a "favorable" p2

value for the two-way models, this time assuming
that the number of parameters in the two-way seg
mentation models is the same as in the endogenous
segmentation model (i.e., 36 parameters). These val
ues are presented in the final column and indicate
that, even under this favorable scenario, the only
model that is close to the endogenous segmentation
model is the one that segments by income and trip
distance. However, a non-nested test convincingly
rejects even this best "favorable-scenario"-specifica
tion relative to the endogenous segmentation model.
In addition, as in the refined utility specifications,
we obtained non-intuitive results in the income
distance two-way segmentation model; the parame
ter on cost was positive in all the segments (and in
two segments, significantly so) except in the high
income-high trip distance segment. Further, in all
but two segments, the parameter on in-vehicle time
was statistically insignificant (in part, this is a con
2019 05:27:50 UTC

44 / C. R. BHAT
sequence of imprecision in coefficient estimates due
to the small number of observations in the seg
ments).
In summary, the empirical results indicate clearly
that the endogenous segmentation model provides a
better fit to the data and offers more intuitively
appealing results compared to other extant ap
proaches to incorporating systematic heterogeneity.
This can be attributed to the inclusion of the full
order of interaction effects of the segmentation vari
ables on preference and level-of-service sensitivity.
3.4. Assessment of the Performance of the
Hybrid EM-DFP Algorithm
The EM algorithm produces iteration sequences
with good global convergence characteristics (i.e.,
the EM iteration sequence converges to a local max
imizer of the likelihood function) in mixture models,
even when the starting values are distant from the
maximizing values (the theory and proofs of this

result, and a review of an exhaustive body of empir
ical evidence to support this result, can be found in
Redner and Walker, 1984). On the other hand, the
complex nature of the dependence of the log-likeli
hood function on the mixture-model parameters can
produce Newton and quasi-Newton iteration se
quences that get stalled at some point and never
make further progress toward convergence, espe
cially when the starting parameter values are dis
tant from the maximizing values (this is because the
Newton and quasi-Newton iterative schemes rely on
approximations that are valid only in the neighbor
hood of the maximizing parameter values; see Red
ner and Walker, 1984 and CRAMER, 1986, page 72).
Our own empirical experience during the maximi
zation of the likelihood function of the endogenous
segmentation model reinforces the theoretical anal
ysis and empirical evidence discussed earlier. A nat
ural starting point for the iterations in the endoge
nous segmentation model is to use the unsegmented
mode choice model parameters for the mode choice
model in each segment and set all segment member
ship model parameters to zero.6 We considered three
algorithms for estimation: the EM method, the DFP

(Davidson, Fletcher, Powell) algorithm (which is a
widely used and effective quasi-Newton algorithm,
see GREENE, 1990; page 370), and a hybrid EM-DFP
algorithm that starts the iteration with the EM al
gorithm but switches to the DFP algorithm later.
6In practice, at least one mode choice parameter should be
shifted slightly (up or down) in one of the segments to start the
iterations; otherwise, there is no incentive for the segment mem
bership parameters to be updated and "convergence" will be
achieved immediately.
In terms of stability, we found that the EM algo
rithm outperformed the DFP algorithm. In particu
lar, the EM algorithm was predictable; it produced
very similar and monotonically increasing (in the
likelihood function) iteration sequences in repeated
trials. The DFP algorithm, on the other hand, was
quite unpredictable in its iteration sequence in re
peated trials. Some iteration sequences appeared to
proceed well, while others wandered aimlessly with
out any improvement to the log-likelihood function
and without ever converging. The cause of such be
havior appeared to be due to the line search; the

DFP algorithm had some difficulty in early itera
tions in finding an appropriate step length and,
therefore, entered a random direction-search mode.
The random search sometimes resulted in a subse
quent stable iteration sequence and sometimes did
not proceed anywhere. We never obtained a conver
gent iteration sequence with the DFP algorithm for
the endogenous segmentation model with four seg
ments. We obtained a convergent sequence in three
out of five trials for the two-segment solution and in
two of five trials for the three-segment solution.7 We
did not encounter any stability problems in the hy
brid EM-DFP method since the DFP iterations were
started with parameter values close to their maxi
mizing values.
We now compare the computational speeds of the
EM algorithm, the DFP algorithm, and the hybrid
EM-DFP algorithm. We focus on the three-segment
solution. For the DFP algorithm, we present the
results for the fastest successful run. All estimations
were done using the GAUSS matrix programming
language on a 486-DX2, 66 MHz. computer. The
gradient of the log-likelihood function was coded for

the DFP algorithm and the gradient and hessian
were coded for the MNL estimation in the EM algo
rithm. The results are shown in Figure 1. Each point
represents an iteration of the indicated algorithm.
The vertical axis represents the log-likelihood func
tion divided by the number of observations in the
sample.
The EM algorithm has a stable S-shaped pattern.
It takes a few small steps at the beginning before
taking large steps toward increasing the log-likeli
7Both Kamakura and Russell (1989) and Gupta and Chinta
gunta (1994) do not cite any problems in their estimations. It is
not clear whether they did not encounter any problems or
whether they just did not report them. It should also be empha
sized that whereas the study here and many earlier empirical
studies with mixture models have documented difficulties with
the usual maximization algorithms, this need not be a universal
finding in all data contexts. Theoretical and empirical examina
tion of the relationship between data contexts and algorithm
performance would be a useful direction for future research.
2019 05:27:50 UTC

MODEL / 45
Execution Time in Minutes
Convergence Value = -0.584609
Best DFP Algorithm Trial
Convergence Time = 149.1 minutes
20 30 40 50 60 70 80 90 100 110
120 130 140 150 160
Convergence Value = -0.584609
EM-DFP Hybrid Algorithm
Convergence Time = 63.3 minutes
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160

Fig. 1. Performance of alternative algorithms for the three
segment solution.
hood function (this is in contrast to some earlier
applications of the algorithm where the algorithm
makes the largest leap in the first iteration and the
rate of the leap decreases monotonically thereafter;
see Ruud, 1991 and Watson and Engle, 1983). The
secondary MNL iterations in each main EM itera
tion ranged from two to three with the largest EM
iteration leaps being associated with three MNL
iterations. As documented in other applications of
the EM algorithm (see Watson and Engle, 1983 and
Redner and Walker, 1984), most of the change in the
log-likelihood function is realized in the first few
iterations and the algorithm is very slow in conver
gence rate near the likelihood maximum.
Each iteration of the DFP algorithm, in general, is
less time-consuming than the EM iteration; how
ever, because of entering into a random direction
search mode, some early iterations take a substan

tial amount of time. The iteration sequence for the
DFP is not smooth; there are sequences with little
improvement in the log-likelihood function. But the
algorithm converges much more rapidly near the
likelihood maximum relative to the EM method.
In the hybrid EM-DFP algorithm, we used a cri
terion of less than 0.01 difference in successive val
ues of the average log-likelihood function of equation
(15) to switch from the EM algorithm to the DFP
algorithm (this criterion was applied only after a
minimum of five EM iterations). The results clearly
show the effectiveness of the hybrid algorithm which
combines the large and stable steps of the EM algo
rithm at the beginning of the maximization with the
stable and relatively rapid convergence of the DFP
algorithm near the likelihood maximum.
4. CHOICE ELASTICITIES AND POLICY
IMPLICATIONS
THE RESULTS IN TABLE VII present the choice elas
ticities and the mode shares obtained from the en
dogenous segmentation model at the segment-level
and the market level. We present the relevant elas

ticities for changes in rail level-of-service character
istics only. The entries in the segment-level columns
represent the contribution of the corresponding seg
ment to the overall percentage change in market
share of the modes due to a one percent increase in
the train level-of-service variables. We present these
values rather than the actual segment-level elastic
ities since the current entries provide a better basis
to examine the differential effects (across segments)
of policy actions to improve rail level-of-service (the
segment-level elasticities provide information only
on within-segment shifts; they are not useful for
cross-segment comparisons since they do not ac
count for the different segment sizes and because
they are a function of segment-level mode choice
probabilities).
The choice elasticities in the table reflect our ear
lier qualitative discussion of the level-of-service sen
sitivity differences among the segments. It is also
useful to note that the market share sensitivity of
car and air are about the same due to an increase in
rail frequency or rail cost for segments two and
8Since the objective of the original study for which the data

were collected was to examine the effect of alternative improve
ments in rail level-of-service characteristics, we focus on the
elasticity matrix corresponding to changes in rail level -of-
service
here.
2019 05:27:50 UTC
46 / C. R. BHAT
TABLE VII
Choice Elasticities (in Response to Change in Rail Service) and
Mode Choice Shares from the Endogenous Segmentation Model
Segment 1 Segment 2 Segment 3 Market3 Rail Level of Service
?
Attribute Rail Air Car Rail Air Car Rail Air Car Rail Air Car
Frequency 0.15 -0.01 -0.04 0.20 -0.03 -0.04 0.06 -0.01 -0.01
0.41 -0.05 -0.10
Cost -0.57 0.06 0.14 -0.77 0.14 0.15 -0.14 0.03 0.03 -1.48 0.22

0.31
IVTTb -0.32 0.01 0.09 -0.03 0.00 0.00 -0.51 0.07 0.12 -0.86
0.09 0.21
0VTTb -0.35 0.03 0.09 -0.18 0.03 0.04 -0.67 0.08 0.16 -1.20
0.14 0.29
Mode choice 0.07 0.20 0.73 0.80 0.07 0.13 0.06 0.76 0.18 0.15
0.40 0.44
sharesc
aThe overall market elasticity is the sum of the "elasticity"
entries in the individual segments. The "elasticity" entries at the
segment
level represent the contribution of the corresponding segment
to the overall percentage change in market share due to a 1%
increase in
rail level-of-service attribute. The actual increase (decrease) in
rail share at the market level is equal to the decrease (increase)
in the air
and car shares at the market level. These conditions do not hold
exactly in the table due to rounding of the elasticity entries.
bIVTT represents In-Vehicle Travel Time; OVTT represents
Out-of-Vehicle Travel Time.
cThe overall market share of each mode in the joint

segmentation/mode choice model is equal to the weighted
average of the
corresponding segment-level mode shares, where the weights
are the segment sizes.
three, but the sensitivity of the car mode is greater
than air for segment one.
The mode choice shares in individual segments in
the segmented model show that the car share is
highest for segment one, the rail share is highest for
segment two, and the air share is highest for seg
ment three. The market share predictions from the
segmented model are not exactly equal to the actual
market shares (as represented by the weighted es
timation sample) if we consider all significant digits;
they are however the same as the actual market
shares up to two decimal places in this empirical
analysis (if we use the a posteriori segment mem
bership probabilities instead of the a priori segment
membership probabilities in computing the mode
shares, then it is relatively straightforward to show

using equation (12) that the market share predic
tions from the segmented model will equal the ac
tual market shares represented by the weighted es
timation sample).
The choice elasticities of the segmented model
provide valuable insights for the targeting of infor
mational campaigns regarding rail service improve
ments and for positioning of rail improvements in
the travel market. In particular, the results indicate
that it is most efficient to target individuals who
have low-income earnings, who are female, and who
travel in a group (segment two) for informational
campaigns on rail frequency improvements or rail
cost decreases. Rail frequency improvements and
rail cost reductions are best positioned in the medi
um-distance intercity travel corridors and in the
weekend business travel market. The above strate
gies lead to the maximum increase in rail share in
response to frequency improvements and cost reduc
tions, but the increase in rail share is due to almost
equal percentage decreases in car and air shares. If
the primary objective is to decrease auto-congestion
on highways, while at the same time ensuring a

reasonable increase in rail share, then it may be
more appropriate to target the male traveling pop
ulation with medium incomes, and to position fre
quency improvements and cost reductions in the
short-distance travel corridors and in the weekday
travel market (segment one). Informational cam
paigns regarding rail travel time improvements are
best directed toward the high-income travelers; such
improvements are best positioned in the long-dis
tance travel corridors and in the weekday travel
market.
5. SUMMARY AND CONCLUSIONS
IN THIS PAPER, we propose the use of an endogenous
market segmentation model in a travel demand con
text. In this model, membership in a segment is
probabilistic and is a function of the sociodemo
graphic and trip characteristics of each individual.
The formulation enables the use of a number of
segmentation variables without requiring the multi
way partition of data, and obviates the need to es
tablish ad hoc threshold values of continuous seg
mentation variables. An examination of the
individual-level cross-elasticities indicates that the

endogenous segmentation model does not have the
independence from irrelevant alternatives property
of the standard multinomial logit formulation. We
have also developed a stable, efficient estimation
procedure for estimati ng the segmentation model.
This procedure involves the use of an EM algorithm
to obtain starting values for the subsequent direct
maximization of the likelihood function of the seg
mentation model. Whereas we have developed the
hybrid EM-DFP technique in the context of a multi
2019 05:27:50 UTC
MODEL / 47
nomial logit formulation for mode choice, the same
technique can be used for more complex mode choice
formulations such as the nested logit or multinomial
probit. A particularly valuable contribution of our
estimation approach is that the EM algorithm

(which requires only the standard multinomial logit
software for estimation) can be used in isolation to
obtain the parameters of the segmentation model
and their asymptotic standard errors if access to
general maximum-likelihood software is not avail
able (this approach will, however, be quite time
consuming due to the slow nature of convergence of
the EM algorithm in the vicinity of the likelihood
maximum).
We applied the endogenous segmentation model
to the estimation of inter-city travel mode choice in
the Toronto-Montreal corridor. Using the Bayesian
Information Criterion, we found that the preferred
specification used a three-segment solution. The
probability of belonging to any segment was a func
tion of income, sex, travel group size, day of travel,
and trip distance. In a comparative empirical assess
ment of the endogenous segmentation model with
other commonly used methods to account for sys
tematic heterogeneity, we found the endogenous
segmentation model to be the most appropriate,
based on fit to the data and reasonability of the
results. Our assessment of the performance of the
hybrid EM-DFP algorithm vis-a-vis the DFP

method also clearly indicated the superiority of the
former over the latter.
The results of the preferred three-segment solu
tion of the endogenous segmentation model showed
that the intrinsic preferences for modes and level
of-service sensitivity were quite different among the
three segment groups. These differences, in combi
nation with the sociodemographic and trip-related
characteristics of individuals in each segment, pro
vide important information for effective targeting
and strategic positioning of rail service improve
ments. More generally, the endogenous segmenta
tion model represents a valuable methodology for
evaluating the effects of intercity traffic congestion
alleviation strategies in urban and intercity travel
contexts.
REFERENCES
G. Allenby, "Hypothesis Testing with Scanner Data: The
Advantage of Bayesian Methods," Journal of Market
ing Research 27, 379-389 (1990).
M. Ben-Akiva and s. R. lerman, Discrete Choice Analysis:

Theory and Application to Travel Demand, The MIT
Press, Cambridge, 1985.
A. b6rsch-supan, Econometric Analysis of Discrete
Choice, Lecture Notes in Economics and Mathematical
Systems, p. 34, Springer-Verlag, Berlin, Germany,
1987.
G. Chamberlain, "Analysis of Covariance with Qualita
tive Data," Review of Economic Studies 47, 225-238
(1980).
J. S. cramer, Econometric Applications of Maximum Like
lihood Methods, Cambridge University Press, Cam
bridge, London, 1986.
M. C. Dayton and G. B. Macready, "Concomitant-Vari
able Latent Class Models," Journal of the American
Statistical Society 83, 173-179 (1988).
A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum
Likelihood from Incomplete Data via the EM Algo
rithm," Journal of the Royal Statistical Society B39,
1-38 (1977).

Electric Power Research Institute, Methodology for
Predicting the Demand for New Electricity-Using
Goods, EA-593, Project 488-1, Final Report, 1977.
G. W. Fischer and D. Nagin, "Random Versus Fixed
Coefficient Quantal Choice Models," in Structural
Analysis of Discrete Data with Econometric Applica
tions, pp. 273-304, Manski, C. F. and D. McFadden
(eds.), MIT Press, Cambridge, Massachusetts, 1981.
C. V. forinash, A Comparison of Model Structures for
Intercity Travel Mode Choice, unpublished Master's
Thesis, Department of Civil Engineering, Northwest
ern University, Evanston, Illinois, 1992.
F. Gonul and K. Srinwasan, "Modeling Multiple Sources
of Heterogeneity in Multinomial Logit Models: Meth
odological and Managerial Issues," Marketing Science
12, 213-229 (1993).
W. H. Greene, Econometric Analysis, Macmillan Publish
ing Company, New York, 1990.
S. Gupta and P. K. Chintagunta, "On Using Demo

graphic Variables to Determine Segment Membership
in Logit Mixture Models," Journal of Marketing Re
search 31, 128-136 (1994).
J. A. Hausman and D. Wise, "A Conditional Probit Model
for Qualitative Choice: Discrete Decisions Recognizing
Interdependence and Heterogenous Preferences,"
Econometrica 46, 403-426 (1978).
D. A. Hensher, "Two Contentions Related to Conceptual
Context in Behavioral Travel Modeling," in New Hori
zons in Travel Behavior Research, pp. 509-514, P. R.
Stopher, A. H. Meyburg, and W. Brog (eds.), Lexington
Books, Lexington, Massachusetts, 1981.
J. L. Horowitz, "Reconsidering the Multinomial Probit
Model," Transportation Research 25B, 433-438 (1991).
J. L. horowitz, "Semiparametric Estimation of a Work
Trip Mode Choice Model," Journal of Econometrics 58,
49-70 (1993).
W. A. Kamakura and G. J. Russell, "A Probabilistic
Choice Model for Market Segmentation and Elasticity
Structure," Journal of Marketing Research 26, 379

390 (1989).
KPMG Peat Marwick and F. S. Koppelman, Proposal for
Analysis of the Market Demand for High Speed Rail in
the Quebec/Ontario Corridor, submitted to Ontario/
Quebec Rapid Train Task Force, 1990.
KPMG Peat Marwick in association with ICF Kaiser En
2019 05:27:50 UTC
48 / C. R. BHAT
gineers, Inc., Midwest System Sciences, Resource
Systems Group, Comsis Corporation and Transporta
tion Consulting Group, Florida High Speed and Inter
city Rail Market and Ridership Study: Final Report, sub
mitted to Florida Department of Transportation, 1993.
C. Manski and S. Lerman, "The Estimation of Choice
Probabilities from Choice-Based Samples," Economet

rica 45, 1977-1988 (1977).
D. McFadden, "Conditional Logit Analysis of Qualitative
Choice Behavior" in Frontiers in Econometrics, Za
remmbka, P. (ed.), Academic Press, New York, 1973.
G. J. McLachlan and K. E. Basford, Mixture Models:
Inference and Applications to Clustering, Marcel Dek
ker, Inc., New York, 1988.
R. A. Redner and H. F. Walker, "Mixture densities, Max
imum Likelihood, and the EM Algorithm," SIAM Re
view 2, 195-239 (1984).
P. A. Ruud, Extensions of Estimation Methods Using the
EM Algorithm, Journal of Econometrics 49, 305-341
(1991).
M. W. Watson and R. F. Engle, "Alternative Algorithms
for the Estimation of Dynamic Factor, MIMIC and
Varying Coefficient Regression Models," Journal of
Econometrics 23, 385-400 (1983).
(Submitted: April 1995; revisions received: September 1995,
Feb

ruary 1996; accepted: February 1996)
2019 05:27:50 UTC
Contentsp. 34p. 35p. 36p. 37p. 38p. 39p. 40p. 41p. 42p. 43p.
44p. 45p. 46p. 47p. 48Issue Table of ContentsTransportation
Science, Vol. 31, No. 1 (February 1997) pp. 1-100Front
Matterþÿ�þ�ÿ��I��N��
��M��E��M��O��R��I��A��M��:
�� W��i��l��l��i��a��m��
��S��.��
��V��i��c��k��r��e��y��,��
��1��9��1��4��1��9��9��6��
� ��[��p��p��.�� 1��-��2��]In
This Issue [pp. 3-4]Controlled Optimization of Phases at an
Intersection [pp. 5-17]Analytical Models for Vehicle/Gap
Distribution on Automated Highway Systems [pp. 18-33]An
Endogenous Segmentation Mode Choice Model with an
Application to Intercity Travel [pp. 34-48]A Tabu Search
Heuristic for the Vehicle Routing Problem with Backhauls and
Time Windows [pp. 49-59]Heuristic Algorithms for the
Handicapped Persons Transportation Problem [pp. 60-71]Airline
Scheduling for the Temporary Closure of Airports [pp. 72-
82]"TSS Dissertation Abstracts": Abstracts for the 1996

Transportation Science Section Dissertation Prize Competition
[pp. 83-94]Bibliographic SectionReview: untitled [pp. 95-
96]Review: untitled [pp. 96-97]Back Matter
www.asb.unsw.edu.au
Last updated: 12/11/12 CRICOS Code: 00098G
Australian School of Business Research Paper No. 2012 ECON
49
lclogit: A Stata module for estimating latent class conditi onal
logit models via the
Expectation-Maximization algorithm

Daniele Pacifico
Hong il Yoo
This paper can be downloaded without charge from
The Social Science Research Network Electronic Paper
Collection:
http://ssrn.com/abstract=2174146

Australian School of Business
Working Paper
http://ssrn.com/abstract=2174146�
The Stata Journal (yyyy) vv, Number ii, pp. 1–15
lclogit: A Stata module for estimating latent
class conditional logit models via the
Expectation-Maximization algorithm
Daniele Pacifico

Italian Department of the Treasury
[email protected]
Hong il Yoo
University of New South Wales
[email protected]
Abstract. This paper describes lclogit, a Stata module for
estimating a discrete
mixture or latent class logit model via the EM algorithm.
Keywords: st0001, lclogit, latent class model, EM algorithm,
mixed logit
1 Introduction
Mixed logit or random parameter logit is used in many
empirical applications to cap-
ture more realistic substitution patterns than traditional
conditional logit. The random
parameters are usually assumed to follow a normal distribution
and the resulting model
is estimated through simulated maximum likelihood, as in Hole
(2007)’s Stata module
mixlogit. Several recent studies, however, note potential gains

from specifying a dis-
crete instead of normal mixing distribution, including the
ability to approximate the
true parameter distribution more flexibly at lower computational
costs.1
Pacifico (2012) implements the Expectation-Maximization (EM)
algorithm for esti-
mating a discrete mixture logit model, also known as latent
class logit model, in Stata.
As Bhat (1997) and Train (2008) emphasize, the EM algorithm
is an attractive alterna-
tive to the usual (quasi-)Newton methods in the present context,
because it guarantees
numerical stability and convergence to a local maximum even
when the number of la-
tent classes is large. In contrast, the usual optimization
procedures often fail to achieve
convergence since inversion of the (approximate) Hessian
becomes numerically difficult.
With this contribution, we aim at generalizing Pacifico (2012)’s
code with a Stata
module that introduces a series of important functionalities and
provides an improved

performance in terms of run time and stability.
2 EM algorithm for latent class logit
This section recapitulates the EM algorithm for estimating a
latent class logit model
(LCL).2 Suppose that each of N agents faces, for notational
simplicity, J alternatives
1. For example, see Hess et al. (2011), Shen (2009) and Greene
and Hensher (2003).
2. Further details are available in Bhat (1997) and Train (2008).
c© yyyy StataCorp LP st0001
2 Latent class logit model
in each of T choice scenarios.3 Let ynjt denote a binary variable
which equals 1 if agent
n chooses alternative j in scenario t and 0 otherwise. Each
alternative is described by
alternative-specific characteristics, xnjt, and each agent by
agent-specific characteristics

including a constant, zn.
LCL assumes that there are C distinct sets (or classes) of taste
parameters, β =
(β1,β2, ...,βC ). If agent n is in class c, the probability of
observing her sequence of
choices is a product of conditional logit formulas:
Pn(βc) =
T∏
t=1
J∏
j=1
(
exp(βcxnjt)∑J
k=1 exp(βcxnkt))
)ynjt
(1)
Since the class membership status is unknown, the researcher

needs to specify the
unconditional likelihood of agent n’s choices, which equals the
weighted average of
equation 1 over classes. The weight for class c, πcn(θ), is the
population share of that
class and usually modeled as fractional multinomial logit:
πcn(θ) =
exp(θczn)
1 +
∑C−1
l=1 exp(θlzn)
(2)
where θ = (θ1,θ2, ...,θC−1) are class membership model
parameters; note that θC has
been normalized to zero for identification.
The sample log likelihood is then obtained by summing each
agent’s log uncondi-
tional likelihood:
ln L(β,θ) =

N∑
n=1
ln
C∑
c=1
πcn(θ)Pn(βc) (3)
Bhat (1997) and Train (2008) note numerical difficulties
associated with maximizing
equation 3 directly. They show that β and θ can be more
conveniently estimated via
a well-known EM algorithm for likelihood maximization in the
presence of incomplete
data, treating each agent’s class membership status as the
missing information. Let
superscript s denote the estimates obtained at the sth iteration of
this algorithm. Then,
at iteration s + 1, the estimates are updated as:
βs+1 = argmaxβ
∑N

n=1
∑C
c=1 ηcn(β
s,θs) ln Pn(βc)
θs+1 = argmaxθ
∑N
n=1
∑C
c=1 ηcn(β
s,θs) ln πcn(θ)
(4)
where ηcn(β
s,θs) is the posterior probability that agent n is in class c
evaluated at the
3. lclogit is also applicable when the number of scenarios varies
across agents, and that of alternatives

varies both across agents and over scenarios.
D. Pacifico and H. Yoo 3
sth estimates:
ηcn(β
s,θs) =
πcn(θ
s)Pn(β
s
c)∑C
l=1 πln(θ
s)Pn(β
s
l )
(5)

The updating procedure can be implemented easily in Stata,
exploiting clogit and
fmlogit routines as follows.4 βs+1 is computed by estimating a
conditional logit model
(clogit) C times, each time using ηcn(β
s,θs) for a particular c to weight observations on
each n. θs+1 is obtained by estimating a fractional multinomial
logit model (fmlogit)
which takes η1n(β
s,θs), η2n(β
s,θs), · · · , ηCn(βs,θs) as dependent variables. When zn
only includes the constant term so that each class share is the
same for all agents, ie
πcn(θ) = πc(θ), each class share can be directly updated using
the following analytical
solution:
πc(θ
s+1) =
∑N
n=1 ηcn(β

s,θs)∑C
l=1
∑N
n=1 ηln(β
s,θs)
(6)
without estimating the fractional multinomial logit model.
With a suitable selection of starting values, the updating
procedure can be repeated
until changes in the estimates and/or improvement in the log
likelihood between itera-
tions are small enough.
An often highlighted feature of LCL is its ability to
accommodate unobserved inter-
personal taste variation without restricting the shape of the
underlying taste distribu-
tion. Hess et al. (2011) have recently emphasized that LCL also
provides convenient
means to account for observed interpersonal heterogeneity in

correlations among tastes
for different attributes. For example, let βq and βh denote taste
coefficients on the q
th
and hth attributes respectively. Each coefficient may take one of
C distinct values, and
is a random parameter from the researcher’s perspective. Their
covariance is given by:
covn(βq,βh) =
C∑
c=1
πcn(θ)βc,qβc,h −
(
C∑
c=1
πcn(θ)βc,q

)(
C∑
c=1
πcn(θ)βc,h
)
(7)
where βc,q is the value of βq when agent n is in class c, and
βc,h is defined similarly.
As long as zn in equation 2 includes a non-constant variable,
this covariance will vary
across agents with different observed characteristics through the
variation in πcn(θ).
3 The lclogit command
lclogit is a Stata module which impleme nts the EM iterative
scheme outlined in the
previous section. This module generalizes Pacifico (2012)’s
step-by-step procedure and
4. fmlogit is a user-written program. See footnote 5 for a further

description.
introduces an improved internal loop along with other important
functionalities. The
overall effect is to make the estimation process more
convenient, significantly faster and
more stable numerically.
To give a few examples, the internal code of lclogit executes
fewer algebraic oper-
ations per iteration to update the estimates; uses the standard
generate command to
perform tasks which were previously executed with slightly
slower egen functions; and
works with log probabilities instead of probabilities when
possible. All these changes
substantially reduce the estimation run time, especially in the
presence of a large num-
ber of parameters and/or observations. Taking the 8-class model
estimated by Pacifico
(2012) for example, lclogit produces the same results as the

step-by-step procedure
while using less than a half of the latter’s run time.
The data setup for lclogit is identical to that required by clogit.
The generic syntax for lclogit is:
lclogit depvar
[
varlist
] [
if
][
in
]
, group(varname) id(varname) nclasses(#)[
options
]
The options for lclogit are:
• group(varname) is required and specifies a numeric identifier

variable for the
choice scenarios.
• id(varname) is required and specifies a numeric identifier
variable for the choice
makers or agents. When only one choice scenario is available
for each agent, users
may specify the same variable for both group() and id().
• nclasses(#) is required and specifies the number of latent
classes. A minimum
of 2 latent classes is required.
• membership(varlist) specifies independent variables that enter
the fractional multi-
nomial logit model of class membership, i.e. the variables
included in the vector
zn of equation 2. These variables must be constant within the
same agent as iden-
tified by id().5 When this option is not specified, the class
shares are updated
algebraically following equation 6.
• convergence(#) specifies the tolerance for the log likelihood.
When the propor-

tional increase in the log likelihood over the last five iterations
is less than the
specified criterion, lclogit declares convergence. The default is
0.00001.
5. Pacifico (2012) specified a ml program with the method lf to
estimate the class membership model.
lclogit uses another user-written program from Maarten L. Buis
- fmlogit - which performs the
same estimation with the significantly faster and more accurate
d2 method. lclogit is downloaded
with a modified version of the prediction command of fmlogit -
fmlogit pr - since we had to
modify this command to obtain double-precision class shares.
• iterate(#) specifies the maximum number of iterations. If
convergence is not
achieved after the selected number of iterations, lclogit stops
the recursion and
notes this fact before displaying the estimation results. The
default is 150.

• seed(#) sets the seed for pseudo uniform random numbers. The
default is
c(seed).
The starting values for taste parameters are obtained by splitting
the sample into
nclasses() different subsamples and estimating a clogit model
for each of them.
During this process, a pseudo uniform random number is
generated for each agent
to assign the agent into a particular subsample.6 As for the
starting values for
the class shares, the module assumes equal shares, i.e.
1/nclasses().
• constraints(Class#1numlist: Class#2 numlist: ..) specifies the
constraints
that are imposed on the taste parameters of the designated
classes, i.e. βc in equa-
tion 1. For instance, suppose that x1 and x2 are alternative-
specific characteristics
included in indepvars for lclogit and the user wishes to restrict
the coefficient
on x1 to zero for Class1 and Class4, and the coefficient on x2 to

2 for Class4.
Then, the relevant series of commands would look like:
. constraint 1 x1 = 0
. constraint 2 x2 = 2
. lclogit depvar indepvars, gr() id() ncl() constraints(Class1 1:
Class4 1 2)
• nolog suppresses the display of the iteration log.
4 Post-estimation command: lclogitpr
lclogitpr predicts the probabilities of choosing each alternative
in a choice situation
(choice probabilities hereafter), the class shares or prior
probabilities of class member-
ship, and the posterior probabilities of class membership. The
predicted probabilities
are stored in a variable named stubname# where # refers to the
relevant class number;
the only exception is the unconditional choice probability, as it
is stored in a variable
named stubname. The syntax for lclogitpr is:

lclogitpr stubname
[
if
][
in
]
,
[
options
]
The options for lclogitpr are:
• class(numlist) specifies the classes for which the probabilities
are going to be
predicted. The default setting assumes all classes.
6. More specifically, the unit interval is divided into nclasses()
equal parts and if the agent’s pseudo
random draw is in the cth part, the agent is allocated to the
subsample whose clogit results serve
as the initial estimates of Class c’s taste parameters. Note that
lclogit is identical to asmprobit in

that the current seed as at the beginning of the command’s
execution is restored once all necessary
pseudo random draws have been made.
• pr0 predicts the unconditional choice probability, which
equals the average of
class-specific choice probabilities weighted by the
corresponding class shares. That
is,
∑C
c=1 πcn(θ)[exp(βcxnjt)/(
∑J
k=1 exp(βcxnkt))] in the context of Section 2.
• pr predicts the unconditional choice probability and the choice
probabilities condi-
tional on being in particular classes; exp(βcxnjt)/(
∑J

k=1 exp(βcxnkt)) in equation
1 corresponds to the choice probability conditional on being in
class c. This is the
default option when no other option is specified.
• up predicts the class shares or prior probabilities that the
agent is in particular
classes. They correspond to the class shares predicted by using
the class member-
ship model parameter estimates; see equation 2 in Section 2.
• cp predicts the posterior probabilities that the agent is in
particular classes taking
into account her sequence of choices. They are computed by
evaluating equation
5 at the final estimates for each c = 1, 2, · · · ,C.
5 Post-estimation command: lclogitcov
lclogitcov predicts the implied variances and covariances of
taste parameters by eval-
uating equation 7 at the active lclogit estimates. They could be
a useful tool for
studying the underlying taste patterns; see Hess et al. (2011) for

a related application.
The generic syntax for lclogitcov is:
lclogitcov
[
varlist
] [
if
][
in
]
,
[
options
]
The default is to store the predicted variances in a set of hard-
coded variables named
var 1, var 2, ... where var q is the predicted variance of the
coefficient on the qth variable
listed in varlist, and the predicted covariances in cov 12, cov

13, ..., cov 23, ... where
cov qh is the predicted covariance between the coefficients on
the qth variable and the
hth variable in varlist.
The averages of these variance and covariances over agents - as
identified by the
required option id() of lclogit - in the prediction sample are
reported as a covariance
matrix at the end of lclogitcov’s execution.
The options for lclogitcov are:
• nokeep drops the predicted variances and covariances from the
data set at the end
of the command’s execution. The average covariance matrix is
still displayed.
• varname(stubname) requests the predicted variances to be
stored as stubname1,
stubname2,...
• covname(stubname) requests the predicted covariances to be
stored as stubname12,
stubname13,...

• matrix(name) stores the reported average covariance matrix in
a Stata matrix
called name.
6 Post-estimation command: lclogitml
lclogitml is a wrapper for gllamm (Rabe-Hesketh et al., 2002)
which uses the d0
method to fit generalised linear latent class and mixed models
including LCL via the
Newton-Rhapson (NR) algorithm for likelihood maximization.7
This post-estimation
command passes active lclogit specification and estimates to
gllamm, and its primary
usage mainly depends on how iterate() option is specified; see
below for details.
The default setting relabels and transforms the ereturn results of
gllamm in ac-
cordance with those of lclogit, before reporting and posting

them. Users can exploit
lclogitpr and lclogitcov, as well as Stata’s usual post-estimation
commands re-
quiring the asymptotic covariance matrix such as nlcom. When
switch is specified,
the original ereturn results of gllamm are reported and posted;
users gain access to
gllamm’s post-estimation commands, but lose access to lclogitpr
and lclogitcov.
lclogitml can also be used as its own post-estimation command,
for example to
pass the currently active lclogitml results to gllamm for further
NR iterations.
The generic syntax for lclogitml is:
lclogitml
[
if
][
in
]
,

[
options
]
The options for lclogitml are:
• iterate(#) specifies the maximum number of NR iterations for
gllamm’s likeli-
hood maximization process. The default is 0 in which case the
likelihood function
and its derivatives are evaluated at the current lclogit estimates;
this allows
obtaining standard errors associated with the current estimates
without boot-
strapping.
With a non-zero argument, this option can implement a hybrid
estimation strategy
similar to Bhat (1997)’s. He executes a relatively small number
of EM iterations to
obtain intermediate estimates, and use them as starting values
for direct likelihood
maximization via a quasi-Newton algorithm until convergence,
because the EM
algorithm tends to slow down near a local maximum.

Specifying a non-zero argument for this option can also be a
useful tool for check-
ing whether lclogit has declared convergence prematurely, for
instance because
convergence() has not been set stringently enough for an
application at hand.
• level(#) sets confidence level; the default is 95.
• nopost restores the currently active ereturn results at the end
of the command’s
execution.
7. gllamm can be downloaded by entering ssc install gllamm
into the command window.
• switch displays and posts the original gllamm estimation
results, without relabel-
ing and transforming them in accordance with the lclogit output.

• compatible gllamm options refer to gllamm’s estimation
options which are com-
patible with the LCL model specification. See gllamm’s own
help menu for more
information.
7 Application
We illustrate the use of lclogit and its companion post-
estimation commands by ex-
panding upon the example Pacifico (2012) uses to demonstrate
his step-by-step proce-
dure for estimating LCL in Stata. This example analyzes the
stated preference data on
household’s electricity supplier choice accompanying Hole
(2007)’s mixlogit module,
which in turn are a subset of data used in Huber and Train
(2001). There are 100
customers who face up to 12 different choice occasions, each of
them consisting of a
single choice among 4 suppliers with the following
characteristics:
• The price of the contract (in cents per kWh) whenever the
supplier offers a contract

with a fixed rate (price)
• The length of contract that the supplier offered, expressed in
years (contract)
• Whether the supplier is a local company (local)
• Whether the supplier is a well-known company (wknown)
• Whether the supplier offers a time-of-day rate instead of a
fixed rate (tod)
• Whether the supplier offers a seasonal rate instead of a fixed
rate (seasonal)
The dummy variable y collects the stated choice in each choice
occasion whilst the
numeric variables pid and gid identify customers and choice
occasions respectively. To
illustrate the use of membership() option, we generate a pseudo
random regressor x1
which mimics a demographic variable. The data are organized
as follows:
. use http://fmwww.bc.edu/repec/bocode/t/traindata.dta, clear

. set seed 1234567890
. bysort pid: egen _x1=sum(round(rnormal(0.5),1))
. list in 1/12, sepby(gid)
y price contract local wknown tod seasonal gid pid _x1
1. 0 7 5 0 1 0 0 1 1 26
2. 0 9 1 1 0 0 0 1 1 26
3. 0 0 0 0 0 0 1 1 1 26
4. 1 0 5 0 1 1 0 1 1 26
5. 0 7 0 0 1 0 0 2 1 26
6. 0 9 5 0 1 0 0 2 1 26
7. 1 0 1 1 0 1 0 2 1 26
8. 0 0 5 0 0 0 1 2 1 26
9. 0 9 5 0 0 0 0 3 1 26

10. 0 7 1 0 1 0 0 3 1 26
11. 0 0 0 0 1 1 0 3 1 26
12. 1 0 0 1 0 0 1 3 1 26
In empirical applications, it is common to choose the optimal
number of latent classes
by examining information criteria such as BIC and CAIC. The
next lines show how to
estimate 9 LCL specifications repeatedly and obtain the related
information criteria: 8
. forvalues c = 2/10 {
2. lclogit y price contract local wknown tod seasonal,
group(gid) id(pid)
> nclasses(`c´) membership(_x1) seed(1234567890)
3. matrix b = e(b)
4. matrix ic =
nullmat(ic)è(nclasses)´,è(ll)´,`=colsof(b)´,è(caic)´,è(bic)´
5. }
(output omitted)
. matrix colnames ic = "Classes" "LLF" "Nparam" "CAIC"
"BIC"

. matlist ic, name(columns)
Classes LLF Nparam CAIC BIC
2 -1211.232 14 2500.935 2486.935
3 -1117.521 22 2358.356 2336.356
4 -1084.559 30 2337.273 2307.273
5 -1039.771 38 2292.538 2254.538
6 -1027.633 46 2313.103 2267.103
7 -999.9628 54 2302.605 2248.605
8 -987.7199 62 2322.96 2260.96
9 -985.1933 70 2362.748 2292.748
10 -966.3487 78 2369.901 2291.901
CAIC and BIC are minimized with 5 and 7 classes respectively.
In the remainder of
this section, our analysis focuses on the 5-class specification to
economize on space.
lclogit reports the estimation results as follows:
. lclogit y price contract local wknown tod seasonal, group(gid)
id(pid) nclass
> es(5) membership(_x1) seed(1234567890)

Iteration 0: log likelihood = -1313.967
(output omitted)
Latent class model with 5 latent classes
Choice model parameters and average classs shares
Variable Class1 Class2 Class3 Class4 Class5
8. lclogit saves three information criteria in its ereturn list:
AIC, BIC and CAIC. AIC equals
−2 ln L + 2m, where ln L is the maximized sample log
likelihood and m is the total number of
estimated model parameters. BIC and CAIC penalize models
with extra parameters more heavily,
by using penalty functions increasing in the number of choice
makers, N: BIC = −2 ln L + m ln N
and CAIC = −2 ln L + m(1 + ln N).

price -0.902 -0.325 -0.763 -1.526 -0.520
contract -0.470 0.011 -0.533 -0.405 -0.016
local 0.424 3.120 0.527 0.743 3.921
wknown 0.437 2.258 0.325 1.031 3.063
tod -8.422 -2.162 -5.379 -15.677 -6.957
seasonal -6.354 -2.475 -7.763 -14.783 -6.941
Class Share 0.113 0.282 0.162 0.243 0.200
Class membership model parameters : Class5 = Reference class
Variable Class1 Class2 Class3 Class4 Class5
_x1 0.045 0.040 0.047 0.048 0.000
_cons -1.562 -0.544 -1.260 -0.878 0.000
Note: Model estimated via EM algorithm
It is worth noting that the reported class shares are the average
shares over agents,
because the class shares vary across agents when the
membership() option is included

in the syntax. If needed, agent-specific class shares can be
easily computed by using the
post-estimation command lclogitpr with the up option.
In order to obtain a quantitative measure of how well the model
does in differen-
tiating several classes of preferences, we use lclogitpr to
compute the average (over
respondents) of the highest posterior probability of class
membership:9
. bys `e(id)´: gen first = _n==1
. lclogitpr cp, cp
. egen double cpmax = rowmax(cp1-cp5)
. sum cpmax if first, sep(0)
Variable Obs Mean Std. Dev. Min Max
cpmax 100 .9596674 .0860159 .5899004 1
As it can be seen, the mean highest posterior probability is
about 0.96, meaning that

the model does very well in distinguishing among different
underlying taste patterns for
the observed choice behavior.
We next examine the model’s ability to make in-sample
predictions of the actual
choice outcomes. For this purpose, we first classify a
respondent as a member of class c
if class c gives her highest posterior membership probability.
Then, for each subsample
of such respondents, we predict the unconditional probability of
actual choice and the
probability of actual choice conditional on being in class c:
. lclogitpr pr, pr
. gen byte class = .
(4780 missing values generated)
. forvalues c = 1/`e(nclasses)´ {
9. A dummy variable which equals 1 for the first observation on
each respondent is generated because
not every agent faces the same number of choice situations in
this specific experiment.

Working Papers N° 6 - July 2012 Ministry of Economy

Working Papers N° 6 - July 2012 Ministry of Economy

Recommended

Recommended

More Related Content

Similar to Working Papers N° 6 - July 2012 Ministry of Economy

Similar to Working Papers N° 6 - July 2012 Ministry of Economy (20)

More from MikeEly930

More from MikeEly930 (20)

Recently uploaded

Recently uploaded (20)

Working Papers N° 6 - July 2012 Ministry of Economy