SlideShare a Scribd company logo
1 of 15
Download to read offline
ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School Of
Architecture, Civil and Environmental Engineering
Semester Project in Civil Engineering
Enhancing the Serial Estimation of Discrete
Choice Models Sequences
by
Youssef Kitane
Under the direction of Prof. Michel Bierlaire
Under the supervision of Nicola Ortelli and Gael Lederrey
TRANSP-OR: Transport and Mobility Laboratory
Lausanne, June 2020
1
Contents
1 Introduction 3
2 Literature Review 4
3 Methodology 5
3.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Case Study 7
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Sequence of Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . 8
5 Results 9
5.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4 All Methods Combined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Conclusion 14
2
1 Introduction
Discrete Choice Models (DCMs) have played an essential role in transportation modeling
for the last 25 years [1]. Discrete choice modeling is a field designed to capture in detail the
underlying behavioral mechanisms at the foundation of the decision-making process that
drives consumers [2]. Because they must be behaviorally realistic while properly fitting
the data, appropriate utility specifications for discrete choice models are hard to develop.
In particular, modelers usually start by including a number of variables that are seen as
”essential” in the specification; these originate from their context knowledge or intuition.
Then, small changes are tested sequentially so as to improve the goodness of fit of the model
while ensuring its behavioral realism.The result is that many model specifications are usually
tested before the modeler is satisfied with the result. Thus, this approach leads to extensive
computational time because each model is optimized separately. A faster optimization time
would allow researchers to test many more specifications in the same amount of time.
In this project, the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm
is used to estimate the parameters of each DCM. Three techniques are implemented to
accelerate the process of estimating a sequence of DCMs:
• Standardization (ST) of the variables: The goal is to change the values of numeric
columns in the dataset to a common scale, without distorting differences in the ranges
of values.
• Warm Start (WS): This technique uses the knowledge acquired by the precedent model
to initialize the values of the parameters for the estimation of the next model.
• Early Stopping (ES): This consists in stopping the estimation of a model earlier than
expected, based on how promising the improvement in log likelihood looks in the last
iterations of the optimization algorithm.
The next Section is dedicated to the literature review of the existent methods that speed
up an optimization process. Then, in Section 3, the three techniques are presented in detail
for a sequence of DCMs. Section 4 presents the data considered in this project, as well as the
sequences of models that we use to measure the effectiveness of the three techniques. Section
5 gathers the results obtained by the implemented methods. The last Section resumes the
findings of this project and highlights possible improvements and directions of research for
the future.
3
2 Literature Review
In large-scale convex optimization, first-order methods are methods of choice due to their
cheap iteration cost [3]. While second-order methods, such as the Newton method, are
making use of the curvature’s information, the cost of computing the Hessian can become a
hassle. Thus, quasi-Newton methods are a good compromise between curvature information
and low computation time. Indeed, they use an approximation of the Hessian instead of
its exact computation. The BFGS algorithm named after its inventors Broyen, Fletcher,
Goldfarb and Shannon [4] is one of the most well-known quasi-Newton methods. A new
method for solving linear systems is proposed [5]. The algorithm is specialized to invert
positive definite matrices in such a way that all iterates (approximate solutions) generated
by the algorithm are positive definite matrices themselves. This opens the way for many
applications in the field of optimization. Under a careful choice of the parameters of the
method, and depending on the problem structure and conditioning, acceleration might re-
sult into significant speedups both for the matrix inversion problem and for the stochastic
BFGS algorithm. It is confirmed experimentally that these accelerated methods can lead
to speed-ups when compared to the classical BFGS algorithm, but no convergence analysis
is yet provided.
The increase in the size of choice modeling datasets in recent years has led to a growing
interest in research to accelerate the estimation of DCMs. Researchers have used techniques
to speed-up the estimation of one DCM inspired on Machine Learning (ML) techniques [6].
It is achieved by proposing new efficient stochastic optimization algorithms and extensively
testing them alongside existing approaches. These algorithms are developed based on three
main contributions: the use of a stochastic Hessian, the modification of the batch size, and
a change of optimization algorithm depending on the batch size. This paper shows that
the use of a second-order method and a small batch size is a good starting point for DCM
optimization. It also shows that BFGS is an algorithm that works particularly well when
the said starting point has been found.
The problem of initializing a parameter in a model is central in ML. One particularly com-
mon scenario is where a ML algorithm must be constantly updated with new data. This
situation occurs generally in finance, online advertising, recommendation systems, fraud
detection, and many other domains where machine learning systems are used for prediction
and decision making in the real world [7]. When new data arrive, the model needs to be
updated so that it can be as accurate as possible. While the majority of existing methods
start the configuration process of an algorithm from scratch by initializing randomly the
parameters, it is possible to exploit information previously learned in order to ”warm start”
its configuration on new type of configurations.
In most common optimization algorithms and more precisely in ML, the modeler decides to
stop the optimization procedure before reaching the required tolerance in the solution [8].
Stopping earlier an optimization process is a trick used to control the generalization per-
formance of the ongoing model during the training phase and avoid over-fitting in the test
phase. In discrete choice modeling, the main objective is not to have the highest accuracy
but parameters that are behaviorally realistic.
4
3 Methodology
This section briefly introduces the principles underlying the BFGS algorithm before pre-
senting the techniques used to speed-up the estimation of a sequence of DCMs.
As a reminder, the iterates {xj
} of a line search optimization method following a descent
direction dj and a step size αj are defined as follows :
xj+1
= xj
+ αjdj (1)
where the direction of descent is obtained by preconditioning the gradient and is defined
as :
dj = −Dj f(xj
) (2)
assuming that the matrix Dj at the iterate xj
is semi-definite positive.
For quasi-Newton methods, Dj is an approximation of the Hessian. A slightly different
version of BFGS consists in approximating the inverse of the Hessian. The BFGS−1
algo-
rithm uses the following approximation [9] :
D−1
j+1 = D−1
j +
(sT
j yj + yT
j D−1
j yj)(sjsT
j )
(sT
j yj)2
−
D−1
j yjsT
j + sjyT
j D−1
j
sT
j yj
(3)
where sj = xj+1
− xj
and yj = f(xj+1
) − f(xj
)
The step is calculated with an inexact line search method, based on the two Wolfe con-
ditions. The first condition [11], also known as the Armijo rule, guarantees that the step
gives a sufficient decrease in the objective function. The second condition [12], known as
the curvature condition, prevents the step length from being too short.
3.1 Standardization
The concept of standardization is relevant when continuous independent variables are mea-
sured at different scales. Indeed, standardization is a technique often applied as part of data
preparation for ML. The goal is to change the values of numeric columns in the dataset to
a common scale, without distorting differences in the ranges of values. More formally, let’s
suppose that a variable x takes values from the set S = {x1, x2, ...., xn}. The process of
standardization of one variable xi in S is applied as follows :
ˆxi =
xi − ¯x
σ
(4)
where ¯x is the mean of the values in S and σ is the respective standard deviation. It
consists in re-scaling the variable so as to obtain a mean equal to 0 and a standard deviation
of 1.
5
3.2 Warm Start
A method commonly used in the field of ML consists on initializing a set of parameters with
non-arbitrary values. In our case, we initialize the parameters of a model with the values
obtained by the BFGS−1
algorithm in the previous model.
Formally, we define the set of parameters in model m as xm ∈ RNm
where Nm corre-
sponds to the number of parameters in model m. The set of parameters for the following
model is defined similarly, i.e. xm+1 ∈ RNm+1
where Nm+1 corresponds to the number of
parameters of this model. To generate the initial variables of model m + 1, i.e. x0
m+1, we
use the optimized variable of the previous model, i.e. x∗
m. In the case where the Box-Cox
parameter of a variable is increasing or decreasing from model m to m+1, we decide to ini-
tialize at 0 instead of using the previous optimized value. We thus define the initialization
of x0
m+1 for each index i ∈ {1, . . . , Nm+1} such that:
x0
m+1,i =
0 if i /∈ {1, . . . , Nm} or modification of the Box-Cox parameter,
x∗
m,i otherwise.
(5)
The same procedure is used for the initialization of the Hessian between the model m
and the following model m+1. Instead of initializing with the identity matrix, we define the
initialization of H0
m+1 for each combination of indexes i, k ∈ {1, . . . , Nm+1} such that:
H0
m+1,(i,k) =



H∗
m,(i,k) if i, k ∈ {1, . . . , Nm} ,
1 if i = k,
0 otherwise.
(6)
3.3 Early Stopping
The early stopping method consists in stopping the estimation process before the conver-
gence to the maximum is achieved. Because the objective is to select the best model among a
sequence of DCMs, the log likelihood evaluation LL(xi
) obtained at an iteration is compared
to the highest log likelihood LLbest of all previous models. In the case where the log like-
lihood LL(xi
) is higher than the LLbest, the estimation process is not stopped because the
best value can only be further maximized by the BFGS−1
algorithm until the convergence.
The best value LLbest is updated by the estimated value LL∗
(xi
). When the log likelihood
LL(xi
) is lower than the LLbest, the estimation process could be stopped based on some
criterion. This criterion estimates the relative evolution of the function in order to detect
a plateau, it means that the function is no longer experiencing a significant improvement.
Three evaluations of the function are considered in order to be sure of the convergence of
the log likelihood.
6
Let’s considere the last three evaluations of the log likelihood LL(xi
), LL(xi−1
), LL(xi−2
)
during the estimation process of one model. Is it possible to assess that stagnation by
evaluating the two following ratios and compare them to a predefined threshold ε:
LL(xi−1
)
LL(xi)
< ε (7)
LL(xi−2
)
LL(xi−1)
< ε (8)
Even though the goal of the early stopping is to reduces the estimation time of DCMs,
it is important to keep in mind that an important difference between the solution obtained
by applying the early stopping to the BFGS−1
algorithm and the standard BFGS−1
should
not arise. For example, Figure 3 shows the value of the log likelihood during the estimation
process of a random model. As we can see in this example, there is a stagnation in the
middle of the estimation. We do not want to do an early stopping at this moment since the
estimation is far from being finished. We thus have to be careful with the threshold and
make a sensitivity analysis on this parameter.
Figure 1: Difference between a possible stagnation of the log likelihood and the real conver-
gence
4 Case Study
4.1 Dataset
The Swissmetro dataset [10] corresponds to survey data collected during March 1998. In that
sense, it was used to study the market penetration of the Swissmetro, a revolutionary mag-
lev under-ground system. Three alternatives - train,car and swissmetro - were generated for
each of the 1192 respondents. A sample of 10’728 observations was obtained by generating 9
7
types of situations. The pre-selected attributes of the alternatives are categorical (travel card
ownership, gender, type of luggage, etc.) and continuous (travel time, cost and headway).
4.2 Sequence of Discrete Choice Models
For the purpose of this project, two sequences of a hundred DCMs respectively denoted by
S1 and S2, are considered. Each sequence starts with a given choice model. Then, a random
perturbation is applied. These small modifications corresponds to the typical elementary
perturbations that models consider when developing DCMs:
• Add a non selected variable to enter the utility of an alternative
• Remove a variable from the utility of an alternative
• Increment the Box-Cox parameter of a given variable
• Decrement the Box-Cox parameter of a given variable
• Interact a variable with a socioeconomic variable
• Deactivate the interaction of the considered variable with a socioeconmic variable
The first sequence S1 begins with an alternative specific constant model and the com-
plexity increases while the sequence S2 start with a random model and the complexity is
approximately constant along the hundred models. The number of parameters for each
sequence of DCMs is shown in the Figure 4:
Figure 2: Number of parameters for the two sequences S1 and S2.
8
5 Results
In order to avoid misunderstandings, abbreviations are given to the different methods. The
base method estimates the parameters without applying any of the methods previously men-
tioned methods and is denoted by Base. The standardization of the variables is denoted by
ST. The warm start of the parameters is denoted by WSx, the warm start of the Hessian
by WSH and the combination of the warm start of the Hessian and the variables by WSxH
.
The early stopping method is denoted by ES.
5.1 Standardization
A benchmark of ten estimations for the methods Base and ST is conducted for the sequences
S1 and S2. The Tables 1 and 2 presents a summary of the statistics for the methods
previously mentioned. The lowest, mean and highest time among the ten estimations are
reported. The average speedup corresponds to the ratio between the mean time of the Base
method and the mean time of the ST method.
Table 1: Summary of statistics for 10 estimations by method for the sequence S1
Statistics Base ST
Minimum [s] 224.9 198.1
Mean [s] 229.5 201.7
Maximum [s] 231.3 203.2
Average Speedup / 1.15
Table 2: Summary of statistics for 10 estimations by method for the sequence S2
Statistics Base ST
Minimum [s] 534.5 478.3
Mean [s] 536.2 480.9
Maximum [s] 539.4 485.9
Average Speedup / 1.12
It appears that the ST method of the variables is useful. For the first sequence S1, the
estimation time is reduced from 229.5 s to 203.3 s. Concerning the sequence S2, a reduc-
tion of 10 % of the estimation time is obtained. Since the initial Hessian of each model
corresponds to the identity matrix, the direction of descent of the BFGS−1
corresponds to
that of gradient descent. Because gradient descent algorithm does not take into account
the curvature, having an error surface with high curvature will mean that we take many
steps which may not be in the optimal direction. When we scale the variables, we reduce
the curvature, which makes methods that ignore curvature work much better and reach the
9
convergence faster. The standardization of the variables permits to have interesting results
and should be applied beforehand for every sequence of DCMs that presents variables with
differences in the range of values.
5.2 Warm Start
A benchmark of ten estimations for the methods Base, WSx, WSH and WSxH
is conducted
for the sequences S1 and S2. Tables 3 and 4 presents a summary of the statistics for the
methods previously mentioned.
Table 3: Summary of statistics for 10 estimations by method for the sequence S1
Statistics Base WSx WSH WSxH
Minimum [s] 224.9 187.4 104.9 60.8
Mean [s] 229.5 188.7 105.4 61.0
Maximum [s] 231.3 189.5 106.1 61.5
Average Speedup / 1.21 2.20 3.84
Table 4: Summary of statistics for 10 estimations by method for the sequence S2
Statistics Base WSx WSH WSxH
Minimum [s] 534.6 459.3 210.4 119.2
Mean [s] 536.2 460.3 211.3 119.7
Maximum [s] 539.4 461.4 212.5 120.4
Average Speedup / 1.17 2.56 4.50
The results shows that the WSx and WSH methods enables to achieve a gain of time
compared to the Base method. The WSx method reduces the estimation time of the Base
method by respectively 19 % and 15 % for S1 and S2. For the sequences S1 and S2,
the WSH is also an efficient method because it permits to accelerate the estimation by a
factor respectively equal to 2.20 and 2.56. Indeed, the BFGS−1
algorithm is a line search
optimization method and the iterates use the Hessian information and first-order information
given by the gradient. Since the parameters involved from one model to another do not differ
significantly and then the majority of the parameters are initialized with the estimated values
of the previous model, the learned gradient and approximated Hessian from the previous
model leads to a near-optimal initial value. The next iterate will be very close to the previous
one and eventually subsequent update directions are expected to have small difference. It
explains the improvement of time of the WSx and WSH methods compared to the Base
method. The WSH is a more efficient method than the WSx. The former use the curvature
information while the latter initialize the Hessian matrix with the identity matrix. Even
10
though the iterates are close to the optimal value, the WSx corresponds to a gradient descent
method which is known to have a low convergence speed compared to second-order methods.
Because the WSx and WSH methods lead to an improvement of the estimation time, is it
reasonable that the combination of these two methods which is the WSxH
method leads to a
better average speedup ratio. The WSxH
permits to speedup the estimation time by a factor
of 3.84 for S1 and 4.50 for S2. This is not only the sum of the two speedup ratios of the
WSx and WSH, but an even better improvement. By initially having access to the gradient
information, the BFGS−1
algorithm is faster in the first iterations, and also converges faster
by exploiting the curvature information given by the Hessian.
5.3 Early Stopping
A sensitivity analysis is conducted for The ES method. A sequence of 20 thresholds ranging
from 10−7
to 5·10−4
is used in order to test the performance of the ES method compared to
the Base method. Figures 3 and 4 presents the relative estimation time of the ES method
for the 20 thresholds compared to the Base method for respectively S1 and S2. The black
lines correspond to the mean time observed for the Base method for 10 estimations. The
grey lines represent a confidence interval of 95 % around the mean estimation time of the
Base method. A box plot with a confidence interval of 95 % for every threshold is plotted.
As long as the value of the threshold parameter increases, the estimation time of the ES
method decreases. This is an expected behavior, since higher thresholds are more flexible
and the estimation process could potentially stop at iterations where the log likelihood is far
away from the convergence region. For sequence S1, a threshold of 10−7
leads to a speed up
of 3 %, while for S2 a speed up for approximately 4% is observed. Is it possible to obtain a
better speed up when increasing the value of the threshold. Indeed, a reduction of 35 % and
15 % of the optimization is obtained for respectively S1 and S2 when a higher threshold of
5·10−4
is used.
Figure 3: Sensitivity analysis of the threshold parameter for S1
11
Figure 4: Sensitivity analysis of the threshold parameter for S2
In order to select the best threshold, the improvement of the estimation time is not
the only criterion that should be taken into account. Even though the number of models
stopped earlier increases as long as the threshold increases and the total optimization time
decreases, the main drawback is that the method could stop at a plateau that is far away
from the real plateau of convergence of the log likelihood. These models are falsely stopped
earlier and should be distinguished from the models that have reached the real plateau of
convergence. Figures 5 and 6 shows that from a certain threshold, some models are falsely
stopped. Indeed, a threshold of 10−4
leads to 6 models among the 76 models that do not
reach the real convergence of the log likelihood for the sequence S1. Concerning sequence
S2, a higher threshold of 3·10−4
stopped falsely 3 models among the 90 models stopped
earlier. Even though the main objective is to speed up the optimization time of a sequence
of models and higher thresholds leads to lower optimization time, the modeler has to be
careful with models that are falsely stopped earlier. We want now to select a suitable
threshold parameter that provide both a good speedup ratio and a low number of models
stopped earlier. A threshold of 2·10−5
is acceptable in the sense that no model is falsely
stopped earlier for S1 and the speed up performance is equivalent to higher thresholds.
Concerning S2, higher thresholds have no models that are stopped earlier but does not offer
a substantial improvement in term of speedup ratio compared to the threshold 2·10−5
. We
propose to use the threshold 2·10−5
.
12
Figure 5: Number of models falsely stopped earlier for S1
Figure 6: Number of models falsely stopped earlier for S2
5.4 All Methods Combined
We want now to test all the methods that leads to an improvement of the estimation time
compared to the Base method. For both S1 and S2, the ST, the WSxH
and the ES with a
threshold of 2·10−5
methods are speeding-up the estimation process. The obtained results
of the combination of all this methods for both S1 and S2 are compared to the Base method
and presented in the Tables 5 and 6. The combination of the WSxH
, ES with a threshold
of 2·10−5
, ST methods leads to an improvement of a factor 5.26 and 6.67 compared to the
Base method for respectively S1 and S2. The WSxH
method is the best method among the
three main techniques in term of speedup ratio, but the association with the ST and ES
methods leads to an even better improvement compared to the Base method.
13
Table 5: Summary of statistics for 10 estimations for the sequence S1 : Comparison between
the combined methods and the Base method
Statistics Base Final
Minimum [s] 224.9 44.3
Mean [s] 229.5 44.3
Maximum [s] 231.3 44.8
Average Speedup / 5.26
Table 6: Summary of statistics for 10 estimations for the sequence S2 : Comparison between
the combined methods and the Base method
Statistics Base Final
Minimum [s] 534.6 83.8
Mean [s] 536.2 84.2
Maximum [s] 539.4 84.6
Average Speedup / 6.67
6 Conclusion
Enhancing the estimation of a sequence of DCMs is a direction of research that has not yet
been explored. The objective of this project was to propose different methods to improve the
total estimation time of a sequence of DCMs. The BFGS−1
algorithm is used to estimate the
two sequence of DCMs, S1 and S2. The standardization of the variables accelerate slowly
the estimation time and should be used at the beginning of every optimization task because
of its simplicity. The second approach was to implement a commonly used method in ML,
which uses the knowledge acquired before to use it for a new task. The WSxH
method per-
mits to speedup the estimation time by a factor of 3.84 and 4.5 compared to the Base method
for respectively S1 and S2. The last approach is the ES method which has shown interesting
improvement of the estimation time but the applied threshold has to be carefully chosen in
order not stop at a bad convergence plateau of the log likelihood. A threshold of 2·10−5
is
chosen. The combination of all the methods that speedup the estimation time of both S1
and S2 leads to an improvement of respectively 5.26 and 6.67 compared to the Base method.
Future directions of research include two main improvements. The first one concerns the
ES method. The ES method tends to stop at a plateau of convergences that could be far
away from the real convergence of the log likelihood. A more robust ES method could be
implemented by finding an efficient way to detect these regions of convergence. The second
possible improvement concerns the warm start. A more detailed analysis of the warm start
could be done. Even though, the total estimation time is reduced by the WSxH
method,
some models were this method is applied have an optimization time higher than the case
where no warm start is applied.
14
References
[1] Bierlaire M. (1998) Discrete Choice Models. In: Labb´e M., Laporte G., Tanczos K.,
Toint P. (eds) Operations Research and Decision Aid Methodologies in Traffic and Trans-
portation Management. NATO ASI Series (Series F: Computer and Systems Sciences),
vol 166. Springer, Berlin, Heidelberg
[2] Ben-Akiva M., Bierlaire M. (1999) Discrete Choice Methods and their Applications to
Short Term Travel Decisions. In: Hall R.W. (eds) Handbook of Transportation Science.
International Series in Operations Research & Management Science, vol 23. Springer,
Boston, MA
[3] Devolder, O., Glineur, F. & Nesterov, Y. First-order methods of smooth convex opti-
mization with inexact oracle. Math. Program. 146, 37–75 (2014).
https://doi.org/10.1007/s10107-013-0677-5
[4] Henning, P. and Kiefel, M. (2013). Quasi-newton methods: A new direction. The Journal
of Machine Learning Research,14(1):843-865
[5] Robert M. Gower and Filip Hanzely and Peter Richt´arik and Sebastian Stich (2018).
Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules
for Faster Second-Order Optimization.
[6] Lederrey G., Lurkin V. and Hillel T. and Bierlaire M (2020). Estimation of Discrete
Choice Models with Hybrid Stochastic Adaptive Batch Size Algorithms.
[7] Jordan T. Ash and Ryan P. Adams (2019). On the Difficulty of Warm-Starting Neural
Network Training.
[8] Prechelt L. (2012) Early Stopping — But When?. In: Montavon G., Orr G.B., M¨uller
KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol
7700. Springer, Berlin, Heidelberg.
[9] Fletcher, R. (1987). Practical Methods of Optimization; (2Nd Ed.). Wiley-
Interscience,New York, NY, USA.
[10] Wolfe, P. (1969). Convergence Conditions for Ascent Methods. SIAM Review,
11(2):226– 235.
[11] Wolfe, P. (1971). Convergence Conditions for Ascent Methods. II: Some Corrections.
SIAM Review, 13(2):185–188
[12] Bierlaire, M., Axhausen, K. and Abay, G. (2001), The acceptance of modal innovation:
The case of Swissmetro, in ‘Proceedings of the Swiss Transport Research Conference’,
Ascona, Switzerland.
15

More Related Content

What's hot

DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_JieMDO_Lab
 
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...ijcsit
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliMDO_Lab
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracinigrssieee
 
Computational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringComputational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringSpringer
 
Machine Learning techniques for the Task Planning of the Ambulance Rescue Team
Machine Learning techniques for the Task Planning of the Ambulance Rescue TeamMachine Learning techniques for the Task Planning of the Ambulance Rescue Team
Machine Learning techniques for the Task Planning of the Ambulance Rescue TeamFrancesco Cucari
 
AIAA-SciTech-ModelSelection-2014-Mehmani
AIAA-SciTech-ModelSelection-2014-MehmaniAIAA-SciTech-ModelSelection-2014-Mehmani
AIAA-SciTech-ModelSelection-2014-MehmaniOptiModel
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
 
AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_AliMDO_Lab
 
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION ijscai
 
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...TELKOMNIKA JOURNAL
 
MAST30013 Techniques in Operations Research
MAST30013 Techniques in Operations ResearchMAST30013 Techniques in Operations Research
MAST30013 Techniques in Operations ResearchLachlan Russell
 
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...mabualsh
 
PEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliPEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliMDO_Lab
 
A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...ijcsit
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1abc
 
Methods for solving ‘or’ models
Methods for solving ‘or’ modelsMethods for solving ‘or’ models
Methods for solving ‘or’ modelsJishnu Rajan
 

What's hot (20)

DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...
Validation Study of Dimensionality Reduction Impact on Breast Cancer Classifi...
 
FCMPE2015
FCMPE2015FCMPE2015
FCMPE2015
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_Ali
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracini
 
Session 4 plan verification prostate
Session 4 plan verification prostateSession 4 plan verification prostate
Session 4 plan verification prostate
 
Computational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineeringComputational intelligence systems in industrial engineering
Computational intelligence systems in industrial engineering
 
Machine Learning techniques for the Task Planning of the Ambulance Rescue Team
Machine Learning techniques for the Task Planning of the Ambulance Rescue TeamMachine Learning techniques for the Task Planning of the Ambulance Rescue Team
Machine Learning techniques for the Task Planning of the Ambulance Rescue Team
 
AIAA-SciTech-ModelSelection-2014-Mehmani
AIAA-SciTech-ModelSelection-2014-MehmaniAIAA-SciTech-ModelSelection-2014-Mehmani
AIAA-SciTech-ModelSelection-2014-Mehmani
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
 
Defuzzification
DefuzzificationDefuzzification
Defuzzification
 
AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_Ali
 
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
MARGINAL PERCEPTRON FOR NON-LINEAR AND MULTI CLASS CLASSIFICATION
 
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...Half Gaussian-based wavelet transform for pooling layer for convolution neura...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
 
MAST30013 Techniques in Operations Research
MAST30013 Techniques in Operations ResearchMAST30013 Techniques in Operations Research
MAST30013 Techniques in Operations Research
 
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
Applications of Markov Decision Processes (MDPs) in the Internet of Things (I...
 
PEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliPEMF-1-MAO2012-Ali
PEMF-1-MAO2012-Ali
 
A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1
 
Methods for solving ‘or’ models
Methods for solving ‘or’ modelsMethods for solving ‘or’ models
Methods for solving ‘or’ models
 

Similar to Sequential estimation of_discrete_choice_models

Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...cscpconf
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...IJECEIAES
 
local_learning.doc - Word Format
local_learning.doc - Word Formatlocal_learning.doc - Word Format
local_learning.doc - Word Formatbutest
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...cscpconf
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGTOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGijdkp
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...Amir Ziai
 
Multi objective predictive control a solution using metaheuristics
Multi objective predictive control  a solution using metaheuristicsMulti objective predictive control  a solution using metaheuristics
Multi objective predictive control a solution using metaheuristicsijcsit
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryTELKOMNIKA JOURNAL
 

Similar to Sequential estimation of_discrete_choice_models (20)

Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
 
report
reportreport
report
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
C013141723
C013141723C013141723
C013141723
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...A comparative study of three validities computation methods for multimodel ap...
A comparative study of three validities computation methods for multimodel ap...
 
local_learning.doc - Word Format
local_learning.doc - Word Formatlocal_learning.doc - Word Format
local_learning.doc - Word Format
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGTOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
 
Multi objective predictive control a solution using metaheuristics
Multi objective predictive control  a solution using metaheuristicsMulti objective predictive control  a solution using metaheuristics
Multi objective predictive control a solution using metaheuristics
 
Selfadaptive report
Selfadaptive reportSelfadaptive report
Selfadaptive report
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure Recovery
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Sequential estimation of_discrete_choice_models

  • 1. ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School Of Architecture, Civil and Environmental Engineering Semester Project in Civil Engineering Enhancing the Serial Estimation of Discrete Choice Models Sequences by Youssef Kitane Under the direction of Prof. Michel Bierlaire Under the supervision of Nicola Ortelli and Gael Lederrey TRANSP-OR: Transport and Mobility Laboratory Lausanne, June 2020 1
  • 2. Contents 1 Introduction 3 2 Literature Review 4 3 Methodology 5 3.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Case Study 7 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Sequence of Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . 8 5 Results 9 5.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.2 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.3 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.4 All Methods Combined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6 Conclusion 14 2
  • 3. 1 Introduction Discrete Choice Models (DCMs) have played an essential role in transportation modeling for the last 25 years [1]. Discrete choice modeling is a field designed to capture in detail the underlying behavioral mechanisms at the foundation of the decision-making process that drives consumers [2]. Because they must be behaviorally realistic while properly fitting the data, appropriate utility specifications for discrete choice models are hard to develop. In particular, modelers usually start by including a number of variables that are seen as ”essential” in the specification; these originate from their context knowledge or intuition. Then, small changes are tested sequentially so as to improve the goodness of fit of the model while ensuring its behavioral realism.The result is that many model specifications are usually tested before the modeler is satisfied with the result. Thus, this approach leads to extensive computational time because each model is optimized separately. A faster optimization time would allow researchers to test many more specifications in the same amount of time. In this project, the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is used to estimate the parameters of each DCM. Three techniques are implemented to accelerate the process of estimating a sequence of DCMs: • Standardization (ST) of the variables: The goal is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. • Warm Start (WS): This technique uses the knowledge acquired by the precedent model to initialize the values of the parameters for the estimation of the next model. • Early Stopping (ES): This consists in stopping the estimation of a model earlier than expected, based on how promising the improvement in log likelihood looks in the last iterations of the optimization algorithm. The next Section is dedicated to the literature review of the existent methods that speed up an optimization process. Then, in Section 3, the three techniques are presented in detail for a sequence of DCMs. Section 4 presents the data considered in this project, as well as the sequences of models that we use to measure the effectiveness of the three techniques. Section 5 gathers the results obtained by the implemented methods. The last Section resumes the findings of this project and highlights possible improvements and directions of research for the future. 3
  • 4. 2 Literature Review In large-scale convex optimization, first-order methods are methods of choice due to their cheap iteration cost [3]. While second-order methods, such as the Newton method, are making use of the curvature’s information, the cost of computing the Hessian can become a hassle. Thus, quasi-Newton methods are a good compromise between curvature information and low computation time. Indeed, they use an approximation of the Hessian instead of its exact computation. The BFGS algorithm named after its inventors Broyen, Fletcher, Goldfarb and Shannon [4] is one of the most well-known quasi-Newton methods. A new method for solving linear systems is proposed [5]. The algorithm is specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization. Under a careful choice of the parameters of the method, and depending on the problem structure and conditioning, acceleration might re- sult into significant speedups both for the matrix inversion problem and for the stochastic BFGS algorithm. It is confirmed experimentally that these accelerated methods can lead to speed-ups when compared to the classical BFGS algorithm, but no convergence analysis is yet provided. The increase in the size of choice modeling datasets in recent years has led to a growing interest in research to accelerate the estimation of DCMs. Researchers have used techniques to speed-up the estimation of one DCM inspired on Machine Learning (ML) techniques [6]. It is achieved by proposing new efficient stochastic optimization algorithms and extensively testing them alongside existing approaches. These algorithms are developed based on three main contributions: the use of a stochastic Hessian, the modification of the batch size, and a change of optimization algorithm depending on the batch size. This paper shows that the use of a second-order method and a small batch size is a good starting point for DCM optimization. It also shows that BFGS is an algorithm that works particularly well when the said starting point has been found. The problem of initializing a parameter in a model is central in ML. One particularly com- mon scenario is where a ML algorithm must be constantly updated with new data. This situation occurs generally in finance, online advertising, recommendation systems, fraud detection, and many other domains where machine learning systems are used for prediction and decision making in the real world [7]. When new data arrive, the model needs to be updated so that it can be as accurate as possible. While the majority of existing methods start the configuration process of an algorithm from scratch by initializing randomly the parameters, it is possible to exploit information previously learned in order to ”warm start” its configuration on new type of configurations. In most common optimization algorithms and more precisely in ML, the modeler decides to stop the optimization procedure before reaching the required tolerance in the solution [8]. Stopping earlier an optimization process is a trick used to control the generalization per- formance of the ongoing model during the training phase and avoid over-fitting in the test phase. In discrete choice modeling, the main objective is not to have the highest accuracy but parameters that are behaviorally realistic. 4
  • 5. 3 Methodology This section briefly introduces the principles underlying the BFGS algorithm before pre- senting the techniques used to speed-up the estimation of a sequence of DCMs. As a reminder, the iterates {xj } of a line search optimization method following a descent direction dj and a step size αj are defined as follows : xj+1 = xj + αjdj (1) where the direction of descent is obtained by preconditioning the gradient and is defined as : dj = −Dj f(xj ) (2) assuming that the matrix Dj at the iterate xj is semi-definite positive. For quasi-Newton methods, Dj is an approximation of the Hessian. A slightly different version of BFGS consists in approximating the inverse of the Hessian. The BFGS−1 algo- rithm uses the following approximation [9] : D−1 j+1 = D−1 j + (sT j yj + yT j D−1 j yj)(sjsT j ) (sT j yj)2 − D−1 j yjsT j + sjyT j D−1 j sT j yj (3) where sj = xj+1 − xj and yj = f(xj+1 ) − f(xj ) The step is calculated with an inexact line search method, based on the two Wolfe con- ditions. The first condition [11], also known as the Armijo rule, guarantees that the step gives a sufficient decrease in the objective function. The second condition [12], known as the curvature condition, prevents the step length from being too short. 3.1 Standardization The concept of standardization is relevant when continuous independent variables are mea- sured at different scales. Indeed, standardization is a technique often applied as part of data preparation for ML. The goal is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. More formally, let’s suppose that a variable x takes values from the set S = {x1, x2, ...., xn}. The process of standardization of one variable xi in S is applied as follows : ˆxi = xi − ¯x σ (4) where ¯x is the mean of the values in S and σ is the respective standard deviation. It consists in re-scaling the variable so as to obtain a mean equal to 0 and a standard deviation of 1. 5
  • 6. 3.2 Warm Start A method commonly used in the field of ML consists on initializing a set of parameters with non-arbitrary values. In our case, we initialize the parameters of a model with the values obtained by the BFGS−1 algorithm in the previous model. Formally, we define the set of parameters in model m as xm ∈ RNm where Nm corre- sponds to the number of parameters in model m. The set of parameters for the following model is defined similarly, i.e. xm+1 ∈ RNm+1 where Nm+1 corresponds to the number of parameters of this model. To generate the initial variables of model m + 1, i.e. x0 m+1, we use the optimized variable of the previous model, i.e. x∗ m. In the case where the Box-Cox parameter of a variable is increasing or decreasing from model m to m+1, we decide to ini- tialize at 0 instead of using the previous optimized value. We thus define the initialization of x0 m+1 for each index i ∈ {1, . . . , Nm+1} such that: x0 m+1,i = 0 if i /∈ {1, . . . , Nm} or modification of the Box-Cox parameter, x∗ m,i otherwise. (5) The same procedure is used for the initialization of the Hessian between the model m and the following model m+1. Instead of initializing with the identity matrix, we define the initialization of H0 m+1 for each combination of indexes i, k ∈ {1, . . . , Nm+1} such that: H0 m+1,(i,k) =    H∗ m,(i,k) if i, k ∈ {1, . . . , Nm} , 1 if i = k, 0 otherwise. (6) 3.3 Early Stopping The early stopping method consists in stopping the estimation process before the conver- gence to the maximum is achieved. Because the objective is to select the best model among a sequence of DCMs, the log likelihood evaluation LL(xi ) obtained at an iteration is compared to the highest log likelihood LLbest of all previous models. In the case where the log like- lihood LL(xi ) is higher than the LLbest, the estimation process is not stopped because the best value can only be further maximized by the BFGS−1 algorithm until the convergence. The best value LLbest is updated by the estimated value LL∗ (xi ). When the log likelihood LL(xi ) is lower than the LLbest, the estimation process could be stopped based on some criterion. This criterion estimates the relative evolution of the function in order to detect a plateau, it means that the function is no longer experiencing a significant improvement. Three evaluations of the function are considered in order to be sure of the convergence of the log likelihood. 6
  • 7. Let’s considere the last three evaluations of the log likelihood LL(xi ), LL(xi−1 ), LL(xi−2 ) during the estimation process of one model. Is it possible to assess that stagnation by evaluating the two following ratios and compare them to a predefined threshold ε: LL(xi−1 ) LL(xi) < ε (7) LL(xi−2 ) LL(xi−1) < ε (8) Even though the goal of the early stopping is to reduces the estimation time of DCMs, it is important to keep in mind that an important difference between the solution obtained by applying the early stopping to the BFGS−1 algorithm and the standard BFGS−1 should not arise. For example, Figure 3 shows the value of the log likelihood during the estimation process of a random model. As we can see in this example, there is a stagnation in the middle of the estimation. We do not want to do an early stopping at this moment since the estimation is far from being finished. We thus have to be careful with the threshold and make a sensitivity analysis on this parameter. Figure 1: Difference between a possible stagnation of the log likelihood and the real conver- gence 4 Case Study 4.1 Dataset The Swissmetro dataset [10] corresponds to survey data collected during March 1998. In that sense, it was used to study the market penetration of the Swissmetro, a revolutionary mag- lev under-ground system. Three alternatives - train,car and swissmetro - were generated for each of the 1192 respondents. A sample of 10’728 observations was obtained by generating 9 7
  • 8. types of situations. The pre-selected attributes of the alternatives are categorical (travel card ownership, gender, type of luggage, etc.) and continuous (travel time, cost and headway). 4.2 Sequence of Discrete Choice Models For the purpose of this project, two sequences of a hundred DCMs respectively denoted by S1 and S2, are considered. Each sequence starts with a given choice model. Then, a random perturbation is applied. These small modifications corresponds to the typical elementary perturbations that models consider when developing DCMs: • Add a non selected variable to enter the utility of an alternative • Remove a variable from the utility of an alternative • Increment the Box-Cox parameter of a given variable • Decrement the Box-Cox parameter of a given variable • Interact a variable with a socioeconomic variable • Deactivate the interaction of the considered variable with a socioeconmic variable The first sequence S1 begins with an alternative specific constant model and the com- plexity increases while the sequence S2 start with a random model and the complexity is approximately constant along the hundred models. The number of parameters for each sequence of DCMs is shown in the Figure 4: Figure 2: Number of parameters for the two sequences S1 and S2. 8
  • 9. 5 Results In order to avoid misunderstandings, abbreviations are given to the different methods. The base method estimates the parameters without applying any of the methods previously men- tioned methods and is denoted by Base. The standardization of the variables is denoted by ST. The warm start of the parameters is denoted by WSx, the warm start of the Hessian by WSH and the combination of the warm start of the Hessian and the variables by WSxH . The early stopping method is denoted by ES. 5.1 Standardization A benchmark of ten estimations for the methods Base and ST is conducted for the sequences S1 and S2. The Tables 1 and 2 presents a summary of the statistics for the methods previously mentioned. The lowest, mean and highest time among the ten estimations are reported. The average speedup corresponds to the ratio between the mean time of the Base method and the mean time of the ST method. Table 1: Summary of statistics for 10 estimations by method for the sequence S1 Statistics Base ST Minimum [s] 224.9 198.1 Mean [s] 229.5 201.7 Maximum [s] 231.3 203.2 Average Speedup / 1.15 Table 2: Summary of statistics for 10 estimations by method for the sequence S2 Statistics Base ST Minimum [s] 534.5 478.3 Mean [s] 536.2 480.9 Maximum [s] 539.4 485.9 Average Speedup / 1.12 It appears that the ST method of the variables is useful. For the first sequence S1, the estimation time is reduced from 229.5 s to 203.3 s. Concerning the sequence S2, a reduc- tion of 10 % of the estimation time is obtained. Since the initial Hessian of each model corresponds to the identity matrix, the direction of descent of the BFGS−1 corresponds to that of gradient descent. Because gradient descent algorithm does not take into account the curvature, having an error surface with high curvature will mean that we take many steps which may not be in the optimal direction. When we scale the variables, we reduce the curvature, which makes methods that ignore curvature work much better and reach the 9
  • 10. convergence faster. The standardization of the variables permits to have interesting results and should be applied beforehand for every sequence of DCMs that presents variables with differences in the range of values. 5.2 Warm Start A benchmark of ten estimations for the methods Base, WSx, WSH and WSxH is conducted for the sequences S1 and S2. Tables 3 and 4 presents a summary of the statistics for the methods previously mentioned. Table 3: Summary of statistics for 10 estimations by method for the sequence S1 Statistics Base WSx WSH WSxH Minimum [s] 224.9 187.4 104.9 60.8 Mean [s] 229.5 188.7 105.4 61.0 Maximum [s] 231.3 189.5 106.1 61.5 Average Speedup / 1.21 2.20 3.84 Table 4: Summary of statistics for 10 estimations by method for the sequence S2 Statistics Base WSx WSH WSxH Minimum [s] 534.6 459.3 210.4 119.2 Mean [s] 536.2 460.3 211.3 119.7 Maximum [s] 539.4 461.4 212.5 120.4 Average Speedup / 1.17 2.56 4.50 The results shows that the WSx and WSH methods enables to achieve a gain of time compared to the Base method. The WSx method reduces the estimation time of the Base method by respectively 19 % and 15 % for S1 and S2. For the sequences S1 and S2, the WSH is also an efficient method because it permits to accelerate the estimation by a factor respectively equal to 2.20 and 2.56. Indeed, the BFGS−1 algorithm is a line search optimization method and the iterates use the Hessian information and first-order information given by the gradient. Since the parameters involved from one model to another do not differ significantly and then the majority of the parameters are initialized with the estimated values of the previous model, the learned gradient and approximated Hessian from the previous model leads to a near-optimal initial value. The next iterate will be very close to the previous one and eventually subsequent update directions are expected to have small difference. It explains the improvement of time of the WSx and WSH methods compared to the Base method. The WSH is a more efficient method than the WSx. The former use the curvature information while the latter initialize the Hessian matrix with the identity matrix. Even 10
  • 11. though the iterates are close to the optimal value, the WSx corresponds to a gradient descent method which is known to have a low convergence speed compared to second-order methods. Because the WSx and WSH methods lead to an improvement of the estimation time, is it reasonable that the combination of these two methods which is the WSxH method leads to a better average speedup ratio. The WSxH permits to speedup the estimation time by a factor of 3.84 for S1 and 4.50 for S2. This is not only the sum of the two speedup ratios of the WSx and WSH, but an even better improvement. By initially having access to the gradient information, the BFGS−1 algorithm is faster in the first iterations, and also converges faster by exploiting the curvature information given by the Hessian. 5.3 Early Stopping A sensitivity analysis is conducted for The ES method. A sequence of 20 thresholds ranging from 10−7 to 5·10−4 is used in order to test the performance of the ES method compared to the Base method. Figures 3 and 4 presents the relative estimation time of the ES method for the 20 thresholds compared to the Base method for respectively S1 and S2. The black lines correspond to the mean time observed for the Base method for 10 estimations. The grey lines represent a confidence interval of 95 % around the mean estimation time of the Base method. A box plot with a confidence interval of 95 % for every threshold is plotted. As long as the value of the threshold parameter increases, the estimation time of the ES method decreases. This is an expected behavior, since higher thresholds are more flexible and the estimation process could potentially stop at iterations where the log likelihood is far away from the convergence region. For sequence S1, a threshold of 10−7 leads to a speed up of 3 %, while for S2 a speed up for approximately 4% is observed. Is it possible to obtain a better speed up when increasing the value of the threshold. Indeed, a reduction of 35 % and 15 % of the optimization is obtained for respectively S1 and S2 when a higher threshold of 5·10−4 is used. Figure 3: Sensitivity analysis of the threshold parameter for S1 11
  • 12. Figure 4: Sensitivity analysis of the threshold parameter for S2 In order to select the best threshold, the improvement of the estimation time is not the only criterion that should be taken into account. Even though the number of models stopped earlier increases as long as the threshold increases and the total optimization time decreases, the main drawback is that the method could stop at a plateau that is far away from the real plateau of convergence of the log likelihood. These models are falsely stopped earlier and should be distinguished from the models that have reached the real plateau of convergence. Figures 5 and 6 shows that from a certain threshold, some models are falsely stopped. Indeed, a threshold of 10−4 leads to 6 models among the 76 models that do not reach the real convergence of the log likelihood for the sequence S1. Concerning sequence S2, a higher threshold of 3·10−4 stopped falsely 3 models among the 90 models stopped earlier. Even though the main objective is to speed up the optimization time of a sequence of models and higher thresholds leads to lower optimization time, the modeler has to be careful with models that are falsely stopped earlier. We want now to select a suitable threshold parameter that provide both a good speedup ratio and a low number of models stopped earlier. A threshold of 2·10−5 is acceptable in the sense that no model is falsely stopped earlier for S1 and the speed up performance is equivalent to higher thresholds. Concerning S2, higher thresholds have no models that are stopped earlier but does not offer a substantial improvement in term of speedup ratio compared to the threshold 2·10−5 . We propose to use the threshold 2·10−5 . 12
  • 13. Figure 5: Number of models falsely stopped earlier for S1 Figure 6: Number of models falsely stopped earlier for S2 5.4 All Methods Combined We want now to test all the methods that leads to an improvement of the estimation time compared to the Base method. For both S1 and S2, the ST, the WSxH and the ES with a threshold of 2·10−5 methods are speeding-up the estimation process. The obtained results of the combination of all this methods for both S1 and S2 are compared to the Base method and presented in the Tables 5 and 6. The combination of the WSxH , ES with a threshold of 2·10−5 , ST methods leads to an improvement of a factor 5.26 and 6.67 compared to the Base method for respectively S1 and S2. The WSxH method is the best method among the three main techniques in term of speedup ratio, but the association with the ST and ES methods leads to an even better improvement compared to the Base method. 13
  • 14. Table 5: Summary of statistics for 10 estimations for the sequence S1 : Comparison between the combined methods and the Base method Statistics Base Final Minimum [s] 224.9 44.3 Mean [s] 229.5 44.3 Maximum [s] 231.3 44.8 Average Speedup / 5.26 Table 6: Summary of statistics for 10 estimations for the sequence S2 : Comparison between the combined methods and the Base method Statistics Base Final Minimum [s] 534.6 83.8 Mean [s] 536.2 84.2 Maximum [s] 539.4 84.6 Average Speedup / 6.67 6 Conclusion Enhancing the estimation of a sequence of DCMs is a direction of research that has not yet been explored. The objective of this project was to propose different methods to improve the total estimation time of a sequence of DCMs. The BFGS−1 algorithm is used to estimate the two sequence of DCMs, S1 and S2. The standardization of the variables accelerate slowly the estimation time and should be used at the beginning of every optimization task because of its simplicity. The second approach was to implement a commonly used method in ML, which uses the knowledge acquired before to use it for a new task. The WSxH method per- mits to speedup the estimation time by a factor of 3.84 and 4.5 compared to the Base method for respectively S1 and S2. The last approach is the ES method which has shown interesting improvement of the estimation time but the applied threshold has to be carefully chosen in order not stop at a bad convergence plateau of the log likelihood. A threshold of 2·10−5 is chosen. The combination of all the methods that speedup the estimation time of both S1 and S2 leads to an improvement of respectively 5.26 and 6.67 compared to the Base method. Future directions of research include two main improvements. The first one concerns the ES method. The ES method tends to stop at a plateau of convergences that could be far away from the real convergence of the log likelihood. A more robust ES method could be implemented by finding an efficient way to detect these regions of convergence. The second possible improvement concerns the warm start. A more detailed analysis of the warm start could be done. Even though, the total estimation time is reduced by the WSxH method, some models were this method is applied have an optimization time higher than the case where no warm start is applied. 14
  • 15. References [1] Bierlaire M. (1998) Discrete Choice Models. In: Labb´e M., Laporte G., Tanczos K., Toint P. (eds) Operations Research and Decision Aid Methodologies in Traffic and Trans- portation Management. NATO ASI Series (Series F: Computer and Systems Sciences), vol 166. Springer, Berlin, Heidelberg [2] Ben-Akiva M., Bierlaire M. (1999) Discrete Choice Methods and their Applications to Short Term Travel Decisions. In: Hall R.W. (eds) Handbook of Transportation Science. International Series in Operations Research & Management Science, vol 23. Springer, Boston, MA [3] Devolder, O., Glineur, F. & Nesterov, Y. First-order methods of smooth convex opti- mization with inexact oracle. Math. Program. 146, 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5 [4] Henning, P. and Kiefel, M. (2013). Quasi-newton methods: A new direction. The Journal of Machine Learning Research,14(1):843-865 [5] Robert M. Gower and Filip Hanzely and Peter Richt´arik and Sebastian Stich (2018). Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization. [6] Lederrey G., Lurkin V. and Hillel T. and Bierlaire M (2020). Estimation of Discrete Choice Models with Hybrid Stochastic Adaptive Batch Size Algorithms. [7] Jordan T. Ash and Ryan P. Adams (2019). On the Difficulty of Warm-Starting Neural Network Training. [8] Prechelt L. (2012) Early Stopping — But When?. In: Montavon G., Orr G.B., M¨uller KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. [9] Fletcher, R. (1987). Practical Methods of Optimization; (2Nd Ed.). Wiley- Interscience,New York, NY, USA. [10] Wolfe, P. (1969). Convergence Conditions for Ascent Methods. SIAM Review, 11(2):226– 235. [11] Wolfe, P. (1971). Convergence Conditions for Ascent Methods. II: Some Corrections. SIAM Review, 13(2):185–188 [12] Bierlaire, M., Axhausen, K. and Abay, G. (2001), The acceptance of modal innovation: The case of Swissmetro, in ‘Proceedings of the Swiss Transport Research Conference’, Ascona, Switzerland. 15