A presentation by SMART Infrastructure Facility Research Director Dr Pascal Perez to the International Symposium For Next Generation Infrastructure, Vienna, 30 September - 1 October 2014.
Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Tour-based Travel Mode Estimation based on Data Mining and Fuzzy Techniques
1. Tour-based Travel Mode Choice Estimation
based on
Data Mining and Fuzzy Techniques
Nagesh Shukla, Jun Ma, Rohan Wickramasuriya, Nam Huynh, Pascal Perez
Presented by:
Pascal Perez
Research Director
perez@uow.edu.au
2. Classification of literature in mode choice
Data
Type
Trip Type
Discrete Choice Models Machine Learning
Crisp Data Crisp & Fuzzy Data Crisp Data
Crisp &
Fuzzy
Data
Independent Trips
Gaudry, (1980);
McFadden (1973);
Daly & Zachary
(1979); Hensher &
Ton (2000)
Dell'Orco et al.
(2007)
Xie et al. 2003;
Reggiani & Tritapepe
1998; Cantarella et al.,
2003; Shmueli et al.
1996; Edara 2003;
Hensher and Ton, 2000
Yaldi, G.
(2005)
Linked Individual
Trips (tour-based)
Miller et al. (2005) - Biagioni et al., (2008) This Study
Linked Household
Trips
Miller et al. (2005) - Future Work
Future
Work
3. Machine learning methods
Input Layer Hidden Layer Output Layer
Back-propagation algorithms for ANN training
- Scaled conjugate gradient (Moller 1993)
-Levenberg-Marquardt optimization (Hagan and
Menhaj, 1994)
(http://iasri.res.in/ebook/win_school_aa/notes/Decision_tree.pdf)
Decision trees
- are easy to assimilate by humans thanks to their
intuitive representation
- do not require too much parameter settings
- can be constructed fairly fast and its accuracy is
comparable to other classification models.
DT algorithms such as C4.5 and Classification and
Regression Technique (CART) have been identified as
top 10 data mining algorithms in terms of its wider
applicability.
4. Case study - HTS data for Sydney GMA
• 3000-3500 household participants each
year. Dataset covers 5 years.
• 14 variables include
– Day of the week
– Household type
– Occupancy
– Number of vehicles
– Household income
– Number of people holding a valid licence
– Number of students
– Working at home
– Total number of residents
– Trip time *
– Trip purpose
– Road distance travelled
– Departure time *
– Travel mode
(http://www.bts.nsw.gov.au/Images/UserUploadedImages/86/hts-gma-map.jpg)
5. Data pre-processing
• Linking consecutive trips of an individual
Let (X,Y) be a survey dataset of trips made by L travellers, where (xlm,ylm)
collectively represents information of the mth trip made by the traveller l, m Є {1,
2, ..., Ml}, l Є {1, 2, ..., L}.
is a collection of explanatory variables and ylm is the travel
mode of the mth trip made by the traveller l.
To account for impact of consecutive trips, a new explanatory variable
representing the mode of the previous trip is defined as
,
6. Data preprocessing (cont.)
• Fuzzifying explanatory variables departure time
Four fuzzy sets of departure are defined, “2 hour am peak (7-9am), 6 hour inter-peak
(9am-3pm), 3 hour evening peak (3-6pm) and the remaining evening/night period”
(Sydney Strategic Travel Model – Modelling future travel patterns, February 2011 Release, Technical Documentation)
7. Data preprocessing (cont.)
• Fuzzifying explanatory variables household income
Low income: “Persons in the
second and third income
deciles”
Middle income: “Persons in the
middle income quintile”
High income: “Persons in the
top income quintile”
(Australian Bureau of Statistics – Household Income
and Income Distribution, 6523.0, 2011-2012)
Household income in survey data, ranging from AU$5006 to AU$402741, is classified
into three fuzzy sets ‘low income’, ‘middle income’, and ‘high income’.
8. Experiments
Experiment 1 (Base) Experiment 2 (Fuzzy variables) Experiment 3 (linked trip) Experiment 4 (Fuzzy variables and linked trips)
Day of the week Day of the week Day of the week Day of the week
Household type Household type Household type Household type
Occupancy Occupancy Occupancy Occupancy
Number of vehicles Number of vehicles Number of vehicles Number of vehicles
Household income Fuzzy household income Household income Fuzzy household income
Number of licences Number of licences Number of licences Number of licences
Number of students Number of students Number of students Number of students
Working at home Working at home Working at home Working at home
Number of residents Number of residents Number of residents Number of residents
Trip time Trip time Trip time Trip time
Trip purpose Trip purpose Trip purpose Trip purpose
Road distance travelled Road distance travelled Road distance travelled Road distance travelled
Departure time Fuzzy departure time Departure time Fuzzy departure time
Previous trip mode Previous trip mode
9. Results
Household travel survey data is partitioned into three subsets, a training dataset (30%),
a testing dataset (35%) and a validation dataset (35%).
Experiment
Empirical Settings PCI (%)
Fuzzy sets
Dependent
trip
DT ANN
1 No No 64.71 68.1
2 Yes No 67.67 68.7
3 No Yes 85.63 85.9
4 Yes Yes 86.17 86.8
Travel Modes HTS data DT Prediction ANN Prediction
Car_driver 40.95% 43.50% 43.11%
Car_passenger 20.65% 30.76% 19.05%
Public_transport 8.37% 7.54% 7.74%
Walk 29.26% 17.68% 29.55%
Bicycle 0.77% 0.53% 0.53%
10. Conclusions
• New methodology for travel mode choice using artificial
neural network and decision trees.
• The methodology considers
– Expert judgements by using fuzzy sets instead of crisp data for some
explanatory variables.
– Tour based model that accounts for the dependency of modes
between trips
• Travel mode prediction using fuzzified explanatory variables
combined with tour based model proved to out-perform
predictions using crisp variables.
• Future work could involve more explanatory variables, new
fuzzy sets, and account for dependencies between trips of
individuals in the same household.
An ANN consists of a set of interconnected processing nodes called neurons that is used to estimate the mapping between explanatory variables and the responses. In this diagram, the explanatory variables (the 12-13 attributes from HTS data) are x1 to xn. The value of each node in the hidden layer is estimated by equation 1, and the value of each node of the output layer is estimated by equation 2.
Phi are predetermined transfer function. Weights and bias values are iteratively estimated during the training until satisfactory output layer is achieved.
A decision tree is used to learn a classification function which predicts the value of a dependent attribute given the values of independent attributes.
In a decision trees, an instance is classified by sorting it through the tree to the appropriate leaf node, then returning the classification associated with this leaf (in this case different transport modes)
Trip time and departure time are categorical variables (originally continuous variables)
Trip times are grouped into blocks of 20 minutes.
Departure times are grouped into blocks of 1 hour (?)
The common conclusions in machine learning are that categorising continuous variables helps increase the accuracy of classification problems.
Vertical axis is membership. According to this new graph of fuzzified departure time, the departure time of a trip is classified into the category with higher membership. For example, departure time 6.30 is classified as ‘evening/night period’ rather than ‘morning peak’ because 6.30 has a higher membership in the former category.