2. The Survey Method
• Conventional survey methods for activity-travel diary
data
• Application of new data collection methods
– GPS logger (original traces)
– User participation
• Social demographic information
• Personal profile
• Downloading and uploading data
• Validating activity-trip agendas
– Web-based prompt recall survey
• Embedded in TraceAnnotator
5. Survey Management
• Time horizon
– 4 waves, each wave takes 3 months
– Each individual is invited for 3 months continuously
• Location
– Rijnmond and Eindhoven regions
• Respondents
– People living in area
– Companies recruit their own panels
• Statistics followed will use the data from Rijnmond
region as an example
6. User Participation (# of days)
19%
6%
11%
5%
59%
0%
10%
20%
30%
40%
50%
60%
70%
0~7 8~14 15~31 32~60 60~
Percentageoftheparticipation
Number of days
User participation: Rijmond area
• 300 of 434 respondents are fully or partly involved in the survey
12. Approach
• Classification of transport modes and activity episode
– Bayesian Belief Network (BBN)
• Replaces ad hoc rules
• A graphical representation of probabilistic causal information
incorporating sets of probability conditional tables;
• Represents the interrelationship between spatial and temporal
factors (input), and activity-travel pattern (output), i.e.
transportation modes and activity episode;
• Learning-based improved accuracy if consistent evidence is
obtained over time from more samples;
15. Theoretical support and applications
• Accuracy of the algorithm
– Limited sample and transportation modes
– Full sample and full transportation modes
• Comparison of different imputation algorithms
• Improve the imputed activity/trip sequence
• Map matching between GPS traces and road networks
• Impact of equity of travel time uncertainty
16. Accuracy of the Algorithm
Source: Anastasia, et al., (2010) Semi-Automatic Imputation of Activity-Travel Diaries Using GPS Traces, Prompted Recall and Context-
Sensitive Learning Algorithms. Journal of Transportation Research Record, 2183.
17. Accuracy of the Algorithm
Activity Walking Running Cycling Bus Motorcycle Car Train Metro Tram Light rail
Activity 84% 4% 0% 0% 0% 0% 1% 9% 2% 0% 0%
Walking 2% 97% 0% 0% 1% 0% 0% 0% 0% 0% 0%
Running 0% 0% 98% 0% 1% 0% 1% 0% 0% 0% 0%
Cycling 0% 0% 0% 100% 0% 0% 0% 0% 0% 0% 0%
Bus 1% 0% 0% 0% 87% 0% 0% 0% 0% 12% 0%
Motorcycle 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 0%
Car 0% 0% 0% 0% 1% 0% 98% 0% 0% 0% 1%
Train 0% 0% 0% 0% 0% 0% 5% 58% 36% 0% 0%
Metro 1% 0% 0% 0% 0% 0% 0% 1% 98% 0% 0%
Tram 0% 0% 0% 0% 0% 0% 2% 0% 0% 98% 0%
Light rail 0% 0% 0% 0% 2% 0% 0% 0% 0% 0% 98%
GPS Only
Activity 84%
Walking 97%
Running 98%
Cycling 100%
Bus 87%
Motorcycle 100%
Car 98%
Train 58%
Metro 98%
Tram 98%
Light rail 98%
Source: Feng, T and Timmermans, H. (2012) Recognition of
transportation mode using GPS and accelerometer data. International
Conference of IATBR, Toronto, Canada, 15-20, July, 2012.
18. Comparison of Imputation Algorithms
Id Algorithms
1 Bayesian Network (BN)
2 Naive Bayesian (NB)
3 Logistic regression (LR)
4 Multilayer Perception (MP)
5 Decision Table (DT)
6 Support Vector Machine (SVM)
7 C4.5 (C45)
8 CART (CART)
Algorithms
Training Data Test Data
CCI (%) ICI (%) Kappa CCI (%) ICI (%) Kappa
BN 99.805 0.195 0.997 99.474 0.526 0.993
NB 86.966 13.034 0.822 86.648 13.352 0.818
LR 94.865 5.135 0.926 94.510 5.490 0.921
MP 97.118 2.882 0.958 96.816 3.184 0.954
DT 98.886 1.114 0.984 98.100 1.900 0.973
SVM 94.667 5.333 0.923 94.458 5.542 0.920
C45 99.825 0.175 0.998 99.309 0.691 0.990
Table 3 Prediction accuracy and model performance
• Training data and test data
• We use the indicators of the correctly classified
instances (CCI), incorrectly classified instances (ICI) and
Kappa value (Kappa).
• Data are for each time epoch
- WCTRS 2013
Count Percentage
Training data 39,942 75%
Test data 13,316 25%
Total 53,258 100%
Training and test datasets
19. Comparison of Imputation Algorithms
Table 4 Hit ratios by transportation mode and activity episode
Note: A-Activity episode; B-Train; C-Walking; D-Bike; E-Car; F-Bus; G-Motorbike; H-Running; I-Tram; J-Metro
• BN and C45 may perform more stable than others
• The hit ratios for the test data do not have to be lower than that for the
training data, except the BN and C45.
• The level of the hit ratio of BN model is comparable with other methods.
Training Data A B C D E F G H I J
BN 0.997 0.997 0.999 1 0.999 0.999 1 0.999 1 1
NB 0.848 0.969 0.934 0.799 0.836 0.926 0.949 0.98 1 0.983
LR 0.989 0.991 0.818 0.928 0.891 0.758 0.947 0.76 1 1
MP 0.998 0.974 0.916 0.926 0.965 0.743 0.989 0.985 1 1
DT 0.999 0.971 0.958 0.985 0.979 0.99 0.991 0.974 0.982 0.98
SVM 0.987 0.999 0.76 0.925 0.876 0.888 0.971 0.654 1 1
C45 1 0.999 0.993 0.997 0.997 0.994 0.998 0.999 0.996 0.99
Test Data A B C D E F G H I J
BN 0.996 0.993 0.988 0.997 0.994 0.977 0.999 1 1 0.983
NB 0.849 0.964 0.942 0.789 0.826 0.9 0.946 0.963 1 0.975
LR 0.99 0.994 0.815 0.915 0.882 0.733 0.935 0.752 1 1
MP 0.998 0.976 0.896 0.926 0.962 0.708 0.987 0.974 1 1
DT 0.998 0.948 0.939 0.973 0.97 0.973 0.982 0.963 0.892 0.959
SVM 0.987 0.998 0.763 0.931 0.869 0.844 0.968 0.641 0.985 1
C45 0.998 0.998 0.974 0.992 0.987 0.98 0.991 0.956 1 0.992
20. Superimposing the activity/trip sequence
L1 = L4
HOME
L2
Work
L3
Shop
Sport
Trip 2
Trip 3
Trip 4
1
2
3
Trip 1
Trip 5
Trip 6
L5
Restaurant • Method 1
o The frequency of the transportation mode which
has the highest probability is identified for each
trip episode separately. The transportation mode
which has the highest frequency for all trips is
selected.
• Method 2
o The frequencies of all transportation modes of all
trip episodes which belong to the same tour are
put together. Then, the one which has the
highest frequency with highest probabilities is
selected to replace others.
• Method 3
o In case of three or more trips within a same
tour, we identify the transportation mode using
Method 1 for all trips excluding the first and the
last trips. Then, we use the confirmed mode as
the replacement of the first and last trips.
- NTTS2013
21. Morning peak Evening peak
Original imputed 60,50% 71,1%
Method 1 65,8% 76,3%
Method 2 76,3% 65,4%
Method 3 63,2% 68,4%
• Hit ratios of car mode (AM vs. PM)
BIKE BUS CAR METRO TRAIN TRAM WALKING
Original BIKE 4,3% - 6,4% 4,8% - 5,6% 20,9%
BUS 4,3% - 34,6% 9,5% - - 21,3%
CAR 4,3% 42,9% 2,3% 6,3% 57,1% - 24,4%
METRO - - 0,5% 27,0% - - 2,2%
RUNNING 48,9% - 0,3% - - - 12,5%
TRAIN - 4,8% 42,7% 34,9% 28,6% - 17,2%
TRAM - 47,6% 1,8% - - 79,6% 0,9%
WALKING 38,3% 4,8% 11,5% 17,5% 14,3% 14,8% 0,6%
Method 1 BIKE 34,0% - 2,8% 4,8% - 1,9% 14,1%
BUS 4,3% 4,8% 22,6% 9,5% - - 9,7%
CAR - 28,6% 26,2% 11,1% 85,7% - 44,4%
METRO - 4,8% 0,8% 23,8% - - 1,9%
RUNNING 34,0% - 0,3% - - - 5,3%
TRAIN - 9,5% 28,8% 33,3% - - 14,4%
TRAM - 38,1% 1,3% - - 72,2% 3,4%
WALKING 27,7% 14,3% 17,3% 17,5% 14,3% 25,9% 6,9%
Method 2 BIKE 19,1% - 3,1% 4,8% - 1,9% 15,0%
BUS 4,3% - 19,6% 9,5% - - 7,8%
CAR 2,1% 33,3% 26,7% 11,1% 71,4% - 44,1%
METRO - - 0,8% 20,6% - - 1,6%
RUNNING 34,0% - 0,3% - - - 6,3%
TRAIN - 9,5% 31,6% 36,5% 14,3% - 14,4%
TRAM - 47,6% 2,0% - - 77,8% 2,5%
WALKING 40,4% 9,5% 16,0% 17,5% 14,3% 20,4% 8,4%
Method 3 BIKE 17,0% - 4,8% 4,8% - 1,9% 13,8%
BUS 4,3% - 23,2% 9,5% - - 14,4%
CAR 2,1% 38,1% 13,7% 6,3% 57,1% - 29,7%
METRO - - 1,3% 27,0% - - 1,6%
RUNNING 29,8% - 0,3% - - 5,6% 10,6%
TRAIN - 9,5% 34,4% 36,5% 28,6% - 16,3%
TRAM - 38,1% 0,8% - - 68,5% 2,5%
WALKING 46,8% 14,3% 21,6% 15,9% 14,3% 24,1% 11,3%
Total 100,0% 100,0% 100,0% 100,0% 100,0% 100,0% 100,0%
• Confusion matrix of original imputed data and new methods
• The confusion matrix shows
that the suggested algorithm
could substantially improve
the accuracy of the
imputation;
• As shown in the hit ratio, all
improved methods lead to
increased accuracy for
morning peak trips relative to
originally imputed data;
• Method 1 is better than the
other two methods, especially
for the prediction of
motorized commute trips
during peak times.
22. Feedbacks from Respondents
• Problems during the survey
– Problems of using BT747
• Different windows system (64b system)
• Internet browser (Firefox sometimes has problems)
• Can’t download data (complex reasons)
• Can’t upload data (wrong data file or data format)
– Problems of website
• Small bugs of website program (improved)
• Multiple persons in a same household (user account specific)
• Long processing time (Not cleaning data)
– Missing days
• Forget GPS logger or problematic data (view as a schedule)
23. Other Issues
• Enough number of respondents
• Monitor and remind respondents
• Completeness of personal profile data (social
demography)
• Post data processing