More Related Content Similar to unischeduler_pakdd_v3 ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha... ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha... LIFE GreenYourMove
Similar to unischeduler_pakdd_v3 (20) unischeduler_pakdd_v31. Automated Setting of Bus
Schedule Coverage using
Unsupervised Machine Learning
Jihed Khiary, Luis Moreira-Matias, Vitor Cerqueira
Intelligent Transport Systems Group
Social Solutions Research division
NEC Laboratories Europe, Heidelberg, DE
Oded Cats
Dep. Transport and Planning, TU Delft, NL
3. 3 © NEC Corporation 2015
Problem Overview
Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment
4. 4 © NEC Corporation 2015
Problem Overview
Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment
5. 5 © NEC Corporation 2015
Problem Overview
Automatic Vehicle
Location (AVL) Data
GPS
Satellites
Automatic Passenger
Counting (APC) Data
Cloud Data
ServersAnalytics
6. 6 © NEC Corporation 2015
Example of Schedule Coverage
7. 7 © NEC Corporation 2015
Methodology
8. 8 © NEC Corporation 2015
Methodology – 1,2) Daily Profile Generation
Daily Profiles computed based on round-trip times. Link travel times are derived from AVL;
Dwell time constants (boarding/alighting/dead times) are estimated from APC;
Robust Linear Regression (i.e. w/ Huber Loss) is used to (over/under) estimate dwells;
9. 9 © NEC Corporation 2015
Methodology – 3) DTW-Flavoured Distance
Dynamic Time Warping(DTW) is used to compute a distance-like measure between days using
those profiles;
A Feature Matrix is created using DTW. A Distance matrix is then created using the Euclidean
Distance over that Feature Matrix.
DTW
Day i
Day j
Feature matrix Distance matrix
𝑳 𝟐-distance
10. 10 © NEC Corporation 2015
Methodology – 4) Clustering w/ GMM delivered by E-M
(Note: Illustrative Example from synthetic data.)
11. 11 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
1) Test a pool of admissible values for 𝑘 𝜖 2,7 ;
2) Learn a Gaussian Mixture of Models (GMM) w/
E-M for each possible k value;
3) Evaluate each model using Bayesian
Information Criterion:
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
12. 12 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
Goodness of fit linear penalty
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
13. 13 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
Goodness of fit linear penalty
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
14. 14 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
15. 15 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality
normalized/scaled BIC;
• Adequacy
𝒇 𝒌 ∈ 𝟎, 𝟏 , logarithmic penalty for the high k values;
• Interpretability;
q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
• Reliability;
𝜎 𝑘 , penalty for outliers (e.g. sundays punctually grouped with workdays
in some weeks);
16. 16 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
• Interpretability;
q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
Two simple ideas:
Consider every week as a Transaction and each day that is in it as an Item;
A cluster with a large frequent itemset with a large support is a good quality
cluster (e.g. Monday to Friday) – while the opposite is also true;
How to use those to find highly interpretable Schedules:
1) Use PrefixSpan to find such frequent sequences/itemsets;
2) Compute the one which provides best ratio support/number of items;
3) Use that one to assign a quality value to each cluster;
17. 17 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
Illustrating BIC limitations on this problem...
nbic=0,3987 nbic=0,4462
18. 18 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
Illustrating how Sequence Mining can Promote Schedules Easy-to-Memorize
nbic=0,4166
q=0,7
nbic=0,4877
q=0,33
19. 19 © NEC Corporation 2015
Methodology – 6) Merging the Results w/ Consensual Clustering
1) Get a consensual K through a weighted average of the quality of
each partitioning of each route (where high frequent routes
weight more than low-frequent ones);
2) Perform Consensual Clustering for the consensual K (using
mean-co association matrix);
3) Interpret the results using a rule induction model (e.g. RIPPER)
to extract understandable rules;
20. 20 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
Case Study: SWEDEN (EMEA)
21. 21 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
Suggested
Schedule
Switch
Original
Schedule
Switch
▌Input: 6 months of
AVL/APC data from four
distinct routes was
used;
▌Output: A period of
one month is suggested
to change from Schedule
#1 (workdays-Summer)
to Schedule #3
(workdays-Winter).
22. 22 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
▌A Simulation was used to assess impact on Operator’s KPI;
▌The winter timetables were put in place one month earlier;
▌EWT improved 70%. OTP improved 10%.
23. 23 © NEC Corporation 2015
Final Remarks
1. We proposed the first methodology to evaluate the
reliability of both the Number and Coverage of Schedules on
mass transit operators using AVL/APC data;
2. It involves multiple steps, including constrained profiling using a
specific loss function, model-based clustering and k selection
using an ad-hoc metric specific for this problem and a
networkwise solution based on consensual clustering;
3. A real-world case study was used to evaluate the methodology.
The obtained results suggested to keep the same number of
schedules with a small change of the Schedule Coverage
definition;
4. A simulation uncovered the potential benefits of adopting
the novel schedule (i.e. +10% On-Time Performance);
24. 24 © NEC Corporation 2015
Thank you for your time!