unischeduler_pakdd_v3

Automated Setting of Bus
Schedule Coverage using
Unsupervised Machine Learning
Jihed Khiary, Luis Moreira-Matias, Vitor Cerqueira
Intelligent Transport Systems Group
Social Solutions Research division
NEC Laboratories Europe, Heidelberg, DE
Oded Cats
Dep. Transport and Planning, TU Delft, NL

Outline
 Problem Overview
 Methodology
 Real-world Case Study
 Conclusions

3 © NEC Corporation 2015
Problem Overview
 Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment

Problem Overview
 Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment

Problem Overview
Automatic Vehicle
Location (AVL) Data
GPS
Satellites
Automatic Passenger
Counting (APC) Data
Cloud Data
ServersAnalytics

Example of Schedule Coverage

Methodology

Methodology – 1,2) Daily Profile Generation
 Daily Profiles computed based on round-trip times. Link travel times are derived from AVL;
 Dwell time constants (boarding/alighting/dead times) are estimated from APC;
 Robust Linear Regression (i.e. w/ Huber Loss) is used to (over/under) estimate dwells;

Methodology – 3) DTW-Flavoured Distance
 Dynamic Time Warping(DTW) is used to compute a distance-like measure between days using
those profiles;
 A Feature Matrix is created using DTW. A Distance matrix is then created using the Euclidean
Distance over that Feature Matrix.
DTW
Day i
Day j
Feature matrix Distance matrix
𝑳 𝟐-distance

Methodology – 4) Clustering w/ GMM delivered by E-M
(Note: Illustrative Example from synthetic data.)

Methodology – 5) Selecting the best route-based k
1) Test a pool of admissible values for 𝑘 𝜖 2,7 ;
2) Learn a Gaussian Mixture of Models (GMM) w/
E-M for each possible k value;
3) Evaluate each model using Bayesian
Information Criterion:
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size

Goodness of fit linear penalty
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;

Goodness of fit linear penalty
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Reliability;

𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality;
• Adequacy;
• Reliability;

𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality
normalized/scaled BIC;
• Adequacy
𝒇 𝒌 ∈ 𝟎, 𝟏 , logarithmic penalty for the high k values;
q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
• Reliability;
𝜎 𝑘 , penalty for outliers (e.g. sundays punctually grouped with workdays
in some weeks);

q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
Two simple ideas:
 Consider every week as a Transaction and each day that is in it as an Item;
 A cluster with a large frequent itemset with a large support is a good quality
cluster (e.g. Monday to Friday) – while the opposite is also true;
How to use those to find highly interpretable Schedules:
1) Use PrefixSpan to find such frequent sequences/itemsets;
2) Compute the one which provides best ratio support/number of items;
3) Use that one to assign a quality value to each cluster;

Illustrating BIC limitations on this problem...
nbic=0,3987 nbic=0,4462

Illustrating how Sequence Mining can Promote Schedules Easy-to-Memorize
nbic=0,4166
q=0,7
nbic=0,4877
q=0,33

Methodology – 6) Merging the Results w/ Consensual Clustering
1) Get a consensual K through a weighted average of the quality of
each partitioning of each route (where high frequent routes
weight more than low-frequent ones);
2) Perform Consensual Clustering for the consensual K (using
mean-co association matrix);
3) Interpret the results using a rule induction model (e.g. RIPPER)
to extract understandable rules;

Real-World Case Study: Swedish Transit Operator
Case Study: SWEDEN (EMEA)

Suggested
Schedule
Switch
Original
Schedule
Switch
▌Input: 6 months of
AVL/APC data from four
distinct routes was
used;
▌Output: A period of
one month is suggested
to change from Schedule
#1 (workdays-Summer)
to Schedule #3
(workdays-Winter).

▌A Simulation was used to assess impact on Operator’s KPI;
▌The winter timetables were put in place one month earlier;
▌EWT improved 70%. OTP improved 10%.

Final Remarks
1. We proposed the first methodology to evaluate the
reliability of both the Number and Coverage of Schedules on
mass transit operators using AVL/APC data;
2. It involves multiple steps, including constrained profiling using a
specific loss function, model-based clustering and k selection
using an ad-hoc metric specific for this problem and a
networkwise solution based on consensual clustering;
3. A real-world case study was used to evaluate the methodology.
The obtained results suggested to keep the same number of
schedules with a small change of the Schedule Coverage
definition;
4. A simulation uncovered the potential benefits of adopting
the novel schedule (i.e. +10% On-Time Performance);

Thank you for your time!

unischeduler_pakdd_v3

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to unischeduler_pakdd_v3

Similar to unischeduler_pakdd_v3 (20)

unischeduler_pakdd_v3