SlideShare a Scribd company logo
1 of 25
Download to read offline
Automated Setting of Bus
Schedule Coverage using
Unsupervised Machine Learning
Jihed Khiary, Luis Moreira-Matias, Vitor Cerqueira
Intelligent Transport Systems Group
Social Solutions Research division
NEC Laboratories Europe, Heidelberg, DE
Oded Cats
Dep. Transport and Planning, TU Delft, NL
Outline
 Problem Overview
 Methodology
 Real-world Case Study
 Conclusions
3 © NEC Corporation 2015
Problem Overview
 Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment
4 © NEC Corporation 2015
Problem Overview
 Tactical Planning Stages:
1. Network and Resource Pool
2. Schedule Number and Coverage
3. Timetabling and Service
4. Service Assignment
5 © NEC Corporation 2015
Problem Overview
Automatic Vehicle
Location (AVL) Data
GPS
Satellites
Automatic Passenger
Counting (APC) Data
Cloud Data
ServersAnalytics
6 © NEC Corporation 2015
Example of Schedule Coverage
7 © NEC Corporation 2015
Methodology
8 © NEC Corporation 2015
Methodology – 1,2) Daily Profile Generation
 Daily Profiles computed based on round-trip times. Link travel times are derived from AVL;
 Dwell time constants (boarding/alighting/dead times) are estimated from APC;
 Robust Linear Regression (i.e. w/ Huber Loss) is used to (over/under) estimate dwells;
9 © NEC Corporation 2015
Methodology – 3) DTW-Flavoured Distance
 Dynamic Time Warping(DTW) is used to compute a distance-like measure between days using
those profiles;
 A Feature Matrix is created using DTW. A Distance matrix is then created using the Euclidean
Distance over that Feature Matrix.
DTW
Day i
Day j
Feature matrix Distance matrix
𝑳 𝟐-distance
10 © NEC Corporation 2015
Methodology – 4) Clustering w/ GMM delivered by E-M
(Note: Illustrative Example from synthetic data.)
11 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
1) Test a pool of admissible values for 𝑘 𝜖 2,7 ;
2) Learn a Gaussian Mixture of Models (GMM) w/
E-M for each possible k value;
3) Evaluate each model using Bayesian
Information Criterion:
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
12 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
Goodness of fit linear penalty
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
13 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n)
Goodness of fit linear penalty
𝑥 – observed data
𝜃 - model parameters
𝑀 – resulting model
𝑛 – sample size
Is the best BIC-model a good schedule plan?
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
14 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality;
• Adequacy;
• Interpretability;
• Reliability;
15 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘
• Punctuality
normalized/scaled BIC;
• Adequacy
𝒇 𝒌 ∈ 𝟎, 𝟏 , logarithmic penalty for the high k values;
• Interpretability;
q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
• Reliability;
𝜎 𝑘 , penalty for outliers (e.g. sundays punctually grouped with workdays
in some weeks);
16 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
• Interpretability;
q(k), quality of the produced schedules (sequence mining using
Prefixspan) in terms of the regularity of the proposed outputs;
Two simple ideas:
 Consider every week as a Transaction and each day that is in it as an Item;
 A cluster with a large frequent itemset with a large support is a good quality
cluster (e.g. Monday to Friday) – while the opposite is also true;
How to use those to find highly interpretable Schedules:
1) Use PrefixSpan to find such frequent sequences/itemsets;
2) Compute the one which provides best ratio support/number of items;
3) Use that one to assign a quality value to each cluster;
17 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
Illustrating BIC limitations on this problem...
nbic=0,3987 nbic=0,4462
18 © NEC Corporation 2015
Methodology – 5) Selecting the best route-based k
Illustrating how Sequence Mining can Promote Schedules Easy-to-Memorize
nbic=0,4166
q=0,7
nbic=0,4877
q=0,33
19 © NEC Corporation 2015
Methodology – 6) Merging the Results w/ Consensual Clustering
1) Get a consensual K through a weighted average of the quality of
each partitioning of each route (where high frequent routes
weight more than low-frequent ones);
2) Perform Consensual Clustering for the consensual K (using
mean-co association matrix);
3) Interpret the results using a rule induction model (e.g. RIPPER)
to extract understandable rules;
20 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
Case Study: SWEDEN (EMEA)
21 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
Suggested
Schedule
Switch
Original
Schedule
Switch
▌Input: 6 months of
AVL/APC data from four
distinct routes was
used;
▌Output: A period of
one month is suggested
to change from Schedule
#1 (workdays-Summer)
to Schedule #3
(workdays-Winter).
22 © NEC Corporation 2015
Real-World Case Study: Swedish Transit Operator
▌A Simulation was used to assess impact on Operator’s KPI;
▌The winter timetables were put in place one month earlier;
▌EWT improved 70%. OTP improved 10%.
23 © NEC Corporation 2015
Final Remarks
1. We proposed the first methodology to evaluate the
reliability of both the Number and Coverage of Schedules on
mass transit operators using AVL/APC data;
2. It involves multiple steps, including constrained profiling using a
specific loss function, model-based clustering and k selection
using an ad-hoc metric specific for this problem and a
networkwise solution based on consensual clustering;
3. A real-world case study was used to evaluate the methodology.
The obtained results suggested to keep the same number of
schedules with a small change of the Schedule Coverage
definition;
4. A simulation uncovered the potential benefits of adopting
the novel schedule (i.e. +10% On-Time Performance);
24 © NEC Corporation 2015
Thank you for your time!
unischeduler_pakdd_v3

More Related Content

What's hot

VEDAViz for ETSAP partners
VEDAViz for ETSAP partnersVEDAViz for ETSAP partners
VEDAViz for ETSAP partnersIEA-ETSAP
 
DSD-INT 2018 Opening wflow User Day 2018 - Hegnauer
DSD-INT 2018  Opening wflow User Day 2018 - HegnauerDSD-INT 2018  Opening wflow User Day 2018 - Hegnauer
DSD-INT 2018 Opening wflow User Day 2018 - HegnauerDeltares
 
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...Deltares
 
CCXG Workshop, May 2021, Mausami Desai
CCXG Workshop, May 2021, Mausami DesaiCCXG Workshop, May 2021, Mausami Desai
CCXG Workshop, May 2021, Mausami DesaiOECD Environment
 
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish Pass
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish PassTGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish Pass
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish PassTGS
 
DSD-INT 2018 Investigation of optimization options for polder flooding at the...
DSD-INT 2018 Investigation of optimization options for polder flooding at the...DSD-INT 2018 Investigation of optimization options for polder flooding at the...
DSD-INT 2018 Investigation of optimization options for polder flooding at the...Deltares
 
DBCP monthly report (July 2016
DBCP monthly report (July 2016DBCP monthly report (July 2016
DBCP monthly report (July 2016JCOMMOPS
 
ALLENERGY2015_6515
ALLENERGY2015_6515ALLENERGY2015_6515
ALLENERGY2015_6515Peter Bruce
 
Climate scenario data (GERICS)
Climate scenario data (GERICS)Climate scenario data (GERICS)
Climate scenario data (GERICS)NAP Events
 
A new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsA new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsCIFOR-ICRAF
 
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14CARBS Project Presentation - Jisc Cost of IT Services 10-02-14
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14JISC's Green ICT Programme
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesNECST Lab @ Politecnico di Milano
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Demonstrating RES visualisation software
Demonstrating RES visualisation softwareDemonstrating RES visualisation software
Demonstrating RES visualisation softwareIEA-ETSAP
 

What's hot (20)

VEDAViz for ETSAP partners
VEDAViz for ETSAP partnersVEDAViz for ETSAP partners
VEDAViz for ETSAP partners
 
DSD-INT 2018 Opening wflow User Day 2018 - Hegnauer
DSD-INT 2018  Opening wflow User Day 2018 - HegnauerDSD-INT 2018  Opening wflow User Day 2018 - Hegnauer
DSD-INT 2018 Opening wflow User Day 2018 - Hegnauer
 
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...
DSD-INT 2018 Latest developments in hydrology - hydrodynamic modelling using ...
 
CCXG Workshop, May 2021, Mausami Desai
CCXG Workshop, May 2021, Mausami DesaiCCXG Workshop, May 2021, Mausami Desai
CCXG Workshop, May 2021, Mausami Desai
 
Eskom - Grid Connectivity
Eskom - Grid ConnectivityEskom - Grid Connectivity
Eskom - Grid Connectivity
 
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish Pass
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish PassTGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish Pass
TGS GPS- Eastern Canada Interpretation- Newfoundland and Flemish Pass
 
44 ocea lmco-ascari
44 ocea lmco-ascari44 ocea lmco-ascari
44 ocea lmco-ascari
 
TCM presentation May 2013
TCM presentation May 2013TCM presentation May 2013
TCM presentation May 2013
 
DSD-INT 2018 Investigation of optimization options for polder flooding at the...
DSD-INT 2018 Investigation of optimization options for polder flooding at the...DSD-INT 2018 Investigation of optimization options for polder flooding at the...
DSD-INT 2018 Investigation of optimization options for polder flooding at the...
 
ACCESS-Opt_Overview
ACCESS-Opt_OverviewACCESS-Opt_Overview
ACCESS-Opt_Overview
 
DBCP monthly report (July 2016
DBCP monthly report (July 2016DBCP monthly report (July 2016
DBCP monthly report (July 2016
 
ALLENERGY2015_6515
ALLENERGY2015_6515ALLENERGY2015_6515
ALLENERGY2015_6515
 
Climate scenario data (GERICS)
Climate scenario data (GERICS)Climate scenario data (GERICS)
Climate scenario data (GERICS)
 
NAIP and TNRIS
NAIP and TNRISNAIP and TNRIS
NAIP and TNRIS
 
A new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsA new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropics
 
Presentation
PresentationPresentation
Presentation
 
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14CARBS Project Presentation - Jisc Cost of IT Services 10-02-14
CARBS Project Presentation - Jisc Cost of IT Services 10-02-14
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Demonstrating RES visualisation software
Demonstrating RES visualisation softwareDemonstrating RES visualisation software
Demonstrating RES visualisation software
 

Viewers also liked

Magazine design
Magazine design Magazine design
Magazine design kyeallen
 
15 Psychological Studies for Marketers to Know
15 Psychological Studies for Marketers to Know15 Psychological Studies for Marketers to Know
15 Psychological Studies for Marketers to KnowKirsten Coventry ✎
 
из опыта классного руководителя по внеурочной деятельности
из опыта классного руководителя по внеурочной деятельностииз опыта классного руководителя по внеурочной деятельности
из опыта классного руководителя по внеурочной деятельностиyakushenkova
 
Букет ароматов март 2016
Букет ароматов март 2016Букет ароматов март 2016
Букет ароматов март 2016Яна Іванова
 
Python 入門初體驗(程式語法)
Python 入門初體驗(程式語法)Python 入門初體驗(程式語法)
Python 入門初體驗(程式語法)政斌 楊
 
รูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนรูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนsoysuda
 
Physically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotPhysically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotDarin Rajan
 
Audience feedback
Audience feedbackAudience feedback
Audience feedbackcharlicarty
 
I am the people , the mob
I am the people , the mobI am the people , the mob
I am the people , the mobSansa Krishna
 
คู่ การใส่รูปภาพ
คู่ การใส่รูปภาพคู่ การใส่รูปภาพ
คู่ การใส่รูปภาพsoysuda
 
Desmistificando a Amazon AWS
Desmistificando a Amazon AWSDesmistificando a Amazon AWS
Desmistificando a Amazon AWSMatheus Fidelis
 
Short Onchenda-Grace presentation
Short Onchenda-Grace presentationShort Onchenda-Grace presentation
Short Onchenda-Grace presentationColleen Dick
 

Viewers also liked (19)

Magazine design
Magazine design Magazine design
Magazine design
 
15 Psychological Studies for Marketers to Know
15 Psychological Studies for Marketers to Know15 Psychological Studies for Marketers to Know
15 Psychological Studies for Marketers to Know
 
Article-unnikrishnan
Article-unnikrishnanArticle-unnikrishnan
Article-unnikrishnan
 
Task 5
Task 5Task 5
Task 5
 
из опыта классного руководителя по внеурочной деятельности
из опыта классного руководителя по внеурочной деятельностииз опыта классного руководителя по внеурочной деятельности
из опыта классного руководителя по внеурочной деятельности
 
GRIPAS
GRIPASGRIPAS
GRIPAS
 
Букет ароматов март 2016
Букет ароматов март 2016Букет ароматов март 2016
Букет ароматов март 2016
 
Lo4
Lo4 Lo4
Lo4
 
Python 入門初體驗(程式語法)
Python 入門初體驗(程式語法)Python 入門初體驗(程式語法)
Python 入門初體驗(程式語法)
 
รูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนรูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอน
 
PPTTITAN
PPTTITANPPTTITAN
PPTTITAN
 
Physically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotPhysically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion Robot
 
Audience feedback
Audience feedbackAudience feedback
Audience feedback
 
I am the people , the mob
I am the people , the mobI am the people , the mob
I am the people , the mob
 
คู่ การใส่รูปภาพ
คู่ การใส่รูปภาพคู่ การใส่รูปภาพ
คู่ การใส่รูปภาพ
 
Desmistificando a Amazon AWS
Desmistificando a Amazon AWSDesmistificando a Amazon AWS
Desmistificando a Amazon AWS
 
Short Onchenda-Grace presentation
Short Onchenda-Grace presentationShort Onchenda-Grace presentation
Short Onchenda-Grace presentation
 
Zoops
ZoopsZoops
Zoops
 
Nuevo documento 3
Nuevo documento 3Nuevo documento 3
Nuevo documento 3
 

Similar to unischeduler_pakdd_v3

TSO Reliability Management: a probabilistic approach for better balance betwe...
TSO Reliability Management: a probabilistic approach for better balance betwe...TSO Reliability Management: a probabilistic approach for better balance betwe...
TSO Reliability Management: a probabilistic approach for better balance betwe...Leonardo ENERGY
 
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...LIFE GreenYourMove
 
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...IEEEGLOBALSOFTTECHNOLOGIES
 
Distributed web systems performance forecasting
Distributed web systems performance forecastingDistributed web systems performance forecasting
Distributed web systems performance forecastingIEEEFINALYEARPROJECTS
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Leonardo ENERGY
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delayiDTechTechnologies
 
Service Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen DemandService Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen Demandirrosennen
 
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...IRJET Journal
 
Proactive Scheduling in Cloud Computing
Proactive Scheduling in Cloud ComputingProactive Scheduling in Cloud Computing
Proactive Scheduling in Cloud ComputingjournalBEEI
 
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_acceptedTransmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_acceptedNeeraj Gupta
 
Rides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA BikeRides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA BikeIRJET Journal
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...Meditya Wasesa
 
LPC Transport Presentation introduction to PLC
LPC Transport Presentation introduction to PLCLPC Transport Presentation introduction to PLC
LPC Transport Presentation introduction to PLCthomas851723
 
Introduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-EngineeringIntroduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-Engineeringthomas851723
 

Similar to unischeduler_pakdd_v3 (20)

Data mining
Data miningData mining
Data mining
 
TSO Reliability Management: a probabilistic approach for better balance betwe...
TSO Reliability Management: a probabilistic approach for better balance betwe...TSO Reliability Management: a probabilistic approach for better balance betwe...
TSO Reliability Management: a probabilistic approach for better balance betwe...
 
Timetable synchronization
Timetable synchronization Timetable synchronization
Timetable synchronization
 
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
ESCC2018, Mykonos, Greece, June 4-8, 2018, presentation by Rizopoulos D, Saha...
 
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
 
Distributed web systems performance forecasting
Distributed web systems performance forecastingDistributed web systems performance forecasting
Distributed web systems performance forecasting
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
A statistical approach to predict flight delay
A statistical approach to predict flight delayA statistical approach to predict flight delay
A statistical approach to predict flight delay
 
Service Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen DemandService Management: Forecasting Hydrogen Demand
Service Management: Forecasting Hydrogen Demand
 
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...
ACT: Securing Vanet Against Malicious Vehicles Using Advanced Clustering Tech...
 
Proactive Scheduling in Cloud Computing
Proactive Scheduling in Cloud ComputingProactive Scheduling in Cloud Computing
Proactive Scheduling in Cloud Computing
 
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_acceptedTransmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
 
Rides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA BikeRides Request Demand Forecast- OLA Bike
Rides Request Demand Forecast- OLA Bike
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
 
LPC Transport Presentation introduction to PLC
LPC Transport Presentation introduction to PLCLPC Transport Presentation introduction to PLC
LPC Transport Presentation introduction to PLC
 
Introduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-EngineeringIntroduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-Engineering
 
hje.pptx
hje.pptxhje.pptx
hje.pptx
 
demandlocker_TRB_v3
demandlocker_TRB_v3demandlocker_TRB_v3
demandlocker_TRB_v3
 

unischeduler_pakdd_v3

  • 1. Automated Setting of Bus Schedule Coverage using Unsupervised Machine Learning Jihed Khiary, Luis Moreira-Matias, Vitor Cerqueira Intelligent Transport Systems Group Social Solutions Research division NEC Laboratories Europe, Heidelberg, DE Oded Cats Dep. Transport and Planning, TU Delft, NL
  • 2. Outline  Problem Overview  Methodology  Real-world Case Study  Conclusions
  • 3. 3 © NEC Corporation 2015 Problem Overview  Tactical Planning Stages: 1. Network and Resource Pool 2. Schedule Number and Coverage 3. Timetabling and Service 4. Service Assignment
  • 4. 4 © NEC Corporation 2015 Problem Overview  Tactical Planning Stages: 1. Network and Resource Pool 2. Schedule Number and Coverage 3. Timetabling and Service 4. Service Assignment
  • 5. 5 © NEC Corporation 2015 Problem Overview Automatic Vehicle Location (AVL) Data GPS Satellites Automatic Passenger Counting (APC) Data Cloud Data ServersAnalytics
  • 6. 6 © NEC Corporation 2015 Example of Schedule Coverage
  • 7. 7 © NEC Corporation 2015 Methodology
  • 8. 8 © NEC Corporation 2015 Methodology – 1,2) Daily Profile Generation  Daily Profiles computed based on round-trip times. Link travel times are derived from AVL;  Dwell time constants (boarding/alighting/dead times) are estimated from APC;  Robust Linear Regression (i.e. w/ Huber Loss) is used to (over/under) estimate dwells;
  • 9. 9 © NEC Corporation 2015 Methodology – 3) DTW-Flavoured Distance  Dynamic Time Warping(DTW) is used to compute a distance-like measure between days using those profiles;  A Feature Matrix is created using DTW. A Distance matrix is then created using the Euclidean Distance over that Feature Matrix. DTW Day i Day j Feature matrix Distance matrix 𝑳 𝟐-distance
  • 10. 10 © NEC Corporation 2015 Methodology – 4) Clustering w/ GMM delivered by E-M (Note: Illustrative Example from synthetic data.)
  • 11. 11 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k 1) Test a pool of admissible values for 𝑘 𝜖 2,7 ; 2) Learn a Gaussian Mixture of Models (GMM) w/ E-M for each possible k value; 3) Evaluate each model using Bayesian Information Criterion: 𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n) 𝑥 – observed data 𝜃 - model parameters 𝑀 – resulting model 𝑛 – sample size
  • 12. 12 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k 𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n) Goodness of fit linear penalty 𝑥 – observed data 𝜃 - model parameters 𝑀 – resulting model 𝑛 – sample size Is the best BIC-model a good schedule plan? • Punctuality; • Adequacy; • Interpretability; • Reliability;
  • 13. 13 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k 𝐵𝐼𝐶 = 2 ln 𝑝 𝑥 𝜃, 𝑀 − 𝑘 ⋅ ln(n) Goodness of fit linear penalty 𝑥 – observed data 𝜃 - model parameters 𝑀 – resulting model 𝑛 – sample size Is the best BIC-model a good schedule plan? • Punctuality; • Adequacy; • Interpretability; • Reliability;
  • 14. 14 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k 𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘 • Punctuality; • Adequacy; • Interpretability; • Reliability;
  • 15. 15 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k 𝑚 𝑘 = 𝑛𝑏𝑖𝑐 𝑘 − 𝑓 𝑘 2 + 𝑞 𝑘 − 𝜎 𝑘 • Punctuality normalized/scaled BIC; • Adequacy 𝒇 𝒌 ∈ 𝟎, 𝟏 , logarithmic penalty for the high k values; • Interpretability; q(k), quality of the produced schedules (sequence mining using Prefixspan) in terms of the regularity of the proposed outputs; • Reliability; 𝜎 𝑘 , penalty for outliers (e.g. sundays punctually grouped with workdays in some weeks);
  • 16. 16 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k • Interpretability; q(k), quality of the produced schedules (sequence mining using Prefixspan) in terms of the regularity of the proposed outputs; Two simple ideas:  Consider every week as a Transaction and each day that is in it as an Item;  A cluster with a large frequent itemset with a large support is a good quality cluster (e.g. Monday to Friday) – while the opposite is also true; How to use those to find highly interpretable Schedules: 1) Use PrefixSpan to find such frequent sequences/itemsets; 2) Compute the one which provides best ratio support/number of items; 3) Use that one to assign a quality value to each cluster;
  • 17. 17 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k Illustrating BIC limitations on this problem... nbic=0,3987 nbic=0,4462
  • 18. 18 © NEC Corporation 2015 Methodology – 5) Selecting the best route-based k Illustrating how Sequence Mining can Promote Schedules Easy-to-Memorize nbic=0,4166 q=0,7 nbic=0,4877 q=0,33
  • 19. 19 © NEC Corporation 2015 Methodology – 6) Merging the Results w/ Consensual Clustering 1) Get a consensual K through a weighted average of the quality of each partitioning of each route (where high frequent routes weight more than low-frequent ones); 2) Perform Consensual Clustering for the consensual K (using mean-co association matrix); 3) Interpret the results using a rule induction model (e.g. RIPPER) to extract understandable rules;
  • 20. 20 © NEC Corporation 2015 Real-World Case Study: Swedish Transit Operator Case Study: SWEDEN (EMEA)
  • 21. 21 © NEC Corporation 2015 Real-World Case Study: Swedish Transit Operator Suggested Schedule Switch Original Schedule Switch ▌Input: 6 months of AVL/APC data from four distinct routes was used; ▌Output: A period of one month is suggested to change from Schedule #1 (workdays-Summer) to Schedule #3 (workdays-Winter).
  • 22. 22 © NEC Corporation 2015 Real-World Case Study: Swedish Transit Operator ▌A Simulation was used to assess impact on Operator’s KPI; ▌The winter timetables were put in place one month earlier; ▌EWT improved 70%. OTP improved 10%.
  • 23. 23 © NEC Corporation 2015 Final Remarks 1. We proposed the first methodology to evaluate the reliability of both the Number and Coverage of Schedules on mass transit operators using AVL/APC data; 2. It involves multiple steps, including constrained profiling using a specific loss function, model-based clustering and k selection using an ad-hoc metric specific for this problem and a networkwise solution based on consensual clustering; 3. A real-world case study was used to evaluate the methodology. The obtained results suggested to keep the same number of schedules with a small change of the Schedule Coverage definition; 4. A simulation uncovered the potential benefits of adopting the novel schedule (i.e. +10% On-Time Performance);
  • 24. 24 © NEC Corporation 2015 Thank you for your time!