2. over 250m riders annually
118 miles of track
Facts
over 13 disruptions per day
Problem Statement
3. Problem Statement
highly publicized safety lapses
& deferred maintenance
1 Year timeframe
estimated $60,000,000 price tag
improved safety & reliability?
4. Hypothesis
The DC Metro System is a pivotal transportation asset for Washington DC and the surrounding
regions. The SafeTrack project is meant to increase system safety and reliability. While technical and
operational disruptions are inevitable, we believe that available data can provide insight into how
frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro
commute expectations.
Scenario #1
Improvement
Scenario #2
Improvement
Scenario #3
Improvement
To quantify the outcome, we will explore several scenarios to provide riders with a
clearer picture of their post-safetrack commute.
Scenario #4
Improvement
Scenario #5
Improvement
5. Data Ingestion & Wrangling
System Operations Data: used to determine system
behavior under optimal conditions
Disruption Data: historical data used to analyze the frequency
and effect of technical and operational
disruptions (ie: delays)
Ridership Data: in conjunction with operational datasets,
ridership data used to quantify and extrapolate
the scope of Metro delays.
6. The Data
ON TIME
ON TIME
ON TIME DELAYED
DELAYED
DELAYED
Planned Operating
Schedule
Disruption Data
Data_Source: wmata.com
Data_Scope:
Provided operating data
under a perfectly
efficient system with no
delays or disruptions
Data_Scope:
Provided 5 years of daily
disruption logs,
including; cause of
disruption and minutes
delayed
Data_Source: opendatadc
Planned Operating
Schedule and
Disruption Data
provided
a basis for
comparing pre and
post-safetrack
system behavior
LN CAR DEST MINLN CAR DEST MIN
RD 6
RD 6
RD 6
RD 6
RD 6
RD 6
7. The Data
24,335
records
between April
2012 - July
2016
All Metro lines
represented in
the dataset
Description of
disruption
cause.
Translated as
technical or
operational
Delay, in
minutes
8. Computation & Analysis: Limitations
AccuracyLocation
Station - To -
Station
‘Garbage in -
Garbage out’
concept
Opted to take a two-pronged approach:
1.) Build data product
2.) Develop simulation based on available data
Completeness
Compounding
Delays
9. Computation & Analysis: Methodology
1
Calculated the number of minutes of trips per day on each
line.
Broke daily delays into five tiers based on severity.
Scenario:1 Scenario:2 Scenario:3 Scenario:5
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 1
Scenario:4
Built in compounding delays based on expected train
departures.
Injected random noise into the system.
2
3
4
14. Conclusions
Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5
Noticeable improvements in time and probability of delay was not realized until higher
scenario parameters were introduced.
Analysis of the results indicates that SafeTrack repairs
must reduce disruption severity and probability by roughly
30% - 50% for Metro riders to experience improved trip
safety and reliability.
15. Conclusions
Improvements in
Stochastic System
Biases &
Assumptions
Data Quality
Springboard for
Future Work
SafeTrack’s improvements
may not be noticed if they
do not overcome the
system’s random noise
Recognizing biases and
stating assumptions is
key to data science
The importance of
accurate data cannot be
overstated
Our software can be
generalized and adapted