Predicting Truck Breakdowns - Rory Woods

Big Data for Big Rigs
Predicting Truck Breakdowns
Rory Woods
Lead Data Scientist
9-24-2016

Overview
10/17/20161
 What is Preteckt?
 Getting and working with the data
 Making predictions work
 Conclusion

What Does Preteckt Do?
Prevent on-the-road breakdowns by identifying
breakdowns many days in advance.
10/17/20162
• Take data from truck
sensors
• Analyze and compare
them to other trucks
• Monitor trucks in real
time to identify
breakdowns in advance

Preteckt’s Data Science Team
10/17/20163
Rory Woods – Lead Data Scientist
PhD in Computational Astrophysics
with experience in high performance
computing.
Bertrand Brelier – Data Scientist
Former research scientist at IBM and
data scientist at Numeris. PhD in
Physics.
Mikhail Klassen – Chief Data
Scientist at Paradigm Knowledge
Solutions, PhD in Computational
Astrophysics.
Ben Keller – PhD student in
Computational Astrophysics.
Jim Reilly – Professor of ECE
Interests in signal processing and
machine learning techniques
Ken Sills – CTO
15-years experience in data
analytics; Master of Electrical and
Computer Engineering.

We use proprietary hardware, with a built-in microcomputer, to
gain access to the data generated on a truck.
• Use small computer with cellular
access
• Sniff ECU bus on truck
• Record and sync all data to servers
10/17/20164

Data Flow Within Preteckt
10/17/20165

Finding Useful Sensors
10/17/20166
O(104) - All Sensors
O(103) – Documented sensors
O(500) – Available on any one truck
O(100) – Good sensors
Drop proprietary,
undocumented
Drop unavailable sensors
Write conversion functions by hand
Drop “bad” sensors
(garbage data, constant values)
O(50) – Relevant sensors
Method-specific feature selection

Data Attributes
10/17/20167
Time Voltage Engine
Speed
Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 - - 92 300
3 14.1 - - 512 300
4 13.9 1230 - 92 300
5 14.1 - - - 300
6 14.1 - - 520 300
7 14.0 - - 92 300
8 14.1 - 119 518 300
9 13.9 1260 - 90 300
Irregular
High Frequency
Low Frequency
Bad Readings
Constant Readings

Typical Pre-processing
10/17/20168
Time Voltage Engine
Speed
Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 1210 120 92 300
3 14.1 1220 120 512 300
4 13.9 1230 120 92 300
5 14.1 1235 120 92 300
6 14.1 1240 120 520 300
7 14.0 1245 120 92 300
8 14.1 1250 119 518 300
9 13.9 1255 119 90 300
92
92
91
Drop Garbage
Drop 0 variance
Interpolate, OR
Forward fill

Unlabeled Data
10/17/20169
Time Voltage Engine
Speed
Fuel Speed LABEL
1 14.1 1200 120 90 0
2 14.0 1210 120 92 0
3 14.1 1220 120 92 0
4 13.9 1230 120 92 0
5 14.1 1235 120 92 0
6 14.1 1240 120 92 0
7 14.0 1245 120 92 0
8 14.1 1250 119 91 1
9 13.9 1255 119 90 1
Truck Breaks
down here
?
?
?
?
?
?
?
?
?

Unlabeled Data
10/17/201610
Labeling breakdowns is currently
the biggest bottleneck!
1. Create labels from sensors
- Sensor a = 1 if part x is not
functioning correctly
- Sensor a > threshold = bad
2. Use Unsupervised Learning techniques
- Clustering
Start With This- Anomaly Detection

Predicting Rates of Change
Goal: Predict time-derivative of sensor x
Preprocessing:
1. Use above-mentioned data cleaning
2. Smooth x using rolling window
3. Take derivative of X
4. Smooth dX/dt using rolling window
10/17/201611
Sensor X
dX/dt
Time (s)

Predicting Rates of Change
10/17/201612
Method R Score
Ordinary Least Squares ~ 0.05
Lasso, Ridge, LARS ~ 0.02-0.15
Partial Least Squares ~ 0.2
Avoid Predicting Continuous Variables!

Predicting Events
10/17/201613
Label “events” as points when sensor Y = 1.
1. Pre-process data (scaling, etc.)
2. Create N label columns representing “Event occurs in
x hours = True”
3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72
hours)
4. Do feature selection to reduce sensors (PCA, mrmr)
5. Run classifiers to predict lead times (good results
with logistic regression and SVM)

Predicting Events
10/17/201614
Lead Time (hours) F1, R
(roughly the same for all)
3 0.96
6 0.95
12 0.81
24 0.70
48 0.70
72 0.75
Note: Frequency of Y = 1 is very roughly once every 48-72 hours.

Probability of y = 1 in the next 24 hours
10/17/201615
Time (s)
P(y=1,24hr)
Truck shuts down
y = 1 here
Note: data only trained on y ≠ 1
Target
Predicted
y = 1

Future Plans
• Identify other sensors to repeat the above
process
• Once we have enough breakdowns, apply
above procedure to breakdowns
• Recurrent Neural Network
• With large number of labels, can do survival
analysis
10/17/201616

Predicting Truck Breakdowns - Rory Woods

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Predicting Truck Breakdowns - Rory Woods

Similar to Predicting Truck Breakdowns - Rory Woods (20)

More from WithTheBest

More from WithTheBest (20)

Recently uploaded

Recently uploaded (20)

Predicting Truck Breakdowns - Rory Woods