Preteckt aims to prevent breakdowns on long haul trucks by identifying precursor patterns to breakdowns and flagging them before they happen. We are discussing some of the challenges faced in trying to build this predictive system and some of the techniques we use to address these problems.
Rory Woods, Lead Data Scientist
3. What Does Preteckt Do?
Prevent on-the-road breakdowns by identifying
breakdowns many days in advance.
10/17/20162
• Take data from truck
sensors
• Analyze and compare
them to other trucks
• Monitor trucks in real
time to identify
breakdowns in advance
4. Preteckt’s Data Science Team
10/17/20163
Rory Woods – Lead Data Scientist
PhD in Computational Astrophysics
with experience in high performance
computing.
Bertrand Brelier – Data Scientist
Former research scientist at IBM and
data scientist at Numeris. PhD in
Physics.
Mikhail Klassen – Chief Data
Scientist at Paradigm Knowledge
Solutions, PhD in Computational
Astrophysics.
Ben Keller – PhD student in
Computational Astrophysics.
Jim Reilly – Professor of ECE
Interests in signal processing and
machine learning techniques
Ken Sills – CTO
15-years experience in data
analytics; Master of Electrical and
Computer Engineering.
5. We use proprietary hardware, with a built-in microcomputer, to
gain access to the data generated on a truck.
• Use small computer with cellular
access
• Sniff ECU bus on truck
• Record and sync all data to servers
10/17/20164
7. Finding Useful Sensors
10/17/20166
O(104) - All Sensors
O(103) – Documented sensors
O(500) – Available on any one truck
O(100) – Good sensors
Drop proprietary,
undocumented
Drop unavailable sensors
Write conversion functions by hand
Drop “bad” sensors
(garbage data, constant values)
O(50) – Relevant sensors
Method-specific feature selection
11. Unlabeled Data
10/17/201610
Labeling breakdowns is currently
the biggest bottleneck!
1. Create labels from sensors
- Sensor a = 1 if part x is not
functioning correctly
- Sensor a > threshold = bad
2. Use Unsupervised Learning techniques
- Clustering
Start With This- Anomaly Detection
12. Predicting Rates of Change
Goal: Predict time-derivative of sensor x
Preprocessing:
1. Use above-mentioned data cleaning
2. Smooth x using rolling window
3. Take derivative of X
4. Smooth dX/dt using rolling window
10/17/201611
Sensor X
dX/dt
Time (s)
13. Predicting Rates of Change
10/17/201612
Method R Score
Ordinary Least Squares ~ 0.05
Lasso, Ridge, LARS ~ 0.02-0.15
Partial Least Squares ~ 0.2
Avoid Predicting Continuous Variables!
14. Predicting Events
10/17/201613
Label “events” as points when sensor Y = 1.
1. Pre-process data (scaling, etc.)
2. Create N label columns representing “Event occurs in
x hours = True”
3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72
hours)
4. Do feature selection to reduce sensors (PCA, mrmr)
5. Run classifiers to predict lead times (good results
with logistic regression and SVM)
15. Predicting Events
10/17/201614
Lead Time (hours) F1, R
(roughly the same for all)
3 0.96
6 0.95
12 0.81
24 0.70
48 0.70
72 0.75
Note: Frequency of Y = 1 is very roughly once every 48-72 hours.
16. Probability of y = 1 in the next 24 hours
10/17/201615
Time (s)
P(y=1,24hr)
Truck shuts down
y = 1 here
Note: data only trained on y ≠ 1
Target
Predicted
y = 1
17. Future Plans
• Identify other sensors to repeat the above
process
• Once we have enough breakdowns, apply
above procedure to breakdowns
• Recurrent Neural Network
• With large number of labels, can do survival
analysis
10/17/201616