SlideShare a Scribd company logo
1 of 17
Big Data for Big Rigs
Predicting Truck Breakdowns
Rory Woods
Lead Data Scientist
9-24-2016
Overview
10/17/20161
 What is Preteckt?
 Getting and working with the data
 Making predictions work
 Conclusion
What Does Preteckt Do?
Prevent on-the-road breakdowns by identifying
breakdowns many days in advance.
10/17/20162
• Take data from truck
sensors
• Analyze and compare
them to other trucks
• Monitor trucks in real
time to identify
breakdowns in advance
Preteckt’s Data Science Team
10/17/20163
Rory Woods – Lead Data Scientist
PhD in Computational Astrophysics
with experience in high performance
computing.
Bertrand Brelier – Data Scientist
Former research scientist at IBM and
data scientist at Numeris. PhD in
Physics.
Mikhail Klassen – Chief Data
Scientist at Paradigm Knowledge
Solutions, PhD in Computational
Astrophysics.
Ben Keller – PhD student in
Computational Astrophysics.
Jim Reilly – Professor of ECE
Interests in signal processing and
machine learning techniques
Ken Sills – CTO
15-years experience in data
analytics; Master of Electrical and
Computer Engineering.
We use proprietary hardware, with a built-in microcomputer, to
gain access to the data generated on a truck.
• Use small computer with cellular
access
• Sniff ECU bus on truck
• Record and sync all data to servers
10/17/20164
Data Flow Within Preteckt
10/17/20165
Finding Useful Sensors
10/17/20166
O(104) - All Sensors
O(103) – Documented sensors
O(500) – Available on any one truck
O(100) – Good sensors
Drop proprietary,
undocumented
Drop unavailable sensors
Write conversion functions by hand
Drop “bad” sensors
(garbage data, constant values)
O(50) – Relevant sensors
Method-specific feature selection
Data Attributes
10/17/20167
Time Voltage Engine
Speed
Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 - - 92 300
3 14.1 - - 512 300
4 13.9 1230 - 92 300
5 14.1 - - - 300
6 14.1 - - 520 300
7 14.0 - - 92 300
8 14.1 - 119 518 300
9 13.9 1260 - 90 300
Irregular
High Frequency
Low Frequency
Bad Readings
Constant Readings
Typical Pre-processing
10/17/20168
Time Voltage Engine
Speed
Fuel Speed Pressure
1 14.1 1200 120 90 300
2 14.0 1210 120 92 300
3 14.1 1220 120 512 300
4 13.9 1230 120 92 300
5 14.1 1235 120 92 300
6 14.1 1240 120 520 300
7 14.0 1245 120 92 300
8 14.1 1250 119 518 300
9 13.9 1255 119 90 300
92
92
91
Drop Garbage
Drop 0 variance
Interpolate, OR
Forward fill
Unlabeled Data
10/17/20169
Time Voltage Engine
Speed
Fuel Speed LABEL
1 14.1 1200 120 90 0
2 14.0 1210 120 92 0
3 14.1 1220 120 92 0
4 13.9 1230 120 92 0
5 14.1 1235 120 92 0
6 14.1 1240 120 92 0
7 14.0 1245 120 92 0
8 14.1 1250 119 91 1
9 13.9 1255 119 90 1
Truck Breaks
down here
?
?
?
?
?
?
?
?
?
Unlabeled Data
10/17/201610
Labeling breakdowns is currently
the biggest bottleneck!
1. Create labels from sensors
- Sensor a = 1 if part x is not
functioning correctly
- Sensor a > threshold = bad
2. Use Unsupervised Learning techniques
- Clustering
Start With This- Anomaly Detection
Predicting Rates of Change
Goal: Predict time-derivative of sensor x
Preprocessing:
1. Use above-mentioned data cleaning
2. Smooth x using rolling window
3. Take derivative of X
4. Smooth dX/dt using rolling window
10/17/201611
Sensor X
dX/dt
Time (s)
Predicting Rates of Change
10/17/201612
Method R Score
Ordinary Least Squares ~ 0.05
Lasso, Ridge, LARS ~ 0.02-0.15
Partial Least Squares ~ 0.2
Avoid Predicting Continuous Variables!
Predicting Events
10/17/201613
Label “events” as points when sensor Y = 1.
1. Pre-process data (scaling, etc.)
2. Create N label columns representing “Event occurs in
x hours = True”
3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72
hours)
4. Do feature selection to reduce sensors (PCA, mrmr)
5. Run classifiers to predict lead times (good results
with logistic regression and SVM)
Predicting Events
10/17/201614
Lead Time (hours) F1, R
(roughly the same for all)
3 0.96
6 0.95
12 0.81
24 0.70
48 0.70
72 0.75
Note: Frequency of Y = 1 is very roughly once every 48-72 hours.
Probability of y = 1 in the next 24 hours
10/17/201615
Time (s)
P(y=1,24hr)
Truck shuts down
y = 1 here
Note: data only trained on y ≠ 1
Target
Predicted
y = 1
Future Plans
• Identify other sensors to repeat the above
process
• Once we have enough breakdowns, apply
above procedure to breakdowns
• Recurrent Neural Network
• With large number of labels, can do survival
analysis
10/17/201616

More Related Content

Viewers also liked

Presentacion induccion a TEG 2-2016
Presentacion induccion a TEG 2-2016Presentacion induccion a TEG 2-2016
Presentacion induccion a TEG 2-2016GAMD_UNEFA
 
Ixonos References 2015
Ixonos References 2015Ixonos References 2015
Ixonos References 2015Andrew Knight
 
Project Aeroplane (Short Review)
Project Aeroplane (Short Review) Project Aeroplane (Short Review)
Project Aeroplane (Short Review) Moideen Thashreef
 
Bluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San JoseBluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San JoseSergio Loza
 
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGenius
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGeniusHow To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGenius
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGeniusLeadGenius
 

Viewers also liked (6)

Presentacion induccion a TEG 2-2016
Presentacion induccion a TEG 2-2016Presentacion induccion a TEG 2-2016
Presentacion induccion a TEG 2-2016
 
Ixonos References 2015
Ixonos References 2015Ixonos References 2015
Ixonos References 2015
 
Project Aeroplane (Short Review)
Project Aeroplane (Short Review) Project Aeroplane (Short Review)
Project Aeroplane (Short Review)
 
Bluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San JoseBluemix presentation IBM Cloud Briefing in San Jose
Bluemix presentation IBM Cloud Briefing in San Jose
 
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGenius
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGeniusHow To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGenius
How To Scale Outbound Sales? - LAUNCH Scale - Prayag Narula, CEO, LeadGenius
 
Mazda RX8 Tab Kit
Mazda RX8 Tab KitMazda RX8 Tab Kit
Mazda RX8 Tab Kit
 

Similar to Predicting Truck Breakdowns - Rory Woods

Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august editiontransight
 
Prediction of the bike rental demand in Washington
Prediction of the bike rental demand in WashingtonPrediction of the bike rental demand in Washington
Prediction of the bike rental demand in WashingtonZilong Zhao
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Amazon Web Services
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Mingxuan Li
 
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...TEST Huddle
 
Francisco J. Doblas-Big Data y cambio climático
Francisco J. Doblas-Big Data y cambio climáticoFrancisco J. Doblas-Big Data y cambio climático
Francisco J. Doblas-Big Data y cambio climáticoFundación Ramón Areces
 
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発Ryo 亮 Kawahara 河原
 
Air Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsAir Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsCarlo Carandang
 
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Databricks
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
Full Scale Data Handling in Shipping: A Big Data Solution
Full Scale Data Handling in Shipping: A Big Data SolutionFull Scale Data Handling in Shipping: A Big Data Solution
Full Scale Data Handling in Shipping: A Big Data SolutionLokukaluge Prasad Perera
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
AIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfAIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfssuserb4d806
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionSafe Software
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterAndrei Khurshudov
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 

Similar to Predicting Truck Breakdowns - Rory Woods (20)

Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
 
Prediction of the bike rental demand in Washington
Prediction of the bike rental demand in WashingtonPrediction of the bike rental demand in Washington
Prediction of the bike rental demand in Washington
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance
 
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
 
Francisco J. Doblas-Big Data y cambio climático
Francisco J. Doblas-Big Data y cambio climáticoFrancisco J. Doblas-Big Data y cambio climático
Francisco J. Doblas-Big Data y cambio climático
 
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
 
Air Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsAir Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and Predictions
 
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
Enabling Physics and Empirical-Based Algorithms with Spark Using the Integrat...
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Full Scale Data Handling in Shipping: A Big Data Solution
Full Scale Data Handling in Shipping: A Big Data SolutionFull Scale Data Handling in Shipping: A Big Data Solution
Full Scale Data Handling in Shipping: A Big Data Solution
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
Nomads
NomadsNomads
Nomads
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
AIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfAIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdf
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
 
03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenter
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 

More from WithTheBest

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo VittoriaWithTheBest
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual realityWithTheBest
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experienceWithTheBest
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioWithTheBest
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101 WithTheBest
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyWithTheBest
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devicesWithTheBest
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityWithTheBest
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vrWithTheBest
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physicsWithTheBest
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self WithTheBest
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsWithTheBest
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overviewWithTheBest
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason JeraldWithTheBest
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devilWithTheBest
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateWithTheBest
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRWithTheBest
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. WithTheBest
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldWithTheBest
 

More from WithTheBest (20)

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Predicting Truck Breakdowns - Rory Woods

  • 1. Big Data for Big Rigs Predicting Truck Breakdowns Rory Woods Lead Data Scientist 9-24-2016
  • 2. Overview 10/17/20161  What is Preteckt?  Getting and working with the data  Making predictions work  Conclusion
  • 3. What Does Preteckt Do? Prevent on-the-road breakdowns by identifying breakdowns many days in advance. 10/17/20162 • Take data from truck sensors • Analyze and compare them to other trucks • Monitor trucks in real time to identify breakdowns in advance
  • 4. Preteckt’s Data Science Team 10/17/20163 Rory Woods – Lead Data Scientist PhD in Computational Astrophysics with experience in high performance computing. Bertrand Brelier – Data Scientist Former research scientist at IBM and data scientist at Numeris. PhD in Physics. Mikhail Klassen – Chief Data Scientist at Paradigm Knowledge Solutions, PhD in Computational Astrophysics. Ben Keller – PhD student in Computational Astrophysics. Jim Reilly – Professor of ECE Interests in signal processing and machine learning techniques Ken Sills – CTO 15-years experience in data analytics; Master of Electrical and Computer Engineering.
  • 5. We use proprietary hardware, with a built-in microcomputer, to gain access to the data generated on a truck. • Use small computer with cellular access • Sniff ECU bus on truck • Record and sync all data to servers 10/17/20164
  • 6. Data Flow Within Preteckt 10/17/20165
  • 7. Finding Useful Sensors 10/17/20166 O(104) - All Sensors O(103) – Documented sensors O(500) – Available on any one truck O(100) – Good sensors Drop proprietary, undocumented Drop unavailable sensors Write conversion functions by hand Drop “bad” sensors (garbage data, constant values) O(50) – Relevant sensors Method-specific feature selection
  • 8. Data Attributes 10/17/20167 Time Voltage Engine Speed Fuel Speed Pressure 1 14.1 1200 120 90 300 2 14.0 - - 92 300 3 14.1 - - 512 300 4 13.9 1230 - 92 300 5 14.1 - - - 300 6 14.1 - - 520 300 7 14.0 - - 92 300 8 14.1 - 119 518 300 9 13.9 1260 - 90 300 Irregular High Frequency Low Frequency Bad Readings Constant Readings
  • 9. Typical Pre-processing 10/17/20168 Time Voltage Engine Speed Fuel Speed Pressure 1 14.1 1200 120 90 300 2 14.0 1210 120 92 300 3 14.1 1220 120 512 300 4 13.9 1230 120 92 300 5 14.1 1235 120 92 300 6 14.1 1240 120 520 300 7 14.0 1245 120 92 300 8 14.1 1250 119 518 300 9 13.9 1255 119 90 300 92 92 91 Drop Garbage Drop 0 variance Interpolate, OR Forward fill
  • 10. Unlabeled Data 10/17/20169 Time Voltage Engine Speed Fuel Speed LABEL 1 14.1 1200 120 90 0 2 14.0 1210 120 92 0 3 14.1 1220 120 92 0 4 13.9 1230 120 92 0 5 14.1 1235 120 92 0 6 14.1 1240 120 92 0 7 14.0 1245 120 92 0 8 14.1 1250 119 91 1 9 13.9 1255 119 90 1 Truck Breaks down here ? ? ? ? ? ? ? ? ?
  • 11. Unlabeled Data 10/17/201610 Labeling breakdowns is currently the biggest bottleneck! 1. Create labels from sensors - Sensor a = 1 if part x is not functioning correctly - Sensor a > threshold = bad 2. Use Unsupervised Learning techniques - Clustering Start With This- Anomaly Detection
  • 12. Predicting Rates of Change Goal: Predict time-derivative of sensor x Preprocessing: 1. Use above-mentioned data cleaning 2. Smooth x using rolling window 3. Take derivative of X 4. Smooth dX/dt using rolling window 10/17/201611 Sensor X dX/dt Time (s)
  • 13. Predicting Rates of Change 10/17/201612 Method R Score Ordinary Least Squares ~ 0.05 Lasso, Ridge, LARS ~ 0.02-0.15 Partial Least Squares ~ 0.2 Avoid Predicting Continuous Variables!
  • 14. Predicting Events 10/17/201613 Label “events” as points when sensor Y = 1. 1. Pre-process data (scaling, etc.) 2. Create N label columns representing “Event occurs in x hours = True” 3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72 hours) 4. Do feature selection to reduce sensors (PCA, mrmr) 5. Run classifiers to predict lead times (good results with logistic regression and SVM)
  • 15. Predicting Events 10/17/201614 Lead Time (hours) F1, R (roughly the same for all) 3 0.96 6 0.95 12 0.81 24 0.70 48 0.70 72 0.75 Note: Frequency of Y = 1 is very roughly once every 48-72 hours.
  • 16. Probability of y = 1 in the next 24 hours 10/17/201615 Time (s) P(y=1,24hr) Truck shuts down y = 1 here Note: data only trained on y ≠ 1 Target Predicted y = 1
  • 17. Future Plans • Identify other sensors to repeat the above process • Once we have enough breakdowns, apply above procedure to breakdowns • Recurrent Neural Network • With large number of labels, can do survival analysis 10/17/201616