SlideShare a Scribd company logo
1 of 35
© 2017 MapR TechnologiesMapR Confidential 1
State of the Art Robot Predictive
Maintenance with Real-time
Sensor Data
Mateusz Dymczyk, Software Engineer @ h2o.ai
Mathieu Dumoulin, Data Engineer @ MapR
Strata New York 2017
© 2017 MapR TechnologiesMapR Confidential 2
State of the Art Robot Predictive
Maintenance with Real-time
Sensor Data Part 2
Mateusz Dymczyk, Software Engineer @ h2o.ai
Mathieu Dumoulin, Data Engineer @ MapR
Strata New York 2017
© 2017 MapR Technologies 3
Mateusz Dymczyk and Mathieu Dumoulin
• Data Engineer @ MapR
Technologies
• Previously data scientist, DS
manager, search, NLP and ML
engineering Canada and in
Japan
• Software Engineer @ H2O.ai
• Previously ML/NLP @ Fujitsu
Laboratories and en-japan inc
© 2017 MapR Technologies
• 907B$/y investment until 20201
• 1,6M operational industrial
robots in the world in 20152
• 2.6M by 20201
1: What Everyone Must Know About Industry 4.0, Forbes June 2016
2: International Federation of Robotics (IFR) study World Robotics 2016source: PwC 2016 Global Industry 4.0 Survey
Industry 4.0 is Now
Industry 4.0 systems1:
1. Interoperable
2. Information transparency
3. Technical assistance
4. Decentralized decision making
© 2017 MapR Technologies
5
Predictive Maintenance for Industrial Robots
Primary goal: Reduce unplanned downtime
© 2017 MapR Technologies
Robot Actuator Failure Prediction PoC
Model 6-axis industrial robot
LPMS-B2
Wireless movement sensor
PoC Goal: Predict potential actuator failure in real-time (within 3s)
© 2017 MapR Technologies 7
Success criteria
• Detect correct robot state
(Normal/Failure) within in 3s
• Recall > precision
• Improve over time once a
“MVP” model is working
Photo: Ambient Intelligence Blog
© 2017 MapR Technologies 8
Need for Scale: Deploy to a Real Factory
Tesla Factory photo by Paul Sakuma/AP
© 2017 MapR Technologies
Don’t Reinvent the Wheel
• We have limited time and
bugdet for this PoC
• Tools > assembly of existing
software > coding
• The state of the art is often
OSS anyways!
© 2017 MapR Technologies 10
Video of solution in action 2m
© 2017 MapR Technologies
PoC Building Blocs
People: 2 Engineers LP-RESEARCH, ML Engineer and Data Engineer
Effort: 2 months part-time
© 2017 MapR Technologies 12
Experimental Setup
© 2017 MapR Technologies 13
Experimental Setup: Normal State
© 2017 MapR Technologies 14
Experimental Setup: Failure State
© 2017 MapR Technologies
Anomaly Detection for Predictive
Maintenance
© 2017 MapR TechnologiesMapR Confidential 16
Machine Learning Project Flow
Explore and
Analyze
Choose
Algorithm
Build
Model
Evaluate
Model
Put into
production
Problem
evaluation &
definition
Data
preparation
© 2017 MapR Technologies
1. Starting Point
– Classification problem
– Time series data
• Linear Acceleration X, Y, Z axis
– No labeled data at first
• Accumulate over time
2. Machine Learning goal/metrics
– Recall vs. Precision
3. Additional Requirements
– Detect state within 3 seconds
Problem Definition
Normal State (OK!)
PREDICT FAILURE
© 2017 MapR Technologies 18
Data Source: Movement Sensor
• Real-time, on-device calculation of
linear acceleration
– Data centered around 0
– Measurements [-1,1]
• Data output rates of up to 400Hz
• Very sensitive
www.lp-research.com
LPMS-B2
© 2017 MapR Technologies 19
Sensor Data Preparation
200ms window
Ref: 21 Great Articles and Tutorials on Time Series
• Feature selection(3 / 27 features)
• Windowing
– Window size: 200ms
– Sensor data rate: 100Hz
© 2017 MapR Technologies 20
Modeling for Anomaly Detection
• Unlabeled data -> unsupervised learning
• Training data consists only of data
during “normal state” runs
– Only train on normal op. data
• Conclusion: anomaly detection
• Possible algorithms:
– HMM
– Autoencoders
– LSTM auto encoders
– KNN, Local Outlier Factor
Anomaly Detection
Get Ted Dunning’s Anomaly Dectection Book
Anomaly!
© 2017 MapR Technologies
First Model: Autoencoders
• A kind of neural network
used for unsupervised
learning of efficient codings
• Requires a training pass to
learn a representation of
”normal” data
• Anomalous data will have a
large reconstruction error
compared to normal data
Längkvist, Martin, Lars Karlsson, and Amy Loutfi. "A review of unsupervised feature learning and
deep learning for time-series modeling." Pattern Recognition Letters 42 (2014): 11-24.
© 2017 MapR Technologies 22
Experimental Setup: Training the Model
© 2017 MapR Technologies
Performance Evaluation
• Evaluation dataset
– Captured from a preprogrammed “pre-failure” operation
mode
– 1x full movement cycle of (“pre-failure”) labeled data
• Normal 90% Failure 10%
• Performance measures:
– MSE during training
– TPR/FPR on the test dataset
Note: For an example with code: https://machinelearningmastery.com
© 2017 MapR Technologies 24
ML – Results
Note: Time window: 200ms, Threshold: 2SD
© 2017 MapR Technologies 25
Experimental Setup: Real-time Predictions
© 2017 MapR Technologies
Next Step: Long Short Term Memory (LSTM)
• Deep learning architecture in
the RNN family that
remembers arbitrary intervals1.
• Overcomes known RNN issues
– limited memory
– instability
• Especially used for image, text
and speech applications
… and time series data
Ref: “Understanding LSTM Networks” by Christopher Olah (2015)
1: Long Short-Term Memory, Hochreiter and Schmidhuber (1997)
RNN
LSTM
© 2017 MapR Technologies
Implementation: Keras with TensorFlow Backend
• Similar design to
Autoencoder
• Encoder and decoder
are separate
• Model implemented with
Keras in Python but
executed by H2O Deep
Water
© 2017 MapR Technologies
LSTM and H2O: Deep Water
• Keras model is trained through H2O
– Fast data ingest, missing value handling, ignoring
columns, etc. 2.5m/100 epoch
– MOJO output (binary model representation)
• Usable from any JVM language
• Just like H2O POJO!
• Prediction service infrastructure is reused
© 2017 MapR Technologies
LSTM Results
LinAccX Results
LinAccZ
LinAccY
© 2017 MapR Technologies 30
Conclusion
© 2017 MapR Technologies
What We Didn’t Talk About (Much)
Security: System and Data
Reliability and Scalability Machine learning logistics
Integration in a Factory
© 2017 MapR Technologies 32
• Clever assembly of existing enterprise
software can do it with surprisingly small
time, effort and complexity
• H2O and MapR offers a fast path to value for
production ML
• LSTM doesn’t easily beat Autoencoders
without significant effort and expertise
• Converged platforms reduce complexity
Advanced Predictive Maintenance
Poster by J. Howard Miller (1943)
© 2017 MapR TechnologiesMapR Confidential
New: Machine Learning Logistics
Model Management in the Real World
O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017
Get free pdf copy of book courtesy of MapR:
https://mapr.com/ebook/machine-learning-logistics/
Visit MapR booth for free book signings & booth theater
presentations by the authors
Wed schedule:
Book signing: afternoon break 3:35 – 4:20 pm
Booth presentation by Ted Dunning: 3:00 – 3:30 pm
Thur schedule:
Book signing: morning break 10:45 – 11:20 am
Booth presentation by Ellen Friedman: 3:00 – 3:30 pm
© 2017 MapR Technologies 34
Q&A
ENGAGE WITH US
mateusz@h2o.ai
mathieu.dumoulin@mapr.com
PROJECT GITHUB:
github.com/mdymczyk/iot-pipeline
Our thanks to:
LP RESEARCH
www.lp-research.com
contact: Klaus Peterson
klaus@lp-research.com
© 2017 MapR Technologies 35
Thank you to LP-RESEARCH!
Hardware design and production
Expertise in Motion sensors
Gyroscope
Accelerometer
Magnetometer
Sensor fusion algorithm
development
Multi-platform application
development
See all our products: https://www.lp-research.com/products/
LPMS-B2 LPMS-CU2 LPMS-CANAL2 LPMS-USBAL2OEM also
available!

More Related Content

What's hot

Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapRThe World Bank
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 

What's hot (20)

Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 

Similar to Robot Predictive Maintenance with Real-Time Sensor Data and Machine Learning

Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine LearningJustin Brandenburg
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Matt Stubbs
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 

Similar to Robot Predictive Maintenance with Real-Time Sensor Data and Machine Learning (20)

Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine Learning
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Smart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat TranSmart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat Tran
 

More from Mathieu Dumoulin

Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduceMathieu Dumoulin
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMathieu Dumoulin
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 

More from Mathieu Dumoulin (6)

Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduce
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifié
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
Introduction à Hadoop
Introduction à HadoopIntroduction à Hadoop
Introduction à Hadoop
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Robot Predictive Maintenance with Real-Time Sensor Data and Machine Learning

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 State of the Art Robot Predictive Maintenance with Real-time Sensor Data Mateusz Dymczyk, Software Engineer @ h2o.ai Mathieu Dumoulin, Data Engineer @ MapR Strata New York 2017
  • 2. © 2017 MapR TechnologiesMapR Confidential 2 State of the Art Robot Predictive Maintenance with Real-time Sensor Data Part 2 Mateusz Dymczyk, Software Engineer @ h2o.ai Mathieu Dumoulin, Data Engineer @ MapR Strata New York 2017
  • 3. © 2017 MapR Technologies 3 Mateusz Dymczyk and Mathieu Dumoulin • Data Engineer @ MapR Technologies • Previously data scientist, DS manager, search, NLP and ML engineering Canada and in Japan • Software Engineer @ H2O.ai • Previously ML/NLP @ Fujitsu Laboratories and en-japan inc
  • 4. © 2017 MapR Technologies • 907B$/y investment until 20201 • 1,6M operational industrial robots in the world in 20152 • 2.6M by 20201 1: What Everyone Must Know About Industry 4.0, Forbes June 2016 2: International Federation of Robotics (IFR) study World Robotics 2016source: PwC 2016 Global Industry 4.0 Survey Industry 4.0 is Now Industry 4.0 systems1: 1. Interoperable 2. Information transparency 3. Technical assistance 4. Decentralized decision making
  • 5. © 2017 MapR Technologies 5 Predictive Maintenance for Industrial Robots Primary goal: Reduce unplanned downtime
  • 6. © 2017 MapR Technologies Robot Actuator Failure Prediction PoC Model 6-axis industrial robot LPMS-B2 Wireless movement sensor PoC Goal: Predict potential actuator failure in real-time (within 3s)
  • 7. © 2017 MapR Technologies 7 Success criteria • Detect correct robot state (Normal/Failure) within in 3s • Recall > precision • Improve over time once a “MVP” model is working Photo: Ambient Intelligence Blog
  • 8. © 2017 MapR Technologies 8 Need for Scale: Deploy to a Real Factory Tesla Factory photo by Paul Sakuma/AP
  • 9. © 2017 MapR Technologies Don’t Reinvent the Wheel • We have limited time and bugdet for this PoC • Tools > assembly of existing software > coding • The state of the art is often OSS anyways!
  • 10. © 2017 MapR Technologies 10 Video of solution in action 2m
  • 11. © 2017 MapR Technologies PoC Building Blocs People: 2 Engineers LP-RESEARCH, ML Engineer and Data Engineer Effort: 2 months part-time
  • 12. © 2017 MapR Technologies 12 Experimental Setup
  • 13. © 2017 MapR Technologies 13 Experimental Setup: Normal State
  • 14. © 2017 MapR Technologies 14 Experimental Setup: Failure State
  • 15. © 2017 MapR Technologies Anomaly Detection for Predictive Maintenance
  • 16. © 2017 MapR TechnologiesMapR Confidential 16 Machine Learning Project Flow Explore and Analyze Choose Algorithm Build Model Evaluate Model Put into production Problem evaluation & definition Data preparation
  • 17. © 2017 MapR Technologies 1. Starting Point – Classification problem – Time series data • Linear Acceleration X, Y, Z axis – No labeled data at first • Accumulate over time 2. Machine Learning goal/metrics – Recall vs. Precision 3. Additional Requirements – Detect state within 3 seconds Problem Definition Normal State (OK!) PREDICT FAILURE
  • 18. © 2017 MapR Technologies 18 Data Source: Movement Sensor • Real-time, on-device calculation of linear acceleration – Data centered around 0 – Measurements [-1,1] • Data output rates of up to 400Hz • Very sensitive www.lp-research.com LPMS-B2
  • 19. © 2017 MapR Technologies 19 Sensor Data Preparation 200ms window Ref: 21 Great Articles and Tutorials on Time Series • Feature selection(3 / 27 features) • Windowing – Window size: 200ms – Sensor data rate: 100Hz
  • 20. © 2017 MapR Technologies 20 Modeling for Anomaly Detection • Unlabeled data -> unsupervised learning • Training data consists only of data during “normal state” runs – Only train on normal op. data • Conclusion: anomaly detection • Possible algorithms: – HMM – Autoencoders – LSTM auto encoders – KNN, Local Outlier Factor Anomaly Detection Get Ted Dunning’s Anomaly Dectection Book Anomaly!
  • 21. © 2017 MapR Technologies First Model: Autoencoders • A kind of neural network used for unsupervised learning of efficient codings • Requires a training pass to learn a representation of ”normal” data • Anomalous data will have a large reconstruction error compared to normal data Längkvist, Martin, Lars Karlsson, and Amy Loutfi. "A review of unsupervised feature learning and deep learning for time-series modeling." Pattern Recognition Letters 42 (2014): 11-24.
  • 22. © 2017 MapR Technologies 22 Experimental Setup: Training the Model
  • 23. © 2017 MapR Technologies Performance Evaluation • Evaluation dataset – Captured from a preprogrammed “pre-failure” operation mode – 1x full movement cycle of (“pre-failure”) labeled data • Normal 90% Failure 10% • Performance measures: – MSE during training – TPR/FPR on the test dataset Note: For an example with code: https://machinelearningmastery.com
  • 24. © 2017 MapR Technologies 24 ML – Results Note: Time window: 200ms, Threshold: 2SD
  • 25. © 2017 MapR Technologies 25 Experimental Setup: Real-time Predictions
  • 26. © 2017 MapR Technologies Next Step: Long Short Term Memory (LSTM) • Deep learning architecture in the RNN family that remembers arbitrary intervals1. • Overcomes known RNN issues – limited memory – instability • Especially used for image, text and speech applications … and time series data Ref: “Understanding LSTM Networks” by Christopher Olah (2015) 1: Long Short-Term Memory, Hochreiter and Schmidhuber (1997) RNN LSTM
  • 27. © 2017 MapR Technologies Implementation: Keras with TensorFlow Backend • Similar design to Autoencoder • Encoder and decoder are separate • Model implemented with Keras in Python but executed by H2O Deep Water
  • 28. © 2017 MapR Technologies LSTM and H2O: Deep Water • Keras model is trained through H2O – Fast data ingest, missing value handling, ignoring columns, etc. 2.5m/100 epoch – MOJO output (binary model representation) • Usable from any JVM language • Just like H2O POJO! • Prediction service infrastructure is reused
  • 29. © 2017 MapR Technologies LSTM Results LinAccX Results LinAccZ LinAccY
  • 30. © 2017 MapR Technologies 30 Conclusion
  • 31. © 2017 MapR Technologies What We Didn’t Talk About (Much) Security: System and Data Reliability and Scalability Machine learning logistics Integration in a Factory
  • 32. © 2017 MapR Technologies 32 • Clever assembly of existing enterprise software can do it with surprisingly small time, effort and complexity • H2O and MapR offers a fast path to value for production ML • LSTM doesn’t easily beat Autoencoders without significant effort and expertise • Converged platforms reduce complexity Advanced Predictive Maintenance Poster by J. Howard Miller (1943)
  • 33. © 2017 MapR TechnologiesMapR Confidential New: Machine Learning Logistics Model Management in the Real World O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017 Get free pdf copy of book courtesy of MapR: https://mapr.com/ebook/machine-learning-logistics/ Visit MapR booth for free book signings & booth theater presentations by the authors Wed schedule: Book signing: afternoon break 3:35 – 4:20 pm Booth presentation by Ted Dunning: 3:00 – 3:30 pm Thur schedule: Book signing: morning break 10:45 – 11:20 am Booth presentation by Ellen Friedman: 3:00 – 3:30 pm
  • 34. © 2017 MapR Technologies 34 Q&A ENGAGE WITH US mateusz@h2o.ai mathieu.dumoulin@mapr.com PROJECT GITHUB: github.com/mdymczyk/iot-pipeline Our thanks to: LP RESEARCH www.lp-research.com contact: Klaus Peterson klaus@lp-research.com
  • 35. © 2017 MapR Technologies 35 Thank you to LP-RESEARCH! Hardware design and production Expertise in Motion sensors Gyroscope Accelerometer Magnetometer Sensor fusion algorithm development Multi-platform application development See all our products: https://www.lp-research.com/products/ LPMS-B2 LPMS-CU2 LPMS-CANAL2 LPMS-USBAL2OEM also available!

Editor's Notes

  1. Industry 4.0 is all about digitization of the factory. Sensors everywhere. All this data makes possible new opportunities for automation, cost savings, higher productivity and higher quality. What makes a system Industry 4.0 Interoperability — machines, devices, sensors and people that connect and communicate with one another. Information transparency — the systems create a virtual copy of the physical world through sensor data in order to contextualize information. Technical assistance — both the ability of the systems to support humans in making decisions and solving problems andthe ability to assist humans with tasks that are too difficult or unsafe for humans. Decentralized decision-making — the ability of cyber-physical systems to make simple decisions on their own and become as autonomous as possible. Our talk will focus on Data & Analytics for improving the efficiency of operations of factories with lots of industrial robots. We combine Smart sensors, DB Analytics (ML), Cloud computing and AR to power a real-world, state of the art predictive analytics system.
  2. Order parts predictively Increased factory efficiency Robots operate at peak efficiency
  3. We have a business goal, a robot and a sensor to work with. We are gonna have to data science the shit out of this [9]https://www.quora.com/Which-type-of-Sensors-use-in-industrial-robots inderesting: no motion sensors! that’s the justification here.
  4. Based on known real-world requirement of state of the art Japanese car-parts manufacturers. Recall is more important than precision because too many false alarms will increase costs and make trusting the system very hard. Precision can be initially very low and still the system can be useful IF you can trust the predictions. The models can then be improved over time.
  5. Scale with number of sensors, robots and factories. GB a day quickly become many GB per hour or even minutes. This is comfortably on moderate sized clusters (5-25 nodes) using current big data platforms used by attendees of Strata.
  6. Working software over complex implementations that never get done
  7. 異常がない場合、ご覧いただいた通り緑のマークが表示されます。
  8. 20m mark
  9. What do we even want?! I.E.: Data gathering Feature selection, extraction, engineering and data transformation 3) Pick all potential algorithms 4) Build a model using your library/tool of choice 5) Evaluate according to previously defined metrics 6) If not good enough then either try a different approach, features or method parameters 7) Otherwise extract the model and put it into production!
  10. Simplifications: Data is centered around 0 Data is scaled [-1,1] No missing values
  11. Mention why we are doing it with machine learning at all! No rules, automatically learn the best parameters for each application without new coding and not based on supervised techniques. Especially good when we don’t know what we are looking for: machines can break in a variety of ways.
  12. Mention why we are doing it with machine learning at all! No rules, automatically learn the best parameters for each application without new coding and not based on supervised techniques. Especially good when we don’t know what we are looking for: machines can break in a variety of ways. Peeking: ML modeling mistake where some data is used to train a model includes information about the answer
  13. Short mention of what does ”large” reconstruction error mean? Discussion of thresholds and why SD is a good choice.
  14. MSE measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.  The MSE is a measure of the quality of an estimator—it is always non-negative, and values closer to zero are better. Note: using RMSE is popular too and is a scaled version of MSE, otherwise it’s identical.
  15. Circle back to slide
  16. RNNs can remember their former inputs and operate over a sequence of vectors. Good for time series? Training with Back Propagation Through Time is unstable1 Effective limit of RNNs is 5-10 discreet time steps2 “Works slightly better (than RNN) in practice, owing to its more powerful update equation and some appealing backpropagation dynamics” - Andrej Karpathy I mentioned the remarkable results people are achieving with RNNs. Essentially all of these are achieved using LSTMs. They really work a lot better for most tasks! – C Olah text and speech : (Google Translate, Apple’s Siri, Amazon’s Alexa)
  17. LSTM Can learn in excess of 1000 discreet time steps Algorithm is local in space and time Computational complexity per time step/weight is O(1) Keras has implementation of LSTM layer Lots of examples are available (TODO: 1, 2, 3)
  18. Keras and TF stil need expertise unavailable to most engineers, but it’s a huge step in the right direction Prediction throughput is slower, will need more engineering to make it work properly
  19. Mention Convergence
  20. Have a clear plan for production Data Science + Data Engineering = Win Effort: Data engineering > data science