© 2017 MapR TechnologiesMapR Confidential 1
Chicago Advanced Analytics
Meetup:
IoT Predictive Maintenance using
Recurrent Neural Networks
Justin Brandenburg
Data Scientist
10/12/17
© 2017 MapR TechnologiesMapR Confidential 2
Predictive Maintenance Importance
“The best way of thinking about predictive maintenance is by tying it
into a revenue stream. When your machines are up and running, you’re
making money. Instead of just looking at the average time between
failures, you’re looking for subtle clues within the machine itself. You’re
measuring sound, heat, vibration, tilt, acceleration, compression,
humidity, and checking to see if any of those are out of spec.”
- Greg Fell is the former CIO of Terex, a heavy equipment manufacturer
© 2017 MapR TechnologiesMapR Confidential 3
Predictive Maintenance
• The idea behind predictive maintenance is that the failure patterns of various
types of equipment are predictable. If we can predict when a piece of
hardware will fail accurately, and replace that component before it fails, we
can achieve much higher levels of operational efficiency.
• With many devices now including sensor data and other components that
send diagnosis reports, predictive maintenance using big data becomes
increasingly more accurate and effective.
© 2017 MapR TechnologiesMapR Confidential 4
Data is Generated One Event at a Time
“time” : “6:01.103”,
“event” : “RETWEET”,
“location” :
“lat” : 40.712784,
“lon” : -74.005941
“time: “5:04.120”,
“severity” : “CRITICAL”,
“msg” : “Service down”
“card_num” : 1234,
“merchant” : ”Apple”,
“amount” : 50
© 2017 MapR TechnologiesMapR Confidential 5
Stream
Welding Sensors
Stream
Press Sensors
Stream
Prod Line
Stream
Paint
Topic
Topic
Topic
Topic
• Insight on the entire process in real time
• Insight on individual components in real time
IoT Real Time Monitoring
Data Platform
© 2017 MapR TechnologiesMapR Confidential 6
From Monitoring to Classifying to Predicting
• Monitoring still requires active involvement and quick response
– Oil well that is indicating increased temperature or volume
– Network traffic for bot net activity or insider threat
• What are we monitoring?
– Is this behavior normal?
• Based on what we are monitoring, can we predict what will
happen?
© 2017 MapR TechnologiesMapR Confidential 7
Classifying vs Predicting in IoT
• Classification
• Prediction – Can we predict when this non-normal behavior will
occur?
Non-Normal
© 2017 MapR TechnologiesMapR Confidential 8
Deep Learning
Deep learning is a particular subset of ML methodologies using
artificial neural networks (ANN)
• Successfully applied to so many different domains (image, text, video,
speech, and vision)
• Success of DL is also due to the availability of more training data (such as
ImageNet for images) and the relatively low-cost availability of storage and
increase in computational power
© 2017 MapR TechnologiesMapR Confidential 9
Deep Learning Implementations
Convolutional
Neural
Networks
Deep
Neural
Networks
Recurrent
Neural
Networks
providing lift for
classification and
forecasting models
feature extraction
and classification of
images
for sequence of
events (sentences
or time series)
© 2017 MapR TechnologiesMapR Confidential 10
Recurrent Neural Network at a Glance
• A neural network that can be used when your data is treated as a
sequence, where the particular order of the data-points matter
• Sometimes, the input is a sequence and the output is a single
vector, or the other way around.
© 2017 MapR TechnologiesMapR Confidential 11
Recurrent Neural Network Topology
Unrolling through time
© 2017 MapR TechnologiesMapR Confidential 12
Other RNN Implementations
© 2017 MapR TechnologiesMapR Confidential 13
TensorFlow
TensorFlow is an open source software library for numerical
computation using data flow graphs
• Developed by Google, released to open source community in Nov 2015 and
quickly became one of the most popular deep learning frameworks
• Two months after its release it had already become the most popular forked
ML GitHub repository
• Built on C++ with a Python interface
© 2017 MapR TechnologiesMapR Confidential 14
What is a Tensor?
A Tensor is a n-dimensional matrix
• 1D is a vector
• 2D (M x M) matrix/tensor is a square array of numbers (m numbers tall and
m numbers wide)
• M x M x M tensor is a cube array (m tall, m wide, m deep)
© 2017 MapR TechnologiesMapR Confidential 15
Why TensorFlow for this problem?
• TensorFlow has rich documentation
• Works on CPUs and GPUs (most DL frameworks can)
• Versions 1.x and above have increased utility with function
abstraction
• Once model is trained, tested and optimized, it can be deployed
to edge computing structures or containers
© 2017 MapR TechnologiesMapR Confidential 16
What are we working with?
• Challenge: Sensor attached to a automated manufacturing
device capture position and calibration at each time stamp.
Sensor is capturing real time data on the device and its current
health. The data is stored for historical analysis to identify trends
and patterns to determine if any devices need to be taken out of
production for health checks and maintenance.
• Data: 2,013 .dat files that, when unpackaged, were xml format
© 2017 MapR TechnologiesMapR Confidential 17
Workflow
• Import Data into environment
• Perform data transformations
• Exploration of historical data
• Model Construction
• Model Testing
• Deploy Model into Streaming Consumer
• Integrate Visualization
© 2017 MapR TechnologiesMapR Confidential 18
Data Import
• MapR-FS allows for a user to ingest any file type
– Filesystem offers schema-on-read versus schema-on-write. Schema-on-
write necessitates knowing your schema before data is written and then
when data is read, it comes back in the schema defined up-front.
Schema-on-read allows data loading as-is, with no preprocessing thereby
removing obstacles for data capture.
• Data was uploaded in compressed file into MapR-FS via Hue UI
– Also can be done via NFS or scp
• Extract files from .zip file
• Extracted files were .dat filetypes, converted file extension to .xml
using standard bash commands
© 2017 MapR TechnologiesMapR Confidential 19
Data Transformation
• Using Spark to do bulk file ETL from many xml files to single csv
• XML format
• Dataframe
© 2017 MapR TechnologiesMapR Confidential 20
Data Exploration
• Use Jupyter Notebook for interactive data exploration and model
development
© 2017 MapR TechnologiesMapR Confidential 21
Data Prep and Model Building
• Use Jupyter Notebook for prepping data and developing your
model and setting hyper parameters
© 2017 MapR TechnologiesMapR Confidential 22
Model Training and Testing
• Use Jupyter
Notebook for
training and
testing the model
© 2017 MapR TechnologiesMapR Confidential 23
Model Deployment
• Can take tested RNN model and deploy the model on new data as
it streams from sensor attached to device
• Model will generate an alert if predicted metric exceeds historically
normal threshold
© 2017 MapR TechnologiesMapR Confidential 24
Visualization
• Dashboard can show in
real time trends and
behaviors of real time
sensor data and the next
period prediction
© 2017 MapR TechnologiesMapR Confidential 25
Improvement?
• Implement an LSTM RNN
• Change training batch sizes
• Adjust hyper-parameters
• Multi-variate inputs
© 2017 MapR TechnologiesMapR Confidential 26
Points to Remember
• TF is just one ML tool among many (but a great one)
• Choosing the right one depends on your problem
– Ex: Supervised or Unsupervised learning
• How does this model or solution scale?
• Once a model is optimized and insight gained, how can I deploy
my model to help my organization?
• Tools are never used in isolation, the platform matters!
– Support the Workflow, Not Just Modeling
© 2017 MapR TechnologiesMapR Confidential 27
Data Science In IoT…
… still evolving.
© 2017 MapR TechnologiesMapR Confidential 28
Q&A
ENGAGE WITH US
@mapr
Blog: https://mapr.com/blog/
MapR Academy
http://learn.mapr.com/

Map r chicago_advanalytics_oct_meetup

  • 1.
    © 2017 MapRTechnologiesMapR Confidential 1 Chicago Advanced Analytics Meetup: IoT Predictive Maintenance using Recurrent Neural Networks Justin Brandenburg Data Scientist 10/12/17
  • 2.
    © 2017 MapRTechnologiesMapR Confidential 2 Predictive Maintenance Importance “The best way of thinking about predictive maintenance is by tying it into a revenue stream. When your machines are up and running, you’re making money. Instead of just looking at the average time between failures, you’re looking for subtle clues within the machine itself. You’re measuring sound, heat, vibration, tilt, acceleration, compression, humidity, and checking to see if any of those are out of spec.” - Greg Fell is the former CIO of Terex, a heavy equipment manufacturer
  • 3.
    © 2017 MapRTechnologiesMapR Confidential 3 Predictive Maintenance • The idea behind predictive maintenance is that the failure patterns of various types of equipment are predictable. If we can predict when a piece of hardware will fail accurately, and replace that component before it fails, we can achieve much higher levels of operational efficiency. • With many devices now including sensor data and other components that send diagnosis reports, predictive maintenance using big data becomes increasingly more accurate and effective.
  • 4.
    © 2017 MapRTechnologiesMapR Confidential 4 Data is Generated One Event at a Time “time” : “6:01.103”, “event” : “RETWEET”, “location” : “lat” : 40.712784, “lon” : -74.005941 “time: “5:04.120”, “severity” : “CRITICAL”, “msg” : “Service down” “card_num” : 1234, “merchant” : ”Apple”, “amount” : 50
  • 5.
    © 2017 MapRTechnologiesMapR Confidential 5 Stream Welding Sensors Stream Press Sensors Stream Prod Line Stream Paint Topic Topic Topic Topic • Insight on the entire process in real time • Insight on individual components in real time IoT Real Time Monitoring Data Platform
  • 6.
    © 2017 MapRTechnologiesMapR Confidential 6 From Monitoring to Classifying to Predicting • Monitoring still requires active involvement and quick response – Oil well that is indicating increased temperature or volume – Network traffic for bot net activity or insider threat • What are we monitoring? – Is this behavior normal? • Based on what we are monitoring, can we predict what will happen?
  • 7.
    © 2017 MapRTechnologiesMapR Confidential 7 Classifying vs Predicting in IoT • Classification • Prediction – Can we predict when this non-normal behavior will occur? Non-Normal
  • 8.
    © 2017 MapRTechnologiesMapR Confidential 8 Deep Learning Deep learning is a particular subset of ML methodologies using artificial neural networks (ANN) • Successfully applied to so many different domains (image, text, video, speech, and vision) • Success of DL is also due to the availability of more training data (such as ImageNet for images) and the relatively low-cost availability of storage and increase in computational power
  • 9.
    © 2017 MapRTechnologiesMapR Confidential 9 Deep Learning Implementations Convolutional Neural Networks Deep Neural Networks Recurrent Neural Networks providing lift for classification and forecasting models feature extraction and classification of images for sequence of events (sentences or time series)
  • 10.
    © 2017 MapRTechnologiesMapR Confidential 10 Recurrent Neural Network at a Glance • A neural network that can be used when your data is treated as a sequence, where the particular order of the data-points matter • Sometimes, the input is a sequence and the output is a single vector, or the other way around.
  • 11.
    © 2017 MapRTechnologiesMapR Confidential 11 Recurrent Neural Network Topology Unrolling through time
  • 12.
    © 2017 MapRTechnologiesMapR Confidential 12 Other RNN Implementations
  • 13.
    © 2017 MapRTechnologiesMapR Confidential 13 TensorFlow TensorFlow is an open source software library for numerical computation using data flow graphs • Developed by Google, released to open source community in Nov 2015 and quickly became one of the most popular deep learning frameworks • Two months after its release it had already become the most popular forked ML GitHub repository • Built on C++ with a Python interface
  • 14.
    © 2017 MapRTechnologiesMapR Confidential 14 What is a Tensor? A Tensor is a n-dimensional matrix • 1D is a vector • 2D (M x M) matrix/tensor is a square array of numbers (m numbers tall and m numbers wide) • M x M x M tensor is a cube array (m tall, m wide, m deep)
  • 15.
    © 2017 MapRTechnologiesMapR Confidential 15 Why TensorFlow for this problem? • TensorFlow has rich documentation • Works on CPUs and GPUs (most DL frameworks can) • Versions 1.x and above have increased utility with function abstraction • Once model is trained, tested and optimized, it can be deployed to edge computing structures or containers
  • 16.
    © 2017 MapRTechnologiesMapR Confidential 16 What are we working with? • Challenge: Sensor attached to a automated manufacturing device capture position and calibration at each time stamp. Sensor is capturing real time data on the device and its current health. The data is stored for historical analysis to identify trends and patterns to determine if any devices need to be taken out of production for health checks and maintenance. • Data: 2,013 .dat files that, when unpackaged, were xml format
  • 17.
    © 2017 MapRTechnologiesMapR Confidential 17 Workflow • Import Data into environment • Perform data transformations • Exploration of historical data • Model Construction • Model Testing • Deploy Model into Streaming Consumer • Integrate Visualization
  • 18.
    © 2017 MapRTechnologiesMapR Confidential 18 Data Import • MapR-FS allows for a user to ingest any file type – Filesystem offers schema-on-read versus schema-on-write. Schema-on- write necessitates knowing your schema before data is written and then when data is read, it comes back in the schema defined up-front. Schema-on-read allows data loading as-is, with no preprocessing thereby removing obstacles for data capture. • Data was uploaded in compressed file into MapR-FS via Hue UI – Also can be done via NFS or scp • Extract files from .zip file • Extracted files were .dat filetypes, converted file extension to .xml using standard bash commands
  • 19.
    © 2017 MapRTechnologiesMapR Confidential 19 Data Transformation • Using Spark to do bulk file ETL from many xml files to single csv • XML format • Dataframe
  • 20.
    © 2017 MapRTechnologiesMapR Confidential 20 Data Exploration • Use Jupyter Notebook for interactive data exploration and model development
  • 21.
    © 2017 MapRTechnologiesMapR Confidential 21 Data Prep and Model Building • Use Jupyter Notebook for prepping data and developing your model and setting hyper parameters
  • 22.
    © 2017 MapRTechnologiesMapR Confidential 22 Model Training and Testing • Use Jupyter Notebook for training and testing the model
  • 23.
    © 2017 MapRTechnologiesMapR Confidential 23 Model Deployment • Can take tested RNN model and deploy the model on new data as it streams from sensor attached to device • Model will generate an alert if predicted metric exceeds historically normal threshold
  • 24.
    © 2017 MapRTechnologiesMapR Confidential 24 Visualization • Dashboard can show in real time trends and behaviors of real time sensor data and the next period prediction
  • 25.
    © 2017 MapRTechnologiesMapR Confidential 25 Improvement? • Implement an LSTM RNN • Change training batch sizes • Adjust hyper-parameters • Multi-variate inputs
  • 26.
    © 2017 MapRTechnologiesMapR Confidential 26 Points to Remember • TF is just one ML tool among many (but a great one) • Choosing the right one depends on your problem – Ex: Supervised or Unsupervised learning • How does this model or solution scale? • Once a model is optimized and insight gained, how can I deploy my model to help my organization? • Tools are never used in isolation, the platform matters! – Support the Workflow, Not Just Modeling
  • 27.
    © 2017 MapRTechnologiesMapR Confidential 27 Data Science In IoT… … still evolving.
  • 28.
    © 2017 MapRTechnologiesMapR Confidential 28 Q&A ENGAGE WITH US @mapr Blog: https://mapr.com/blog/ MapR Academy http://learn.mapr.com/