Predictive Maintenance - Portland Machine Learning Meetup

A Machine Learning Practitioner’s Guide to
Predictive Maintenance
IAN DOWNARD
idownard@mapr.com

© 2018 MapR Technologies 2
About Me
Sr. Solution Architect at MapR
Email: idownard@mapr.com
Blog: bigendiandata.com

Predictive maintenance can
reduce…
• equipment failures by 75%
• downtime by 45%
• maintenance costs by 30%
US Department of Energy, August 2010
https://www.energy.gov/eere/femp/downloads/operations-and-maintenance-best-practices-guide

“Manufacturer’s adoption of
ML/AI will increase 38% in the
next five years.”
Digital Factories 2020: Shaping the future of manufacturing
PricewaterhouseCoopers, 2017
If ML is so effective, why
aren’t more people using it?

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Hidden Technical Dept in Machine Learning Systems
Google, 2015
“Only a small fraction of real-world ML systems is composed of the ML code…
The required surrounding infrastructure is vast and complex.”

Data Collection
Feature
Extraction
This talk is about Data Collection, Feature
Engineering, and time-series forecasting.
ML
Code

1. Collect data wherever possible
(sensors, cameras, operator logs,
weather, etc.)
2. Store that data for a long time
3. Look for patterns using tools that can
spot subtle trends.
4. Deploy monitoring agents to watch
for anomalies or known failure modes.
Predictive Maintenance
Ein Besuch bei Ford in Köln (source: Gilly on Flickr)

Data Collection

Data Pipelines
Ingest Persist Analyze / Operationalize
IDEs,
notebooks,
and AI
platforms
?
Data Flow
Data Platform(s)

Data Pipelines
Ingest Persist Analyze / Operationalize
Data Flow
IDEs,
notebooks,
and AI
platforms

Industry’s leading data storage platform for Big Data and Machine Learning
Data Storage

Data Exploration, Feature Engineering, and AI
IDEs, notebooks
platforms
Programming
Libraries
Dataware Files, Tables, StreamsData Flow

Dataflow Management
• IDE with integrated monitoring and
debugging tools
• CD/CI capabilities built-in
• Containerized approach that scales
elastically, e.g. in Kubernetes.
• Data validation on-the-fly
• Dataflows that can span edge/on-
prem/cloud infrastructure with
guaranteed privacy and delivery.

StreamSets Demo
Demo Steps:
https://github.com/mapr-demos/predictive-maintenance#streamsets-demonstration

What does Factory IoT data look like?
Sensor Data
Device ID time x y z
1 8:00:00 .431 .123 .145
1 8:00:01 .735 .112 .672
1 8:00:02 .932 .141 .431
1 8:00:03 .988 .241 .625
Very important
machinery

Device ID time x y z _operator _weather
1 11:59:58 .431 .123 .145 Joe sunny
1 11:59:59 .735 .112 .672 Joe rain
1 12:00:00 .932 .141 .431 Moe sunny
1 12:00:01 .988 .241 .625 Moe sunny
Feature engineering helps AI find correlations.
Feature tables need to have flexible schemas.

Device ID time x y z _subsystem _weekend
1 11:59:58 .431 .123 .145 Boiler False
2 11:59:59 .735 .112 .672 Chiller False
1 12:00:00 .932 .141 .431 Boiler True
3 12:00:01 .988 .241 .625 Fuel Supply True
Derived features also make analysis easier.
How would you write SQL logic for _weekend?

Device ID time x y z _subsystem _weekend
1 11:59:58 .431 .123 .145 Boiler False
2 11:59:59 .735 .112 .672 Chiller False
1 12:00:00 .932 .141 .431 Boiler True
3 12:00:01 .988 .241 .625 Fuel Supply True
Derived features also make analysis easier.
How would you filter on _subsystem without a full table scan?

Device
ID
time x y z Remaining
Life
30s to
failure
1 8:00:00 .431 .123 .145
1 8:00:01 .735 .112 .672
1 8:00:02 .932 .141 .431
1 8:00:03 .988 .241 .625
Some features may be “lagging”
Values can only be calculated once a future event occurs.

Device
ID
time x y z Remaining
Life
30s to
failure
1 8:00:00 .431 .123 .145 3 true
1 8:00:01 .735 .112 .672 2 true
1 8:00:02 .932 .141 .431 1 true
1 8:00:03 --- --- --- 0 true
Lagging features work like this:
When a failure happens…
…then lagging features get labeled.

Spark database connectors
Key Features:
• Operate on data in Spark
without data movement.
• DB pushdown  fast
filtering and sorting.
DB connectors enable you to use feature tables
without massive ETL into Spark executors.
r

Ingest
Stream
Feature
Engineering
in Spark
Feature
storage in
NoSQL DB
SQL analytics in Drill
ODBC connect to
data science tools.
Kafka API CRUD API
Failure events
Sensor data
Feature Engineering Demo:
github.com/mapr-demos/predictive-maintenance

Feature engineering with Spark
Define case class for
incoming metrics.
Subscribe to stream
Read from stream

Feature engineering with Spark
Create lagging variables
Derive _weekend feature
Save feature table to DB

Labeling lagging features with Spark
Subscribe to stream containing
failure notifications
When there’s a failure,
open the feature table
Calculate the timestamps for when
we consider failure “immanent”

Labeling lagging features with Spark
Label “AboutToFail”
Label “RUL”
Combine the lagging
feature updates into one df
Save to DB

Feature table size
Number of lagging variable
records to update.
Alert sent to Grafana
Listening to stream
for failure events.

• Continuous time signals require high speed sampling.
• Full resolution is required.
– Aggregation hides important things.
– High fidelity makes AI more effective.
• Challenges:
– Stream throughput congestion?
– Stream transformation backlog?
Architecting for Fast Data

• Vibrations give the first clue that a machine is failing
• Vibration sensors measure physical displacement
• Capturing a 10kHz vibration requires > 20k samples / second
Detecting Vibrations with FFTs
Detecting vibration anomalies requires continuously
processing high speed data streams.

Can Spark distill fast data streams?
(1 record / sec)
>20k samples/sec
Anomaly
notifications
Feature
Store
As long as Spark can process signals fast enough, this
will work. What two things could go wrong?

Spark scales via parallel compute.
Anomaly
notifications
Feature
Store
If Spark computes FFTs too slow, then just run more Spark jobs.
What if there’s too much data for one stream?

Stream bottlenecks can be avoided by distributing
data across topics and/or partitions.
Anomaly
notifications
Feature
Store
Tip: make sure each
producer has its own topic.
This can significantly improve
throughput (msgs/sec).
Tip: Spark consumers
can subscribe to multiple
topics, so subscribe to all.

1. Regression: Predict the Remaining Useful Life (RUL)
2. Binary classification: Predict if an asset will fail within
certain time frame (e.g. 50 days).
3. Multi-class classification: Predict if an asset will fail in
different time windows (e.g. tomorrow, next week…)
Three Types of Models for PdM

Terminology: “units” vs “cell”
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
One LSTM cell
3 LSTM cells
3 memory cells
3 hidden nodes
3 hidden layers
3 neurons
3 units
1 visible layer
Determining units is a process of trial and error.

• Input shape:
[samples, time steps, and features]:
• Samples: these are all the rows in our
time-series training data
• Time Steps: This is the look back
window, or sequence length.
– LSTMs work better on windows with fewer
than a couple hundred time steps.
• Features columns - these are the
signals which we want to generalize
Model Input
LSTM 1
LSTM 2
100 Units
0.2 Dropout
50 Units
0.3 Dropout
Dense
Sigmoid
0 or 1 result
Input

Sequential: linear stack of layers
Units: number of neurons
Dropout: reduces overfitting by randomly dropping neurons.
Sigmoid: makes the output either 0 or 1.
Dense: applies sigmoid to every neuron
Binary Cross Entropy: loss function for when you have just two
classes (1 and 0).
Adam Optimizer: learns fast, low memory usage, and is stable
over a wide range of learning rates
Model Structure
LSTM 1
LSTM 2
100 Units
0.2 Dropout
50 Units
0.3 Dropout
Dense
Sigmoid
0 or 1 result
Input

Training
• An epoch is when you go over the complete training data once.
• A batch size of 10 means we expose the network to 10 input sequences before
updating the weights.
• Batches also ensure we don’t try to load the entire training data into memory at
once.

Generating data with logsynth
https://github.com/tdunning/log-synth

References https://github.com/mapr-demos/predictive-maintenance

References
• Awesome LSTM implementation for predictive maintenance on aircraft
engines, by Fidan Boylu Uz, PhD:
https://github.com/Azure/lstms_for_predictive_maintenance/blob/master/Deep%20Learning%20Basics%20for%2
0Predictive%20Maintenance.ipynb
• Good LSTM overview, by Jason Brownlee, PhD:
https://machinelearningmastery.com/5-step-life-cycle-long-short-term-memory-models-keras/

Check out my webinars:
bit.ly/iot_webinar_part1 bit.ly/iot_webinar_part2

Questions?
https://mapr.com/ebooks
IAN DOWNARD
idownard@mapr.com

Predictive Maintenance - Portland Machine Learning Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Predictive Maintenance - Portland Machine Learning Meetup

Similar to Predictive Maintenance - Portland Machine Learning Meetup (20)

Recently uploaded

Recently uploaded (20)

Predictive Maintenance - Portland Machine Learning Meetup

Editor's Notes