Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predictive Maintenance - Portland Machine Learning Meetup

754 views

Published on

The evolution of machine learning and IoT have made it possible for manufacturers to build more effective applications for predictive maintenance than ever before. Despite the huge potential that machine learning offers for predictive maintenance, it's challenging to build solutions that can handle the speed of IoT data streams and the massively large datasets required to train models that can forecast rare events like mechanical failures. Solving these challenges requires knowledge about state-of-the-art dataware, such as MapR, and cluster computing frameworks, such as Spark, which give developers foundational APIs for consuming and transforming data into feature tables useful for machine learning.

Published in: Data & Analytics
  • Be the first to comment

Predictive Maintenance - Portland Machine Learning Meetup

  1. 1. A Machine Learning Practitioner’s Guide to Predictive Maintenance IAN DOWNARD idownard@mapr.com
  2. 2. © 2018 MapR Technologies 2 About Me Sr. Solution Architect at MapR Email: idownard@mapr.com Blog: bigendiandata.com
  3. 3. © 2018 MapR Technologies 3 Predictive maintenance can reduce… • equipment failures by 75% • downtime by 45% • maintenance costs by 30% US Department of Energy, August 2010 https://www.energy.gov/eere/femp/downloads/operations-and-maintenance-best-practices-guide
  4. 4. © 2018 MapR Technologies 4 “Manufacturer’s adoption of ML/AI will increase 38% in the next five years.” Digital Factories 2020: Shaping the future of manufacturing PricewaterhouseCoopers, 2017 If ML is so effective, why aren’t more people using it?
  5. 5. © 2018 MapR Technologies 5 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Dept in Machine Learning Systems Google, 2015 “Only a small fraction of real-world ML systems is composed of the ML code… The required surrounding infrastructure is vast and complex.”
  6. 6. © 2018 MapR Technologies 6 Data Collection Feature Extraction This talk is about Data Collection, Feature Engineering, and time-series forecasting. ML Code
  7. 7. Data Collection
  8. 8. © 2018 MapR Technologies 8 1. Collect data wherever possible (sensors, cameras, operator logs, weather, etc.) 2. Store that data for a long time 3. Look for patterns using tools that can spot subtle trends. 4. Deploy monitoring agents to watch for anomalies or known failure modes. Predictive Maintenance Ein Besuch bei Ford in Köln (source: Gilly on Flickr)
  9. 9. © 2018 MapR Technologies 9 Data Collection
  10. 10. © 2018 MapR Technologies 10 Data Pipelines Ingest Persist Analyze / Operationalize IDEs, notebooks, and AI platforms ? Data Flow Data Platform(s)
  11. 11. © 2018 MapR Technologies 11 Data Pipelines Ingest Persist Analyze / Operationalize Data Flow IDEs, notebooks, and AI platforms
  12. 12. © 2018 MapR Technologies 12 Industry’s leading data storage platform for Big Data and Machine Learning Data Storage
  13. 13. © 2018 MapR Technologies 13 Data Exploration, Feature Engineering, and AI IDEs, notebooks platforms Programming Libraries Dataware Files, Tables, StreamsData Flow
  14. 14. © 2018 MapR Technologies 14 Dataflow Management • IDE with integrated monitoring and debugging tools • CD/CI capabilities built-in • Containerized approach that scales elastically, e.g. in Kubernetes. • Data validation on-the-fly • Dataflows that can span edge/on- prem/cloud infrastructure with guaranteed privacy and delivery.
  15. 15. © 2018 MapR Technologies 15 StreamSets Demo Demo Steps: https://github.com/mapr-demos/predictive-maintenance#streamsets-demonstration
  16. 16. Feature Engineering
  17. 17. © 2018 MapR Technologies 17 What does Factory IoT data look like? Sensor Data Device ID time x y z 1 8:00:00 .431 .123 .145 1 8:00:01 .735 .112 .672 1 8:00:02 .932 .141 .431 1 8:00:03 .988 .241 .625 Very important machinery
  18. 18. © 2018 MapR Technologies 18 Device ID time x y z _operator _weather 1 11:59:58 .431 .123 .145 Joe sunny 1 11:59:59 .735 .112 .672 Joe rain 1 12:00:00 .932 .141 .431 Moe sunny 1 12:00:01 .988 .241 .625 Moe sunny Feature engineering helps AI find correlations. Feature tables need to have flexible schemas.
  19. 19. © 2018 MapR Technologies 19 Device ID time x y z _subsystem _weekend 1 11:59:58 .431 .123 .145 Boiler False 2 11:59:59 .735 .112 .672 Chiller False 1 12:00:00 .932 .141 .431 Boiler True 3 12:00:01 .988 .241 .625 Fuel Supply True Derived features also make analysis easier. How would you write SQL logic for _weekend?
  20. 20. © 2018 MapR Technologies 20 Device ID time x y z _subsystem _weekend 1 11:59:58 .431 .123 .145 Boiler False 2 11:59:59 .735 .112 .672 Chiller False 1 12:00:00 .932 .141 .431 Boiler True 3 12:00:01 .988 .241 .625 Fuel Supply True Derived features also make analysis easier. How would you filter on _subsystem without a full table scan?
  21. 21. © 2018 MapR Technologies 21 Device ID time x y z Remaining Life 30s to failure 1 8:00:00 .431 .123 .145 1 8:00:01 .735 .112 .672 1 8:00:02 .932 .141 .431 1 8:00:03 .988 .241 .625 Some features may be “lagging” Values can only be calculated once a future event occurs.
  22. 22. © 2018 MapR Technologies 22 Device ID time x y z Remaining Life 30s to failure 1 8:00:00 .431 .123 .145 3 true 1 8:00:01 .735 .112 .672 2 true 1 8:00:02 .932 .141 .431 1 true 1 8:00:03 --- --- --- 0 true Lagging features work like this: When a failure happens… …then lagging features get labeled.
  23. 23. © 2018 MapR Technologies 23 Spark database connectors Key Features: • Operate on data in Spark without data movement. • DB pushdown  fast filtering and sorting. DB connectors enable you to use feature tables without massive ETL into Spark executors. r
  24. 24. © 2018 MapR Technologies 24 Ingest Stream Feature Engineering in Spark Feature storage in NoSQL DB SQL analytics in Drill ODBC connect to data science tools. Kafka API CRUD API Failure events Sensor data Feature Engineering Demo: github.com/mapr-demos/predictive-maintenance
  25. 25. © 2018 MapR Technologies 25 Feature engineering with Spark Define case class for incoming metrics. Subscribe to stream Read from stream
  26. 26. © 2018 MapR Technologies 26 Feature engineering with Spark Create lagging variables Derive _weekend feature Save feature table to DB
  27. 27. © 2018 MapR Technologies 27 Labeling lagging features with Spark Subscribe to stream containing failure notifications When there’s a failure, open the feature table Calculate the timestamps for when we consider failure “immanent”
  28. 28. © 2018 MapR Technologies 28 Labeling lagging features with Spark Label “AboutToFail” Label “RUL” Combine the lagging feature updates into one df Save to DB
  29. 29. © 2018 MapR Technologies 29 Feature table size Number of lagging variable records to update. Alert sent to Grafana Listening to stream for failure events.
  30. 30. © 2018 MapR Technologies 30 • Continuous time signals require high speed sampling. • Full resolution is required. – Aggregation hides important things. – High fidelity makes AI more effective. • Challenges: – Stream throughput congestion? – Stream transformation backlog? Architecting for Fast Data
  31. 31. © 2018 MapR Technologies 31 • Vibrations give the first clue that a machine is failing • Vibration sensors measure physical displacement • Capturing a 10kHz vibration requires > 20k samples / second Detecting Vibrations with FFTs Detecting vibration anomalies requires continuously processing high speed data streams.
  32. 32. © 2018 MapR Technologies 32 Can Spark distill fast data streams? (1 record / sec) >20k samples/sec Anomaly notifications Feature Store As long as Spark can process signals fast enough, this will work. What two things could go wrong?
  33. 33. © 2018 MapR Technologies 33 Spark scales via parallel compute. Anomaly notifications Feature Store If Spark computes FFTs too slow, then just run more Spark jobs. What if there’s too much data for one stream?
  34. 34. © 2018 MapR Technologies 34 Stream bottlenecks can be avoided by distributing data across topics and/or partitions. Anomaly notifications Feature Store Tip: make sure each producer has its own topic. This can significantly improve throughput (msgs/sec). Tip: Spark consumers can subscribe to multiple topics, so subscribe to all.
  35. 35. Machine Learning
  36. 36. © 2018 MapR Technologies 36 1. Regression: Predict the Remaining Useful Life (RUL) 2. Binary classification: Predict if an asset will fail within certain time frame (e.g. 50 days). 3. Multi-class classification: Predict if an asset will fail in different time windows (e.g. tomorrow, next week…) Three Types of Models for PdM
  37. 37. © 2018 MapR Technologies 37 Terminology: “units” vs “cell” http://colah.github.io/posts/2015-08-Understanding-LSTMs/ One LSTM cell 3 LSTM cells 3 memory cells 3 hidden nodes 3 hidden layers 3 neurons 3 units 1 visible layer Determining units is a process of trial and error.
  38. 38. © 2018 MapR Technologies 38 • Input shape: [samples, time steps, and features]: • Samples: these are all the rows in our time-series training data • Time Steps: This is the look back window, or sequence length. – LSTMs work better on windows with fewer than a couple hundred time steps. • Features columns - these are the signals which we want to generalize Model Input LSTM 1 LSTM 2 100 Units 0.2 Dropout 50 Units 0.3 Dropout Dense Sigmoid 0 or 1 result Input
  39. 39. © 2018 MapR Technologies 39 Sequential: linear stack of layers Units: number of neurons Dropout: reduces overfitting by randomly dropping neurons. Sigmoid: makes the output either 0 or 1. Dense: applies sigmoid to every neuron Binary Cross Entropy: loss function for when you have just two classes (1 and 0). Adam Optimizer: learns fast, low memory usage, and is stable over a wide range of learning rates Model Structure LSTM 1 LSTM 2 100 Units 0.2 Dropout 50 Units 0.3 Dropout Dense Sigmoid 0 or 1 result Input
  40. 40. © 2018 MapR Technologies 40 Training • An epoch is when you go over the complete training data once. • A batch size of 10 means we expose the network to 10 input sequences before updating the weights. • Batches also ensure we don’t try to load the entire training data into memory at once.
  41. 41. © 2018 MapR Technologies 41 Generating data with logsynth https://github.com/tdunning/log-synth
  42. 42. © 2018 MapR Technologies 42 References https://github.com/mapr-demos/predictive-maintenance
  43. 43. © 2018 MapR Technologies 43 References • Awesome LSTM implementation for predictive maintenance on aircraft engines, by Fidan Boylu Uz, PhD: https://github.com/Azure/lstms_for_predictive_maintenance/blob/master/Deep%20Learning%20Basics%20for%2 0Predictive%20Maintenance.ipynb • Good LSTM overview, by Jason Brownlee, PhD: https://machinelearningmastery.com/5-step-life-cycle-long-short-term-memory-models-keras/
  44. 44. © 2018 MapR Technologies 44 Check out my webinars: bit.ly/iot_webinar_part1 bit.ly/iot_webinar_part2
  45. 45. © 2018 MapR Technologies 45 Questions? https://mapr.com/ebooks IAN DOWNARD idownard@mapr.com

×