Our Strata Beijing 2017 presentation slides where we show how to use data from a movement sensor, in real-time, to do anomaly detection at scale using standard enterprise big data software.
Industry 4.0 is all about digitization of the factory. Sensors everywhere. All this data makes possible new opportunities for automation, cost savings, higher productivity and higher quality.
What makes a system Industry 4.0
Interoperability — machines, devices, sensors and people that connect and communicate with one another.
Information transparency — the systems create a virtual copy of the physical world through sensor data in order to contextualize information.
Technical assistance — both the ability of the systems to support humans in making decisions and solving problems andthe ability to assist humans with tasks that are too difficult or unsafe for humans.
Decentralized decision-making — the ability of cyber-physical systems to make simple decisions on their own and become as autonomous as possible.
Our talk will focus on Data & Analytics for improving the efficiency of operations of factories with lots of industrial robots.
We combine Smart sensors, DB Analytics (ML), Cloud computing and AR to power a real-world, state of the art predictive analytics system.
Order parts predictively
Increased factory efficiency
Robots operate at peak efficiency
We have a business goal, a robot and a sensor to work with.
We are gonna have to data science the shit out of this
[9]https://www.quora.com/Which-type-of-Sensors-use-in-industrial-robots
inderesting: no motion sensors! that’s the justification here.
Based on known real-world requirement of state of the art Japanese car-parts manufacturers.
Recall is more important than precision because too many false alarms will increase costs and make trusting the system very hard.
Precision can be initially very low and still the system can be useful IF you can trust the predictions. The models can then be improved over time.
Scale with number of sensors, robots and factories. GB a day quickly become many GB per hour or even minutes. This is comfortably on moderate sized clusters (5-25 nodes) using current big data platforms used by attendees of Strata.
Working software over complex implementations that never get done
異常がない場合、ご覧いただいた通り緑のマークが表示されます。
20m mark
What do we even want?!
I.E.:
Data gathering
Feature selection, extraction, engineering and data transformation
3) Pick all potential algorithms
4) Build a model using your library/tool of choice
5) Evaluate according to previously defined metrics
6) If not good enough then either try a different approach, features or method parameters
7) Otherwise extract the model and put it into production!
Simplifications:
Data is centered around 0
Data is scaled [-1,1]
No missing values
Mention why we are doing it with machine learning at all!
No rules, automatically learn the best parameters for each application without new coding and not based on supervised techniques.
Especially good when we don’t know what we are looking for: machines can break in a variety of ways.
Mention why we are doing it with machine learning at all!
No rules, automatically learn the best parameters for each application without new coding and not based on supervised techniques.
Especially good when we don’t know what we are looking for: machines can break in a variety of ways.
Peeking: ML modeling mistake where some data is used to train a model includes information about the answer
Short mention of what does ”large” reconstruction error mean? Discussion of thresholds and why SD is a good choice.
MSE measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.
The MSE is a measure of the quality of an estimator—it is always non-negative, and values closer to zero are better.
Note: using RMSE is popular too and is a scaled version of MSE, otherwise it’s identical.
Circle back to slide
RNNs can remember their former inputs and operate over a sequence of vectors. Good for time series?
Training with Back Propagation Through Time is unstable1
Effective limit of RNNs is 5-10 discreet time steps2
“Works slightly better (than RNN) in practice, owing to its more powerful update equation and some appealing backpropagation dynamics” - Andrej Karpathy
I mentioned the remarkable results people are achieving with RNNs. Essentially all of these are achieved using LSTMs. They really work a lot better for most tasks! – C Olah
text and speech : (Google Translate, Apple’s Siri, Amazon’s Alexa)
LSTM
Can learn in excess of 1000 discreet time steps
Algorithm is local in space and time
Computational complexity per time step/weight is O(1)
Keras has implementation of LSTM layer
Lots of examples are available (TODO: 1, 2, 3)
Keras and TF stil need expertise unavailable to most engineers, but it’s a huge step in the right direction
Prediction throughput is slower, will need more engineering to make it work properly
Mention Convergence
Have a clear plan for production
Data Science + Data Engineering = Win
Effort: Data engineering > data science