Data Analysis in Industrial Applications:
From Predictive Maintenance
to Image Understanding
2016 Taipei Tech Workshop, Technikum Wien, 22.11.2016
DI Matthias Wastian (dwh GmbH), DI Dominik Brunmeir (dwh OG)
matthias.wastian@dwh.at, dominik.brunmeir@dwh.at
Presentation Outline
• Who We Are
• Some Definitions
– Machine Learning
– Data Mining
– Deep Learning
• Natural Language Processing
– Telecommunication Patent Classification
– Speech Analysis of Austrian Politicians
• Predictive Maintenance
– Server Outage Prediction
• Image Understanding
– Object Detection Using HOG Features
– Deep Inspection
– Automatic Optical Inspection of Humidity Sensors
dwh GmbH
• Founded 2004, GmbH since 2010
• 16 employees
• 17 master theses
• 6 finished dissertations
• 6 current dissertations
• >90 publications
• Bosses:
– Niki Popper
– Michael Landsiedl
Definitions
Definitions
Machine Learning
• is a field of study that gives computers the ability to learn without being
explicitly programmed (Arthur Samuel, 1959).
• The field of machine learning is concerned with the question of how to
construct computer programs that automatically improve with experience.
• A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E (Tom Mitchell,
1997).
Data Mining
• is the analysis of (often large) observational data sets to find unsuspected
relationships and to summarize the data in novel ways that are both
understandable and useful to the data owner (David Hand, 2001).
Definitions
Deep Learning
• is learning using one of a set of algorithms that
attempt to model high-level abstractions in data
by using model architectures composed of
multiple non-linear transformations.
• One of the promises of deep learning is replacing
handcrafted features with efficient algorithms for
unsupervised or semi-supervised feature learning
and hierarchical feature extraction.
Telecommunication Patent
Classification
Example EP1696821B1
• Method and device for automatically detecting mating of animals
• Abstract: The inventive device (110, 210, 310, 510) for
automatically detecting the mating of animals is wearable by an
animal (100) and comprises means (105, 505) for fixing to an
animal, means (140) for detecting an attempt of mating a female
animal (120) by said animal, means (145, 180, 345, 580) for
identifying an electronic label which is introduced in the body of
said female animal and actuated by said detection means and/or
by the female animal identification means by processing the image
of at least one part of the female animal triggered by said detection
means. In the preferred embodiment, means for identifying said
other animal comprises means for communicating with the
electronic label (130) carried by a female animal conspecific with
the animal triggered by said detection means. In one of the
embodiments, communications means is embodied in such a way
that it reads the electronic label identifier of each female animal
which said animal attempts to mate and storing means (160)
memorises each displayed identifier. In the other embodiment,
communication means is provided with a device for storing
representative information on the attempted mating in the random
access memory of the electronic label carried by the conspecific
female animal.
Telecommunication Patent
Classification
• Several thousand classified patents were used
to derive a classification model for millions of
3GPP patents from Korea, Japan, China, the US
and Europe.
• The data used included the publication
number, the abstract, the abstract of the DWPI
and the claims of the patent.
Telecommunication Patent
Classification
• Language detection
• Bag of words models
• Tf-idf weighting
• Different classification models
– SVM
– Maximum entropy classifier
• Fuzzy key word comparison
Natural Language Processing
Word Counts
• Measuring similarity: scalar product
• Problem: document length, solution: normalize
Tf-idf
• Common words (stop words: a, the, in...)
vs rare words (names, technical terms,...)
• Important words: common locally, rare globally
• Term frequency times 𝑙𝑜𝑔
#𝑑𝑜𝑐𝑠
1+#𝑑𝑜𝑐𝑠 𝑢𝑠𝑖𝑛𝑔 𝑤𝑜𝑟𝑑
Speech Analysis of Austrian
Politicians
• How rude are Austrian politicians?
• Have they become ruder over time?
 Data acquisition via
web scraping
 Human labelling of
selected sentences
 Word2vec or similar
models
Predictive Maintenance
Server Outage Prediction
• NOBODY likes server outages.
• Is there an exact
definition of the term outage?
• Is it measurable?
• Downtime minutes per user
Server Outage Prediction
• Definition (Event): An event shall be defined as an
occurrence happening at a determinable time and place
with a certain duration. It may be a part of a chain of
occurrences as an effect of a preceding occurrence and as
the cause of a succeeding occurrence. It is possible that
more than one event occurs at the same time and/or place.
• Definition (Abnormal Event): An abnormal event shall be
defined as an outlier in a chain of events, an event that
deviates so much from the other events as to arouse
suspicion that it was caused by something that does not
follow the usual behavior of the considered system and
that it could change the entire system behavior.
Server Outage Prediction
Server
Server
Monitoring
Feature
SelectionPrediction
Anomaly
Detection
Abnormal
Event Basic Model
Assumption:
The predictions are
accurate, if the
server status is ok.
Server Outage Prediction
• ANN
• SARIMA
• OC-SVM
• ABOD
Server Outage Prediction
Data:
• up to 1439 features per server, sampling rate 1-15min
• historic data sets, IBM Lotus Domino Server.Load
Preprocessing:
• Reduction of data using expert knowledge
• Differentiating accumulative features
• Checking for wrong or missing data
• Normalizing the data (maxmin-mapping)
Server Outage Prediction
“I have seen the
future and it is very
much like the
present, only
longer.“
Kehlog Albran, The Profit
Server Outage Prediction
„Prediction is very
difficult, especially
if it's about the
future.“
Niels Bohr, Nobel laureate in Physics
SARIMA, ANN
• Univariate
• Seasonality
• Crossvalidation
• Errors
0 100 200 300 400 500 600 700 800 900 1000
40
50
60
70
80
90
100
Platform.Memory.RAM.PctUtil; prediction: green
Prediction Errors
0 100 200 300 400 500 600 700 800 900 1000
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
time step
error
Prediction Errors Platform.Memory.RAM.PctUtil
Outlier Detection
• Threshold
• Angle-based outlier detection
• One-class support vector machine
• Why do we use these methods?
– 1 + 𝑥 -classification problem
– Unsupervised
– Range of dimensions
Outlier Detection
• Threshold
• Angle-based outlier detection
• One-class support vector machine
• Why do we use these methods?
– 1 + 𝑥 -classification problem
– Unsupervised
– Range of dimensions
Server Outage Prediction
• The outlier detection delivers a score that can be used to calculate a
fuzzy value of outageness.
• Thus a partition based on the relevance of outages is possible – traffic
light system
• A combination of outageness
scores delivered by various
anomaly detectors is possible.
• By saving all these scores in a
database, a classification of
outages is possible (e.g. with
an ANN or some clustering
method).
0 50 100 150 200 250 300
0e+002e+054e+056e+058e+051e+06
ABOF-Bewertung der Zeitpunkte
Rot-Lastbeginn, Grün-Lastende
Zeit
F-ABOFaktor
Image Understanding
Applications
• Industrial image analysis
 Quality assurance
 Labview, Halcon, Cognex Vision Pro
• Medical image analysis
 Mostly researchers with medical background
 Visualisation support, detection of carcinoms etc.
 ITK
• Image analysis and AI
 Facebook, Google, Baidu, Microsoft – but still enrooted in universities
 Face detection, mimic detection, scene description
 OpenCV, dlib, Theano, Keras
HOG: Goldpad Search Results
were extremely well.
HOG: Algorithm Details
• Dalal, Triggs (2005)
• Focus on intensity gradients/edge directories
• Local contrast normalization (invariant to light conditions)
• Orientation detection of a single pixel, overlapping blocks, histogram of
orientated gradients
• SVM classifier
• Open source availybility (dlib)
• Relatively few training pictures necessary
• Not a lot of parameter tuning
• Few wrong detections
HOG: Training Dataset
Deep Inspection
• Automatic optical inspection of sensors
• Sensor generations look similar, but not exactly alike
• Deep Convolutional Networks for better
generalization and no extra parameter tuning
• HOG
• Software used: Keras (Python)
Deep Inspection
• Input: pixel grey values
• Solution processed by Gershick et al. (2014):
227*227
• Alternating convolution and max pooling,
spatial overlapping
• Sparse connections (non-linear filter)
• Shared weights to gain translation invariance
and an improved generalization ability
Deep Inspection
• Input: pixel grey values
• Solution processed by Gershick et al. (2014): 227*227
• Alternating convolution and max pooling, spatial
overlapping
• Hierarchical abstractions
• Sparse connections (non-linear filter)
• Shared weights to gain translation invariance and an
improved generalization ability
• MLP classifier
• Dataset augmentation: sliding window, flipping,
distortion,...
Deep Inspection
• Gold pad check (scratches vs errors)
• Ohmic contact
• active area
AOI
Automatic Optical Inspection of
humidity sensors
Automatic error detection
for humidity sensors
 Multiple Challenges
 High quality requirements
 Changing specifications
 Different kind of errors
 High data volume
Image Aquisition
 8“ Silicium wafers
 90.000 Sensors per wafer
 0.7µm/pixel resolution
 Target scan speed of 30 minutes per wafer
Sensor
Cropping of images based on CAD file
Focus
 Intensity of reflection of laser beam
 Deep search for highest peak
 Keep focus with proportional controller
-50
0
50
100
150
200
250
300
350
400
1
20
39
58
77
96
115
134
153
172
191
210
229
248
267
286
305
324
343
362
381
400
419
438
457
476
495
514
533
552
571
590
609
628
647
666
685
704
723
742
761
780
799
818
837
856
875
894
913
932
951
970
989
1008
Master
 A master image is created per wafer
 Perfect image
 Self-adapting to new sensors
 Simplification of image registration
Step 1: Canny edge detection
 Proven algorithm for
edge detection
 Reliable
 Easy to implement
 Fast
 Skeleton (1px edge)
 Con: Threshold
needed
Step 2: Filtering
 Discrepancy norm
 Robust against noise
 Con: No skeleton
 Automatic
thresholding
Final Step
 Combination of these
two methods
 Comparison with
Canny-image of
master
Conclusion
 High rate of error detection
 No human intervention needed
 Fast and robust
 Based on proven algorithms

Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive Maintenance to Image Understanding

  • 1.
    Data Analysis inIndustrial Applications: From Predictive Maintenance to Image Understanding 2016 Taipei Tech Workshop, Technikum Wien, 22.11.2016 DI Matthias Wastian (dwh GmbH), DI Dominik Brunmeir (dwh OG) matthias.wastian@dwh.at, dominik.brunmeir@dwh.at
  • 2.
    Presentation Outline • WhoWe Are • Some Definitions – Machine Learning – Data Mining – Deep Learning • Natural Language Processing – Telecommunication Patent Classification – Speech Analysis of Austrian Politicians • Predictive Maintenance – Server Outage Prediction • Image Understanding – Object Detection Using HOG Features – Deep Inspection – Automatic Optical Inspection of Humidity Sensors
  • 3.
    dwh GmbH • Founded2004, GmbH since 2010 • 16 employees • 17 master theses • 6 finished dissertations • 6 current dissertations • >90 publications • Bosses: – Niki Popper – Michael Landsiedl
  • 4.
  • 5.
    Definitions Machine Learning • isa field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). • The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. • A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E (Tom Mitchell, 1997). Data Mining • is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (David Hand, 2001).
  • 6.
    Definitions Deep Learning • islearning using one of a set of algorithms that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations. • One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.
  • 7.
    Telecommunication Patent Classification Example EP1696821B1 •Method and device for automatically detecting mating of animals • Abstract: The inventive device (110, 210, 310, 510) for automatically detecting the mating of animals is wearable by an animal (100) and comprises means (105, 505) for fixing to an animal, means (140) for detecting an attempt of mating a female animal (120) by said animal, means (145, 180, 345, 580) for identifying an electronic label which is introduced in the body of said female animal and actuated by said detection means and/or by the female animal identification means by processing the image of at least one part of the female animal triggered by said detection means. In the preferred embodiment, means for identifying said other animal comprises means for communicating with the electronic label (130) carried by a female animal conspecific with the animal triggered by said detection means. In one of the embodiments, communications means is embodied in such a way that it reads the electronic label identifier of each female animal which said animal attempts to mate and storing means (160) memorises each displayed identifier. In the other embodiment, communication means is provided with a device for storing representative information on the attempted mating in the random access memory of the electronic label carried by the conspecific female animal.
  • 8.
    Telecommunication Patent Classification • Severalthousand classified patents were used to derive a classification model for millions of 3GPP patents from Korea, Japan, China, the US and Europe. • The data used included the publication number, the abstract, the abstract of the DWPI and the claims of the patent.
  • 9.
    Telecommunication Patent Classification • Languagedetection • Bag of words models • Tf-idf weighting • Different classification models – SVM – Maximum entropy classifier • Fuzzy key word comparison
  • 10.
    Natural Language Processing WordCounts • Measuring similarity: scalar product • Problem: document length, solution: normalize Tf-idf • Common words (stop words: a, the, in...) vs rare words (names, technical terms,...) • Important words: common locally, rare globally • Term frequency times 𝑙𝑜𝑔 #𝑑𝑜𝑐𝑠 1+#𝑑𝑜𝑐𝑠 𝑢𝑠𝑖𝑛𝑔 𝑤𝑜𝑟𝑑
  • 11.
    Speech Analysis ofAustrian Politicians • How rude are Austrian politicians? • Have they become ruder over time?  Data acquisition via web scraping  Human labelling of selected sentences  Word2vec or similar models
  • 12.
    Predictive Maintenance Server OutagePrediction • NOBODY likes server outages. • Is there an exact definition of the term outage? • Is it measurable? • Downtime minutes per user
  • 13.
    Server Outage Prediction •Definition (Event): An event shall be defined as an occurrence happening at a determinable time and place with a certain duration. It may be a part of a chain of occurrences as an effect of a preceding occurrence and as the cause of a succeeding occurrence. It is possible that more than one event occurs at the same time and/or place. • Definition (Abnormal Event): An abnormal event shall be defined as an outlier in a chain of events, an event that deviates so much from the other events as to arouse suspicion that it was caused by something that does not follow the usual behavior of the considered system and that it could change the entire system behavior.
  • 14.
    Server Outage Prediction Server Server Monitoring Feature SelectionPrediction Anomaly Detection Abnormal EventBasic Model Assumption: The predictions are accurate, if the server status is ok.
  • 15.
    Server Outage Prediction •ANN • SARIMA • OC-SVM • ABOD
  • 16.
    Server Outage Prediction Data: •up to 1439 features per server, sampling rate 1-15min • historic data sets, IBM Lotus Domino Server.Load Preprocessing: • Reduction of data using expert knowledge • Differentiating accumulative features • Checking for wrong or missing data • Normalizing the data (maxmin-mapping)
  • 17.
    Server Outage Prediction “Ihave seen the future and it is very much like the present, only longer.“ Kehlog Albran, The Profit
  • 18.
    Server Outage Prediction „Predictionis very difficult, especially if it's about the future.“ Niels Bohr, Nobel laureate in Physics
  • 19.
    SARIMA, ANN • Univariate •Seasonality • Crossvalidation • Errors 0 100 200 300 400 500 600 700 800 900 1000 40 50 60 70 80 90 100 Platform.Memory.RAM.PctUtil; prediction: green
  • 20.
    Prediction Errors 0 100200 300 400 500 600 700 800 900 1000 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 time step error Prediction Errors Platform.Memory.RAM.PctUtil
  • 21.
    Outlier Detection • Threshold •Angle-based outlier detection • One-class support vector machine • Why do we use these methods? – 1 + 𝑥 -classification problem – Unsupervised – Range of dimensions
  • 22.
    Outlier Detection • Threshold •Angle-based outlier detection • One-class support vector machine • Why do we use these methods? – 1 + 𝑥 -classification problem – Unsupervised – Range of dimensions
  • 23.
    Server Outage Prediction •The outlier detection delivers a score that can be used to calculate a fuzzy value of outageness. • Thus a partition based on the relevance of outages is possible – traffic light system • A combination of outageness scores delivered by various anomaly detectors is possible. • By saving all these scores in a database, a classification of outages is possible (e.g. with an ANN or some clustering method). 0 50 100 150 200 250 300 0e+002e+054e+056e+058e+051e+06 ABOF-Bewertung der Zeitpunkte Rot-Lastbeginn, Grün-Lastende Zeit F-ABOFaktor
  • 24.
    Image Understanding Applications • Industrialimage analysis  Quality assurance  Labview, Halcon, Cognex Vision Pro • Medical image analysis  Mostly researchers with medical background  Visualisation support, detection of carcinoms etc.  ITK • Image analysis and AI  Facebook, Google, Baidu, Microsoft – but still enrooted in universities  Face detection, mimic detection, scene description  OpenCV, dlib, Theano, Keras
  • 25.
    HOG: Goldpad SearchResults were extremely well.
  • 26.
    HOG: Algorithm Details •Dalal, Triggs (2005) • Focus on intensity gradients/edge directories • Local contrast normalization (invariant to light conditions) • Orientation detection of a single pixel, overlapping blocks, histogram of orientated gradients • SVM classifier • Open source availybility (dlib) • Relatively few training pictures necessary • Not a lot of parameter tuning • Few wrong detections
  • 27.
  • 28.
    Deep Inspection • Automaticoptical inspection of sensors • Sensor generations look similar, but not exactly alike • Deep Convolutional Networks for better generalization and no extra parameter tuning • HOG • Software used: Keras (Python)
  • 29.
    Deep Inspection • Input:pixel grey values • Solution processed by Gershick et al. (2014): 227*227 • Alternating convolution and max pooling, spatial overlapping • Sparse connections (non-linear filter) • Shared weights to gain translation invariance and an improved generalization ability
  • 30.
    Deep Inspection • Input:pixel grey values • Solution processed by Gershick et al. (2014): 227*227 • Alternating convolution and max pooling, spatial overlapping • Hierarchical abstractions • Sparse connections (non-linear filter) • Shared weights to gain translation invariance and an improved generalization ability • MLP classifier • Dataset augmentation: sliding window, flipping, distortion,...
  • 31.
    Deep Inspection • Goldpad check (scratches vs errors) • Ohmic contact • active area
  • 32.
  • 33.
    Automatic error detection forhumidity sensors  Multiple Challenges  High quality requirements  Changing specifications  Different kind of errors  High data volume
  • 34.
    Image Aquisition  8“Silicium wafers  90.000 Sensors per wafer  0.7µm/pixel resolution  Target scan speed of 30 minutes per wafer
  • 35.
    Sensor Cropping of imagesbased on CAD file
  • 36.
    Focus  Intensity ofreflection of laser beam  Deep search for highest peak  Keep focus with proportional controller -50 0 50 100 150 200 250 300 350 400 1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 457 476 495 514 533 552 571 590 609 628 647 666 685 704 723 742 761 780 799 818 837 856 875 894 913 932 951 970 989 1008
  • 37.
    Master  A masterimage is created per wafer  Perfect image  Self-adapting to new sensors  Simplification of image registration
  • 38.
    Step 1: Cannyedge detection  Proven algorithm for edge detection  Reliable  Easy to implement  Fast  Skeleton (1px edge)  Con: Threshold needed
  • 39.
    Step 2: Filtering Discrepancy norm  Robust against noise  Con: No skeleton  Automatic thresholding
  • 40.
    Final Step  Combinationof these two methods  Comparison with Canny-image of master
  • 41.
    Conclusion  High rateof error detection  No human intervention needed  Fast and robust  Based on proven algorithms