Deep Learning for Imbalanced Data Defect Detection

1
AOIEA
深度學習於表面瑕疵檢測:
Strategies for imbalanced data training
蔡篤銘教授
元智大學
工業工程與管理學系
2018 Taiwan AOI Forum and show

物體表面特徵(surface texture)
2
High-resolution
Resolution
Uniform Low contrast Patterned Homogeneous
texture
Repeated
patterns
Heterogeneous
texture
•Web material
•Paper
•Steel surface
•Mura
•Backlight panel
•Assembled PCB
•IC die
•TFT-LCD
•IC wafer
•TFT-LCD
•Color filter
•Solar cell/wafer
Uniform Low contrast Patterned Homogeneous
texture
Repeated
patterns
Heterogeneous
texture

Inspection in manufacturing
Product & process properties in manufacturing:
• Small , subtle local defect in
− Size
− Shape
− Deformation
− Gray scale/color unevenness
• Imbalanced data
– Many positive (defect-free) samples
– Only a few or No negative (defect) samples
Requirements:
• High precision/accuracy
– Location
– Detection rate
• Computationally fast
(Real-time computation to meet the cycle time)
3

Implement the deep learning models for
• CNN regression for image positioning
• GAN-based techniques to generate synthesized defects for
the data-imbalance problem
• Autoencoding models
– Convolutional Autoencoders (CAE) with one-class SVM classification
for unsupervised anomaly detection
– Variational Autoencoders (VAE) image reconstruction for unsupervised
defect detection
4

5
Deep learning for PCB positioning

6PCB assembly PCB inspection
A precise positioning system can be used for
 Automated assembly
 Automated visual defect detection
Bare PCB Assembled PCB

 Problem
Predict Angle (θ)、Horizontal displacement (x) and Vertical displacement (y)
of a PCB w.r.t. the template.
7
(θ, x, y)=(-20°, -20, -20) (θ, x, y)=(4°, 0, 12) (θ, x, y)=(20°, 20, 20)
(θ, x, y)=(0°, 0, 0)PCB board
Template

8
Traditional machine learning for regression
Deep learning for regression
Deep learning regression v.s. traditional ML
regression

9
CNN for regression:
CNN as feature extractor
for SVR regression:
DNN for regression:
CNN + SVR for regression:
Using Convolutional Neural Network (CNN)
and Support Vector Regression (SVR) models
to predict θ, x and y.

10
Linear SVR
Nonlinear SVR (with Kernel transformation)
Polynomial:
Radial Basis Function(RBF):
Support Vector Regression (SVR)
x = input feature(s)
y = output response(s)

11
• Input PCB image into DNN model to predict (θ, x, y)
• Number of hidden layers: 3
Deep Neural Network (DNN) model

12
• Input PCB image into CNN model to predict (θ, x, y)
Convolutional Neural Network (CNN) model

13
Note:
1. Kernel function “Radial Basis Function” is used
2. CNN model is the same as the previous one,
the number of features to SVR is 3.
3. Training time: 2.2 hours (for 9,261 samples)
CNN+SVR model for regression

14
CNN as feature extractor for regression
Note:
1. Kernel function “Radial Basis Function” is used
2. CNN model is the same as the previous one; the last layer of CNN
contains 128 nodes (i.e., number of features to SVR is 128)
3. Training time: 2.2 hours (for 9,261 samples)

15
The good thing about Deep learning for Regression:
The user needs only provide ONE SINGLE template image

16
It creates the input image (x) and the corresponding output (θ, x, y) for DL training
Reference PCB image
provided by users:
Augmented image:
Corresponding output: (θ, x, y)=(-20°, -20, -20) (θ, x, y)=(0°, 0, 0) (θ, x, y)=(20°, 20, 20)
180 pixel
180 pixel
The training samples are automatically generated
by Image Augmentation from the template

17
Range of
Training samples
• Image size: 180 × 180 (pixels)
• Angles: 0°、±2°、±4°、 …、± 20° (every two angles for θ)
• Horizontal displacement: 0、±2、±4、…、 ± 20 (pixel) (every two pixels for x)
• Vertical displacement : 0、±2、±4、…、 ± 20 (pixel) (every two pixels for y)
• Number of training samples: 9,261 (21 × 21 × 21)
Testing samples
• Image size :180 × 180 (pixels)
• Angles : 0°、±1°、±2°、…、± 20°
• Horizontal displacement : 0、±1、±2、…、±20 (pixel)
• Vertical displacement : 0、±1、±2、…、±20 (pixel)
• Number of testing samples :68,921 (41 × 41 × 41)
Note: The images with odd θ, x and y are unseen to the training model

18
Evaluate positioning accuracy by mean error, variance and maximum error.
Angle error (degree) Horizontal error (pixel) Vertical error (pixel)
Mean Variance Max Mean Variance Max Mean Variance Max
DNN 0.111 0.008 0.689 0.119 0.009 1.009 0.120 0.008 0.876
CNN 0.124 0.008 0.634 0.162 0.008 0.657 0.133 0.007 0.563
CNN+SVR 0.049 0.001 0.464 0.049 0.001 0.344 0.055 0.002 0.506
CNN as
feature
extractor
0.068 0.003 0.499 0.066 0.003 0.484 0.069 0.003 0.515
Positioning accuracy

19
Model Time (seconds)
DNN 0.00123
CNN 0.00166
CNN+SVR 0.00230
CNN as feature extractor 0.00541
Note: Equipment
1. CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz × 8
2. GPU: GeForce GTX 1080 Ti
Computation time of each model
It achieves 2-milliseconds efficiency.

20
Defect inspection by image subtraction
Template fT ,
(a) Normal (b) Scratch (c) Extrusion
Test image
,
Image
subtraction
from ,
Result
∆ ,
(d) Intrusion
∆ , fT , ,
Template Aligned

Saw-mark defect detection
in heterogeneous solar wafer images using
- GAN-based training samples generation
- CNN classification
21

Multicrystalline solar wafer inspection
• Multicrystalline silicon wafers
 A multicrystalline solar wafer presents random shapes, sizes and directions of
crystal grains in the surface and results in a heterogeneous texture.
 A saw-mark defect is a severe flaw of wafers when cutting the silicon ingot into
thin wafers using the multi-wire saws.
Defect-free solar wafer image
White saw-mark defect Saw-mark defect caused
by impurity
Solar wafer image
with a black saw-mark defect
22

The proposed deep learning scheme is
composed of two phases:
Defect samples generation using the CycleGAN (Cycle-consistent Generative Adversarial Networks), and then
Defect detection using the CNN (Convolutional Neural Networks) based on
the true defect-free samples and the synthesized defective samples.
• The CycleGAN model combines both the adversarial loss (i.e. GAN) and the cycle consistency loss .
• GAN measures the adversarial lose between the generated images and the target image.
• The consistency lose prevents the learned forward and backward mappings from contradiction.
• It uses unpaired datasets (not specific paired samples in GAN) for the training, and is suited for our
application.
CycleGAN model used for defect patches generation
23

Real solar wafer surfaces
For training the CycleGAN:
• Use a small set of true defect patches (60 for black, 90 for white defects) as the target dataset,
and then randomly collect a small set of defect-free patches (60 & 90) as the input dataset to
the CycleGANs.
Real defect-free samples
Real black saw-mark samples
Real white saw-mark samples
24

Using the CycleGAN model to generate
the defective samples
Synthesized defects:
• Whenever we change the input set with different defect-free patches to the trained
CycleGAN, a new defective set is created.
Real defect-free samples input to the trained CycleGAN
Generated black saw-mark patches
Generated white saw-mark patches
25

The CNN model for classification
• A simple CNN with 3 convolutional layers is used for the training.
• A lean CNN model gives better computational efficiency in the inspection process.
• Training information:
– For the CycleGAN models, 150 (60 & 90 Black and White sawmarks) real defective
patches and 150 (60&90) real defect-free patches are used as the training samples.
– For the CNN model, a total of 4000 real defect-free patches and 4000 synthesized
sawmark patches are used as the training samples.
– Patch size 50 50 pixels
– Training time : 3 hours for CycleGAN , and 1 hour for CNN
CNN model used for defect detection
26

Postprocessing with conventional machine
vision techniques
• The saw-mark in a small windowed patch contains only subtle changes and, thus,
the entire saw-mark region may not be completely detected in the full-sized solar
wafer image.
• Apply the horizontal projection line by line in the resulting binary image B , to
intensify the horizontal saw-mark in the image . That is
P ∑ , , ∀
• The maximum projection value is used as the discriminant measure for saw-mark
detection, i.e. P ∗
P , ∀ . If the horizontal projection P ∗
is large
enough, a saw-mark at line ∗
is declared.
Note: The mean computation time is 0.004 seconds for an image patch of size 50×50
pixels on a PC with an Intel Core 2, 3.6GHz CPU and an NVIDIA GTX 1070 GPU .
27

Detection results on sawmarks
• Detection results of defect-free solar wafer images
Test images Detection result projection 28

Detection results on sawmarks
)( yP
y
)( yP
y
)( yP
y
• Detection results of defective solar wafer images
Test images Detection result projection 29

Detection results on stains / foreign particles
30
Real stains defect samples
Synthesized stain defect samples

Detection results on stains / foreign particles
• Detection results of defective solar wafer images
(a) Test images
(b) Detection result
31

Additional test: Using the CycleGAN model
to generate bump defects
• Bumps defect
32
Real defect samples
Real defect-free samples
Generated defect samples

Autoencoders for defect detection
-Autoencoders for image reconstruction
- Autoencoders for feature extraction
33

Unsupervised Autoencoders for
image reconstruction
, ,
Trained model
Autoencoding Model
Self-reference comparison:
∆ , , ,
∆ , ∆ , ,
Encoder Decoder
Image Image
Testing image Reconstructed image
34

Defect detection in TFT-LCD
• Thin Film Transistor-Liquid Crystal Display (TFT-LCD) comprises vertical
data lines and horizontal gate lines .
• The main types of defects are pinholes, particles and scratches defect.
defect-free
LCD image
LCD image with
Particle defects
LCD image with
Pinhole defects
LCD image with
Scratch defects 35

VAE (Variational AutoEncoder) model for
image reconstraction
- The Model
• Structure of the VAE model
Zp Z p Z
Defect-free Defect-free
Latent variables
36

VAE model for image reconstruction
- Detection results by image subtraction
• Defect-free image
• Defect image (true defects)
Original image Restored image Image subtraction Binarization
, , ∆ ,
Original image Restored image Image subtraction Binarization
, , ∆ ,
37

Use AE for feature extraction for
anomaly detection (with one-class SVM)
• Testing image is reconstructed from the trained AE model.
• The features are extracted from the last layer of the Encoder , and used as
the input data of the one-class SVM to identify the anomalies.
ZEncoder DecoderEncoder Decoder
Test
image
Anomaly detection
Feature maps
One-class
SVM
Trained AE model
38

Use AE for feature extraction for
anomaly detection (one-class SVM)
• One-class SVM (Support Vector Machine)
Ni
NiR
CR
i
ii
N
i
i
,,2,1,0
,,2,1,
s.t.
Min
22
1
2




 



ax
Ra
Positive samples
Support vector
Outlier
Center point (a)
Radius (R)
a ,
39

Use AE for feature extraction for anomaly
detection (one-class SVM) : The Model
• Structure of the AE model:
ZEncoder DecoderEncoder Decoder
Feature maps

detection (one-class SVM) : Training samples (LCD)
 Train only positive (defect-free) samples:
− Original 256 256 LCD images are rotated between 0°
and 35°
with 5°
increment.
− 256 random image patches of size 28×28 are used for training.
− Using the AE model to extract the features for the one-class SVM model.
 Testing data:
− 160 positive samples
− 33 negative (true defect) samples
Computation time : training 15 minutes , testing 0.00008 seconds
41

detection (one-class SVM) : Detection results (LCD)
• Feature size : 490 (10 feature maps of size 7*7)
• Testing data : 160 positive samples 、 33 negative (true defect) samples
Prediction
Actual Normal Outlier
Normal 89% 11%
Outlier 0% 100%
Type I error 11%
Type II error 0%
Overall recognition rate 90%
42

detection (one-class SVM) : Detection results (LCD)
• Feature size : 49 (use only the best feature map with size 7*7)
Testing data : 160 positive samples , 33 negative (true defect) samples
Prediction
Actual Normal Outlier
Normal 92% 8%
Outlier 0% 100%
Type I error 8%
Type II error 0%
Overall recognition rate 92%
43

A thought on Deep Learning:
• Can Deep Learning replace Machine Vision?
• MV as preprocessing , and DL as postprocessing
– MV for defect detection, and DL for defect classification
– Or, vice versa
• MV in DL models?
– “Convolution” and “pooling” are parts of image processing operations
– Human knowledge in DL models
(e.g. embed the known defect features to the DL model)
44

Deep Learning for Imbalanced Data Defect Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning for Imbalanced Data Defect Detection

Similar to Deep Learning for Imbalanced Data Defect Detection (20)

More from CHENHuiMei

More from CHENHuiMei (20)

Recently uploaded

Recently uploaded (20)

Deep Learning for Imbalanced Data Defect Detection