GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用

Lithography Hotspot Detection:
From Shallow To Deep Learning
Bei Yu
Department of Computer Science and Engineering
The Chinese University of Hong Kong
1 / 36

Moore’s Law to Extreme Scaling
2 / 36

Manufacturability Status & Challenges
Dispearance
5 / 36

Manufacturability Status & Challenges
Dispearance
1980 1990 2000 2010 2020
10
1
0.1
um
[Courtesy Intel]
X
5 / 36

Lithography Hotspot Detection
Pre-OPC Layout Post-OPC Mask Hotspot on Wafer
RET: OPC, SRAF, MPL
Still hotspot: low ﬁdelity patterns
Simulations: extremely CPU intensive
Ra#o%of%lithography%simula#on%#me%
(normalized%by%40nm%node)%
Technology%node
Required(computa/onal(
/me(reduc/on!
6 / 36

Layout Verification Hierarchy
Increasing
verification
accuracy
Sampling
Hotspot Detection
Lithography Simulation
(Relative) CPU runtime at each level
Sampling (DRC Checking):
scan and rule check each region
Hotspot Detection:
verify the sampled regions and report potential hotspots
Lithography Simulation:
final verification on the reported hotspots
7 / 36

Pattern Matching based Hotspot Detection
library'
hotspot&
Pa)ern'
matching'
hotspot&
hotspot&
8 / 36

Pattern Matching based Hotspot Detection
library'
hotspot&
Pa)ern'
matching'
hotspot&
hotspot&
detected
hotspot&
undetected
detected
Cannot&detect&
hotspots&not&in&
the&library&
Fast and accurate
[Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15]
Fuzzy pattern matching [Wen+,TCAD’14]
Hard to detect non-seen pattern
8 / 36

Machine Learning based Hotspot Detection
Hotspot&
detec*on&
model&
Classiﬁca*on&
Extract&layout&
features&
9 / 36

Machine Learning based Hotspot Detection
Non$
Hotspot
Hotspot
Hotspot&
detec*on&
model&
Classiﬁca*on&
Extract&layout&
features&
Hard,to,trade$oﬀ,
accuracy,and,false,
alarms,
Predict new patterns
Decision-tree, ANN, SVM, Boosting ...
[Drmanac+,DAC’09] [Ding+,TCAD’12] [Yu+,JM3’15] [Matsunawa+,SPIE’15] [Yu+,TCAD’15]
Hard to balance accuracy and false-alarm
9 / 36

Outline
Shallow Learning: Feature Engineering
Deep Learning
Conclusion
10 / 36

Outline
Deep Learning
Conclusion
11 / 36

Learning Models
Support Vector Machine
[ASPDAC’12][TCAD’15]
Classifier:
Artificial Neural Network
[ASPDAC’12][JOP’13]
x1
x2
x3
h1
h2
h3
h4
y1
y2
Input Hidden Output
W1 W2
Feedforward Prediction/Regression:
Boosting
[SPIE’15][ICCAD’116]
Ensemble machine learning methods
that can build strong classifiers from a
set of weak classifiers.
Adaboost and Decision Tree:
11 / 36

Conventional Feature Extraction
r
0"
0"
0"
0"
F
0"
F_In
F_Ex
F_ExIn
F_InEx
-1-2 +1 +2…
…
…
…
…
+2
-2
-1
-1 +2+1
-2+1
-1
+1
+2
-1 +1
+2
…
Fragment
[ASPDAC’12][JM3’15]
HLAC
[JM3’14]
0 order
1st order
2nd order
Density
[SPIE’15]
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a23 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
ws
wn
12 / 36

Feature Evaluation
(a) Fragment
3 2 1 0 1 2 3 43
2
1
0
1
2
3
(b) HLAC
2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
(c) Density
For hotspot feature i, calculate Mahalanobis distance [Mahalanobis,1936].
di =
(xi − µ)TV−1(xi − µ) − dmin
dmax − dmin
As large as possible
13 / 36

Feature Engineering Example: CCAS
Training Layout Clips Dense Circuit Sampling CCAS
Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16].
Capture the aﬀects of light diﬀraction.
Simple rule to select circles from dense samples.
14 / 36

Feature Engineering Example: CCAS
Training Layout Clips Dense Circuit Sampling CCAS
Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16].
Capture the affects of light diffraction.
Simple rule to select circles from dense samples.
Question:
Can we find correlation between circles and hotspots, and select circles smartly?
14 / 36

Smart CCAS Circle Selection [ICCAD’16]
Higer Mutual Information, more correlation between circle and label variable.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
1 nm
600nm
…
…
15 / 36

Mathematical Formulation
max
w
v w
s.t. vi = I(Ci; Y), ∀vi ∈ v,
||w||0 = nc, ∀ i, wi ∈ {0, 1},
|i − j| ≥ d, ∀ i = j, wi = wj = 1.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
16 / 36

Mathematical Formulation
max
w
v w
s.t. vi = I(Ci; Y), ∀vi ∈ v,
||w||0 = nc, ∀ i, wi ∈ {0, 1},
|i − j| ≥ d, ∀ i = j, wi = wj = 1.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
Optimally Solved by Dynamic Programming
D[i, j] = max{v[i] + D[i − d, j − 1], D[i − 1, j]}},
D[i, j]: optimal solution from range {1, i}, when totally j circles are selected.
16 / 36

Outline
Deep Learning
Conclusion
17 / 36

Why Deep Learning?
Feature Crafting v.s. Feature Learning
Although prior knowledge is considered during manually feature design, information
loss is inevitable. Feature learned from mass dataset is more reliable.
Scalability
With shrinking down circuit feature size, mask layout becomes more complicated.
Deep learning has the potential to handle ultra-large-scale instances while traditional
machine learning may suﬀer from performance degradation.
Mature Libraries
Caffe [Jia+,ACMMM’14] and Tensorflow [Martin+,TR’15]
Powerful GPU for training
17 / 36

CNN Architecture Overview
Convolution Layer
Rectiﬁed Linear Unit (ReLU)
Pooling Layer
Fully Connected Layer
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
18 / 36

Convolution Layer
Convolution Operation:
I ⊗ K(x, y) =
c
i=1
m
j=1
m
k=1
I(i, x − j, y − k)K(j, k)
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
19 / 36

Convolution Layer (cont.)
Eﬀect of diﬀerent convolution kernel sizes:
(a) 7 × 7 (b) 5 × 5 (c) 3 × 3
Kernel Size Padding Test Accuracy
7 × 7 3 87.50%
5 × 5 2 93.75%
3 × 3 1 96.25%
20 / 36

Rectiﬁed Linear Unit
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
Alleviate overﬁtting with sparse feature map
Avoid gradient vanishing problem
Activation Function Expression Validation Loss
ReLU max{x, 0} 0.16
Sigmoid 1
1+exp(−x) 87.0
TanH
exp(2x)−1
exp(2x)+1 0.32
BNLL log(1 + exp(x)) 87.0
WOAF NULL 87.0
21 / 36

Pooling Layer
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
Extracts the local region statistical attributes in the feature map
1 2 3
5 6 7
9 10 11
4
8
12
13 14 15 16
6
1614
8
MAXPOOL
(a) max pooling
3.5
13.511.5
5.5
AVEPOOL
1 2 3
5 6 7
9 10 11
4
8
12
13 14 15 16
(b) avg pooling
22 / 36

Pooling Layer (cont.)
Translation invarient ()
Dimension reduction
Eﬀect of pooling methods:
Pooling Method Kernel Test Accuracy
Max 2 × 2 96.25%
Ave 2 × 2 96.25%
Stochastic 2 × 2 90.00%
23 / 36

Architecture Summary [SPIE’17]
Total 21 layers with 13 convolution layers and 5 pooling layers.
A ReLU is applied after each convolution layer.
…
…
Hotspot
Non-Hotspot
512x512x4
256x256x4
256x256x8
128x128x16
128x128x8
64x64x16
64x64x32
32x32x32 32x32x32
16x16x32
2048
512
C1-1
C1-2 C2-1C2-2 C2-3
P2 C3-1 C3-2
C4-1P3C3-3 C4-2 C4-3
C5-1 C5-2 C5-3P4 P5
24 / 36

Architecture Summary [SPIE’17]
Layer Kernel Size Stride Padding Output Vertexes
Conv1-1 2 × 2 × 4 2 0 512 × 512 × 4
Pool1 2 × 2 2 0 256 × 256 × 4
Conv2-1 3 × 3 × 8 1 1 256 × 256 × 8
Conv2-2 3 × 3 × 8 1 1 256 × 256 × 8
Conv2-3 3 × 3 × 8 1 1 256 × 256 × 8
Pool2 2 × 2 2 0 128 × 128 × 8
Conv3-1 3 × 3 × 16 1 1 128 × 128 × 16
Conv3-2 3 × 3 × 16 1 1 128 × 128 × 16
Conv3-3 3 × 3 × 16 1 1 128 × 128 × 16
Pool3 2 × 2 2 0 64 × 64 × 16
Conv4-1 3 × 3 × 32 1 1 64 × 64 × 32
Conv4-2 3 × 3 × 32 1 1 64 × 64 × 32
Conv4-3 3 × 3 × 32 1 1 64 × 64 × 32
Pool4 2 × 2 2 0 32 × 32 × 32
Conv5-1 3 × 3 × 32 1 1 32 × 32 × 32
Conv5-2 3 × 3 × 32 1 1 32 × 32 × 32
Conv5-3 3 × 3 × 32 1 1 32 × 32 × 32
Pool5 2 × 2 2 0 16 × 16 × 32
FC1 – – – 2048
FC2 – – – 512
FC3 – – – 2
25 / 36

Overcome Large Scale Layout Clip
Clip Partition
Discrete Cosine Transform
Discarding High Frequency Components
Feature Tensor
Division DCT
0
50
100
0 20 40 60 80 100
0
5
10
15
20
25
0
5
10
15
20
2
6
6
6
4
C11,1 C12,1 C13,1 . . . C1n,1
C21,1 C22,1 C23,1 . . . C2n,1
...
...
...
...
...
Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1
3
7
7
7
5
2
6
6
6
4
C11,k C12,k C13,k . . . C1n,k
C21,k C22,k C23,k . . . C2n,k
...
...
...
...
...
Cn1,k Cn2,k Cn3,k . . . Cnn,k
3
7
7
7
5
(
k
Encoding
26 / 36

Simpliﬁed CNN Architecture [DAC’17]
Feature Tensor
k-channel hyper-image
Compatible with CNN
Storage and computional eﬃciency
Layer Kernel Size Stride Output Node #
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
maxpooling1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
maxpooling2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
…
Hotspot
Non-Hotspot
Convolution + ReLU Layer Max Pooling Layer Full Connected Node
2
6
6
6
4
C11,1 C12,1 C13,1 . . . C1n,1
C21,1 C22,1 C23,1 . . . C2n,1
...
...
...
...
...
Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1
3
7
7
7
5
2
6
6
6
4
C11,k C12,k C13,k . . . C1n,k
C21,k C22,k C23,k . . . C2n,k
...
...
...
...
...
Cn1,k Cn2,k Cn3,k . . . Cnn,k
3
7
7
7
5
(
k
27 / 36

Layer Visualization
Origin Pool1 Pool2
Pool3 Pool4 Pool5
28 / 36

Comparison with State-of-the-Art∗
ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average
85
90
95
100
Accuracy(%)
JM3’16 TCAD’15 ICCAD’16 Ours
∗JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.
29 / 36

Comparison with State-of-the-Art†
Improve the performance of ODST by at least 24.80% on average.
ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average
103
104
105
ODST(s)
JM3’16 TCAD’15 ICCAD’16 Ours
†JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.
30 / 36

Issue: Need Pooling?
Pooling
Pooling layer diminishes small variations of input images or intermediate feature maps for
better generality. (Information loss of the edge corrections in post-OPC patterns.)
Layer Kernel Size Stride Output Vertexes
Conv1-1 2 2 512 × 512 × 4
Pool1 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Pool2 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Pool3 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
Pool1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
Pool2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36

Pooling
Conv1-1 2 2 512 × 512 × 4
Conv1-2 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Conv2-4 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Conv3-4 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
Pool1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
Pool2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36

Pooling
Conv1-1 2 2 512 × 512 × 4
Conv1-2 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Conv2-4 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Conv3-4 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
conv1-3 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
conv2-3 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36

Accuracy
IC
CAD
Industry1
Industry2
Industry3
Average
40
50
60
70
80
90
100
Accuracy(%) SPIE’15 ICCAD’16 SPIE’17
SPIE’17_nopool DAC’17 DAC’17_nopool
32 / 36

False Alarm
IC
CAD
Industry1
Industry2
Industry3
Average
0
3,000
6,000
9,000
FalseAlarm SPIE’15 ICCAD’16 SPIE’17
SPIE’17_nopool DAC’17 DAC’17_nopool
33 / 36

Runtime
The runtime of each hotspot detector is represented in (clips per second).
Additional convolution layer does not induce signiﬁcant runtime penalties.
SPIE’15
IC
CAD
’16
SPIE’17
IE’17_nopool
DAC
’17
C
’17_nopool
60
90
120
150
52
142
90 86
106 104
Runtime(clips/s)
34 / 36

Outline
Deep Learning
Conclusion
35 / 36

Rethinking: “ImageHotspot” v.s. ImageNet
35 / 36

Acknowledgment
CUHK colleagues:
Prof. Evengeline F. Y. Young;
Haoyu Yang, Yuzhe Ma, Hao Geng, Tinghuan Chen;
Undergraduate: Shuhe Li, Yajun Lin, Wei Li, Jiamin Chen.
Industrial Liaison:
Yi Zou, Jing Su, Chenxi Lin (ASML);
Jin Miao, Subhendu Roy, Jhih-Rong Gao (Cadence).
Academic Liaison:
David Z. Pan (UT Austin), Xuan Zeng (Fudan University);
Xin Li (Duke University), Bingqing Lin (Shenzhen University);
Shiyan Hu (Michgan Technological University);
Bing Li, Ulf Schlichtmann (TUM).
36 / 36

GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用

More Related Content

What's hot

Similar to GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用

More from NVIDIA Taiwan

Recently uploaded

GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用