Lithography Hotspot Detection:
From Shallow To Deep Learning
Bei Yu
Department of Computer Science and Engineering
The Chinese University of Hong Kong
1 / 36
Moore’s Law to Extreme Scaling
2 / 36
3 / 36
4 / 36
Manufacturability Status & Challenges
Dispearance
5 / 36
Manufacturability Status & Challenges
Dispearance
1980 1990 2000 2010 2020
10
1
0.1
um
[Courtesy Intel]
X
5 / 36
Lithography Hotspot Detection
Pre-OPC Layout Post-OPC Mask Hotspot on Wafer
RET: OPC, SRAF, MPL
Still hotspot: low fidelity patterns
Simulations: extremely CPU intensive
Ra#o%of%lithography%simula#on%#me%
(normalized%by%40nm%node)%
Technology%node
Required(computa/onal(
/me(reduc/on!
6 / 36
Layout Verification Hierarchy
Increasing
verification
accuracy
Sampling
Hotspot Detection
Lithography Simulation
(Relative) CPU runtime at each level
Sampling (DRC Checking):
scan and rule check each region
Hotspot Detection:
verify the sampled regions and report potential hotspots
Lithography Simulation:
final verification on the reported hotspots
7 / 36
Pattern Matching based Hotspot Detection
library'
hotspot&
Pa)ern'
matching'
hotspot&
hotspot&
8 / 36
Pattern Matching based Hotspot Detection
library'
hotspot&
Pa)ern'
matching'
hotspot&
hotspot&
detected
hotspot&
undetected
detected
Cannot&detect&
hotspots&not&in&
the&library&
Fast and accurate
[Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15]
Fuzzy pattern matching [Wen+,TCAD’14]
Hard to detect non-seen pattern
8 / 36
Machine Learning based Hotspot Detection
Hotspot&
detec*on&
model&
Classifica*on&
Extract&layout&
features&
9 / 36
Machine Learning based Hotspot Detection
Non$
Hotspot
Hotspot
Hotspot&
detec*on&
model&
Classifica*on&
Extract&layout&
features&
Hard,to,trade$off,
accuracy,and,false,
alarms,
Predict new patterns
Decision-tree, ANN, SVM, Boosting ...
[Drmanac+,DAC’09] [Ding+,TCAD’12] [Yu+,JM3’15] [Matsunawa+,SPIE’15] [Yu+,TCAD’15]
Hard to balance accuracy and false-alarm
9 / 36
Outline
Shallow Learning: Feature Engineering
Deep Learning
Conclusion
10 / 36
Outline
Shallow Learning: Feature Engineering
Deep Learning
Conclusion
11 / 36
Learning Models
Support Vector Machine
[ASPDAC’12][TCAD’15]
Classifier:
Artificial Neural Network
[ASPDAC’12][JOP’13]
x1
x2
x3
h1
h2
h3
h4
y1
y2
Input Hidden Output
W1 W2
Feedforward Prediction/Regression:
Boosting
[SPIE’15][ICCAD’116]
Ensemble machine learning methods
that can build strong classifiers from a
set of weak classifiers.
Adaboost and Decision Tree:
11 / 36
Conventional Feature Extraction
r
0"
0"
0"
0"
F
0"
F_In
F_Ex
F_ExIn
F_InEx
-1-2 +1 +2…
…
…
…
…
+2
-2
-1
-1 +2+1
-2+1
-1
+1
+2
-1 +1
+2
…
Fragment
[ASPDAC’12][JM3’15]
HLAC
[JM3’14]
0 order
1st order
2nd order
Density
[SPIE’15]
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a23 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
ws
wn
12 / 36
Feature Evaluation
(a) Fragment
3 2 1 0 1 2 3 43
2
1
0
1
2
3
(b) HLAC
2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
(c) Density
For hotspot feature i, calculate Mahalanobis distance [Mahalanobis,1936].
di =
(xi − µ)TV−1(xi − µ) − dmin
dmax − dmin
As large as possible
13 / 36
Feature Engineering Example: CCAS
Training Layout Clips Dense Circuit Sampling CCAS
Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16].
Capture the affects of light diffraction.
Simple rule to select circles from dense samples.
14 / 36
Feature Engineering Example: CCAS
Training Layout Clips Dense Circuit Sampling CCAS
Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16].
Capture the affects of light diffraction.
Simple rule to select circles from dense samples.
Question:
Can we find correlation between circles and hotspots, and select circles smartly?
14 / 36
Smart CCAS Circle Selection [ICCAD’16]
Higer Mutual Information, more correlation between circle and label variable.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
1 nm
600nm
…
…
15 / 36
Smart CCAS Circle Selection [ICCAD’16]
Mathematical Formulation
max
w
v w
s.t. vi = I(Ci; Y), ∀vi ∈ v,
||w||0 = nc, ∀ i, wi ∈ {0, 1},
|i − j| ≥ d, ∀ i = j, wi = wj = 1.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
16 / 36
Smart CCAS Circle Selection [ICCAD’16]
Mathematical Formulation
max
w
v w
s.t. vi = I(Ci; Y), ∀vi ∈ v,
||w||0 = nc, ∀ i, wi ∈ {0, 1},
|i − j| ≥ d, ∀ i = j, wi = wj = 1.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400 500 600
MutualInformation
Circle Radius
Case1
Case2
Case3
Optimally Solved by Dynamic Programming
D[i, j] = max{v[i] + D[i − d, j − 1], D[i − 1, j]}},
D[i, j]: optimal solution from range {1, i}, when totally j circles are selected.
16 / 36
Outline
Shallow Learning: Feature Engineering
Deep Learning
Conclusion
17 / 36
Why Deep Learning?
Feature Crafting v.s. Feature Learning
Although prior knowledge is considered during manually feature design, information
loss is inevitable. Feature learned from mass dataset is more reliable.
Scalability
With shrinking down circuit feature size, mask layout becomes more complicated.
Deep learning has the potential to handle ultra-large-scale instances while traditional
machine learning may suffer from performance degradation.
Mature Libraries
Caffe [Jia+,ACMMM’14] and Tensorflow [Martin+,TR’15]
Powerful GPU for training
17 / 36
CNN Architecture Overview
Convolution Layer
Rectified Linear Unit (ReLU)
Pooling Layer
Fully Connected Layer
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
18 / 36
Convolution Layer
Convolution Operation:
I ⊗ K(x, y) =
c
i=1
m
j=1
m
k=1
I(i, x − j, y − k)K(j, k)
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
19 / 36
Convolution Layer (cont.)
Effect of different convolution kernel sizes:
(a) 7 × 7 (b) 5 × 5 (c) 3 × 3
Kernel Size Padding Test Accuracy
7 × 7 3 87.50%
5 × 5 2 93.75%
3 × 3 1 96.25%
20 / 36
Rectified Linear Unit
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
Alleviate overfitting with sparse feature map
Avoid gradient vanishing problem
Activation Function Expression Validation Loss
ReLU max{x, 0} 0.16
Sigmoid 1
1+exp(−x) 87.0
TanH
exp(2x)−1
exp(2x)+1 0.32
BNLL log(1 + exp(x)) 87.0
WOAF NULL 87.0
21 / 36
Pooling Layer
…
CONV
max(0,x)
ReLU POOL
CONV
max(0,x)
ReLU POOL
……
FC
Hotspot
Non-hotspot
Extracts the local region statistical attributes in the feature map
1 2 3
5 6 7
9 10 11
4
8
12
13 14 15 16
6
1614
8
MAXPOOL
(a) max pooling
3.5
13.511.5
5.5
AVEPOOL
1 2 3
5 6 7
9 10 11
4
8
12
13 14 15 16
(b) avg pooling
22 / 36
Pooling Layer (cont.)
Translation invarient ()
Dimension reduction
Effect of pooling methods:
Pooling Method Kernel Test Accuracy
Max 2 × 2 96.25%
Ave 2 × 2 96.25%
Stochastic 2 × 2 90.00%
23 / 36
Architecture Summary [SPIE’17]
Total 21 layers with 13 convolution layers and 5 pooling layers.
A ReLU is applied after each convolution layer.
…
…
Hotspot
Non-Hotspot
512x512x4
256x256x4
256x256x8
128x128x16
128x128x8
64x64x16
64x64x32
32x32x32 32x32x32
16x16x32
2048
512
C1-1
C1-2 C2-1C2-2 C2-3
P2 C3-1 C3-2
C4-1P3C3-3 C4-2 C4-3
C5-1 C5-2 C5-3P4 P5
24 / 36
Architecture Summary [SPIE’17]
Layer Kernel Size Stride Padding Output Vertexes
Conv1-1 2 × 2 × 4 2 0 512 × 512 × 4
Pool1 2 × 2 2 0 256 × 256 × 4
Conv2-1 3 × 3 × 8 1 1 256 × 256 × 8
Conv2-2 3 × 3 × 8 1 1 256 × 256 × 8
Conv2-3 3 × 3 × 8 1 1 256 × 256 × 8
Pool2 2 × 2 2 0 128 × 128 × 8
Conv3-1 3 × 3 × 16 1 1 128 × 128 × 16
Conv3-2 3 × 3 × 16 1 1 128 × 128 × 16
Conv3-3 3 × 3 × 16 1 1 128 × 128 × 16
Pool3 2 × 2 2 0 64 × 64 × 16
Conv4-1 3 × 3 × 32 1 1 64 × 64 × 32
Conv4-2 3 × 3 × 32 1 1 64 × 64 × 32
Conv4-3 3 × 3 × 32 1 1 64 × 64 × 32
Pool4 2 × 2 2 0 32 × 32 × 32
Conv5-1 3 × 3 × 32 1 1 32 × 32 × 32
Conv5-2 3 × 3 × 32 1 1 32 × 32 × 32
Conv5-3 3 × 3 × 32 1 1 32 × 32 × 32
Pool5 2 × 2 2 0 16 × 16 × 32
FC1 – – – 2048
FC2 – – – 512
FC3 – – – 2
25 / 36
Overcome Large Scale Layout Clip
Clip Partition
Discrete Cosine Transform
Discarding High Frequency Components
Feature Tensor
Division DCT
0
50
100
0 20 40 60 80 100
0
5
10
15
20
25
0
5
10
15
20
2
6
6
6
4
C11,1 C12,1 C13,1 . . . C1n,1
C21,1 C22,1 C23,1 . . . C2n,1
...
...
...
...
...
Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1
3
7
7
7
5
2
6
6
6
4
C11,k C12,k C13,k . . . C1n,k
C21,k C22,k C23,k . . . C2n,k
...
...
...
...
...
Cn1,k Cn2,k Cn3,k . . . Cnn,k
3
7
7
7
5
(
k
Encoding
26 / 36
Simplified CNN Architecture [DAC’17]
Feature Tensor
k-channel hyper-image
Compatible with CNN
Storage and computional efficiency
Layer Kernel Size Stride Output Node #
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
maxpooling1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
maxpooling2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
…
Hotspot
Non-Hotspot
Convolution + ReLU Layer Max Pooling Layer Full Connected Node
2
6
6
6
4
C11,1 C12,1 C13,1 . . . C1n,1
C21,1 C22,1 C23,1 . . . C2n,1
...
...
...
...
...
Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1
3
7
7
7
5
2
6
6
6
4
C11,k C12,k C13,k . . . C1n,k
C21,k C22,k C23,k . . . C2n,k
...
...
...
...
...
Cn1,k Cn2,k Cn3,k . . . Cnn,k
3
7
7
7
5
(
k
27 / 36
Layer Visualization
Origin Pool1 Pool2
Pool3 Pool4 Pool5
28 / 36
Comparison with State-of-the-Art∗
ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average
85
90
95
100
Accuracy(%)
JM3’16 TCAD’15 ICCAD’16 Ours
∗JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.
29 / 36
Comparison with State-of-the-Art†
Improve the performance of ODST by at least 24.80% on average.
ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average
103
104
105
ODST(s)
JM3’16 TCAD’15 ICCAD’16 Ours
†JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.
30 / 36
Issue: Need Pooling?
Pooling
Pooling layer diminishes small variations of input images or intermediate feature maps for
better generality. (Information loss of the edge corrections in post-OPC patterns.)
Layer Kernel Size Stride Output Vertexes
Conv1-1 2 2 512 × 512 × 4
Pool1 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Pool2 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Pool3 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
Layer Kernel Size Stride Output Node #
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
Pool1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
Pool2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36
Issue: Need Pooling?
Pooling
Pooling layer diminishes small variations of input images or intermediate feature maps for
better generality. (Information loss of the edge corrections in post-OPC patterns.)
Layer Kernel Size Stride Output Vertexes
Conv1-1 2 2 512 × 512 × 4
Conv1-2 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Conv2-4 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Conv3-4 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
Layer Kernel Size Stride Output Node #
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
Pool1 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
Pool2 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36
Issue: Need Pooling?
Pooling
Pooling layer diminishes small variations of input images or intermediate feature maps for
better generality. (Information loss of the edge corrections in post-OPC patterns.)
Layer Kernel Size Stride Output Vertexes
Conv1-1 2 2 512 × 512 × 4
Conv1-2 3 2 256 × 256 × 4
Conv2-1 3 1 256 × 256 × 8
Conv2-2 3 1 256 × 256 × 8
Conv2-3 3 1 256 × 256 × 8
Conv2-4 3 2 128 × 128 × 8
Conv3-1 3 1 128 × 128 × 16
Conv3-2 3 1 128 × 128 × 16
Conv3-3 3 1 128 × 128 × 16
Conv3-4 3 2 64 × 64 × 16
Conv4-1 3 1 64 × 64 × 32
Conv4-2 3 1 64 × 64 × 32
... ... ... ...
Layer Kernel Size Stride Output Node #
conv1-1 3 1 12 × 12 × 16
conv1-2 3 1 12 × 12 × 16
conv1-3 2 2 6 × 6 × 16
conv2-1 3 1 6 × 6 × 32
conv2-2 3 1 6 × 6 × 32
conv2-3 2 2 3 × 3 × 32
fc1 N/A N/A 250
fc2 N/A N/A 2
31 / 36
Accuracy
IC
CAD
Industry1
Industry2
Industry3
Average
40
50
60
70
80
90
100
Accuracy(%) SPIE’15 ICCAD’16 SPIE’17
SPIE’17_nopool DAC’17 DAC’17_nopool
32 / 36
False Alarm
IC
CAD
Industry1
Industry2
Industry3
Average
0
3,000
6,000
9,000
FalseAlarm SPIE’15 ICCAD’16 SPIE’17
SPIE’17_nopool DAC’17 DAC’17_nopool
33 / 36
Runtime
The runtime of each hotspot detector is represented in (clips per second).
Additional convolution layer does not induce significant runtime penalties.
SPIE’15
IC
CAD
’16
SPIE’17
IE’17_nopool
DAC
’17
C
’17_nopool
60
90
120
150
52
142
90 86
106 104
Runtime(clips/s)
34 / 36
Outline
Shallow Learning: Feature Engineering
Deep Learning
Conclusion
35 / 36
Rethinking: “ImageHotspot” v.s. ImageNet
35 / 36
Acknowledgment
CUHK colleagues:
Prof. Evengeline F. Y. Young;
Haoyu Yang, Yuzhe Ma, Hao Geng, Tinghuan Chen;
Undergraduate: Shuhe Li, Yajun Lin, Wei Li, Jiamin Chen.
Industrial Liaison:
Yi Zou, Jing Su, Chenxi Lin (ASML);
Jin Miao, Subhendu Roy, Jhih-Rong Gao (Cadence).
Academic Liaison:
David Z. Pan (UT Austin), Xuan Zeng (Fudan University);
Xin Li (Duke University), Bingqing Lin (Shenzhen University);
Shiyan Hu (Michgan Technological University);
Bing Li, Ulf Schlichtmann (TUM).
36 / 36

GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用

  • 1.
    Lithography Hotspot Detection: FromShallow To Deep Learning Bei Yu Department of Computer Science and Engineering The Chinese University of Hong Kong 1 / 36
  • 2.
    Moore’s Law toExtreme Scaling 2 / 36
  • 3.
  • 4.
  • 5.
    Manufacturability Status &Challenges Dispearance 5 / 36
  • 6.
    Manufacturability Status &Challenges Dispearance 1980 1990 2000 2010 2020 10 1 0.1 um [Courtesy Intel] X 5 / 36
  • 7.
    Lithography Hotspot Detection Pre-OPCLayout Post-OPC Mask Hotspot on Wafer RET: OPC, SRAF, MPL Still hotspot: low fidelity patterns Simulations: extremely CPU intensive Ra#o%of%lithography%simula#on%#me% (normalized%by%40nm%node)% Technology%node Required(computa/onal( /me(reduc/on! 6 / 36
  • 8.
    Layout Verification Hierarchy Increasing verification accuracy Sampling HotspotDetection Lithography Simulation (Relative) CPU runtime at each level Sampling (DRC Checking): scan and rule check each region Hotspot Detection: verify the sampled regions and report potential hotspots Lithography Simulation: final verification on the reported hotspots 7 / 36
  • 9.
    Pattern Matching basedHotspot Detection library' hotspot& Pa)ern' matching' hotspot& hotspot& 8 / 36
  • 10.
    Pattern Matching basedHotspot Detection library' hotspot& Pa)ern' matching' hotspot& hotspot& detected hotspot& undetected detected Cannot&detect& hotspots&not&in& the&library& Fast and accurate [Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15] Fuzzy pattern matching [Wen+,TCAD’14] Hard to detect non-seen pattern 8 / 36
  • 11.
    Machine Learning basedHotspot Detection Hotspot& detec*on& model& Classifica*on& Extract&layout& features& 9 / 36
  • 12.
    Machine Learning basedHotspot Detection Non$ Hotspot Hotspot Hotspot& detec*on& model& Classifica*on& Extract&layout& features& Hard,to,trade$off, accuracy,and,false, alarms, Predict new patterns Decision-tree, ANN, SVM, Boosting ... [Drmanac+,DAC’09] [Ding+,TCAD’12] [Yu+,JM3’15] [Matsunawa+,SPIE’15] [Yu+,TCAD’15] Hard to balance accuracy and false-alarm 9 / 36
  • 13.
    Outline Shallow Learning: FeatureEngineering Deep Learning Conclusion 10 / 36
  • 14.
    Outline Shallow Learning: FeatureEngineering Deep Learning Conclusion 11 / 36
  • 15.
    Learning Models Support VectorMachine [ASPDAC’12][TCAD’15] Classifier: Artificial Neural Network [ASPDAC’12][JOP’13] x1 x2 x3 h1 h2 h3 h4 y1 y2 Input Hidden Output W1 W2 Feedforward Prediction/Regression: Boosting [SPIE’15][ICCAD’116] Ensemble machine learning methods that can build strong classifiers from a set of weak classifiers. Adaboost and Decision Tree: 11 / 36
  • 16.
    Conventional Feature Extraction r 0" 0" 0" 0" F 0" F_In F_Ex F_ExIn F_InEx -1-2+1 +2… … … … … +2 -2 -1 -1 +2+1 -2+1 -1 +1 +2 -1 +1 +2 … Fragment [ASPDAC’12][JM3’15] HLAC [JM3’14] 0 order 1st order 2nd order Density [SPIE’15] a11 a12 a13 a14 a15 a21 a22 a23 a24 a25 a31 a32 a33 a23 a35 a41 a42 a43 a44 a45 a51 a52 a53 a54 a55 ws wn 12 / 36
  • 17.
    Feature Evaluation (a) Fragment 32 1 0 1 2 3 43 2 1 0 1 2 3 (b) HLAC 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.02.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 (c) Density For hotspot feature i, calculate Mahalanobis distance [Mahalanobis,1936]. di = (xi − µ)TV−1(xi − µ) − dmin dmax − dmin As large as possible 13 / 36
  • 18.
    Feature Engineering Example:CCAS Training Layout Clips Dense Circuit Sampling CCAS Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16]. Capture the affects of light diffraction. Simple rule to select circles from dense samples. 14 / 36
  • 19.
    Feature Engineering Example:CCAS Training Layout Clips Dense Circuit Sampling CCAS Concentric Circle Area Sampling (CCAS) [Matsunawa+,JM3’16]. Capture the affects of light diffraction. Simple rule to select circles from dense samples. Question: Can we find correlation between circles and hotspots, and select circles smartly? 14 / 36
  • 20.
    Smart CCAS CircleSelection [ICCAD’16] Higer Mutual Information, more correlation between circle and label variable. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 100 200 300 400 500 600 MutualInformation Circle Radius Case1 Case2 Case3 1 nm 600nm … … 15 / 36
  • 21.
    Smart CCAS CircleSelection [ICCAD’16] Mathematical Formulation max w v w s.t. vi = I(Ci; Y), ∀vi ∈ v, ||w||0 = nc, ∀ i, wi ∈ {0, 1}, |i − j| ≥ d, ∀ i = j, wi = wj = 1. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 100 200 300 400 500 600 MutualInformation Circle Radius Case1 Case2 Case3 16 / 36
  • 22.
    Smart CCAS CircleSelection [ICCAD’16] Mathematical Formulation max w v w s.t. vi = I(Ci; Y), ∀vi ∈ v, ||w||0 = nc, ∀ i, wi ∈ {0, 1}, |i − j| ≥ d, ∀ i = j, wi = wj = 1. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 100 200 300 400 500 600 MutualInformation Circle Radius Case1 Case2 Case3 Optimally Solved by Dynamic Programming D[i, j] = max{v[i] + D[i − d, j − 1], D[i − 1, j]}}, D[i, j]: optimal solution from range {1, i}, when totally j circles are selected. 16 / 36
  • 23.
    Outline Shallow Learning: FeatureEngineering Deep Learning Conclusion 17 / 36
  • 24.
    Why Deep Learning? FeatureCrafting v.s. Feature Learning Although prior knowledge is considered during manually feature design, information loss is inevitable. Feature learned from mass dataset is more reliable. Scalability With shrinking down circuit feature size, mask layout becomes more complicated. Deep learning has the potential to handle ultra-large-scale instances while traditional machine learning may suffer from performance degradation. Mature Libraries Caffe [Jia+,ACMMM’14] and Tensorflow [Martin+,TR’15] Powerful GPU for training 17 / 36
  • 25.
    CNN Architecture Overview ConvolutionLayer Rectified Linear Unit (ReLU) Pooling Layer Fully Connected Layer … CONV max(0,x) ReLU POOL CONV max(0,x) ReLU POOL …… FC Hotspot Non-hotspot 18 / 36
  • 26.
    Convolution Layer Convolution Operation: I⊗ K(x, y) = c i=1 m j=1 m k=1 I(i, x − j, y − k)K(j, k) … CONV max(0,x) ReLU POOL CONV max(0,x) ReLU POOL …… FC Hotspot Non-hotspot … CONV max(0,x) ReLU POOL CONV max(0,x) ReLU POOL …… FC Hotspot Non-hotspot 19 / 36
  • 27.
    Convolution Layer (cont.) Effectof different convolution kernel sizes: (a) 7 × 7 (b) 5 × 5 (c) 3 × 3 Kernel Size Padding Test Accuracy 7 × 7 3 87.50% 5 × 5 2 93.75% 3 × 3 1 96.25% 20 / 36
  • 28.
    Rectified Linear Unit … CONV max(0,x) ReLUPOOL CONV max(0,x) ReLU POOL …… FC Hotspot Non-hotspot Alleviate overfitting with sparse feature map Avoid gradient vanishing problem Activation Function Expression Validation Loss ReLU max{x, 0} 0.16 Sigmoid 1 1+exp(−x) 87.0 TanH exp(2x)−1 exp(2x)+1 0.32 BNLL log(1 + exp(x)) 87.0 WOAF NULL 87.0 21 / 36
  • 29.
    Pooling Layer … CONV max(0,x) ReLU POOL CONV max(0,x) ReLUPOOL …… FC Hotspot Non-hotspot Extracts the local region statistical attributes in the feature map 1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16 6 1614 8 MAXPOOL (a) max pooling 3.5 13.511.5 5.5 AVEPOOL 1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16 (b) avg pooling 22 / 36
  • 30.
    Pooling Layer (cont.) Translationinvarient () Dimension reduction Effect of pooling methods: Pooling Method Kernel Test Accuracy Max 2 × 2 96.25% Ave 2 × 2 96.25% Stochastic 2 × 2 90.00% 23 / 36
  • 31.
    Architecture Summary [SPIE’17] Total21 layers with 13 convolution layers and 5 pooling layers. A ReLU is applied after each convolution layer. … … Hotspot Non-Hotspot 512x512x4 256x256x4 256x256x8 128x128x16 128x128x8 64x64x16 64x64x32 32x32x32 32x32x32 16x16x32 2048 512 C1-1 C1-2 C2-1C2-2 C2-3 P2 C3-1 C3-2 C4-1P3C3-3 C4-2 C4-3 C5-1 C5-2 C5-3P4 P5 24 / 36
  • 32.
    Architecture Summary [SPIE’17] LayerKernel Size Stride Padding Output Vertexes Conv1-1 2 × 2 × 4 2 0 512 × 512 × 4 Pool1 2 × 2 2 0 256 × 256 × 4 Conv2-1 3 × 3 × 8 1 1 256 × 256 × 8 Conv2-2 3 × 3 × 8 1 1 256 × 256 × 8 Conv2-3 3 × 3 × 8 1 1 256 × 256 × 8 Pool2 2 × 2 2 0 128 × 128 × 8 Conv3-1 3 × 3 × 16 1 1 128 × 128 × 16 Conv3-2 3 × 3 × 16 1 1 128 × 128 × 16 Conv3-3 3 × 3 × 16 1 1 128 × 128 × 16 Pool3 2 × 2 2 0 64 × 64 × 16 Conv4-1 3 × 3 × 32 1 1 64 × 64 × 32 Conv4-2 3 × 3 × 32 1 1 64 × 64 × 32 Conv4-3 3 × 3 × 32 1 1 64 × 64 × 32 Pool4 2 × 2 2 0 32 × 32 × 32 Conv5-1 3 × 3 × 32 1 1 32 × 32 × 32 Conv5-2 3 × 3 × 32 1 1 32 × 32 × 32 Conv5-3 3 × 3 × 32 1 1 32 × 32 × 32 Pool5 2 × 2 2 0 16 × 16 × 32 FC1 – – – 2048 FC2 – – – 512 FC3 – – – 2 25 / 36
  • 33.
    Overcome Large ScaleLayout Clip Clip Partition Discrete Cosine Transform Discarding High Frequency Components Feature Tensor Division DCT 0 50 100 0 20 40 60 80 100 0 5 10 15 20 25 0 5 10 15 20 2 6 6 6 4 C11,1 C12,1 C13,1 . . . C1n,1 C21,1 C22,1 C23,1 . . . C2n,1 ... ... ... ... ... Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1 3 7 7 7 5 2 6 6 6 4 C11,k C12,k C13,k . . . C1n,k C21,k C22,k C23,k . . . C2n,k ... ... ... ... ... Cn1,k Cn2,k Cn3,k . . . Cnn,k 3 7 7 7 5 ( k Encoding 26 / 36
  • 34.
    Simplified CNN Architecture[DAC’17] Feature Tensor k-channel hyper-image Compatible with CNN Storage and computional efficiency Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 conv1-2 3 1 12 × 12 × 16 maxpooling1 2 2 6 × 6 × 16 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 6 × 6 × 32 maxpooling2 2 2 3 × 3 × 32 fc1 N/A N/A 250 fc2 N/A N/A 2 … Hotspot Non-Hotspot Convolution + ReLU Layer Max Pooling Layer Full Connected Node 2 6 6 6 4 C11,1 C12,1 C13,1 . . . C1n,1 C21,1 C22,1 C23,1 . . . C2n,1 ... ... ... ... ... Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1 3 7 7 7 5 2 6 6 6 4 C11,k C12,k C13,k . . . C1n,k C21,k C22,k C23,k . . . C2n,k ... ... ... ... ... Cn1,k Cn2,k Cn3,k . . . Cnn,k 3 7 7 7 5 ( k 27 / 36
  • 35.
    Layer Visualization Origin Pool1Pool2 Pool3 Pool4 Pool5 28 / 36
  • 36.
    Comparison with State-of-the-Art∗ ICCAD-1ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average 85 90 95 100 Accuracy(%) JM3’16 TCAD’15 ICCAD’16 Ours ∗JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based. 29 / 36
  • 37.
    Comparison with State-of-the-Art† Improvethe performance of ODST by at least 24.80% on average. ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average 103 104 105 ODST(s) JM3’16 TCAD’15 ICCAD’16 Ours †JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based. 30 / 36
  • 38.
    Issue: Need Pooling? Pooling Poolinglayer diminishes small variations of input images or intermediate feature maps for better generality. (Information loss of the edge corrections in post-OPC patterns.) Layer Kernel Size Stride Output Vertexes Conv1-1 2 2 512 × 512 × 4 Pool1 3 2 256 × 256 × 4 Conv2-1 3 1 256 × 256 × 8 Conv2-2 3 1 256 × 256 × 8 Conv2-3 3 1 256 × 256 × 8 Pool2 3 2 128 × 128 × 8 Conv3-1 3 1 128 × 128 × 16 Conv3-2 3 1 128 × 128 × 16 Conv3-3 3 1 128 × 128 × 16 Pool3 3 2 64 × 64 × 16 Conv4-1 3 1 64 × 64 × 32 Conv4-2 3 1 64 × 64 × 32 ... ... ... ... Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 conv1-2 3 1 12 × 12 × 16 Pool1 2 2 6 × 6 × 16 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 6 × 6 × 32 Pool2 2 2 3 × 3 × 32 fc1 N/A N/A 250 fc2 N/A N/A 2 31 / 36
  • 39.
    Issue: Need Pooling? Pooling Poolinglayer diminishes small variations of input images or intermediate feature maps for better generality. (Information loss of the edge corrections in post-OPC patterns.) Layer Kernel Size Stride Output Vertexes Conv1-1 2 2 512 × 512 × 4 Conv1-2 3 2 256 × 256 × 4 Conv2-1 3 1 256 × 256 × 8 Conv2-2 3 1 256 × 256 × 8 Conv2-3 3 1 256 × 256 × 8 Conv2-4 3 2 128 × 128 × 8 Conv3-1 3 1 128 × 128 × 16 Conv3-2 3 1 128 × 128 × 16 Conv3-3 3 1 128 × 128 × 16 Conv3-4 3 2 64 × 64 × 16 Conv4-1 3 1 64 × 64 × 32 Conv4-2 3 1 64 × 64 × 32 ... ... ... ... Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 conv1-2 3 1 12 × 12 × 16 Pool1 2 2 6 × 6 × 16 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 6 × 6 × 32 Pool2 2 2 3 × 3 × 32 fc1 N/A N/A 250 fc2 N/A N/A 2 31 / 36
  • 40.
    Issue: Need Pooling? Pooling Poolinglayer diminishes small variations of input images or intermediate feature maps for better generality. (Information loss of the edge corrections in post-OPC patterns.) Layer Kernel Size Stride Output Vertexes Conv1-1 2 2 512 × 512 × 4 Conv1-2 3 2 256 × 256 × 4 Conv2-1 3 1 256 × 256 × 8 Conv2-2 3 1 256 × 256 × 8 Conv2-3 3 1 256 × 256 × 8 Conv2-4 3 2 128 × 128 × 8 Conv3-1 3 1 128 × 128 × 16 Conv3-2 3 1 128 × 128 × 16 Conv3-3 3 1 128 × 128 × 16 Conv3-4 3 2 64 × 64 × 16 Conv4-1 3 1 64 × 64 × 32 Conv4-2 3 1 64 × 64 × 32 ... ... ... ... Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 conv1-2 3 1 12 × 12 × 16 conv1-3 2 2 6 × 6 × 16 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 6 × 6 × 32 conv2-3 2 2 3 × 3 × 32 fc1 N/A N/A 250 fc2 N/A N/A 2 31 / 36
  • 41.
  • 42.
    False Alarm IC CAD Industry1 Industry2 Industry3 Average 0 3,000 6,000 9,000 FalseAlarm SPIE’15ICCAD’16 SPIE’17 SPIE’17_nopool DAC’17 DAC’17_nopool 33 / 36
  • 43.
    Runtime The runtime ofeach hotspot detector is represented in (clips per second). Additional convolution layer does not induce significant runtime penalties. SPIE’15 IC CAD ’16 SPIE’17 IE’17_nopool DAC ’17 C ’17_nopool 60 90 120 150 52 142 90 86 106 104 Runtime(clips/s) 34 / 36
  • 44.
    Outline Shallow Learning: FeatureEngineering Deep Learning Conclusion 35 / 36
  • 45.
  • 46.
    Acknowledgment CUHK colleagues: Prof. EvengelineF. Y. Young; Haoyu Yang, Yuzhe Ma, Hao Geng, Tinghuan Chen; Undergraduate: Shuhe Li, Yajun Lin, Wei Li, Jiamin Chen. Industrial Liaison: Yi Zou, Jing Su, Chenxi Lin (ASML); Jin Miao, Subhendu Roy, Jhih-Rong Gao (Cadence). Academic Liaison: David Z. Pan (UT Austin), Xuan Zeng (Fudan University); Xin Li (Duke University), Bingqing Lin (Shenzhen University); Shiyan Hu (Michgan Technological University); Bing Li, Ulf Schlichtmann (TUM). 36 / 36