13. Hardware vs AI Trend
13
Source: Ark Investments
Transformer (213M param) trained with NAS
Time
to
Market
Design RTL/Physical Manufactu
re
Test/Pkg/Ing
Days to Months –
Months – to years
14. Hardware vs AI Trend
n HW improvement reduce the cost of training 37% per year
n The model size has grown at the paste of 10X per year
n AI training cost continues to climb quickly
14
Source: Ark Investments
16. Carbon Footprint of AI and DL
n Eye opening study:
q Energy of training the model for 1-day was computed
q Scaled using the data in the paper on the number of GPU-days took for training
q The cost was computed based on average energy cost in US
q This is result for a one training run
16
Source: https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/
BERT carbon footprint =~
1,400lb of CO2 ~ close to a
round-trip trans-America
flight for one person
= round-trip trans-
America flight for 416
persons
17. Growth in AI Energy Requirements
n Models are becoming larger
q GPT-2 à 1.5B parameters and a few
petaflop-days to train
q GPT-3 à 175B parameters
q PaLM à 540B parameters
q GPT4 à 100s of B parameters!
q What is next?
17
Source: compute trends across three areas of machine learning, Servila et al, arXiv 2022
18. Growth in AI Energy Requirements
n Datasets are becoming significantly larger
q 3B words (training set) à training of BERT
q 32B words à XLNet
q 40B words à GPT-2
q 500B words à GPT-3
18
Source: https://epochai.org/blog/trends-in-training-dataset-sizes
27. Modern VLSI Layout
n Designs are getting larger with billions of transistor on chip
n Design flows are getting increasingly more complicated
27
IBM Power 10
18B
Apple A15
15B
NVIDIA Ampere GA100
54B
Cerebras Mega
1.2T
29. Learning Assisted Computer Aided Design
n Physical Design is a very time-consuming Process
q Iterative and incremental (5-18 months in industry)
q Heuristic optimization algorithms
q Human Expert
n Applied Learning can help with
q Reduce the significance of human expert
q Optimization beyond heuristics
q Reduce the design time
29
Design Time
Design
Maturity
Goal
Learning-Assisted outcome
Conventional Design outcome
Design Space
Heuristic Search Space
Best
Local Best
30. Problem-Solution Opportunity Matrix
n Some problems do not get old, but our solutions do!
30
New
Problem
Old
Problem
New
Solution
Old
Solution
Waste of Time! Big Money!
Moderate Risk!
Big Potential!
High Risk
Some Potential!
Lower Risk!
Invention
Experimentation
Maintenance Innovation
31. Physical Design
n In each physical design step, there are 2 required processes:
q Optimization (usually semi-heurisitic) solution – placement, CTS, route, chip finishing
q Quality of Result (QoR) Analysis – i.e. power, area, timing, DRC violation, IR, EM, SI
n For example, during Placement
q The placement of movable objects (cells) are determined (optimization)
q The quality of placement is measured (QoR) in terms of:
n Power, Area, Timing, Potential routing congestion, etc.
n The optimization solution (semi-heuristic) are designed to optimize to improve QoR
metrics.
n In an ML-assisted physical design, we also need to develop ML solutions for both
Optimization and QoR analysis
n Given the approximate nature of ML, signoff level analysis is not possible. The next best
thing is prediction. (however it could be verified by EDA)
q If prediction has high accuracy and could be done far faster, it provide an advantage à could be
used in optimization loop. (full STA may take days, ML-based prediction may take seconds)!
n But also, ML allows us to forecast the outcome of future steps.
q Hence, ML can be used for both prediction and forecast.
31
33. ML Framework could be used for speedup
33
n ML Framework for speedup: rewrite heuristic algorithms using learning framework,
formulate the optimization as a training problem, enjoy GPU scaling.
Learning Assisted Physical Design (LAPD)
ML
Framework
Speedup
34. Example: DREAMPlace
n VLSI Placement
34
VLSI
Placement
Gate level Netlist
STD Cell Lib
Floorplan
Constraints
Legal Placement
Challenges of Nonlinear Placement:
• Low efficiently
• >3h for 10M cell design
• Today we are targeting much larger placement!
• Limited acceleration
• Limited speedup
References: Lin, Yibo, et al. "Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement." Proceedings of the 56th Annual
Design Automation Conference 2019. 2019.
35. Example: DREAMPlace
36
n Interestingly, the objective of placement and training are very similar
References: Lin, Yibo, et al. "Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement." Proceedings of the 56th Annual
Design Automation Conference 2019. 2019.
36. Example: DREAMPlace
37
n Interestingly, the objective of placement and training are very similar
References: Lin, Yibo, et al. "Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement." Proceedings of the 56th Annual
Design Automation Conference 2019. 2019.
37. Example: DREAMPlace Results
n Significant speedup!
n Area for improvement:
q It is not congestion aware (congestion is handled indirectly by density constraint)
q It is not timing aware (average timing is optimized, not the worst case) – good for power though!
References: Lin, Yibo, et al. "Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement." Proceedings of the 56th Annual
Design Automation Conference 2019. 2019.
38. ML Framework could be used for speedup
39
n ML Framework for speedup: rewrite heuristic algorithms using learning framework,
formulate the optimization as a training problem, enjoy GPU scaling.
Learning Assisted Physical Design (LAPD)
ML
Framework
Speedup
39. 40
n ML Framework for speedup
n Develop ML for QoR prediction
q Reduce QoR analysis time
q See many PVT corners early in design time
q Predict QoR in future steps (i.e. routing congestion in placement time)
ML could be used for QoR prediction/cast
Learning Assisted Physical Design (LAPD)
ML
for
QoR
Prediction
ML
Framework
Speedup
Examples:
• PBA prediction using GBA timing analysis
• MCMM STA prediction using STA runs in limited
(i.e., 3) corners.
• IR drop prediction
• Routing congestion prediction at synthesis
• Routing congestion prediction at placement
• DRC prediction
• Yield prediction
40. 41
n ML Framework for speedup
n Develop ML for QoR prediction
n Use Representation Learning to automate feature engineering
q Learn better features à improve accuracy!
q Remove need for cross domain ML and CAD experts à Ease of development
q Lower Adoption bar à widespread use!
ML could be used for QoR prediction/cast
Learning Assisted Physical Design (LAPD)
Auto
Feature
Engineering
ML
for
QoR
Prediction
ML
Framework
Speedup
We will cover a case study from our group for MCMM
PBA prediction using automated feature engineering
41. 42
n ML Framework for speedup
n Develop ML for QoR prediction
n Use Representation Learning to automate feature engineering
n ML (i.e., RL) for Optimization
q Optimize beyond heuristic models.
ML Could be used for Optimization
ML
for
Optimization
Learning Assisted Physical Design (LAPD)
Auto
Feature
Engineering
ML
for
QoR
Prediction
ML
Framework
Speedup
Examples:
• RL,CNN, GAN for Macro Placement
• RL for CTS
• RL,CNN, GAN for Routing
• ….
We will cover a case study from our group for
CTS using Reinforcement Learning!
42. 43
ML
for
Optimization
Learning Assisted Physical Design (LAPD)
Auto
Feature
Engineering
ML
for
QoR
Prediction
ML
Framework
Speedup
n ML Framework for speedup
n Develop ML for QoR prediction
n Use Representation Learning to automate feature engineering
n ML (i.e., RL) for Optimization
n The ML for QoR prediction and Optimization (ML or framework based) could work
together in loop.
ML-Guided ML-Optimization (ML Loop)
43. ML-Guided ML-Optimization (ML Loop)
44
n ML Framework for speedup
n Develop ML for QoR prediction
n Use Representation Learning to automate feature engineering
n ML (i.e., RL) for Optimization
n The ML for QoR prediction and Optimization (ML or framework based) could work
together in loop.
ML
for
Optimization
Learning Assisted Physical Design (LAPD)
Auto
Feature
Engineering
ML
for
QoR
Prediction
ML
Framework
Speedup
Two possibilities:
1. QoR nested within optimizationà
ML guided ML optimization
• Reduced number of iterations
• Make better optimization decisions
2. Optimization nested within QoR à
Genetic-based optimization
• Analogous to ML replacing physical designer,
in analyzing the result, and re-running the flow!
44. 45
n Using multiple ML-based QoR prediction in optimization loop results in
q multi-objective optimization
q Predictive-optimization
n Pros:
q Optimize for current and future QoR
q Prevent doomed runs
q Prevent QoR estimation (pessimism) from lowering design quality
q Faster signoff
q Reduce Time To Market (TTM)
q Reduce tool and licensing cost
q Reduce engineering cost
q ….
ML
for
Optimization
Learning Assisted Physical Design (LAPD)
Auto
Feature
Engineering
ML
for
QoR
Prediction
ML
Framework
Speedup
ML-Guided ML-Optimization (ML Loop)
46. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
47
RL for Peak Current Reduction
Time
Demanded
Current
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
47. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
48
RL for Peak Current Reduction
Time
Demanded
Current
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
48. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
n Solution: Reinforcement Learning
49
RL for Peak Current Reduction
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
49. 50
Reinforcement Learning
n Reinforcement learning (RL): How Intelligent agent should take actions and
interact with environment in order to maximize its reward
n RL combines exploitation (maximizing rewards) with exploration (taking risk) to learn
about possible future rewards
n Applicable to problems of sequential decision making
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
50. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
n Solution: Reinforcement Learning
51
RL for Peak Current Reduction
Agent
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
51. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
n Solution: Reinforcement Learning
52
RL for Peak Current Reduction
T + tck-c – tck-l ³ tc-q + tplogic + tsu
thold + d ≤ tcdlogic + tcdreg
Timing Check
Environment
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
52. n Problem: Reducing peak current à IR drop
n Method: Maximize the skew in design to spread the clock arrival times
n Solution: Reinforcement Learning
q Positive reward if an action increase overall CAT distribution
q Large Negative reward if an action generate timing violation
q Allow aggressive exploitation with large discount factor (allowing search for future rewards)
53
RL for Peak Current Reduction
Timing Engine
Action A =
• Insert or remove clock buffer
• Where to Move
Reward R =
• + delta skew
• - timing violation
State S = Updated design
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
53. n Results on Ethernet benchmark
q Wider distribution of CAT in RL
n Measurement from Ansys Redhawk
q 36.2% Reduction in Peak current
q 41.4% Improvement in IR drop
55
Reinforce-L for Peak Current Reduction
skew
Count
CTS
RL
Heuristic CTS
Reinforcement Learning
Reference: "A Reinforced Learning Solution for Clock Skew Engineering to Reduce Peak Current and IR Drop." Proceedings of the 2021 on GLSVLSI 2021
55. Representation Learning for STA Prediction
n Problem:
q Path-based static timing (PBA) analysis is very expensive
q Timing Checks in many corners is a very expensive
q Designer resort to Graph-based timing analysis (GBA) on one or few corners to carry the physical design
q When near maturity, designer switch to PBA mode, and check for other corners
q Signoff require PBA timing check in all corners
57
GBA
PBA
(PVT) = 1
(P,V,T) = 100s
Design Time
Design
Maturity
Goal
Timing
Check
Design outcome + GBA
Design outcome + PBA
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
56. Representation Learning for STA
n But at this point, the damage is already done!
q Design Cycle: GBA is pessimistic à tool will overfix à increasing design iterations
q PPA penalty: GBA is pessimistic à Trade PPA for timing à Lost optimization opportunity
q Corner Blindness: Design is only tracked in limited corners à other corners may surprise!
58
GBA
PBA
(PVT) = 1
(P,V,T) = 100s
Design Time
Design
Maturity
Goal
Timing
Check
Design outcome + GBA
Design outcome + PBA
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
57. Representation Learning for STA
n Objective: Prediction of Static Timing Analysis Results (PBA prediction)
n Constraint: No manual feature engineering
n Approach: Representation Learning + MLP
59
GBA
PBA
(PVT) = 1
(P,V,T) = 100s
Design Time
Design
Maturity
Goal
Timing
Check
Learning Model
PBA Corner 1
PBA Corner 2
PBA Corner 185
…
…
Design outcome (old) + PBA
Design outcome (now) + GBA
PBA Predicted outcome (ML)
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
58. Comparison to Prior Art
60
Timing Engine (STA)
Features
Timing Engine (STA)
Representation
Learning
Training
Training
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
59. Representation Learning for STA
61
3-Unit LSTM Cell
Gate XYZ
Property 1: Value
Property 2: Value
Property 3: Value
.
.
.
Property 500: Value
net YUZ
Property 1: Value
Property 2: Value
Property 3: Value
.
.
.
Property 600: Value
Bigram XYZ-YUZ
Property 1: Value
Property 2: Value
Property 3: Value
.
.
.
Property 500: Value
Property 501: Value
Property 502: Value
Property 503: Value
.
.
.
Property 1100: Value
1
LSTM LSTM LSTM LSTM LSTM
Bigram
Bigram
Bigram
Bigram
Bigram
Data Path
Data Path
2
LSTM
LSTM LSTM
LSTM LSTM
LSTM LSTM
LSTM LSTM
LSTM
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
60. Representation Learning for STA
62
Data Path
1
LSTM LSTM LSTM LSTM LSTM
2
Data
1
LSTM LSTM LSTM LSTM LSTM
2
Capture
1
LSTM LSTM LSTM LSTM LSTM
2
Launch
Capture
Path
Launch
Path
Data Representation Learning
Capture Representation Learning
Launch Representation Learning
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
61. Representation Learning for STA
63
1
LSTM LSTM LSTM LSTM LSTM
2
Data
1
LSTM LSTM LSTM LSTM LSTM
2
Capture
1
LSTM LSTM LSTM LSTM LSTM
2
Launch
Data Representation Learning
Capture Representation Learning
Launch Representation Learning
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
62. Representation Learning for STA
64
Launch
Delay
Capture
Delay
Data
Delay
Launch Path
Features
Capture Path
Features
Data Path
Features
Label
Prediction
PBA Slack Prediction
FC
Sub-Label Prediction
Used for training Phase; Removed in test phase
Input Sample
Data
Representation
Learning
Capture
Representation
Learning
Launch
Representation
Learning
Fully Connected Layer
Dropout
Fully Connected Layer
FC FC
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
63. Comparison to Prior Art
65
Timing Engine (STA)
Features
Timing Engine (STA)
Representation
Learning
Training
Training
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022
64. Results (Route PBA from GBA)
0
5
10
15
20
25
30
0
.
7
8
0
.
7
9
0
.
8
1
0
.
8
3
0
.
8
5
0
.
8
7
0
.
8
9
0
.
9
1
0
.
9
3
0
.
9
5
0
.
9
7
0
.
9
9
1
.
0
1
1
.
0
3
1
.
0
5
68
Ethernet
0
5
10
15
20
25
30
35
40
0
.
7
8
0
.
7
9
0
.
8
1
0
.
8
3
0
.
8
5
0
.
8
7
0
.
8
9
0
.
9
1
0
.
9
3
0
.
9
5
0
.
9
7
0
.
9
9
1
.
0
1
1
.
0
3
1
.
0
5
S38417
Standard
Deviation
(ps)
Standard
Deviation
(ps)
Voltages (V)
Voltages (V)
RAPTA average train and test time on GPU. The reported number for test, is the time needed to generate PBA
prediction for 10K timing paths. The training is only done once during the design cycle.
RAPTA
Reference: "RAPTA: A hierarchical representation learning solution for real-time prediction of path-based static timing analysis." GLSVLSI 2022