More Related Content Similar to High-Level Synthesis for the Design of AI Chips (20) More from Object Automation (20) High-Level Synthesis for the Design of AI Chips1. High-Level Synthesis for the Design of AI Chips
Jeff Roane – Group Director Product Management
Accelerating AI with Semiconductor RTL Front-End Services
2. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
2
• Challenging AI design environment
• High-Level Synthesis 101
• HLS – Perfect Control Point for AI Design
• AI design case studies
• Performance data
• Towards Autogenetic Design
Topics for Discussion
3. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
3
Challenging AI Design Environment
Moore's Law for AI – Size of language model doubles every 3.4 months
• Design cycles - from 1-2 years
• Environment requires ability to
adapt to changing requirements
during the design cycle
• Only the nimble survive
• Is RTL the best entry format?
Julien Simon. “LargeLanguageModels”. In: (2021). URL: https://huggingface.co/blog/large-language-models.
4. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
4
Challenging AI Design Environment
The cost of design change vis-à-vis the design flow
• Changing requirements affect the
entire flow
• Early-stage changes n-times costlier
than late-stage changes
• More automated implementation is
required to meet challenges of
Moore’s Law for AI
• HLS automates Spec-to-RTL
Logic
Synthesis
Spec
C++, MATLAB®, Python
RTL
Netlist
High-Level
Synthesis
Manual
RTL Dev
Licensed
RTL IP
GDSII
Physical
Implementation
Manual
Automated
Cost of late
change
$$$
$
5. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
5
What’s Stratus HLS?
• Cadence® Stratus High-Level
Synthesis (HLS)
o Input SystemC , C++, MATLAB®
o Automated architectural exploration
and optimization
o Integrated Genus Logic Synthesis and
Joules RTL Power Analysis
o Outputs Verilog RTL
• Dramatic improvement in PPA vs.
manual RTL creation
• Engineering force multiplier
6. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
6
Benefits of Stratus HLS
• Speed and quality of implementation
o Fastest path from spec to working RTL
• Ease of code maintenance and modification
• Automates application of low-power design techniques
• Early/accurate implementation PPA visibility
o Measure, don’t guess
• Automated exploration eases attainment of optimal PPA
o Explore a wide range of options automatically
o Optional AI/ML driven exploration
• Enables rapid change of implementation technology
o Measure cost/benefit of alternate process technology nodes
o Simply specify target technology lib, and constraints, then re-run
Logic
Synthesis
C++, MATLAB
RTL
Netlist
High-Level
Synthesis
GDSII
Physical
Implementation
Manual
Automated
7. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
7
HLS – Under the Hood
Read Source
HLS
Elab. and Lexical
Processing
Resource
Char. and Map
Scheduling and
Resource Shar.
Read Source
8. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
8
HLS – Under the Hood
Elaboration and Lexical Processing
HLS
Resource
Char. and Map
Scheduling and
Resource Shar.
Read Source
Elab. and Lexical
Processing
• Determines dataflow
• Identifies operations
• Computes required bit
widths
Dataflow
in1
out1
in2 in3 in4 in5
8 8 8 8 8
16
16
16
16
17 17
17
9. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
9
HLS – Under the Hood
Resource Characterization and Mapping
HLS
Scheduling and
Resource Shar.
Read Source
Elab. and Lexical
Processing
Resource
Char. and Map
10. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
10
HLS – Under the Hood
Scheduling and Resource Sharing
HLS
Resource
Char. and Map
Read Source
Elab. and Lexical
Processing
Scheduling and
Resource Shar.
11. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
11
HLS
The Perfect Control Point for AI Chip Design
• HLS allows
o Rapid application of intent
o Automated exploration and
optimization
• Examples
o Pipeline stalls
o Latency
o Power shutoff
o Power – sequential and
combinatorial clock gating
o Data formats: flexible floating-
point sizes, integer,
o RTL style via STARC support
o …
HLS
Simulation
Pwr. Est.
Metadata
Database
Set Up Tools
Configure
Exploration
RTL
Control Point
Inject Changes
• Logic Synthesis
• Lint
• Power Analysis
• Physical Analysis
• Verification
Automated Flow
Execution
Powerful Analysis
Cross-Referenced to Source
12. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
12
HLS for AI Chip Design
On Cadence.com
Key findings
• Design: AI Endpoint Accelerator
• 80-90% of design via HLS
• Accurate PPA estimation
• Able to make very large
architectural change very late
https://w ww.cadence.com/en_US/home/multimedia-
secured.html/content/dam/cadence-w w w /global/en_US/videos/about-
cadence/events/cadencelive/secured/2023/americas/digital-design/digital1-a-ppa-
coherent-full-flow -from-high-level-synthesis-through-physical-implementation.mp4
https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadenc
e-w ww/global/en_US/videos/tools/digital_design_signoff/tensorflow -rtl-hls-
chalktalk.mp4
https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadence-
w ww/global/en_US/videos/about-cadence/events/DAC/2018/syntiant-dac-2018-hosted-
design.mp4
Key findings
• Design: Edge Inferencing – speech
• Unified source: co-compiled into Python
for Tensorflow
• Early PPA visibility – integrated env. for
sim, synth, power
• TO 4-months from start
See Next Slide
13. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
13
Exploration: TensorFlow to RTL with Stratus HLS
Trivial example to demonstrate concepts
1. Design and train NN
2. Extract trained model metadata
3. Implement parameterized models
4. Load metadata
5. Check PPA-A (accuracy)
6. Change parameters to explore tradeoffs
TensorFlow
SystemC
and
Stratus HLS
Parameters Results
Bit
Width
Speed
Grade
(Latency)
Power
(mW)
Images /
Second
Area
(K um2
)
Accuracy
(%)
16
FAST 63.9 5,294 103.5
96.68%
MED 38.9 2,454 68.8
SLOW 11.4 234 47.6
15
FAST 94.7 5,961 113.4
96.69%
MED 39.7 2,454 65.4
SLOW 12.7 234 44.3
14
FAST 80.2 5,961 100.3
96.58%
MED 37.4 2,460 60.6
SLOW 11.6 234 41.4
13
FAST 76.9 6,026 93.8
96.04%
MED 32.5 2,460 55.6
SLOW 10.6 234 38.9
12
FAST 62.2 5,961 81.0
91.45%
MED 26.2 2,454 49.9
SLOW 9.9 234 36.2
Range of tradeoffs 9.6X 25.8X 3.1X
19% smaller
14% less power
for only
0.64% reduction in accuracy
14. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
14
HLS PPA Consistently Beats
Hand-Coded RTL
Design Reference Type Metric
% Improve
vs. Ref.
3. 5G Hand-coded RTL Area 10%
4. 5G Hand-coded RTL Area 7%
17. Audio Hand-coded RTL Area 15%
16. Camera Hand-coded RTL Area 13%
18. Camera Hand-coded RTL Area 43%
21. Camera Hand-coded RTL Area 43%
15. Display Competitor Area 9%
6. DSP Commercial IP Area 5%
9. DSP Hand-coded RTL Area 40%
12. GPU Competitor Area 20%
7. Image Proc Hand-coded RTL Area 23%
8. Image Proc Hand-coded RTL Area 3%
10. Image Sensor Hand-coded RTL Area 7%
1. Imaging Hand-coded RTL Area 47%
2. Imaging Hand-coded RTL Area 14%
5. Imaging Hand-coded RTL Area 43%
19. Imaging Competitor Area 50%
20. Imaging Competitor Area 69%
11. Mobile Graph Competitor Area 30%
13. Mobile Graph Competitor Area 16%
14. Mobile Graph Competitor Area 10%
23. Physical Sensor Hand-coded RTL Area 5%
25. Printer Hand-coded RTL Area 20%
22. Sat Comms Hand-coded RTL Area 28%
24. Smart TV Hand-coded RTL Area 20%
Design Reference Type Metric
% Improve
vs. Ref.
28. 5G Hand-coded RTL Design Time 1000%
32. ADAS Hand-coded RTL Design Time 900%
26. AI Hand-coded RTL Design Time 80%
31. AI Hand-coded RTL Design Time 200%
33. Camera Hand-coded RTL Design Time 62%
37. Camera Hand-coded RTL Design Time 50%
38. Comms Hand-coded RTL Design Time 200%
30. DSP Hand-coded RTL Design Time 400%
29. Image Proc. Hand-coded RTL Design Time 80%
34. Sat Comms Hand-coded RTL Design Time 300%
27. TV display Hand-coded RTL Design Time 70%
35. Video Codec Hand-coded RTL Design Time 30%
36. Wireless Comms Hand-coded RTL Design Time 40%
Design Reference Type Metric
% Improve
vs. Ref.
39. AI Hand-coded RTL Power 88%
41. Audio Hand-coded RTL Power 50%
42. Camera Hand-coded RTL Power 66%
44. Gen. purpose IP Hand-coded RTL Power 51%
40. Image Sensor Hand-coded RTL Power 4%
45. Image Sensor Hand-coded RTL Power 10%
43. Sat Comms Hand-coded RTL Power 75%
49% Avg Improvement
262% Avg Improvement
24% Avg Improvement
15. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
15
Like AI, HLS is Experiencing Rapid Advancement
Recent Stratus HLS innovations
• Automated creation of
synthesizable SystemC
• Automated import and
Stratus project creation
o Includes design and TB
Benefits
• Early PPA visibility
• Productivity force multiplier
Stratus/MATLAB® Integration
Towards Direct Spec Entry
• Allows HLS novices to
achieve expert-quality PPA
• Efficiently prunes solution
space
Benefits
• Average ~10% better PPA
• Productivity force multiplier
Stratus / Cadence Cerebrus Integration
Towards Autogenetic Design
16. © 2024 CadenceDesign Systems, Inc. All rightsreserved.
16
Summary
• Dynamic AI design environment requires more
productivity than hand-written RTL
• HLS is a proven RTL development solution with
greater productivity and superior PPA than
manual RTL development
• Leading semiconductor companies are
employing HLS today
Logic
Synthesis
Spec
C++, MATLAB, Python
RTL
Netlist
Manual RTL
Development
High-Level
Synthesis
GDSII
Physical
Implementation
Manual
Automated
LLM
• The future is bright with continued HLS R&D
investments promising even greater productivity
C++
?
17. © 2024 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at https://www.cadence.com/go/trademarks are trademarks or registered trademarks of
Cadence Design Systems, Inc. Accellera and SystemC are trademarks of Accellera Systems Initiative Inc. All Arm products are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All
MIPI specifications are registered trademarks or service marks owned by MIPI Alliance.All PCI-SIG specifications are registered trademarks or trademarks of PCI-SIG. All other trademarks are the property of their respective owners.