SlideShare a Scribd company logo
1 of 17
High-Level Synthesis for the Design of AI Chips
Jeff Roane – Group Director Product Management
Accelerating AI with Semiconductor RTL Front-End Services
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
2
• Challenging AI design environment
• High-Level Synthesis 101
• HLS – Perfect Control Point for AI Design
• AI design case studies
• Performance data
• Towards Autogenetic Design
Topics for Discussion
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
3
Challenging AI Design Environment
Moore's Law for AI – Size of language model doubles every 3.4 months
• Design cycles - from 1-2 years
• Environment requires ability to
adapt to changing requirements
during the design cycle
• Only the nimble survive
• Is RTL the best entry format?
Julien Simon. “LargeLanguageModels”. In: (2021). URL: https://huggingface.co/blog/large-language-models.
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
4
Challenging AI Design Environment
The cost of design change vis-à-vis the design flow
• Changing requirements affect the
entire flow
• Early-stage changes n-times costlier
than late-stage changes
• More automated implementation is
required to meet challenges of
Moore’s Law for AI
• HLS automates Spec-to-RTL
Logic
Synthesis
Spec
C++, MATLAB®, Python
RTL
Netlist
High-Level
Synthesis
Manual
RTL Dev
Licensed
RTL IP
GDSII
Physical
Implementation
Manual
Automated
Cost of late
change
$$$
$
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
5
What’s Stratus HLS?
• Cadence® Stratus High-Level
Synthesis (HLS)
o Input SystemC , C++, MATLAB®
o Automated architectural exploration
and optimization
o Integrated Genus Logic Synthesis and
Joules RTL Power Analysis
o Outputs Verilog RTL
• Dramatic improvement in PPA vs.
manual RTL creation
• Engineering force multiplier
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
6
Benefits of Stratus HLS
• Speed and quality of implementation
o Fastest path from spec to working RTL
• Ease of code maintenance and modification
• Automates application of low-power design techniques
• Early/accurate implementation PPA visibility
o Measure, don’t guess
• Automated exploration eases attainment of optimal PPA
o Explore a wide range of options automatically
o Optional AI/ML driven exploration
• Enables rapid change of implementation technology
o Measure cost/benefit of alternate process technology nodes
o Simply specify target technology lib, and constraints, then re-run
Logic
Synthesis
C++, MATLAB
RTL
Netlist
High-Level
Synthesis
GDSII
Physical
Implementation
Manual
Automated
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
7
HLS – Under the Hood
Read Source
HLS
Elab. and Lexical
Processing
Resource
Char. and Map
Scheduling and
Resource Shar.
Read Source
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
8
HLS – Under the Hood
Elaboration and Lexical Processing
HLS
Resource
Char. and Map
Scheduling and
Resource Shar.
Read Source
Elab. and Lexical
Processing
• Determines dataflow
• Identifies operations
• Computes required bit
widths
Dataflow
in1
out1
in2 in3 in4 in5
8 8 8 8 8
16
16
16
16
17 17
17
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
9
HLS – Under the Hood
Resource Characterization and Mapping
HLS
Scheduling and
Resource Shar.
Read Source
Elab. and Lexical
Processing
Resource
Char. and Map
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
10
HLS – Under the Hood
Scheduling and Resource Sharing
HLS
Resource
Char. and Map
Read Source
Elab. and Lexical
Processing
Scheduling and
Resource Shar.
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
11
HLS
The Perfect Control Point for AI Chip Design
• HLS allows
o Rapid application of intent
o Automated exploration and
optimization
• Examples
o Pipeline stalls
o Latency
o Power shutoff
o Power – sequential and
combinatorial clock gating
o Data formats: flexible floating-
point sizes, integer,
o RTL style via STARC support
o …
HLS
Simulation
Pwr. Est.
Metadata
Database
Set Up Tools
Configure
Exploration
RTL
Control Point
Inject Changes
• Logic Synthesis
• Lint
• Power Analysis
• Physical Analysis
• Verification
Automated Flow
Execution
Powerful Analysis
Cross-Referenced to Source
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
12
HLS for AI Chip Design
On Cadence.com
Key findings
• Design: AI Endpoint Accelerator
• 80-90% of design via HLS
• Accurate PPA estimation
• Able to make very large
architectural change very late
https://w ww.cadence.com/en_US/home/multimedia-
secured.html/content/dam/cadence-w w w /global/en_US/videos/about-
cadence/events/cadencelive/secured/2023/americas/digital-design/digital1-a-ppa-
coherent-full-flow -from-high-level-synthesis-through-physical-implementation.mp4
https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadenc
e-w ww/global/en_US/videos/tools/digital_design_signoff/tensorflow -rtl-hls-
chalktalk.mp4
https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadence-
w ww/global/en_US/videos/about-cadence/events/DAC/2018/syntiant-dac-2018-hosted-
design.mp4
Key findings
• Design: Edge Inferencing – speech
• Unified source: co-compiled into Python
for Tensorflow
• Early PPA visibility – integrated env. for
sim, synth, power
• TO 4-months from start
See Next Slide
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
13
Exploration: TensorFlow to RTL with Stratus HLS
Trivial example to demonstrate concepts
1. Design and train NN
2. Extract trained model metadata
3. Implement parameterized models
4. Load metadata
5. Check PPA-A (accuracy)
6. Change parameters to explore tradeoffs
TensorFlow
SystemC
and
Stratus HLS
Parameters Results
Bit
Width
Speed
Grade
(Latency)
Power
(mW)
Images /
Second
Area
(K um2
)
Accuracy
(%)
16
FAST 63.9 5,294 103.5
96.68%
MED 38.9 2,454 68.8
SLOW 11.4 234 47.6
15
FAST 94.7 5,961 113.4
96.69%
MED 39.7 2,454 65.4
SLOW 12.7 234 44.3
14
FAST 80.2 5,961 100.3
96.58%
MED 37.4 2,460 60.6
SLOW 11.6 234 41.4
13
FAST 76.9 6,026 93.8
96.04%
MED 32.5 2,460 55.6
SLOW 10.6 234 38.9
12
FAST 62.2 5,961 81.0
91.45%
MED 26.2 2,454 49.9
SLOW 9.9 234 36.2
Range of tradeoffs 9.6X 25.8X 3.1X
19% smaller
14% less power
for only
0.64% reduction in accuracy
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
14
HLS PPA Consistently Beats
Hand-Coded RTL
Design Reference Type Metric
% Improve
vs. Ref.
3. 5G Hand-coded RTL Area 10%
4. 5G Hand-coded RTL Area 7%
17. Audio Hand-coded RTL Area 15%
16. Camera Hand-coded RTL Area 13%
18. Camera Hand-coded RTL Area 43%
21. Camera Hand-coded RTL Area 43%
15. Display Competitor Area 9%
6. DSP Commercial IP Area 5%
9. DSP Hand-coded RTL Area 40%
12. GPU Competitor Area 20%
7. Image Proc Hand-coded RTL Area 23%
8. Image Proc Hand-coded RTL Area 3%
10. Image Sensor Hand-coded RTL Area 7%
1. Imaging Hand-coded RTL Area 47%
2. Imaging Hand-coded RTL Area 14%
5. Imaging Hand-coded RTL Area 43%
19. Imaging Competitor Area 50%
20. Imaging Competitor Area 69%
11. Mobile Graph Competitor Area 30%
13. Mobile Graph Competitor Area 16%
14. Mobile Graph Competitor Area 10%
23. Physical Sensor Hand-coded RTL Area 5%
25. Printer Hand-coded RTL Area 20%
22. Sat Comms Hand-coded RTL Area 28%
24. Smart TV Hand-coded RTL Area 20%
Design Reference Type Metric
% Improve
vs. Ref.
28. 5G Hand-coded RTL Design Time 1000%
32. ADAS Hand-coded RTL Design Time 900%
26. AI Hand-coded RTL Design Time 80%
31. AI Hand-coded RTL Design Time 200%
33. Camera Hand-coded RTL Design Time 62%
37. Camera Hand-coded RTL Design Time 50%
38. Comms Hand-coded RTL Design Time 200%
30. DSP Hand-coded RTL Design Time 400%
29. Image Proc. Hand-coded RTL Design Time 80%
34. Sat Comms Hand-coded RTL Design Time 300%
27. TV display Hand-coded RTL Design Time 70%
35. Video Codec Hand-coded RTL Design Time 30%
36. Wireless Comms Hand-coded RTL Design Time 40%
Design Reference Type Metric
% Improve
vs. Ref.
39. AI Hand-coded RTL Power 88%
41. Audio Hand-coded RTL Power 50%
42. Camera Hand-coded RTL Power 66%
44. Gen. purpose IP Hand-coded RTL Power 51%
40. Image Sensor Hand-coded RTL Power 4%
45. Image Sensor Hand-coded RTL Power 10%
43. Sat Comms Hand-coded RTL Power 75%
49% Avg Improvement
262% Avg Improvement
24% Avg Improvement
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
15
Like AI, HLS is Experiencing Rapid Advancement
Recent Stratus HLS innovations
• Automated creation of
synthesizable SystemC
• Automated import and
Stratus project creation
o Includes design and TB
Benefits
• Early PPA visibility
• Productivity force multiplier
Stratus/MATLAB® Integration
Towards Direct Spec Entry
• Allows HLS novices to
achieve expert-quality PPA
• Efficiently prunes solution
space
Benefits
• Average ~10% better PPA
• Productivity force multiplier
Stratus / Cadence Cerebrus Integration
Towards Autogenetic Design
© 2024 CadenceDesign Systems, Inc. All rightsreserved.
16
Summary
• Dynamic AI design environment requires more
productivity than hand-written RTL
• HLS is a proven RTL development solution with
greater productivity and superior PPA than
manual RTL development
• Leading semiconductor companies are
employing HLS today
Logic
Synthesis
Spec
C++, MATLAB, Python
RTL
Netlist
Manual RTL
Development
High-Level
Synthesis
GDSII
Physical
Implementation
Manual
Automated
LLM
• The future is bright with continued HLS R&D
investments promising even greater productivity
C++
?
© 2024 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at https://www.cadence.com/go/trademarks are trademarks or registered trademarks of
Cadence Design Systems, Inc. Accellera and SystemC are trademarks of Accellera Systems Initiative Inc. All Arm products are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All
MIPI specifications are registered trademarks or service marks owned by MIPI Alliance.All PCI-SIG specifications are registered trademarks or trademarks of PCI-SIG. All other trademarks are the property of their respective owners.

More Related Content

Similar to High-Level Synthesis for the Design of AI Chips

How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
VMware Tanzu
 
Chirko, Kenneth Resume - long
Chirko, Kenneth Resume - longChirko, Kenneth Resume - long
Chirko, Kenneth Resume - long
Kenneth Chirko
 

Similar to High-Level Synthesis for the Design of AI Chips (20)

2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개
2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개
2018 Pivotal DevOps Day_Pivotal 소개 및 세션 아젠다 소개
 
2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례
2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례
2018 Pivotal DevOps Day_마이크로서비스 전환 방법론과 사례
 
Accelerating rtl reuse
Accelerating rtl reuseAccelerating rtl reuse
Accelerating rtl reuse
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016
 
Smarter Retail
Smarter RetailSmarter Retail
Smarter Retail
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
PlantPAx® System - What’s New & What’s Next
PlantPAx® System - What’s New & What’s NextPlantPAx® System - What’s New & What’s Next
PlantPAx® System - What’s New & What’s Next
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Delivering Mission Critical Applications with Leostream and HP RGS
Delivering Mission Critical Applications with Leostream and HP RGSDelivering Mission Critical Applications with Leostream and HP RGS
Delivering Mission Critical Applications with Leostream and HP RGS
 
What is the future of DevOps and its growing trends.pptx
What is the future of DevOps and its growing trends.pptxWhat is the future of DevOps and its growing trends.pptx
What is the future of DevOps and its growing trends.pptx
 
Serverless: Market Overview and Investment Opportunities
Serverless: Market Overview and Investment OpportunitiesServerless: Market Overview and Investment Opportunities
Serverless: Market Overview and Investment Opportunities
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
 
IBM ALM for aviation safety compliance aerospace
IBM ALM for aviation safety compliance aerospaceIBM ALM for aviation safety compliance aerospace
IBM ALM for aviation safety compliance aerospace
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?
 
Deploying more technology to shift from agility to anti-fragility
Deploying more technology to shift from agility to anti-fragilityDeploying more technology to shift from agility to anti-fragility
Deploying more technology to shift from agility to anti-fragility
 
Chirko, Kenneth Resume - long
Chirko, Kenneth Resume - longChirko, Kenneth Resume - long
Chirko, Kenneth Resume - long
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
Collaborate 2018: Hyperion and PeopleSoft - Hands Off Automation
Collaborate 2018: Hyperion and PeopleSoft - Hands Off AutomationCollaborate 2018: Hyperion and PeopleSoft - Hands Off Automation
Collaborate 2018: Hyperion and PeopleSoft - Hands Off Automation
 
Gartner pace and bi-modal models
Gartner pace and bi-modal modelsGartner pace and bi-modal models
Gartner pace and bi-modal models
 

More from Object Automation

More from Object Automation (20)

RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION IncRTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
 
RTL Design Methodologies_Object Automation Inc
RTL Design Methodologies_Object Automation IncRTL Design Methodologies_Object Automation Inc
RTL Design Methodologies_Object Automation Inc
 
AI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous IntegrationAI-Inspired IOT Chiplets and 3D Heterogeneous Integration
AI-Inspired IOT Chiplets and 3D Heterogeneous Integration
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
CDAC presentation as part of Global AI Festival and Future
CDAC presentation as part of Global AI Festival and FutureCDAC presentation as part of Global AI Festival and Future
CDAC presentation as part of Global AI Festival and Future
 
Global AI Festivla and Future one day event
Global AI Festivla and Future one day eventGlobal AI Festivla and Future one day event
Global AI Festivla and Future one day event
 
Generative AI In Logistics_Object Automation
Generative AI In Logistics_Object AutomationGenerative AI In Logistics_Object Automation
Generative AI In Logistics_Object Automation
 
Gen AI_Object Automation_TechnologyWorkshop
Gen AI_Object Automation_TechnologyWorkshopGen AI_Object Automation_TechnologyWorkshop
Gen AI_Object Automation_TechnologyWorkshop
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfAI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
 
5G Edge Computing_Object Automation workshop
5G Edge Computing_Object Automation workshop5G Edge Computing_Object Automation workshop
5G Edge Computing_Object Automation workshop
 
COE AI Lab Universities
COE AI Lab UniversitiesCOE AI Lab Universities
COE AI Lab Universities
 
Bootcamp_AIApps.pdf
Bootcamp_AIApps.pdfBootcamp_AIApps.pdf
Bootcamp_AIApps.pdf
 
Bootcamp_AIApps.pdf
Bootcamp_AIApps.pdfBootcamp_AIApps.pdf
Bootcamp_AIApps.pdf
 
Bootcamp_AIAppsUCSD.pptx
Bootcamp_AIAppsUCSD.pptxBootcamp_AIAppsUCSD.pptx
Bootcamp_AIAppsUCSD.pptx
 
Course_Object Automation.pdf
Course_Object Automation.pdfCourse_Object Automation.pdf
Course_Object Automation.pdf
 
Enterprise AI_New.pdf
Enterprise AI_New.pdfEnterprise AI_New.pdf
Enterprise AI_New.pdf
 
Super AI tools
Super AI toolsSuper AI tools
Super AI tools
 
Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

High-Level Synthesis for the Design of AI Chips

  • 1. High-Level Synthesis for the Design of AI Chips Jeff Roane – Group Director Product Management Accelerating AI with Semiconductor RTL Front-End Services
  • 2. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 2 • Challenging AI design environment • High-Level Synthesis 101 • HLS – Perfect Control Point for AI Design • AI design case studies • Performance data • Towards Autogenetic Design Topics for Discussion
  • 3. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 3 Challenging AI Design Environment Moore's Law for AI – Size of language model doubles every 3.4 months • Design cycles - from 1-2 years • Environment requires ability to adapt to changing requirements during the design cycle • Only the nimble survive • Is RTL the best entry format? Julien Simon. “LargeLanguageModels”. In: (2021). URL: https://huggingface.co/blog/large-language-models.
  • 4. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 4 Challenging AI Design Environment The cost of design change vis-à-vis the design flow • Changing requirements affect the entire flow • Early-stage changes n-times costlier than late-stage changes • More automated implementation is required to meet challenges of Moore’s Law for AI • HLS automates Spec-to-RTL Logic Synthesis Spec C++, MATLAB®, Python RTL Netlist High-Level Synthesis Manual RTL Dev Licensed RTL IP GDSII Physical Implementation Manual Automated Cost of late change $$$ $
  • 5. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 5 What’s Stratus HLS? • Cadence® Stratus High-Level Synthesis (HLS) o Input SystemC , C++, MATLAB® o Automated architectural exploration and optimization o Integrated Genus Logic Synthesis and Joules RTL Power Analysis o Outputs Verilog RTL • Dramatic improvement in PPA vs. manual RTL creation • Engineering force multiplier
  • 6. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 6 Benefits of Stratus HLS • Speed and quality of implementation o Fastest path from spec to working RTL • Ease of code maintenance and modification • Automates application of low-power design techniques • Early/accurate implementation PPA visibility o Measure, don’t guess • Automated exploration eases attainment of optimal PPA o Explore a wide range of options automatically o Optional AI/ML driven exploration • Enables rapid change of implementation technology o Measure cost/benefit of alternate process technology nodes o Simply specify target technology lib, and constraints, then re-run Logic Synthesis C++, MATLAB RTL Netlist High-Level Synthesis GDSII Physical Implementation Manual Automated
  • 7. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 7 HLS – Under the Hood Read Source HLS Elab. and Lexical Processing Resource Char. and Map Scheduling and Resource Shar. Read Source
  • 8. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 8 HLS – Under the Hood Elaboration and Lexical Processing HLS Resource Char. and Map Scheduling and Resource Shar. Read Source Elab. and Lexical Processing • Determines dataflow • Identifies operations • Computes required bit widths Dataflow in1 out1 in2 in3 in4 in5 8 8 8 8 8 16 16 16 16 17 17 17
  • 9. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 9 HLS – Under the Hood Resource Characterization and Mapping HLS Scheduling and Resource Shar. Read Source Elab. and Lexical Processing Resource Char. and Map
  • 10. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 10 HLS – Under the Hood Scheduling and Resource Sharing HLS Resource Char. and Map Read Source Elab. and Lexical Processing Scheduling and Resource Shar.
  • 11. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 11 HLS The Perfect Control Point for AI Chip Design • HLS allows o Rapid application of intent o Automated exploration and optimization • Examples o Pipeline stalls o Latency o Power shutoff o Power – sequential and combinatorial clock gating o Data formats: flexible floating- point sizes, integer, o RTL style via STARC support o … HLS Simulation Pwr. Est. Metadata Database Set Up Tools Configure Exploration RTL Control Point Inject Changes • Logic Synthesis • Lint • Power Analysis • Physical Analysis • Verification Automated Flow Execution Powerful Analysis Cross-Referenced to Source
  • 12. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 12 HLS for AI Chip Design On Cadence.com Key findings • Design: AI Endpoint Accelerator • 80-90% of design via HLS • Accurate PPA estimation • Able to make very large architectural change very late https://w ww.cadence.com/en_US/home/multimedia- secured.html/content/dam/cadence-w w w /global/en_US/videos/about- cadence/events/cadencelive/secured/2023/americas/digital-design/digital1-a-ppa- coherent-full-flow -from-high-level-synthesis-through-physical-implementation.mp4 https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadenc e-w ww/global/en_US/videos/tools/digital_design_signoff/tensorflow -rtl-hls- chalktalk.mp4 https://w ww.cadence.com/en_US/home/multimedia.html/content/dam/cadence- w ww/global/en_US/videos/about-cadence/events/DAC/2018/syntiant-dac-2018-hosted- design.mp4 Key findings • Design: Edge Inferencing – speech • Unified source: co-compiled into Python for Tensorflow • Early PPA visibility – integrated env. for sim, synth, power • TO 4-months from start See Next Slide
  • 13. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 13 Exploration: TensorFlow to RTL with Stratus HLS Trivial example to demonstrate concepts 1. Design and train NN 2. Extract trained model metadata 3. Implement parameterized models 4. Load metadata 5. Check PPA-A (accuracy) 6. Change parameters to explore tradeoffs TensorFlow SystemC and Stratus HLS Parameters Results Bit Width Speed Grade (Latency) Power (mW) Images / Second Area (K um2 ) Accuracy (%) 16 FAST 63.9 5,294 103.5 96.68% MED 38.9 2,454 68.8 SLOW 11.4 234 47.6 15 FAST 94.7 5,961 113.4 96.69% MED 39.7 2,454 65.4 SLOW 12.7 234 44.3 14 FAST 80.2 5,961 100.3 96.58% MED 37.4 2,460 60.6 SLOW 11.6 234 41.4 13 FAST 76.9 6,026 93.8 96.04% MED 32.5 2,460 55.6 SLOW 10.6 234 38.9 12 FAST 62.2 5,961 81.0 91.45% MED 26.2 2,454 49.9 SLOW 9.9 234 36.2 Range of tradeoffs 9.6X 25.8X 3.1X 19% smaller 14% less power for only 0.64% reduction in accuracy
  • 14. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 14 HLS PPA Consistently Beats Hand-Coded RTL Design Reference Type Metric % Improve vs. Ref. 3. 5G Hand-coded RTL Area 10% 4. 5G Hand-coded RTL Area 7% 17. Audio Hand-coded RTL Area 15% 16. Camera Hand-coded RTL Area 13% 18. Camera Hand-coded RTL Area 43% 21. Camera Hand-coded RTL Area 43% 15. Display Competitor Area 9% 6. DSP Commercial IP Area 5% 9. DSP Hand-coded RTL Area 40% 12. GPU Competitor Area 20% 7. Image Proc Hand-coded RTL Area 23% 8. Image Proc Hand-coded RTL Area 3% 10. Image Sensor Hand-coded RTL Area 7% 1. Imaging Hand-coded RTL Area 47% 2. Imaging Hand-coded RTL Area 14% 5. Imaging Hand-coded RTL Area 43% 19. Imaging Competitor Area 50% 20. Imaging Competitor Area 69% 11. Mobile Graph Competitor Area 30% 13. Mobile Graph Competitor Area 16% 14. Mobile Graph Competitor Area 10% 23. Physical Sensor Hand-coded RTL Area 5% 25. Printer Hand-coded RTL Area 20% 22. Sat Comms Hand-coded RTL Area 28% 24. Smart TV Hand-coded RTL Area 20% Design Reference Type Metric % Improve vs. Ref. 28. 5G Hand-coded RTL Design Time 1000% 32. ADAS Hand-coded RTL Design Time 900% 26. AI Hand-coded RTL Design Time 80% 31. AI Hand-coded RTL Design Time 200% 33. Camera Hand-coded RTL Design Time 62% 37. Camera Hand-coded RTL Design Time 50% 38. Comms Hand-coded RTL Design Time 200% 30. DSP Hand-coded RTL Design Time 400% 29. Image Proc. Hand-coded RTL Design Time 80% 34. Sat Comms Hand-coded RTL Design Time 300% 27. TV display Hand-coded RTL Design Time 70% 35. Video Codec Hand-coded RTL Design Time 30% 36. Wireless Comms Hand-coded RTL Design Time 40% Design Reference Type Metric % Improve vs. Ref. 39. AI Hand-coded RTL Power 88% 41. Audio Hand-coded RTL Power 50% 42. Camera Hand-coded RTL Power 66% 44. Gen. purpose IP Hand-coded RTL Power 51% 40. Image Sensor Hand-coded RTL Power 4% 45. Image Sensor Hand-coded RTL Power 10% 43. Sat Comms Hand-coded RTL Power 75% 49% Avg Improvement 262% Avg Improvement 24% Avg Improvement
  • 15. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 15 Like AI, HLS is Experiencing Rapid Advancement Recent Stratus HLS innovations • Automated creation of synthesizable SystemC • Automated import and Stratus project creation o Includes design and TB Benefits • Early PPA visibility • Productivity force multiplier Stratus/MATLAB® Integration Towards Direct Spec Entry • Allows HLS novices to achieve expert-quality PPA • Efficiently prunes solution space Benefits • Average ~10% better PPA • Productivity force multiplier Stratus / Cadence Cerebrus Integration Towards Autogenetic Design
  • 16. © 2024 CadenceDesign Systems, Inc. All rightsreserved. 16 Summary • Dynamic AI design environment requires more productivity than hand-written RTL • HLS is a proven RTL development solution with greater productivity and superior PPA than manual RTL development • Leading semiconductor companies are employing HLS today Logic Synthesis Spec C++, MATLAB, Python RTL Netlist Manual RTL Development High-Level Synthesis GDSII Physical Implementation Manual Automated LLM • The future is bright with continued HLS R&D investments promising even greater productivity C++ ?
  • 17. © 2024 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at https://www.cadence.com/go/trademarks are trademarks or registered trademarks of Cadence Design Systems, Inc. Accellera and SystemC are trademarks of Accellera Systems Initiative Inc. All Arm products are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All MIPI specifications are registered trademarks or service marks owned by MIPI Alliance.All PCI-SIG specifications are registered trademarks or trademarks of PCI-SIG. All other trademarks are the property of their respective owners.