SlideShare a Scribd company logo
1 of 27
NTHU-CS VLSI/CAD LAB
Speaker:Yi-Wen Hung
2022/12/18
NTHU-CS VLSI/CAD LAB
 A Comprehensive Survey on Hardware-
Aware Neural Architecture Search
 Submitted to Proceedings of IEEE
 Listed in the footnote of each slides
2
NTHU-CS VLSI/CAD LAB
 HW-aware NAS
 Issues & Related Works in HW-NAS
 Goal
 NAS
 HW Cost
 Benchmarks
 Discussions
3
NTHU-CS VLSI/CAD LAB
 Long ASIC/NN design iteration
 The ASIC-NN design iteration is manual and not a turn-key solution
4
NTHU-CS VLSI/CAD LAB
 Long ASIC/NN design iteration
 The ASIC-NN design iteration is manual and not a turn-key solution,
even using Network Architecture Search
5
Ref: Lin et. al, “MCUNet: Tiny Deep Learning on IoT Devices”
NTHU-CS VLSI/CAD LAB
 Hardware-aware auto NN architecture
design
 Goal: Find the best
accuracy NN arch.
w/ HW constraint
6
NTHU-CS VLSI/CAD LAB
 HW-aware NAS
 Issues & Related Works in HW-NAS
 Goal
 NAS
 HW Cost
 Benchmarks
 Discussions
7
NTHU-CS VLSI/CAD LAB
 Goals
 Search space
 Search strategy
 Multi-Objective
 Non-differentiable HW constraints
 HW cost model
 Others
8
NTHU-CS VLSI/CAD LAB
 Single Target
Search architectures for a single specific HW
 Single Config: best accuracy w/ HW constraints
 E. g., single configuration goal get a best accuracy with HW constraints
 Multiple Config: best accuracy, best latency
 E.g., multiple configuration goal get the best accuracy model, and the best latency model
 Multiple Targets
Search architectures consider multiple HWs simultaneously
9
NTHU-CS VLSI/CAD LAB
 Architecture search space
Architecture search space is a set of architectures contains the solutions
 Hyperparameter search
 #channel, stride, kernel size
10
Ref: Ma et. al, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA”
NTHU-CS VLSI/CAD LAB
 Architecture search space
Architecture search space is a set of architectures contains the solutions
 Whole architecture
 Layer-wise, Cell-based, Hierarchical
11
Layer-wise Cell-based Hierarchical
NTHU-CS VLSI/CAD LAB
 Hardware search space
 Parameter-based, Template-based
The search space is formalized by a set of different parameter configuration to fit
the HW design
12
Ref: Jiang et. al, “Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search”
NTHU-CS VLSI/CAD LAB
 Hardware search space
 Parameter-based, Template-based
The search space is defined as a set of pre-configured HW designs
 Categories: server, mobile, tiny
13
Ref: Jiang et. al, “Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks”
NTHU-CS VLSI/CAD LAB
 Goal: sample architecture candidates from a
search space
 Evolutionary, Reinforcement, Gradient-based,
Bayesian Optimization, Random Search
 Hybrid
Hybrid different search strategy for speed up or better exploration/exploitation
 Evolutionary w/ RL or Bayesian or Gradient-
based
14
NTHU-CS VLSI/CAD LAB
 Single
 Two-stage
Find the best accuracy model, then optimize HW cost
 Constrained Optimization
Find the best accuracy model under a specific constraint
 Multiple
 Scalarization
Optimize the model with weighted sum of the accuracy & HW metrics
 NSGA-II
Find solutions that are better than all previous solutions in terms of all
objectives
15
NTHU-CS VLSI/CAD LAB
 Gumbel Softmax
Use softmax & temperature, which are differentiable, to approximate one-hot with
argmax
 Estimated Continuous Function
 REINFORCE
16
Ref: Jang et. al, “Categorical Reparameterization with Gumbel-Softmax”
≈
NTHU-CS VLSI/CAD LAB
 Gumbel Softmax
 Estimated Continuous Function
Use continuous variable as the activation probability of non-continuous constraints,
used in gated operation
 REINFORCE
Use reinforcement learning to learn the policy, and sample the choice from
the poicy
17
0.7 0.15 0.05 0.1
Ref: Cai et. al, “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”
NTHU-CS VLSI/CAD LAB
 Method
 Real-time measure, LUT, Analytical estimation,
Prediction model
The sampled model is executed on the hardware target while searching
18
Ref: Lin et. al, “MCUNet: Tiny Deep Learning on IoT Devices”
NTHU-CS VLSI/CAD LAB
 Method
 Real-time measure, LUT, Analytical estimation,
Prediction model
A lookup table is created beforehand and filled with each operator latency on the targeted hardware.
Once the search starts, the system will calculate the overall cost from the lookup table
19
Ref: Wu et. al, “FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search”
NTHU-CS VLSI/CAD LAB
 Method
 Real-time measure, LUT, Analytical estimation,
Prediction model
Compute a rough estimate using the processing time, the stall time, and the
starting time
20
Ref: Marchisio et. al, “NASCaps: a framework for neural architecture search to optimize the accuracy and hardware efficiency of
convolutional capsule networks”
more complex
For instance, a
ating structure,
the DeepCaps
cture has been
e for the single
l ClassCapsule
hen completed
osition of askip
plicitly indicate
ows the format
which is, from
kip
ection
Resize
flag
capsout
ape
footprint is computed as the sum of the number of weights for
each layer. They are modeled for each operation in a modular
way (i.e., bottom-up). First, the weights must be loaded onto the
PE array, then reused as long as they need to be multiplied by
other inputs. Afterward, the next group of weights is loaded
until all the computations of the layers are done (see Eqs. 2-4).
The model has been validated by comparing the results with the
hardware implementation of the CapsAcc [15] accelerator. The
adopted model parameters arethefollowing:
• w_load_cycles: number of clock cycles required to load the
weight onto thePEarray,
• w_loads: number of groupsof weights loaded onto thePEarray,
• cycles(l): number of cycles required to executethelayer l,
• ma: number of memory accesses,
• enmem : energy consumption of asinglememory accesses,
• pwrPEA: power consumption of thePEarray.
w_load_cycles = 16 (2)
w_loads =
⇠
weiдhts
16·min(16,sums_per_out)
⇡
(3)
cycles(l) = w_load_cycles·w_loads+ data_per_weiдht (4)
The overall latency is then computed as the sum of the
contributions of thelayers (Eq. 5).
latency =
’
l 2L
cycles(l) ·T (5)
In the Eq. 6, the number of memory accesses is computed by
distinguishingwhether theoperation isaconvolutional layer or not.
more complex
For instance, a
ating structure,
the DeepCaps
cture has been
e for the single
al ClassCapsule
hen completed
osition of askip
plicitly indicate
ows the format
which is, from
kip
ection
Resize
flag
capsout
ape
ype.
PE array, then reused as long as they need to be multiplied by
other inputs. Afterward, the next group of weights is loaded
until all the computations of the layers are done (see Eqs. 2-4).
The model has been validated by comparing the results with the
hardware implementation of the CapsAcc [15] accelerator. The
adopted model parameters arethefollowing:
• w_load_cycles: number of clock cycles required to load the
weight onto thePEarray,
• w_loads: number of groupsof weightsloaded onto thePEarray,
• cycles(l): number of cyclesrequired to executethelayer l,
• ma: number of memory accesses,
• enmem: energy consumption of asinglememory accesses,
• pwrPEA: power consumption of thePEarray.
w_load_cycles = 16 (2)
w_loads =
⇠
weiдhts
16·min(16,sums_per_out)
⇡
(3)
cycles(l) = w_load_cycles·w_loads+ data_per_weiдht (4)
The overall latency is then computed as the sum of the
contributions of thelayers(Eq. 5).
latency =
’
l 2L
cycles(l) ·T (5)
In the Eq. 6, the number of memory accesses is computed by
distinguishingwhether theoperation isaconvolutional layer or not.
Such adistinction hasbeen implemented by analyzing thevalueof
data_per_weiдht, which isgreater than 1 for convolutional layers
NTHU-CS VLSI/CAD LAB
 Method
 Real-time measure, LUT, Analytical estimation,
Prediction model
Build a ML model to predict the cost using architecture and dataset feature.
21
Ref: Cai et. al, “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”
NTHU-CS VLSI/CAD LAB
 Method
 Real-time measure, LUT, Analytical estimation,
Prediction model
 Metrics
 FLOPs & #Prameters, Latency, Energy
Consumption, Area, Memory Footprint
22
NTHU-CS VLSI/CAD LAB
 Speed up
 Early Stop, Hot Start (warm up), Proxy Datasets,
Accuracy Prediction
 Quantization & Pruning
 Auto mix precision, Auto pruning
 Security & Reliability
 Adversarial attack
23
NTHU-CS VLSI/CAD LAB
 HW-aware NAS
 Issues & Related Works in HW-NAS
 Goal
 NAS
 HW Cost
 Benchmarks
 Discussions
24
NTHU-CS VLSI/CAD LAB
 Lack of reproducibility
Due to the use of different search spaces, various training methods, and the
required significant computational resources, reproducibility is a difficult step.
 For NAS
 NAS-Bench-101, NAS-Bench-201, NATS-Bench,
NAS-Bench-1shot1, NAS-Bench-NLP, NAS-
Bench-301
 For HWNAS
 HW-NAS-Bench
25
NTHU-CS VLSI/CAD LAB
 HW-aware NAS
 Issues & Related Works in HW-NAS
 Goal
 NAS
 HW Cost
 Benchmarks
 Discussions - HWNAS Applications
26
NTHU-CS VLSI/CAD LAB
 An auto model refinement tool for model-
accelerator integration
Given a pretrained model & a target HW, find a refined model that satisfied
constraints on the target HW
 An HW-SW co-op model deployment tool
for FPGA
Given a pretrained model & a target FPGA, find a refined model & a target
accelerator HDL that satisfied constraints on the target FPGA
 Similar with MCUNet, but wider integration
scope
27

More Related Content

Similar to Survey on HW-aware NAS

A Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterA Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterJames McGalliard
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudAmazon Web Services
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...IJSTA
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSahil Kaw
 
Feature model based commonality and variability analysis for virtual cluster ...
Feature model based commonality and variability analysis for virtual cluster ...Feature model based commonality and variability analysis for virtual cluster ...
Feature model based commonality and variability analysis for virtual cluster ...csandit
 
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...cscpconf
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesJames Johnson
 
IEEE 2015 Java Projects
IEEE 2015 Java ProjectsIEEE 2015 Java Projects
IEEE 2015 Java ProjectsVijay Karan
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsSasha Lazarevic
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...IJECEIAES
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland mictc
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
 

Similar to Survey on HW-aware NAS (20)

E035425030
E035425030E035425030
E035425030
 
A Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing CenterA Queue Simulation Tool for a High Performance Scientific Computing Center
A Queue Simulation Tool for a High Performance Scientific Computing Center
 
Nimrod cloud
Nimrod cloudNimrod cloud
Nimrod cloud
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning Algorithm
 
Feature model based commonality and variability analysis for virtual cluster ...
Feature model based commonality and variability analysis for virtual cluster ...Feature model based commonality and variability analysis for virtual cluster ...
Feature model based commonality and variability analysis for virtual cluster ...
 
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...
FEATURE-MODEL-BASED COMMONALITY AND VARIABILITY ANALYSIS FOR VIRTUAL CLUSTER ...
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
 
IEEE 2015 Java Projects
IEEE 2015 Java ProjectsIEEE 2015 Java Projects
IEEE 2015 Java Projects
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutions
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Cisco project ideas
Cisco   project ideasCisco   project ideas
Cisco project ideas
 
Sigcomm16 sdn-nvf-topics-preview
Sigcomm16 sdn-nvf-topics-previewSigcomm16 sdn-nvf-topics-preview
Sigcomm16 sdn-nvf-topics-preview
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Survey on HW-aware NAS

  • 2. NTHU-CS VLSI/CAD LAB  A Comprehensive Survey on Hardware- Aware Neural Architecture Search  Submitted to Proceedings of IEEE  Listed in the footnote of each slides 2
  • 3. NTHU-CS VLSI/CAD LAB  HW-aware NAS  Issues & Related Works in HW-NAS  Goal  NAS  HW Cost  Benchmarks  Discussions 3
  • 4. NTHU-CS VLSI/CAD LAB  Long ASIC/NN design iteration  The ASIC-NN design iteration is manual and not a turn-key solution 4
  • 5. NTHU-CS VLSI/CAD LAB  Long ASIC/NN design iteration  The ASIC-NN design iteration is manual and not a turn-key solution, even using Network Architecture Search 5 Ref: Lin et. al, “MCUNet: Tiny Deep Learning on IoT Devices”
  • 6. NTHU-CS VLSI/CAD LAB  Hardware-aware auto NN architecture design  Goal: Find the best accuracy NN arch. w/ HW constraint 6
  • 7. NTHU-CS VLSI/CAD LAB  HW-aware NAS  Issues & Related Works in HW-NAS  Goal  NAS  HW Cost  Benchmarks  Discussions 7
  • 8. NTHU-CS VLSI/CAD LAB  Goals  Search space  Search strategy  Multi-Objective  Non-differentiable HW constraints  HW cost model  Others 8
  • 9. NTHU-CS VLSI/CAD LAB  Single Target Search architectures for a single specific HW  Single Config: best accuracy w/ HW constraints  E. g., single configuration goal get a best accuracy with HW constraints  Multiple Config: best accuracy, best latency  E.g., multiple configuration goal get the best accuracy model, and the best latency model  Multiple Targets Search architectures consider multiple HWs simultaneously 9
  • 10. NTHU-CS VLSI/CAD LAB  Architecture search space Architecture search space is a set of architectures contains the solutions  Hyperparameter search  #channel, stride, kernel size 10 Ref: Ma et. al, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA”
  • 11. NTHU-CS VLSI/CAD LAB  Architecture search space Architecture search space is a set of architectures contains the solutions  Whole architecture  Layer-wise, Cell-based, Hierarchical 11 Layer-wise Cell-based Hierarchical
  • 12. NTHU-CS VLSI/CAD LAB  Hardware search space  Parameter-based, Template-based The search space is formalized by a set of different parameter configuration to fit the HW design 12 Ref: Jiang et. al, “Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search”
  • 13. NTHU-CS VLSI/CAD LAB  Hardware search space  Parameter-based, Template-based The search space is defined as a set of pre-configured HW designs  Categories: server, mobile, tiny 13 Ref: Jiang et. al, “Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks”
  • 14. NTHU-CS VLSI/CAD LAB  Goal: sample architecture candidates from a search space  Evolutionary, Reinforcement, Gradient-based, Bayesian Optimization, Random Search  Hybrid Hybrid different search strategy for speed up or better exploration/exploitation  Evolutionary w/ RL or Bayesian or Gradient- based 14
  • 15. NTHU-CS VLSI/CAD LAB  Single  Two-stage Find the best accuracy model, then optimize HW cost  Constrained Optimization Find the best accuracy model under a specific constraint  Multiple  Scalarization Optimize the model with weighted sum of the accuracy & HW metrics  NSGA-II Find solutions that are better than all previous solutions in terms of all objectives 15
  • 16. NTHU-CS VLSI/CAD LAB  Gumbel Softmax Use softmax & temperature, which are differentiable, to approximate one-hot with argmax  Estimated Continuous Function  REINFORCE 16 Ref: Jang et. al, “Categorical Reparameterization with Gumbel-Softmax” ≈
  • 17. NTHU-CS VLSI/CAD LAB  Gumbel Softmax  Estimated Continuous Function Use continuous variable as the activation probability of non-continuous constraints, used in gated operation  REINFORCE Use reinforcement learning to learn the policy, and sample the choice from the poicy 17 0.7 0.15 0.05 0.1 Ref: Cai et. al, “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”
  • 18. NTHU-CS VLSI/CAD LAB  Method  Real-time measure, LUT, Analytical estimation, Prediction model The sampled model is executed on the hardware target while searching 18 Ref: Lin et. al, “MCUNet: Tiny Deep Learning on IoT Devices”
  • 19. NTHU-CS VLSI/CAD LAB  Method  Real-time measure, LUT, Analytical estimation, Prediction model A lookup table is created beforehand and filled with each operator latency on the targeted hardware. Once the search starts, the system will calculate the overall cost from the lookup table 19 Ref: Wu et. al, “FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search”
  • 20. NTHU-CS VLSI/CAD LAB  Method  Real-time measure, LUT, Analytical estimation, Prediction model Compute a rough estimate using the processing time, the stall time, and the starting time 20 Ref: Marchisio et. al, “NASCaps: a framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks” more complex For instance, a ating structure, the DeepCaps cture has been e for the single l ClassCapsule hen completed osition of askip plicitly indicate ows the format which is, from kip ection Resize flag capsout ape footprint is computed as the sum of the number of weights for each layer. They are modeled for each operation in a modular way (i.e., bottom-up). First, the weights must be loaded onto the PE array, then reused as long as they need to be multiplied by other inputs. Afterward, the next group of weights is loaded until all the computations of the layers are done (see Eqs. 2-4). The model has been validated by comparing the results with the hardware implementation of the CapsAcc [15] accelerator. The adopted model parameters arethefollowing: • w_load_cycles: number of clock cycles required to load the weight onto thePEarray, • w_loads: number of groupsof weights loaded onto thePEarray, • cycles(l): number of cycles required to executethelayer l, • ma: number of memory accesses, • enmem : energy consumption of asinglememory accesses, • pwrPEA: power consumption of thePEarray. w_load_cycles = 16 (2) w_loads = ⇠ weiдhts 16·min(16,sums_per_out) ⇡ (3) cycles(l) = w_load_cycles·w_loads+ data_per_weiдht (4) The overall latency is then computed as the sum of the contributions of thelayers (Eq. 5). latency = ’ l 2L cycles(l) ·T (5) In the Eq. 6, the number of memory accesses is computed by distinguishingwhether theoperation isaconvolutional layer or not. more complex For instance, a ating structure, the DeepCaps cture has been e for the single al ClassCapsule hen completed osition of askip plicitly indicate ows the format which is, from kip ection Resize flag capsout ape ype. PE array, then reused as long as they need to be multiplied by other inputs. Afterward, the next group of weights is loaded until all the computations of the layers are done (see Eqs. 2-4). The model has been validated by comparing the results with the hardware implementation of the CapsAcc [15] accelerator. The adopted model parameters arethefollowing: • w_load_cycles: number of clock cycles required to load the weight onto thePEarray, • w_loads: number of groupsof weightsloaded onto thePEarray, • cycles(l): number of cyclesrequired to executethelayer l, • ma: number of memory accesses, • enmem: energy consumption of asinglememory accesses, • pwrPEA: power consumption of thePEarray. w_load_cycles = 16 (2) w_loads = ⇠ weiдhts 16·min(16,sums_per_out) ⇡ (3) cycles(l) = w_load_cycles·w_loads+ data_per_weiдht (4) The overall latency is then computed as the sum of the contributions of thelayers(Eq. 5). latency = ’ l 2L cycles(l) ·T (5) In the Eq. 6, the number of memory accesses is computed by distinguishingwhether theoperation isaconvolutional layer or not. Such adistinction hasbeen implemented by analyzing thevalueof data_per_weiдht, which isgreater than 1 for convolutional layers
  • 21. NTHU-CS VLSI/CAD LAB  Method  Real-time measure, LUT, Analytical estimation, Prediction model Build a ML model to predict the cost using architecture and dataset feature. 21 Ref: Cai et. al, “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”
  • 22. NTHU-CS VLSI/CAD LAB  Method  Real-time measure, LUT, Analytical estimation, Prediction model  Metrics  FLOPs & #Prameters, Latency, Energy Consumption, Area, Memory Footprint 22
  • 23. NTHU-CS VLSI/CAD LAB  Speed up  Early Stop, Hot Start (warm up), Proxy Datasets, Accuracy Prediction  Quantization & Pruning  Auto mix precision, Auto pruning  Security & Reliability  Adversarial attack 23
  • 24. NTHU-CS VLSI/CAD LAB  HW-aware NAS  Issues & Related Works in HW-NAS  Goal  NAS  HW Cost  Benchmarks  Discussions 24
  • 25. NTHU-CS VLSI/CAD LAB  Lack of reproducibility Due to the use of different search spaces, various training methods, and the required significant computational resources, reproducibility is a difficult step.  For NAS  NAS-Bench-101, NAS-Bench-201, NATS-Bench, NAS-Bench-1shot1, NAS-Bench-NLP, NAS- Bench-301  For HWNAS  HW-NAS-Bench 25
  • 26. NTHU-CS VLSI/CAD LAB  HW-aware NAS  Issues & Related Works in HW-NAS  Goal  NAS  HW Cost  Benchmarks  Discussions - HWNAS Applications 26
  • 27. NTHU-CS VLSI/CAD LAB  An auto model refinement tool for model- accelerator integration Given a pretrained model & a target HW, find a refined model that satisfied constraints on the target HW  An HW-SW co-op model deployment tool for FPGA Given a pretrained model & a target FPGA, find a refined model & a target accelerator HDL that satisfied constraints on the target FPGA  Similar with MCUNet, but wider integration scope 27

Editor's Notes

  1. Pros & cons
  2. Pros & cons
  3. Pros & cons
  4. Pros & cons
  5. Pros & cons
  6. Pros & cons
  7. Pros & cons
  8. ECF: updated with gradient methods Pros & cons
  9. CodeGen is apart of the issue Pros & cons
  10. CodeGen is apart of the issue Pros & cons
  11. CodeGen is apart of the issue Pros & cons
  12. CodeGen is apart of the issue Pros & cons
  13. CodeGen is apart of the issue Pros & cons
  14. Pros & cons