SlideShare a Scribd company logo
1 of 33
NTHU-CS VLSI/CAD LAB
Speaker:Yi-Wen Hung
2022/12/18
NTHU-CS VLSI/CAD LAB
 Learning to Optimize Tensor Programs
 Tianqi Chen et. al
 NeurIPS 2018
 TVM: End-to-End Compilation Stack for
Deep Learning
 Tianqi Chen et. al
 MLSys (previously sysML) 2018
2
NTHU-CS VLSI/CAD LAB
 Formalize the problem of learning to
optimize tensor programs and summarize its
key characteristics.
 Propose a machine learning framework to
solve the problem.
 Accelerate the optimization by 2x to 10x
using transfer learning.
3
NTHU-CS VLSI/CAD LAB
 Introduction to TVM
 Auto Tensor Optimization & Objective
 autoTVM
 Overview
 Statistical Cost Model
 Training Objective Function
 Exploration Module
 Acceleration by Transfer Learning
 Experimental Results
 Extensions: NAS w/ autoTVM
4
NTHU-CS VLSI/CAD LAB 5
Ref: https://sampl.cs.washington.edu/projects/tvm.html
NTHU-CS VLSI/CAD LAB
• Deploy deep learning workloads from high-level
frameworks to diverse hardware back-ends (CPU, GPU,
FPGA)
• Introduce schedule primitives to take advantage of
cross-thread memory reuse, novel hardware intrinsics,
and latency hiding
• Evaluate TVM on a generic FPGA-based accelerator for
target specialized accelerators
6
Ref: https://tvm.apache.org/docs/vta/index.html
NTHU-CS VLSI/CAD LAB
 Introduction to TVM
 Auto Tensor Optimization & Objective
 autoTVM
 Overview
 Statistical Cost Model
 Training Objective Function
 Exploration Module
 Acceleration by Transfer Learning
 Experimental Results
 Extensions: NAS w/ autoTVM
7
NTHU-CS VLSI/CAD LAB
 Tensor optimization is complicated
 Choose from many implementations due to differences in threading, memory
reuse, pipelining and other hardware factors.
 HW-framework co-optimization is a
challenge
 Even on currently supported hardware, developing DL frameworks and models
is limited by the set of optimized operators in libraries, preventing optimizations
(such as operator fusion) that can produce unsupported operators.
 Needs an automatic Tensor optimization
method
8
NTHU-CS VLSI/CAD LAB
 Given program IR (𝑒), a set of possible
schedules of 𝑒 (𝑆𝑒), a code-gen (𝑔), and a
real hardware cost function (𝑓)
 Find minimal 𝑓 𝑔 𝑒, 𝑠 in terms of 𝑠 when
𝑠 ∈ 𝑆𝑒
 Find a schedule 𝑠 of program IR 𝑒 that minimize hardware cost under a given
code-gen 𝑔 and a real hardware environment 𝑓
9
NTHU-CS VLSI/CAD LAB
 Low experiment cost
 HPO cost hours or days
 Run a tensor program cost few seconds
 Domain-specific problem structure
 HPO treat problem as black box
 This work treat problem as white box
 Large quantity of similar operators
 Tensor operators are similar, transfer learning is
possible
10
NTHU-CS VLSI/CAD LAB
 Formalize the problem of learning to
optimize tensor programs and summarize its
key characteristics.
 Propose a machine learning framework to
solve the problem.
 Accelerate the optimization by 2x to 10x
using transfer learning.
11
NTHU-CS VLSI/CAD LAB
 Introduction to TVM
 Auto Tensor Optimization & Objective
 autoTVM
 Overview
 Statistical Cost Model
 Training Objective Function
 Exploration Module
 Acceleration by Transfer Learning
 Experimental Results
 Extensions: NAS w/ autoTVM
12
NTHU-CS VLSI/CAD LAB 13
NTHU-CS VLSI/CAD LAB 14
NTHU-CS VLSI/CAD LAB 15
NTHU-CS VLSI/CAD LAB
 Train a cost model 𝑓 𝑥 with a database 𝒟 =
𝑒𝑖, 𝑠𝑖, 𝑐𝑖 , 𝑐𝑖 = 𝑓 𝑥𝑖 , 𝑥𝑖 = 𝑔 𝑒𝑖, 𝑠𝑖
 Train a cost model with a database which contains the information of hardware
cost under a program IR w/ a schedule
 Encode AST (𝑥) with two approaches
 Gradient boosted trees (GBTs) w/ XGBoost
 TreeGRU
16
NTHU-CS VLSI/CAD LAB
 Use rank loss to train a cost model 𝑓 𝑥 with
a database 𝒟 = 𝑒𝑖, 𝑠𝑖, 𝑐𝑖
 𝑖,𝑗 log 1 + 𝑒
−𝑠𝑖𝑔𝑛 𝑐𝑖−𝑐𝑗 𝑓 𝑥𝑖 −𝑓 𝑥𝑗
 Why not 𝑙2 loss
 Only care about the relative order of program
runtimes rather than their absolute values
17
NTHU-CS VLSI/CAD LAB
 Given: a set of possible schedules of 𝑒 (𝑆𝑒)
 Find: 𝑠∗
∈ 𝑆𝑒 that minimize 𝑓 𝑔 𝑒, 𝑠∗
18
NTHU-CS VLSI/CAD LAB
 Step1: pick the next promising batch
 Naïve: enumerate all 𝑠 ∈ 𝑆𝑒 to find top-b 𝑠 is
infeasible
 Run parallel SA with 𝑓 𝑔 𝑒, 𝑠 to find top-b
𝑠 candidates 𝑆
 Objective: change 𝑠 to find minimal 𝑓 𝑔 𝑒, 𝑠
 Apply 𝜖-greedy to sample top-b
19
NTHU-CS VLSI/CAD LAB
 Step2: run measurement on hardware env.
 Run 𝑔 𝑒, 𝑠 , 𝑠 ∈ 𝑆 to get 𝑐𝑠 = 𝑓 𝑔 𝑒, 𝑠
 Save each 𝒟𝑠 = (𝑒, 𝑠, 𝑐𝑠) to 𝒟
20
NTHU-CS VLSI/CAD LAB
 Step3: update cost model
 Update cost model c = 𝑓 𝑔 𝑒, 𝑠 with 𝒟 =
{ 𝑒, 𝑠, 𝑐 }
 Gradient boosted trees (GBTs) w/ XGBoost
 TreeGRU
21
NTHU-CS VLSI/CAD LAB
 In real world, 𝒟 is from previous workloads,
which possible train 𝑓 from history 𝒟′
 Because 𝑓 use embedding vector of ASTs from
code-gen 𝑔 to predict cost 𝑐
 Goal: encode different AST to the same
embedding space
 Gradient boosted trees w/ XGBoost
 TreeGRU
22
NTHU-CS VLSI/CAD LAB
 Introduction to TVM
 Auto Tensor Optimization & Objective
 autoTVM
 Overview
 Statistical Cost Model
 Training Objective Function
 Exploration Module
 Acceleration by Transfer Learning
 Experimental Results
 Extensions: NAS w/ autoTVM
23
NTHU-CS VLSI/CAD LAB 24
NTHU-CS VLSI/CAD LAB 25
NTHU-CS VLSI/CAD LAB 26
NTHU-CS VLSI/CAD LAB
 autoTVM tutorial
27
NTHU-CS VLSI/CAD LAB
 Introduction to TVM
 Auto Tensor Optimization & Objective
 autoTVM
 Overview
 Statistical Cost Model
 Training Objective Function
 Exploration Module
 Acceleration by Transfer Learning
 Experimental Results
 Extensions: NAS w/ autoTVM
28
NTHU-CS VLSI/CAD LAB
 Train a function composition 𝑓 ∘ 𝑔(𝑒, 𝑠) for predict
hardware cost during NAS
 𝑒 is known if an operation is selected
 𝑠 can be defined as a hyper-params in a supernet
 Characteristic of variation
 Spatial locality of bounded PE
 Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)
 Characteristic of sparsity
 Three-level sub-problems
 Model, DRAM access, Accelerator
 Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)
29
NTHU-CS VLSI/CAD LAB
 Characteristic of variation
 RRAM Cell-to-cell variation (𝑅𝑜𝑛)
 Intrinsic ADC offset (process variation)
30
Source: RRAMedy: Protecting ReRAM-based Neural Network from Permanent and Soft Faults During Its Lifetime
NTHU-CS VLSI/CAD LAB
 Characteristic of variation
 RRAM Cell-to-cell variation (𝑅𝑜𝑛)
 Intrinsic ADC offset (process variation)
 Spatial locality of bounded PE
 Tiling, Loop unrolling, different bit-line
 Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)
 Input: schedule, Output: noise impact score
31
NTHU-CS VLSI/CAD LAB
 Characteristic of sparsity
 Model accuracy considering weight sparsity
/
32
Ref: https://tvm.apache.org/docs/vta/index.html
NTHU-CS VLSI/CAD LAB
 Characteristic of sparsity
 Model accuracy considering weight sparsity
/
 Three-level sub-problems
 Model level: model accuracy / weight storage
: access latency, data bandwidth
: sparse matrix operation
 Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)
 Input: schedule, tvm IR, Output: latency or storage
33

More Related Content

Similar to autoTVM

Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Spark Summit
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpuAthul Suresh
 
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...IJCSIS Research Publications
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerNopparat Nopkuat
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R Vivian S. Zhang
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsVajira Thambawita
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Ilham Amezzane
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCinside-BigData.com
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
Aggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsAggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsRoberto Casadei
 

Similar to autoTVM (20)

Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
 
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
A04230105
A04230105A04230105
A04230105
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North Carolina
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
Maestro_Abstract
Maestro_AbstractMaestro_Abstract
Maestro_Abstract
 
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Aggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsAggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the Gaps
 

Recently uploaded

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17Celine George
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of PlayPooky Knightsmith
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 

autoTVM

  • 2. NTHU-CS VLSI/CAD LAB  Learning to Optimize Tensor Programs  Tianqi Chen et. al  NeurIPS 2018  TVM: End-to-End Compilation Stack for Deep Learning  Tianqi Chen et. al  MLSys (previously sysML) 2018 2
  • 3. NTHU-CS VLSI/CAD LAB  Formalize the problem of learning to optimize tensor programs and summarize its key characteristics.  Propose a machine learning framework to solve the problem.  Accelerate the optimization by 2x to 10x using transfer learning. 3
  • 4. NTHU-CS VLSI/CAD LAB  Introduction to TVM  Auto Tensor Optimization & Objective  autoTVM  Overview  Statistical Cost Model  Training Objective Function  Exploration Module  Acceleration by Transfer Learning  Experimental Results  Extensions: NAS w/ autoTVM 4
  • 5. NTHU-CS VLSI/CAD LAB 5 Ref: https://sampl.cs.washington.edu/projects/tvm.html
  • 6. NTHU-CS VLSI/CAD LAB • Deploy deep learning workloads from high-level frameworks to diverse hardware back-ends (CPU, GPU, FPGA) • Introduce schedule primitives to take advantage of cross-thread memory reuse, novel hardware intrinsics, and latency hiding • Evaluate TVM on a generic FPGA-based accelerator for target specialized accelerators 6 Ref: https://tvm.apache.org/docs/vta/index.html
  • 7. NTHU-CS VLSI/CAD LAB  Introduction to TVM  Auto Tensor Optimization & Objective  autoTVM  Overview  Statistical Cost Model  Training Objective Function  Exploration Module  Acceleration by Transfer Learning  Experimental Results  Extensions: NAS w/ autoTVM 7
  • 8. NTHU-CS VLSI/CAD LAB  Tensor optimization is complicated  Choose from many implementations due to differences in threading, memory reuse, pipelining and other hardware factors.  HW-framework co-optimization is a challenge  Even on currently supported hardware, developing DL frameworks and models is limited by the set of optimized operators in libraries, preventing optimizations (such as operator fusion) that can produce unsupported operators.  Needs an automatic Tensor optimization method 8
  • 9. NTHU-CS VLSI/CAD LAB  Given program IR (𝑒), a set of possible schedules of 𝑒 (𝑆𝑒), a code-gen (𝑔), and a real hardware cost function (𝑓)  Find minimal 𝑓 𝑔 𝑒, 𝑠 in terms of 𝑠 when 𝑠 ∈ 𝑆𝑒  Find a schedule 𝑠 of program IR 𝑒 that minimize hardware cost under a given code-gen 𝑔 and a real hardware environment 𝑓 9
  • 10. NTHU-CS VLSI/CAD LAB  Low experiment cost  HPO cost hours or days  Run a tensor program cost few seconds  Domain-specific problem structure  HPO treat problem as black box  This work treat problem as white box  Large quantity of similar operators  Tensor operators are similar, transfer learning is possible 10
  • 11. NTHU-CS VLSI/CAD LAB  Formalize the problem of learning to optimize tensor programs and summarize its key characteristics.  Propose a machine learning framework to solve the problem.  Accelerate the optimization by 2x to 10x using transfer learning. 11
  • 12. NTHU-CS VLSI/CAD LAB  Introduction to TVM  Auto Tensor Optimization & Objective  autoTVM  Overview  Statistical Cost Model  Training Objective Function  Exploration Module  Acceleration by Transfer Learning  Experimental Results  Extensions: NAS w/ autoTVM 12
  • 16. NTHU-CS VLSI/CAD LAB  Train a cost model 𝑓 𝑥 with a database 𝒟 = 𝑒𝑖, 𝑠𝑖, 𝑐𝑖 , 𝑐𝑖 = 𝑓 𝑥𝑖 , 𝑥𝑖 = 𝑔 𝑒𝑖, 𝑠𝑖  Train a cost model with a database which contains the information of hardware cost under a program IR w/ a schedule  Encode AST (𝑥) with two approaches  Gradient boosted trees (GBTs) w/ XGBoost  TreeGRU 16
  • 17. NTHU-CS VLSI/CAD LAB  Use rank loss to train a cost model 𝑓 𝑥 with a database 𝒟 = 𝑒𝑖, 𝑠𝑖, 𝑐𝑖  𝑖,𝑗 log 1 + 𝑒 −𝑠𝑖𝑔𝑛 𝑐𝑖−𝑐𝑗 𝑓 𝑥𝑖 −𝑓 𝑥𝑗  Why not 𝑙2 loss  Only care about the relative order of program runtimes rather than their absolute values 17
  • 18. NTHU-CS VLSI/CAD LAB  Given: a set of possible schedules of 𝑒 (𝑆𝑒)  Find: 𝑠∗ ∈ 𝑆𝑒 that minimize 𝑓 𝑔 𝑒, 𝑠∗ 18
  • 19. NTHU-CS VLSI/CAD LAB  Step1: pick the next promising batch  Naïve: enumerate all 𝑠 ∈ 𝑆𝑒 to find top-b 𝑠 is infeasible  Run parallel SA with 𝑓 𝑔 𝑒, 𝑠 to find top-b 𝑠 candidates 𝑆  Objective: change 𝑠 to find minimal 𝑓 𝑔 𝑒, 𝑠  Apply 𝜖-greedy to sample top-b 19
  • 20. NTHU-CS VLSI/CAD LAB  Step2: run measurement on hardware env.  Run 𝑔 𝑒, 𝑠 , 𝑠 ∈ 𝑆 to get 𝑐𝑠 = 𝑓 𝑔 𝑒, 𝑠  Save each 𝒟𝑠 = (𝑒, 𝑠, 𝑐𝑠) to 𝒟 20
  • 21. NTHU-CS VLSI/CAD LAB  Step3: update cost model  Update cost model c = 𝑓 𝑔 𝑒, 𝑠 with 𝒟 = { 𝑒, 𝑠, 𝑐 }  Gradient boosted trees (GBTs) w/ XGBoost  TreeGRU 21
  • 22. NTHU-CS VLSI/CAD LAB  In real world, 𝒟 is from previous workloads, which possible train 𝑓 from history 𝒟′  Because 𝑓 use embedding vector of ASTs from code-gen 𝑔 to predict cost 𝑐  Goal: encode different AST to the same embedding space  Gradient boosted trees w/ XGBoost  TreeGRU 22
  • 23. NTHU-CS VLSI/CAD LAB  Introduction to TVM  Auto Tensor Optimization & Objective  autoTVM  Overview  Statistical Cost Model  Training Objective Function  Exploration Module  Acceleration by Transfer Learning  Experimental Results  Extensions: NAS w/ autoTVM 23
  • 27. NTHU-CS VLSI/CAD LAB  autoTVM tutorial 27
  • 28. NTHU-CS VLSI/CAD LAB  Introduction to TVM  Auto Tensor Optimization & Objective  autoTVM  Overview  Statistical Cost Model  Training Objective Function  Exploration Module  Acceleration by Transfer Learning  Experimental Results  Extensions: NAS w/ autoTVM 28
  • 29. NTHU-CS VLSI/CAD LAB  Train a function composition 𝑓 ∘ 𝑔(𝑒, 𝑠) for predict hardware cost during NAS  𝑒 is known if an operation is selected  𝑠 can be defined as a hyper-params in a supernet  Characteristic of variation  Spatial locality of bounded PE  Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)  Characteristic of sparsity  Three-level sub-problems  Model, DRAM access, Accelerator  Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠) 29
  • 30. NTHU-CS VLSI/CAD LAB  Characteristic of variation  RRAM Cell-to-cell variation (𝑅𝑜𝑛)  Intrinsic ADC offset (process variation) 30 Source: RRAMedy: Protecting ReRAM-based Neural Network from Permanent and Soft Faults During Its Lifetime
  • 31. NTHU-CS VLSI/CAD LAB  Characteristic of variation  RRAM Cell-to-cell variation (𝑅𝑜𝑛)  Intrinsic ADC offset (process variation)  Spatial locality of bounded PE  Tiling, Loop unrolling, different bit-line  Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)  Input: schedule, Output: noise impact score 31
  • 32. NTHU-CS VLSI/CAD LAB  Characteristic of sparsity  Model accuracy considering weight sparsity / 32 Ref: https://tvm.apache.org/docs/vta/index.html
  • 33. NTHU-CS VLSI/CAD LAB  Characteristic of sparsity  Model accuracy considering weight sparsity /  Three-level sub-problems  Model level: model accuracy / weight storage : access latency, data bandwidth : sparse matrix operation  Possible modeled with 𝑓 ∘ 𝑔(𝑒, 𝑠)  Input: schedule, tvm IR, Output: latency or storage 33

Editor's Notes

  1. Loop unrolling, tiling, operations sharing
  2. Like hyper-parameter optimization, needs explanation
  3. Points 1. Train a cost prediction model with several real HW cost 2. Auto select the best schedule for the target HW 3. Transfer \hat{f} for different programs with AST embedding transfer
  4. Use the embedding of AST to train the \hat{f}
  5. Ref: Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects, IEEE Circuits and Systems Magazine, Volume: 21, Issue: 3, thirdquarter 2021 Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation, arXiv preprint
  6. Ref: Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects, IEEE Circuits and Systems Magazine, Volume: 21, Issue: 3, thirdquarter 2021 Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation, arXiv preprint
  7. Needs to survey
  8. Needs to survey