SlideShare a Scribd company logo
1 of 35
Download to read offline
© 2020 OctoML and University of Washington
Introduction to the TVM Open
Source Deep Learning Compiler
Stack
Luis Ceze
w/ Tianqi Chen, Thierry Moreau, Jared Roesch, Ziheng Jiang,
Lianmin Zheng, Eddie Yan, Meghan Cowan, Chien-Yu Lin,
Haichen Shen, Leyuan Wang, Yuwei Hu, Carlos Guestrin,
Arvind Krishnamurthy, Zach Tatlock, and many in the Apache
TVM community!
© 2020 OctoML and University of Washington
A perfect storm
2
Growing set of requirements: Cost, latency, power, security & privacy
Cambrian explosion of models,
workloads, and use cases CNN GAN RNN MLP DQNN
Rapidly evolving ML software
ecosystem
Silicon scaling limitations
(Dennard and Moore)
Cambrian explosion of HW backends.
Heterogeneous HW
© 2020 OctoML and University of Washington
Current Dominant Deep Learning
Systems Landscape
3
Frameworks and
Inference engines
DL Compilers
Kernel
Libraries
Hardware
Orchestrators
Azure ML GCP Datalab
cuDNN NNPack MKL-DNN
Open source, automated
end-to-end optimization
framework for deep learning
Hand optimized
© 2020 OctoML and University of Washington
Stack
4
End-to-end,
framework to metal open
stack.
Research and deployment.
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge
FPGA
Cloud
FPGA
ASIC
Open source synthesizable deep
learning accelerator design
© 2020 OctoML and University of Washington
Automated by Machine Learning
5
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge
FPGA
Cloud
FPGA
ASIC
TVM: Automated End-to-end Optimizations for Deep Learning. Chen et al. OSDI 18
ML-based
Optimization
AutoTVM
AutoVTA
Hardware Fleet
© 2020 OctoML and University of Washington
End-user perspective:
Compile & deploy
6
import tvm
from tvm import relay
graph, params =
Frontend.from_keras
(keras_resnet50)
graph, lib, params =
Relay.build(graph, target)
Compile Deploy
© 2020 OctoML and University of Washington
Open Source Community
and Impact
7
Open source: ~420+ contributors from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft,
Qualcomm, Alibaba, Intel, …
Incubated as Apache TVM. Independent governance, allowing competitors to
collaborate.
Used in production at leading companies
Deep Learning
Compiler Service
DSP/Tensor engine
for mobile
Mobile and Server
Optimizations
Cloud-side model
optimization
© 2020 OctoML and University of Washington 8
© 2020 OctoML and University of Washington
Existing Deep Learning Frameworks
9
Frameworks
Hardware
Primitive Tensor operators such as
Conv2D
High-level data flow graph
Offload to heavily optimized DNN operator
library
eg. cuDNN
© 2020 OctoML and University of Washington
Engineering costs limits progress
10
cuDNN Engineering intensive
New operator introduced by operator fusion optimization potential
benefit: 1.5x speedup
Frameworks
© 2020 OctoML and University of Washington
Our approach: Learning-based Learning System
11
Frameworks
Hardware
Directly generate optimized program
for new operator workloads and hardware
High-level data flow graph and optimizations
Machine Learning based Program Optimizer
© 2020 OctoML and University of Washington
Tensor Compilation/Optimization as a
search problem
12
Tensor Expression (Specification)
C = tvm.compute((m, n),
lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Search Space of Possible Program Optimizations
Low-level Program Variants
© 2020 OctoML and University of Washington
Search Space Example (1/3)
13
Search Space of Possible Program Optimizations
Vanilla Code
Tensor Expression (Specification)
C = tvm.compute((m, n),
lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
© 2020 OctoML and University of Washington
Search Space Example (2/3)
14
Search Space of Possible Program Optimizations
Loop Tiling for Locality
Tensor Expression (Specification)
C = tvm.compute((m, n),
lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
© 2020 OctoML and University of Washington
Search Space Example (3/3)
15
Search Space of Possible Program Optimizations
Map to Accelerators
Tensor Expression (Specification)
C = tvm.compute((m, n),
lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
© 2020 OctoML and University of Washington
Optimization space is really large…
16
Loop Transformations
Thread
Bindings
Cache
Locality
Thread Cooperation Tensorization
Latency
Hiding
Typically explored via human intuition.
How can we automate this? Auto-tuning is too slow.
Billions of possible
optimization
choices
Tensor Expression (Specification)
C = tvm.compute((m, n),
lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
© 2020 OctoML and University of Washington
Problem Formalization
17
Search Space
Expression
Objective
Code Generator
Optimization
Configuration
Cost:
Execute Time
Program
AutoOpt
© 2020 OctoML and University of Washington
Black-box Optimization
18
Challenge: Lots of experimental trials, each trial costs ~1 second
Code Generator
Try each configuration until we find a good one
Search Space
Expression AutoTVM
© 2020 OctoML and University of Washington
Cost-model Driven Approach
19
Search Space
Expression AutoOpt
Challenge: Need reliable cost model per hardware
Use cost model to pick configuration
Code Generator
Cost Model
© 2020 OctoML and University of Washington
Statistical Cost Model
20
Search Space
Expression AutoOpt Code Generator
Our approach: Use machine learning to learn a statistical cost model
Statistical
Cost Model
Learning
Training data
Benefit: Automatically adapt to hardware type Important: How to design the cost model
© 2020 OctoML and University of Washington
Search
Space
Expression
2
2 AutoTVM
Shared
Cost Model
Code
Generator
New Tasks
Historical data from related operators
(tasks)
Need task invariant
representation
Transfer learning
AutoTVM Overview
21
Conv2D
Matmul
O(microseconds) inference vs. O(seconds) execution
Search
Space
Expression AutoTVM
Code
Generator
Statistical
Cost Model
Learning
Training data
High-level
configurations
Low-level
Abstract Syntax Tree
(AST)
Benefit: Low-level AST is a common representation (General, task
invariant)
Your favourite model
Statistical features
of AST
+ +
Learning to Optimize Tensor Programs. Chen et al. NeurIPS 18
© 2020 OctoML and University of Washington
Does it work?
22
Better than hand-tuned code in a few minutes
1.50x faster than hand-tuned in steady state
AutoTVM + transferred model
3x to 10x faster tuning w/ transfer
learning
© 2020 OctoML and University of Washington
Device Fleet: Distributed Test Bed for AutoTVM
23
Resource
Allocation
Resource
Token
Resource Manager (Tracker)
Nvidia GPU Server
RPC RT CUDA
Android Phone
RPC RT OpenCL
Zynq FPGA Board
RPC RT Bitstream
AutoTVM
Experiment 1
AutoTVM
Experiment 2
Persistent Remote Session
Scale up optimization
Resource sharing
…
© 2020 OctoML and University of Washington
State-of-the-art performance
24
Nvidia Titan X ARM GPU (MALI)
ARM CPU
(Cortex-A53)
Key point: TVM offers good performance with low manual effort
© 2020 OctoML and University of Washington 25
End-to-end,
framework to metal open
stack.
Research and deployment
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge
FPGA
Cloud
FPGA
ASIC
Open source synthesizable deep
learning accelerator design
Stack
© 2020 OctoML and University of Washington
DL Accelerator Design Challenges
26
CNN
GAN
RNN
MLP
DQNN
• Keeping up with algorithmic changes
• Finding the right generality/efficiency trade-off
• Enable a “day-0” software stack on top
• (VTA: two-level ISA, templatized design)
• (VTA: templatized design + HW parameter search)
• (VTA: tight coupling with TVM)
© 2020 OctoML and University of Washington
VTA:
Open & Flexible Deep Learning Accelerator
27
Current TVM Stack
VTA Runtime & JIT Compiler
VTA MicroArchitecture VTA Simulator
VTA Hardware/Software Interface (ISA)
• Move hardware complexity to
software via a two-level ISA
• Runtime JIT-compile
accelerator micro code
• Native support in TVM
• Support heterogenous devices
(split graph)
• Support for secure execution
(soon)
© 2020 OctoML and University of Washington
VTA Open Source Deep Learning accelerator
28
• Decoupled access-execute with explicit software control
• Two-level ISA: JIT breaks multi-cycle “CISC” instructions into micro-ops
• Enables model retargeting without HW changes
• Focused on FPGA deployments so far. Exploring custom silicon
possibilities
Note: HW-SW Blueprint for Flexible Deep Learning Acceleration. Moreau et al. IEEE Micro 2019.
Template
© 2020 OctoML and University of Washington
µTVM - Bare-metal model deployment for edge
devices
29
Optimize, compile and package model for standalone bare metal
deployment
See recent demo on TVM for Azure Sphere deployment.
µTVM
ML model
Optimized
model
Optimized
operators
Standalone
runtime
Edge device board
(ARM, MIPS, RISC-
V,...)
Flash code
© 2020 OctoML and University of Washington
Coming Soon - Ultra low bit-width quantization
Automatic quantization: 5-20x
performance gains with reasonable
accuracy loss.
TVM supports flexible code
generation for a variety of data
types
Squeezenet on RaspberryPi 3
© 2020 OctoML and University of Washington
What about training?
31
• Direct support for training in Apache TVM
coming soon!
• Automatic generation of gradient programs
• Support for customized data types and training
on FPGAs
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge
FPGA
Cloud
FPGA
ASIC
Standalone training deployment
Standalone inference deployment
Gradient Program for Training
Automatic Differentiation
© 2020 OctoML and University of Washington
Other Ongoing TVM efforts
32
• Autoscheduling (Zheng et al. OSDI’20 @ UCBerkeley)
• Automatic synthesis of operator implementations (Cowan et al. CGO’20 @ UWash)
• Sparse support (NLP, graph convolutional neural networks, etc…)
• Secure enclaves
• …
• Join the community!
© 2020 OctoML and University of Washington
https://tvm.ai
33
2nd TVM conference on Dec 5, 2019. 200+ ppl last year!
• Video tutorials
• iPython notebooks tutorials
3rd TVM conference on Dec 3/4, 2020. https://tvmconf.org
© 2020 OctoML and University of Washington 34
https://octoml.ai
© 2020 OctoML and University of Washington
What I would like you to remember…
35
TVM is an emerging open source standard for ML compilation and optimization
TVM offers
• Improved time to market for ML
• Performance
• Unified support for CPU, GPU, Accelerators
• On the framework of your choice
OctoML is here to help you succeed in you ML deployment needs
End-to-end,
framework to
metal open stack.
Research and
deployment
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge
FPGA
Cloud
FPGA
ASIC

More Related Content

What's hot

[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)Deep Learning JP
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language ModelsDeep Learning JP
 
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmenta...
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware  Panoptic Segmenta...【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware  Panoptic Segmenta...
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmenta...Deep Learning JP
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text GenerationDeep Learning JP
 
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築NVIDIA Japan
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIANVIDIA Japan
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)cvpaper. challenge
 
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化について
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化についてマルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化について
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化についてFixstars Corporation
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門ryosuke-kojima
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイyohei okawa
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoderSho Tatsuno
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介Takeo Imai
 
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...Deep Learning JP
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
【DL輪読会】Semi-Parametric Neural Image Synthesis
【DL輪読会】Semi-Parametric Neural Image Synthesis【DL輪読会】Semi-Parametric Neural Image Synthesis
【DL輪読会】Semi-Parametric Neural Image SynthesisDeep Learning JP
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019Yusuke Uchida
 
Optimizer入門&最新動向
Optimizer入門&最新動向Optimizer入門&最新動向
Optimizer入門&最新動向Motokawa Tetsuya
 

What's hot (20)

[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
[DL輪読会]SoftTriple Loss: Deep Metric Learning Without Triplet Sampling (ICCV2019)
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
 
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmenta...
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware  Panoptic Segmenta...【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware  Panoptic Segmenta...
【DL輪読会】“PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmenta...
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation
 
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIA
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)
 
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化について
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化についてマルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化について
マルチレイヤコンパイラ基盤による、エッジ向けディープラーニングの実装と最適化について
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイ
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介
 
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
【DL輪読会】Semi-Parametric Neural Image Synthesis
【DL輪読会】Semi-Parametric Neural Image Synthesis【DL輪読会】Semi-Parametric Neural Image Synthesis
【DL輪読会】Semi-Parametric Neural Image Synthesis
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
Optimizer入門&最新動向
Optimizer入門&最新動向Optimizer入門&最新動向
Optimizer入門&最新動向
 

Similar to “Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Presentation from OctoML

Cray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge SimulationsCray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge Simulationsinside-BigData.com
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?jgIpotiwon
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland mictc
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Grigori Fursin
 
Highway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichHighway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichChristian Deger
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Russell Childs
 
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...Alpen-Adria-Universität
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project
 
Open Source Possibilities for 5G Edge Computing Deployment
Open Source Possibilities for 5G Edge Computing DeploymentOpen Source Possibilities for 5G Edge Computing Deployment
Open Source Possibilities for 5G Edge Computing DeploymentIgnacio Verona
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorialcybercbm
 
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...Edge AI and Vision Alliance
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
 

Similar to “Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Presentation from OctoML (20)

Cray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge SimulationsCray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge Simulations
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-time
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
 
Highway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichHighway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup Munich
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013
 
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
 
Open Source Possibilities for 5G Edge Computing Deployment
Open Source Possibilities for 5G Edge Computing DeploymentOpen Source Possibilities for 5G Edge Computing Deployment
Open Source Possibilities for 5G Edge Computing Deployment
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
 
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Presentation from OctoML

  • 1. © 2020 OctoML and University of Washington Introduction to the TVM Open Source Deep Learning Compiler Stack Luis Ceze w/ Tianqi Chen, Thierry Moreau, Jared Roesch, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Chien-Yu Lin, Haichen Shen, Leyuan Wang, Yuwei Hu, Carlos Guestrin, Arvind Krishnamurthy, Zach Tatlock, and many in the Apache TVM community!
  • 2. © 2020 OctoML and University of Washington A perfect storm 2 Growing set of requirements: Cost, latency, power, security & privacy Cambrian explosion of models, workloads, and use cases CNN GAN RNN MLP DQNN Rapidly evolving ML software ecosystem Silicon scaling limitations (Dennard and Moore) Cambrian explosion of HW backends. Heterogeneous HW
  • 3. © 2020 OctoML and University of Washington Current Dominant Deep Learning Systems Landscape 3 Frameworks and Inference engines DL Compilers Kernel Libraries Hardware Orchestrators Azure ML GCP Datalab cuDNN NNPack MKL-DNN Open source, automated end-to-end optimization framework for deep learning Hand optimized
  • 4. © 2020 OctoML and University of Washington Stack 4 End-to-end, framework to metal open stack. Research and deployment. High-Level Differentiable IR Tensor Expression IR LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Open source synthesizable deep learning accelerator design
  • 5. © 2020 OctoML and University of Washington Automated by Machine Learning 5 High-Level Differentiable IR Tensor Expression IR LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC TVM: Automated End-to-end Optimizations for Deep Learning. Chen et al. OSDI 18 ML-based Optimization AutoTVM AutoVTA Hardware Fleet
  • 6. © 2020 OctoML and University of Washington End-user perspective: Compile & deploy 6 import tvm from tvm import relay graph, params = Frontend.from_keras (keras_resnet50) graph, lib, params = Relay.build(graph, target) Compile Deploy
  • 7. © 2020 OctoML and University of Washington Open Source Community and Impact 7 Open source: ~420+ contributors from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft, Qualcomm, Alibaba, Intel, … Incubated as Apache TVM. Independent governance, allowing competitors to collaborate. Used in production at leading companies Deep Learning Compiler Service DSP/Tensor engine for mobile Mobile and Server Optimizations Cloud-side model optimization
  • 8. © 2020 OctoML and University of Washington 8
  • 9. © 2020 OctoML and University of Washington Existing Deep Learning Frameworks 9 Frameworks Hardware Primitive Tensor operators such as Conv2D High-level data flow graph Offload to heavily optimized DNN operator library eg. cuDNN
  • 10. © 2020 OctoML and University of Washington Engineering costs limits progress 10 cuDNN Engineering intensive New operator introduced by operator fusion optimization potential benefit: 1.5x speedup Frameworks
  • 11. © 2020 OctoML and University of Washington Our approach: Learning-based Learning System 11 Frameworks Hardware Directly generate optimized program for new operator workloads and hardware High-level data flow graph and optimizations Machine Learning based Program Optimizer
  • 12. © 2020 OctoML and University of Washington Tensor Compilation/Optimization as a search problem 12 Tensor Expression (Specification) C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Search Space of Possible Program Optimizations Low-level Program Variants
  • 13. © 2020 OctoML and University of Washington Search Space Example (1/3) 13 Search Space of Possible Program Optimizations Vanilla Code Tensor Expression (Specification) C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
  • 14. © 2020 OctoML and University of Washington Search Space Example (2/3) 14 Search Space of Possible Program Optimizations Loop Tiling for Locality Tensor Expression (Specification) C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
  • 15. © 2020 OctoML and University of Washington Search Space Example (3/3) 15 Search Space of Possible Program Optimizations Map to Accelerators Tensor Expression (Specification) C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
  • 16. © 2020 OctoML and University of Washington Optimization space is really large… 16 Loop Transformations Thread Bindings Cache Locality Thread Cooperation Tensorization Latency Hiding Typically explored via human intuition. How can we automate this? Auto-tuning is too slow. Billions of possible optimization choices Tensor Expression (Specification) C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
  • 17. © 2020 OctoML and University of Washington Problem Formalization 17 Search Space Expression Objective Code Generator Optimization Configuration Cost: Execute Time Program AutoOpt
  • 18. © 2020 OctoML and University of Washington Black-box Optimization 18 Challenge: Lots of experimental trials, each trial costs ~1 second Code Generator Try each configuration until we find a good one Search Space Expression AutoTVM
  • 19. © 2020 OctoML and University of Washington Cost-model Driven Approach 19 Search Space Expression AutoOpt Challenge: Need reliable cost model per hardware Use cost model to pick configuration Code Generator Cost Model
  • 20. © 2020 OctoML and University of Washington Statistical Cost Model 20 Search Space Expression AutoOpt Code Generator Our approach: Use machine learning to learn a statistical cost model Statistical Cost Model Learning Training data Benefit: Automatically adapt to hardware type Important: How to design the cost model
  • 21. © 2020 OctoML and University of Washington Search Space Expression 2 2 AutoTVM Shared Cost Model Code Generator New Tasks Historical data from related operators (tasks) Need task invariant representation Transfer learning AutoTVM Overview 21 Conv2D Matmul O(microseconds) inference vs. O(seconds) execution Search Space Expression AutoTVM Code Generator Statistical Cost Model Learning Training data High-level configurations Low-level Abstract Syntax Tree (AST) Benefit: Low-level AST is a common representation (General, task invariant) Your favourite model Statistical features of AST + + Learning to Optimize Tensor Programs. Chen et al. NeurIPS 18
  • 22. © 2020 OctoML and University of Washington Does it work? 22 Better than hand-tuned code in a few minutes 1.50x faster than hand-tuned in steady state AutoTVM + transferred model 3x to 10x faster tuning w/ transfer learning
  • 23. © 2020 OctoML and University of Washington Device Fleet: Distributed Test Bed for AutoTVM 23 Resource Allocation Resource Token Resource Manager (Tracker) Nvidia GPU Server RPC RT CUDA Android Phone RPC RT OpenCL Zynq FPGA Board RPC RT Bitstream AutoTVM Experiment 1 AutoTVM Experiment 2 Persistent Remote Session Scale up optimization Resource sharing …
  • 24. © 2020 OctoML and University of Washington State-of-the-art performance 24 Nvidia Titan X ARM GPU (MALI) ARM CPU (Cortex-A53) Key point: TVM offers good performance with low manual effort
  • 25. © 2020 OctoML and University of Washington 25 End-to-end, framework to metal open stack. Research and deployment High-Level Differentiable IR Tensor Expression IR LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Open source synthesizable deep learning accelerator design Stack
  • 26. © 2020 OctoML and University of Washington DL Accelerator Design Challenges 26 CNN GAN RNN MLP DQNN • Keeping up with algorithmic changes • Finding the right generality/efficiency trade-off • Enable a “day-0” software stack on top • (VTA: two-level ISA, templatized design) • (VTA: templatized design + HW parameter search) • (VTA: tight coupling with TVM)
  • 27. © 2020 OctoML and University of Washington VTA: Open & Flexible Deep Learning Accelerator 27 Current TVM Stack VTA Runtime & JIT Compiler VTA MicroArchitecture VTA Simulator VTA Hardware/Software Interface (ISA) • Move hardware complexity to software via a two-level ISA • Runtime JIT-compile accelerator micro code • Native support in TVM • Support heterogenous devices (split graph) • Support for secure execution (soon)
  • 28. © 2020 OctoML and University of Washington VTA Open Source Deep Learning accelerator 28 • Decoupled access-execute with explicit software control • Two-level ISA: JIT breaks multi-cycle “CISC” instructions into micro-ops • Enables model retargeting without HW changes • Focused on FPGA deployments so far. Exploring custom silicon possibilities Note: HW-SW Blueprint for Flexible Deep Learning Acceleration. Moreau et al. IEEE Micro 2019. Template
  • 29. © 2020 OctoML and University of Washington µTVM - Bare-metal model deployment for edge devices 29 Optimize, compile and package model for standalone bare metal deployment See recent demo on TVM for Azure Sphere deployment. µTVM ML model Optimized model Optimized operators Standalone runtime Edge device board (ARM, MIPS, RISC- V,...) Flash code
  • 30. © 2020 OctoML and University of Washington Coming Soon - Ultra low bit-width quantization Automatic quantization: 5-20x performance gains with reasonable accuracy loss. TVM supports flexible code generation for a variety of data types Squeezenet on RaspberryPi 3
  • 31. © 2020 OctoML and University of Washington What about training? 31 • Direct support for training in Apache TVM coming soon! • Automatic generation of gradient programs • Support for customized data types and training on FPGAs High-Level Differentiable IR Tensor Expression IR LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Standalone training deployment Standalone inference deployment Gradient Program for Training Automatic Differentiation
  • 32. © 2020 OctoML and University of Washington Other Ongoing TVM efforts 32 • Autoscheduling (Zheng et al. OSDI’20 @ UCBerkeley) • Automatic synthesis of operator implementations (Cowan et al. CGO’20 @ UWash) • Sparse support (NLP, graph convolutional neural networks, etc…) • Secure enclaves • … • Join the community!
  • 33. © 2020 OctoML and University of Washington https://tvm.ai 33 2nd TVM conference on Dec 5, 2019. 200+ ppl last year! • Video tutorials • iPython notebooks tutorials 3rd TVM conference on Dec 3/4, 2020. https://tvmconf.org
  • 34. © 2020 OctoML and University of Washington 34 https://octoml.ai
  • 35. © 2020 OctoML and University of Washington What I would like you to remember… 35 TVM is an emerging open source standard for ML compilation and optimization TVM offers • Improved time to market for ML • Performance • Unified support for CPU, GPU, Accelerators • On the framework of your choice OctoML is here to help you succeed in you ML deployment needs End-to-end, framework to metal open stack. Research and deployment High-Level Differentiable IR Tensor Expression IR LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC