SlideShare a Scribd company logo
1 of 20
Download to read offline
Glow user review @96cfc41
Kuan Hsu Chen (Zakk)
Goal
1. Introduce Compilation Flow
2. Introduce IR Design
3. Pros and Cons
4. TODO Features
5. Make you want to do something on
Glow!?
https://www.books.com.tw/products/0010750924
Compilation Flow
Importer
HIR
optimizer
GenLIR
Scheduler
Optimize
Caffe2/ONNX/
ONNXIFI
builtin
HIR
builtin
HIR Backend
Pre-
Lowering
Node
Lowering
Backend
Post-
Lowering
builtin or
custom
HIR
builtin or
custom
HIR
builtin or
custom
HIR
builtin or
custom
LIR
glow::generateAndOptimizeIR(Function *F, bool shouldShareBuffers)
Backend
save
Backend
compile
glow::CompiledFunction
Standalone Executable
Bundle
backend code
3 Level IR
High Level IR (HIR)
1. dataflow node-based graph representation
2. There are two types (placeholder and constant) of variable.
All nodes can access all variables in the Module.
Currently glow can not distinguish placeholder is input or
output, but the importer has naming rule convention when
creating output placeholder, so we use the name with
“save_” prefix to workaround this issue.
3. Glow is NHWC order. (Caffe2/ONNX are NCHW).
If the backend only supports NCHW, remember to insert
transpose node in pre-lowering.
4. If some model format only support float type but user want to
quantize model. After optimization, the graph inserted
quantize/dequantize before/after input/output placeholder.
But in some case, you would like to perform
quantize/dequantize in user program side.
Low Level IR (LIR)
1. Instruction-based representation, LIR allows multiple
output (ex. loss function)
2. The operand has qualifiers with @in, @out,
@inout. So the user instruction in the users list
maybe is output value.
3. All LIR can be built-in Op.
4. LIR designs the allocactivation/dealloc instruction to
measure live range of activation. LIR optimizer will
sink/hoist the alloc/dealloc place to reduce memory
pressure. But in some backend, we will insert
allocactivation because the weights will occupy
memory
5. Glow provides simple static time memory allocator
(first-fit) for backend usage.
Backend in Glow
1. tools/ClassGen/Backends/: Define backend-specific Nodes and Instructions for the Backend. (like
LLVM’s tablegen)
2. Node attribute:
● .addOverwrittenInput(“Output”)
● .setHasSideEffects(true)
3. Instr attribute:
● .autoIRGen() : Framework help backend to generates translation code (HIR->LIR) .
● .inplaceOperand({"Dest", "Batch"})
● .dataParallel()
Backend in Glow
2. lib/Backends/: implement derivied classes for Backend and CompiledFunction.
a. Backend abstract class
i. bool transformPreLowering/transforPostLowering
ii. bool shouldLower(const Node *N) const;
iii. bool shouldShareBuffers() const;
iv. compile/save
v. isOpSupported(Kinded::Kind opKind, ElemKind elementTy) const;
b. CompiledFunction
i. execute() = 0;
ii. setupRuns(), beforeRun(), afterRun(), tearDownRuns();
Pros and Cons
Pros:
1. Supprot training and inference compilation
2. Support quantization feature
3. Support many HIR and LIR optimziation and it also can work on custom nodes/instructions.
4. Support “dump DAG”
5. Support ASIC-friendly IR and helper function
6. more...
Cons:
1. Does not support python interface. But user can use ONNXIFI to achieve it.
2. Not-exist any ASIC backend for reference.
3. missed some builtin operator
4. more ...
Quantization feature
1. Quantization nodes in HIR.
a. QuantizationProfile
b. Quantize/Dequantize /RescaleQuantized
c. IntLookupTable
d. RowwiseQuantizedFullyConnected
2. Support related optimizations.
a. Quantize(Dequantize(X)) -> RescaleQuantized(X)
b. Dequantize(Quantize(X)) -> X
c. Quantize(Constant) -> Constant
d. PoolingNode(Rescale(X)) -> Rescale(PoolingNode(X)).
e. more...
Optimizations
1. Graph optimizer (HIR)
a. DCE, CSE.
b. Optimize specific node.
i. Concat(Slice(X, 0..10), Slice(X, 10..20)) -> X
ii. merge Transpose into MatMul
iii. Relu (MaxPool(X)) -> MaxPool(Relu(X))
iv. merge batch normalization operations. (Inference)
v. more …
2. IR optimizer (LIR)
a. Reduce memory usage
i. sinkAllocas/hoistDealloc/sinkTensorViews
ii. eliminate copy instruction
b. Eliminate redundant instructions
c. Peephole optimizations
d. more...
Support ASIC-friendly IR and helper function
1. Slice/InsertTensor/Tile/Gather/Scatter (HIR)
2. TensorView (LIR): a view of an existing tensor and does not allocate any new memory
3. Tensor class: represent a contiguous n-dim array. (copyRawFrom/copySlice/Transpose)
4. Handles: easy to access/operation on a Tensor
/// Create a tensor of type Float, of the shape {4 x 2}.
Tensor inputs(ElemKind::FloatTy, {4, 2});
/// Create a handle to the tensor.
auto I = inputs.getHandle<float>();
/// Store an element to the tensor at index {0, 0}.
I.at({0, 0}) = 13.1;
Cons (?)
1. There is only ShareBuffers flag to enable/disable optimization.
2. There is only one memory space in the one LIR function. If you backend has two memory spaces in
the one LIR function, some ShareBuffer optimization will generate unwanted result.
IRFunctionplaceholder
weight
Cons (?)
3. We does not see any advanced optimization comparing with TVM or in-house compiler.
ex. activation/weight partition when memory insufficient, reuse activation to avoid memory
movement, computation and data movement parallelism, more..
Make you want to do something on Glow!?
You can try to
1. Add a real ASIC backend
2. Add more advanced optimizations
3. Offloading subgraph to different backend
a. how to cowork with cpu
4. Improve JIT performance
a. How to support dynamic input shape?
b. How to support ROI pooling layer? (becuase the layer parameter is runtime information)
5. How to debug optimized model
6. Advanced scheduler
7. Advanced memory allocator
8. more..
ex. help function for advanced optimization
https://sampl.cs.washington.edu/tvmconf/slides/Thierry-Moreau-VTA.pdf
Reference.
1. https://github.com/pytorch/glow
2. https://sophon-edge.gitbook.io/project/getting-started/bmnnsdk-framework
3. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-en
gine/
4. Sophon backend
5. DAG graph sponsor
Q&A

More Related Content

What's hot

Low Level Exploits
Low Level ExploitsLow Level Exploits
Low Level Exploitshughpearse
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteDVClub
 
EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4Max Kleiner
 
06 - ELF format, knowing your friend
06 - ELF format, knowing your friend06 - ELF format, knowing your friend
06 - ELF format, knowing your friendAlexandre Moneger
 
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...Alexey Smirnov
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingJonathan Salwan
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingAndreas Schreiber
 
Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Max Kleiner
 
Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programminghybr1s
 
Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Max Kleiner
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)Douglas Chen
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASJongsu "Liam" Kim
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol ResolutionKen Kawamoto
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkAlexey Smirnov
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016Ehsan Totoni
 

What's hot (20)

Low Level Exploits
Low Level ExploitsLow Level Exploits
Low Level Exploits
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
 
EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4EKON 25 Python4Delphi_mX4
EKON 25 Python4Delphi_mX4
 
06 - ELF format, knowing your friend
06 - ELF format, knowing your friend06 - ELF format, knowing your friend
06 - ELF format, knowing your friend
 
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented Programming
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2
 
GCC RTL and Machine Description
GCC RTL and Machine DescriptionGCC RTL and Machine Description
GCC RTL and Machine Description
 
Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programming
 
Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475Ekon 25 Python4Delphi_MX475
Ekon 25 Python4Delphi_MX475
 
Return oriented programming (ROP)
Return oriented programming (ROP)Return oriented programming (ROP)
Return oriented programming (ROP)
 
Return Oriented Programming (ROP) Based Exploits - Part I
Return Oriented Programming  (ROP) Based Exploits  - Part IReturn Oriented Programming  (ROP) Based Exploits  - Part I
Return Oriented Programming (ROP) Based Exploits - Part I
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLAS
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol Resolution
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 

Similar to Glow user review

Linux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungLinux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungdns -
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗Pofat Tseng
 
Part II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationPart II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationWei-Ren Chen
 
Find your own iOS kernel bug
Find your own iOS kernel bugFind your own iOS kernel bug
Find your own iOS kernel bugGustavo Martinez
 
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...7mind
 
Whirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic LinkerWhirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic LinkerGonçalo Gomes
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
Hs P005 Reflective Dll Injection
Hs P005 Reflective Dll InjectionHs P005 Reflective Dll Injection
Hs P005 Reflective Dll InjectionKarlFrank99
 
How to build a tool for operating Flink on Kubernetes
How to build a tool for operating Flink on KubernetesHow to build a tool for operating Flink on Kubernetes
How to build a tool for operating Flink on KubernetesAndreaMedeghini
 
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bugWang Hao Lee
 
Hibernate complete Training
Hibernate complete TrainingHibernate complete Training
Hibernate complete Trainingsourabh aggarwal
 
bh-europe-01-clowes
bh-europe-01-clowesbh-europe-01-clowes
bh-europe-01-clowesguest3e5046
 
Php7 extensions workshop
Php7 extensions workshopPhp7 extensions workshop
Php7 extensions workshopjulien pauli
 
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012DefCamp
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainAttila Szegedi
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)JiandSon
 
The use of Symfony2 @ Overblog
The use of Symfony2 @ OverblogThe use of Symfony2 @ Overblog
The use of Symfony2 @ OverblogXavier Hausherr
 

Similar to Glow user review (20)

Linux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesungLinux kernel driver tutorial vorlesung
Linux kernel driver tutorial vorlesung
 
Basics of building a blackfin application
Basics of building a blackfin applicationBasics of building a blackfin application
Basics of building a blackfin application
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
 
Part II: LLVM Intermediate Representation
Part II: LLVM Intermediate RepresentationPart II: LLVM Intermediate Representation
Part II: LLVM Intermediate Representation
 
Find your own iOS kernel bug
Find your own iOS kernel bugFind your own iOS kernel bug
Find your own iOS kernel bug
 
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
 
Whirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic LinkerWhirlwind tour of the Runtime Dynamic Linker
Whirlwind tour of the Runtime Dynamic Linker
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Test02
Test02Test02
Test02
 
Hs P005 Reflective Dll Injection
Hs P005 Reflective Dll InjectionHs P005 Reflective Dll Injection
Hs P005 Reflective Dll Injection
 
How to build a tool for operating Flink on Kubernetes
How to build a tool for operating Flink on KubernetesHow to build a tool for operating Flink on Kubernetes
How to build a tool for operating Flink on Kubernetes
 
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
 
Hibernate complete Training
Hibernate complete TrainingHibernate complete Training
Hibernate complete Training
 
bh-europe-01-clowes
bh-europe-01-clowesbh-europe-01-clowes
bh-europe-01-clowes
 
Php7 extensions workshop
Php7 extensions workshopPhp7 extensions workshop
Php7 extensions workshop
 
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
 
Readme
ReadmeReadme
Readme
 
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
 
The use of Symfony2 @ Overblog
The use of Symfony2 @ OverblogThe use of Symfony2 @ Overblog
The use of Symfony2 @ Overblog
 

Recently uploaded

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 

Glow user review

  • 1. Glow user review @96cfc41 Kuan Hsu Chen (Zakk)
  • 2. Goal 1. Introduce Compilation Flow 2. Introduce IR Design 3. Pros and Cons 4. TODO Features 5. Make you want to do something on Glow!? https://www.books.com.tw/products/0010750924
  • 3. Compilation Flow Importer HIR optimizer GenLIR Scheduler Optimize Caffe2/ONNX/ ONNXIFI builtin HIR builtin HIR Backend Pre- Lowering Node Lowering Backend Post- Lowering builtin or custom HIR builtin or custom HIR builtin or custom HIR builtin or custom LIR glow::generateAndOptimizeIR(Function *F, bool shouldShareBuffers) Backend save Backend compile glow::CompiledFunction Standalone Executable Bundle backend code
  • 5. High Level IR (HIR) 1. dataflow node-based graph representation 2. There are two types (placeholder and constant) of variable. All nodes can access all variables in the Module. Currently glow can not distinguish placeholder is input or output, but the importer has naming rule convention when creating output placeholder, so we use the name with “save_” prefix to workaround this issue. 3. Glow is NHWC order. (Caffe2/ONNX are NCHW). If the backend only supports NCHW, remember to insert transpose node in pre-lowering. 4. If some model format only support float type but user want to quantize model. After optimization, the graph inserted quantize/dequantize before/after input/output placeholder. But in some case, you would like to perform quantize/dequantize in user program side.
  • 6. Low Level IR (LIR) 1. Instruction-based representation, LIR allows multiple output (ex. loss function) 2. The operand has qualifiers with @in, @out, @inout. So the user instruction in the users list maybe is output value. 3. All LIR can be built-in Op. 4. LIR designs the allocactivation/dealloc instruction to measure live range of activation. LIR optimizer will sink/hoist the alloc/dealloc place to reduce memory pressure. But in some backend, we will insert allocactivation because the weights will occupy memory 5. Glow provides simple static time memory allocator (first-fit) for backend usage.
  • 7. Backend in Glow 1. tools/ClassGen/Backends/: Define backend-specific Nodes and Instructions for the Backend. (like LLVM’s tablegen) 2. Node attribute: ● .addOverwrittenInput(“Output”) ● .setHasSideEffects(true) 3. Instr attribute: ● .autoIRGen() : Framework help backend to generates translation code (HIR->LIR) . ● .inplaceOperand({"Dest", "Batch"}) ● .dataParallel()
  • 8. Backend in Glow 2. lib/Backends/: implement derivied classes for Backend and CompiledFunction. a. Backend abstract class i. bool transformPreLowering/transforPostLowering ii. bool shouldLower(const Node *N) const; iii. bool shouldShareBuffers() const; iv. compile/save v. isOpSupported(Kinded::Kind opKind, ElemKind elementTy) const; b. CompiledFunction i. execute() = 0; ii. setupRuns(), beforeRun(), afterRun(), tearDownRuns();
  • 9. Pros and Cons Pros: 1. Supprot training and inference compilation 2. Support quantization feature 3. Support many HIR and LIR optimziation and it also can work on custom nodes/instructions. 4. Support “dump DAG” 5. Support ASIC-friendly IR and helper function 6. more... Cons: 1. Does not support python interface. But user can use ONNXIFI to achieve it. 2. Not-exist any ASIC backend for reference. 3. missed some builtin operator 4. more ...
  • 10. Quantization feature 1. Quantization nodes in HIR. a. QuantizationProfile b. Quantize/Dequantize /RescaleQuantized c. IntLookupTable d. RowwiseQuantizedFullyConnected 2. Support related optimizations. a. Quantize(Dequantize(X)) -> RescaleQuantized(X) b. Dequantize(Quantize(X)) -> X c. Quantize(Constant) -> Constant d. PoolingNode(Rescale(X)) -> Rescale(PoolingNode(X)). e. more...
  • 11. Optimizations 1. Graph optimizer (HIR) a. DCE, CSE. b. Optimize specific node. i. Concat(Slice(X, 0..10), Slice(X, 10..20)) -> X ii. merge Transpose into MatMul iii. Relu (MaxPool(X)) -> MaxPool(Relu(X)) iv. merge batch normalization operations. (Inference) v. more … 2. IR optimizer (LIR) a. Reduce memory usage i. sinkAllocas/hoistDealloc/sinkTensorViews ii. eliminate copy instruction b. Eliminate redundant instructions c. Peephole optimizations d. more...
  • 12. Support ASIC-friendly IR and helper function 1. Slice/InsertTensor/Tile/Gather/Scatter (HIR) 2. TensorView (LIR): a view of an existing tensor and does not allocate any new memory 3. Tensor class: represent a contiguous n-dim array. (copyRawFrom/copySlice/Transpose) 4. Handles: easy to access/operation on a Tensor /// Create a tensor of type Float, of the shape {4 x 2}. Tensor inputs(ElemKind::FloatTy, {4, 2}); /// Create a handle to the tensor. auto I = inputs.getHandle<float>(); /// Store an element to the tensor at index {0, 0}. I.at({0, 0}) = 13.1;
  • 13. Cons (?) 1. There is only ShareBuffers flag to enable/disable optimization. 2. There is only one memory space in the one LIR function. If you backend has two memory spaces in the one LIR function, some ShareBuffer optimization will generate unwanted result. IRFunctionplaceholder weight
  • 14. Cons (?) 3. We does not see any advanced optimization comparing with TVM or in-house compiler. ex. activation/weight partition when memory insufficient, reuse activation to avoid memory movement, computation and data movement parallelism, more..
  • 15. Make you want to do something on Glow!? You can try to 1. Add a real ASIC backend 2. Add more advanced optimizations 3. Offloading subgraph to different backend a. how to cowork with cpu 4. Improve JIT performance a. How to support dynamic input shape? b. How to support ROI pooling layer? (becuase the layer parameter is runtime information) 5. How to debug optimized model 6. Advanced scheduler 7. Advanced memory allocator 8. more..
  • 16. ex. help function for advanced optimization
  • 17.
  • 19. Reference. 1. https://github.com/pytorch/glow 2. https://sophon-edge.gitbook.io/project/getting-started/bmnnsdk-framework 3. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-en gine/ 4. Sophon backend 5. DAG graph sponsor
  • 20. Q&A