SlideShare a Scribd company logo
1 of 14
Reducing Deep Learning
Integration Costs and Maximising
Compute Efficiency for Multiple AI
Hardware
Jianhui Li
Principal Engineer, Intel
2
Deep Learning Trends
INT8
FP32
Training
Inference
Deep Learning Steps
Data Precision
Topologies
Computer Vision Natural Language Processing
Recommendation Systems
Re-Inforcement Learning
Frameworks
ResNet-50, Squeezenets, Mobilenet GNMT, Bert
NCF, Wide & Deep
MiniGO
Diverse and rapidly
evolving
BFloat16
The driving forces of AI Optimization
Diversifying AI
application
3
(conv: General Matrix Multiply)
conv
Recommendation
Engine
conv
Natural Language Processing
conv
Computer
Vision
Hardware
Acceleration
for AI
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
Accelera
tors
4
Deep learn workload time breakdown
• Accelerating matrix multiplication alone doesn’t solve the problem
• Conv and Matmul operations are less dominant beyond computer vision application
• Low-Precision introduces memory bound quantize operations
• Amdahl's law
• Need to have aggressive fusion
*Profiling data collected from internal performance study
Accelerating Matrix Multiplication
5
Dot product
Matrix A
Matrix C
Matrix B
M
K
K
N
Dot product with
matrix operation
Matrix A
Matrix C
Matrix B
M
K
K
N
potential
fusion function
6
Performance
Library
Integration
Framework
Graph
1
3
4
2
1
3
4
2
Framework
Runtime
1
3
4
2
Pattern
Matcher
Graph
Rewriter
Function API
Extend Function API to support Fusion
Matmul
+Relu
Mat
mul
Activ
ation
Norm RNN
Conv
+Relu
Kernel wrapper
Performance Library
implements DNN ops and
fused op and exposed
using function APIs
Dispatch fused OPs to
registered library functions
at Framework Runtime
Enhance FW pattern
matcher and replace
matched subgraph as one
fused op backed by library
functions
1
2 3
Gelu
Framework Graph
Representation for Gelu
Passing
Graph
Limitation of Pattern Match
7
Another Framework Graph
Representation for Gelu
Passing
Graph
Gelu
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
NHWC
Output1
NHWC
Output2
NHWC
Small pattern miss optimization for large graph
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
Blocked Layout
Output1
Blocked Layout
Output2
NHWC
Pattern too rigid to match the input graphs
8
• Graph API allows HW backend to maximize performance
• Same integration for multiple AI HW: CPU, GPU, and accelerators
Today
Deep Learning frameworks
Primitives API
HW
Accel
Future
Deep Learning frameworks
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
HW
Accel
Primitives API + Graph API
oneDNN
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
oneDNN
oneDNN is evolving…
9
Framework
Runtime
Context
Graph
Rewrite
get_partitions()
Framework Graph
Passing
Graph
1
3
4
2
oneDNN
Graph API add_op()
1
3
4
2
DL
Framework
oneDNN
Graph
Backend
1
3
4
2
compile() execute()
Forming
graph
1
3
4
2
Backend decides
partition
4
2
Backend compiles
partition
4
2
Backend executes
compiled partition
4
2
oneDNN Graph API
10
oneDNN Graph API Usage
oneDNN
Graph API
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
3
4
2
CPU GPU
Intel®, ARM Intel®, NVIDIA GPU
* Other names and brands may be claimed as the property of others.
Other implementations
Accelerators
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
2
4
3
Leverage oneDNN based framework
integration and oneDNN implementation
Leverage oneDNN based framework
integration and bring your own
implementation based on backend API
Unified API for DL
acceleration libraries
targeting AI HWs
1
3
4
2
4
2 4
2 4
2
oneDNN w/ Graph
backend API
Industry
Momentum
oneDNN implementation
ported to A64FX Fugaku CPU
Optimized for the Armv8-A and
SVE instruction set
9.3x speedup for Tensorflow
Resnet-50 training and 7.8x for
inference on A64FX
https://github.com/oneapi-src/oneDNN
11
https://blog.fltech.dev/entry/2020/11/19/fugaku-onednn-deep-dive-en
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy
Call to action
• Join us on this journey -
• Hardware developers – read, provide feedback, and adopt oneDNN Graph for
XPU computing!​
https://spec.oneapi.com/onednn-graph/latest/
https://github.com/oneapi-src/oneDNN/tree/dev-graph
• Check out www.oneAPI.com for oneAPI specification
• Software developers – try out oneAPI in the Intel DevCloud
https://software.intel.com/content/www/us/en/develop/tools/devcloud.html
12
Preview
Notices and Disclaimers
• Intel technologies may require enabled hardware, software or service
activation.
• No product or component can be absolutely secure.
• Your costs and results may vary.
• © Intel Corporation. Intel, the Intel logo, and other Intel marks are
trademarks of Intel Corporation or its subsidiaries. Other names and
brands may be claimed as the property of others.
13
oneCCL
Specification
14
Thank You!
http://oneapi.com

More Related Content

What's hot

Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & Profiling
Anil Kumar Pugalia
 
Van jaconson netchannels
Van jaconson netchannelsVan jaconson netchannels
Van jaconson netchannels
Susant Sahani
 
GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)
智啓 出川
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
Moriyoshi Koizumi
 

What's hot (20)

Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)Vivado hls勉強会5(axi4 stream)
Vivado hls勉強会5(axi4 stream)
 
Introduction to Skia by Ryan Chou @20141008
Introduction to Skia by Ryan Chou @20141008Introduction to Skia by Ryan Chou @20141008
Introduction to Skia by Ryan Chou @20141008
 
Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)
 
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
DNNコンパイラの歩みと最近の動向 〜TVMを中心に〜
 
Git and git flow
Git and git flowGit and git flow
Git and git flow
 
和艦長一起玩轉 GitLab & GitLab Workflow
和艦長一起玩轉 GitLab & GitLab Workflow和艦長一起玩轉 GitLab & GitLab Workflow
和艦長一起玩轉 GitLab & GitLab Workflow
 
How to use STARC RTL Design Style Guide Verilog-HDL 2011 version
How to use STARC RTL Design Style Guide Verilog-HDL 2011 versionHow to use STARC RTL Design Style Guide Verilog-HDL 2011 version
How to use STARC RTL Design Style Guide Verilog-HDL 2011 version
 
[Kotlin 讀書會第五梯次] 深入淺出 Kotlin 第一章導讀
[Kotlin 讀書會第五梯次] 深入淺出 Kotlin 第一章導讀[Kotlin 讀書會第五梯次] 深入淺出 Kotlin 第一章導讀
[Kotlin 讀書會第五梯次] 深入淺出 Kotlin 第一章導讀
 
Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & Profiling
 
New VIdeo CODEC AV1
New VIdeo CODEC AV1 New VIdeo CODEC AV1
New VIdeo CODEC AV1
 
ACRi HLSチャレンジ 高速化テクニック紹介
ACRi HLSチャレンジ 高速化テクニック紹介ACRi HLSチャレンジ 高速化テクニック紹介
ACRi HLSチャレンジ 高速化テクニック紹介
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI InferencingInnovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
 
Zynq + Vivado HLS入門
Zynq + Vivado HLS入門Zynq + Vivado HLS入門
Zynq + Vivado HLS入門
 
Van jaconson netchannels
Van jaconson netchannelsVan jaconson netchannels
Van jaconson netchannels
 
JUnit5とAndroidのテスト
JUnit5とAndroidのテストJUnit5とAndroidのテスト
JUnit5とAndroidのテスト
 
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
 
GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)GPGPU Seminar (GPGPU and CUDA Fortran)
GPGPU Seminar (GPGPU and CUDA Fortran)
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
 
Study on Android Emulator
Study on Android EmulatorStudy on Android Emulator
Study on Android Emulator
 

Similar to Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 

Similar to Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session (20)

Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Host Simulation
Host SimulationHost Simulation
Host Simulation
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Enabling NFV features in kubernetes
Enabling NFV features in kubernetesEnabling NFV features in kubernetes
Enabling NFV features in kubernetes
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
Leveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge DevicesLeveraging Artificial Intelligence Processing on Edge Devices
Leveraging Artificial Intelligence Processing on Edge Devices
 

More from Intel® Software

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
 

Recently uploaded

Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 

Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session

  • 1. Reducing Deep Learning Integration Costs and Maximising Compute Efficiency for Multiple AI Hardware Jianhui Li Principal Engineer, Intel
  • 2. 2 Deep Learning Trends INT8 FP32 Training Inference Deep Learning Steps Data Precision Topologies Computer Vision Natural Language Processing Recommendation Systems Re-Inforcement Learning Frameworks ResNet-50, Squeezenets, Mobilenet GNMT, Bert NCF, Wide & Deep MiniGO Diverse and rapidly evolving BFloat16
  • 3. The driving forces of AI Optimization Diversifying AI application 3 (conv: General Matrix Multiply) conv Recommendation Engine conv Natural Language Processing conv Computer Vision Hardware Acceleration for AI CPU + DL Acceleration GPU +DL Acceleration Accelera tors
  • 4. 4 Deep learn workload time breakdown • Accelerating matrix multiplication alone doesn’t solve the problem • Conv and Matmul operations are less dominant beyond computer vision application • Low-Precision introduces memory bound quantize operations • Amdahl's law • Need to have aggressive fusion *Profiling data collected from internal performance study
  • 5. Accelerating Matrix Multiplication 5 Dot product Matrix A Matrix C Matrix B M K K N Dot product with matrix operation Matrix A Matrix C Matrix B M K K N potential fusion function
  • 6. 6 Performance Library Integration Framework Graph 1 3 4 2 1 3 4 2 Framework Runtime 1 3 4 2 Pattern Matcher Graph Rewriter Function API Extend Function API to support Fusion Matmul +Relu Mat mul Activ ation Norm RNN Conv +Relu Kernel wrapper Performance Library implements DNN ops and fused op and exposed using function APIs Dispatch fused OPs to registered library functions at Framework Runtime Enhance FW pattern matcher and replace matched subgraph as one fused op backed by library functions 1 2 3 Gelu
  • 7. Framework Graph Representation for Gelu Passing Graph Limitation of Pattern Match 7 Another Framework Graph Representation for Gelu Passing Graph Gelu conv relu conv relu conv relu Input NHWC Output0 NHWC Output1 NHWC Output2 NHWC Small pattern miss optimization for large graph conv relu conv relu conv relu Input NHWC Output0 Blocked Layout Output1 Blocked Layout Output2 NHWC Pattern too rigid to match the input graphs
  • 8. 8 • Graph API allows HW backend to maximize performance • Same integration for multiple AI HW: CPU, GPU, and accelerators Today Deep Learning frameworks Primitives API HW Accel Future Deep Learning frameworks CPU + DL Acceleration GPU +DL Acceleration HW Accel Primitives API + Graph API oneDNN CPU + DL Acceleration GPU +DL Acceleration oneDNN oneDNN is evolving…
  • 9. 9 Framework Runtime Context Graph Rewrite get_partitions() Framework Graph Passing Graph 1 3 4 2 oneDNN Graph API add_op() 1 3 4 2 DL Framework oneDNN Graph Backend 1 3 4 2 compile() execute() Forming graph 1 3 4 2 Backend decides partition 4 2 Backend compiles partition 4 2 Backend executes compiled partition 4 2 oneDNN Graph API
  • 10. 10 oneDNN Graph API Usage oneDNN Graph API Graph Rewrite Framework Graph Passing Graph 1 3 4 2 1 3 4 2 DL Framework Framework Runtime Context 1 3 4 2 CPU GPU Intel®, ARM Intel®, NVIDIA GPU * Other names and brands may be claimed as the property of others. Other implementations Accelerators Graph Rewrite Framework Graph Passing Graph 1 3 4 2 1 3 4 2 DL Framework Framework Runtime Context 1 2 4 3 Leverage oneDNN based framework integration and oneDNN implementation Leverage oneDNN based framework integration and bring your own implementation based on backend API Unified API for DL acceleration libraries targeting AI HWs 1 3 4 2 4 2 4 2 4 2 oneDNN w/ Graph backend API
  • 11. Industry Momentum oneDNN implementation ported to A64FX Fugaku CPU Optimized for the Armv8-A and SVE instruction set 9.3x speedup for Tensorflow Resnet-50 training and 7.8x for inference on A64FX https://github.com/oneapi-src/oneDNN 11 https://blog.fltech.dev/entry/2020/11/19/fugaku-onednn-deep-dive-en Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy
  • 12. Call to action • Join us on this journey - • Hardware developers – read, provide feedback, and adopt oneDNN Graph for XPU computing!​ https://spec.oneapi.com/onednn-graph/latest/ https://github.com/oneapi-src/oneDNN/tree/dev-graph • Check out www.oneAPI.com for oneAPI specification • Software developers – try out oneAPI in the Intel DevCloud https://software.intel.com/content/www/us/en/develop/tools/devcloud.html 12 Preview
  • 13. Notices and Disclaimers • Intel technologies may require enabled hardware, software or service activation. • No product or component can be absolutely secure. • Your costs and results may vary. • © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. 13