SlideShare a Scribd company logo
1 of 14
© The Khronos® Group Inc. 2021 - Page 1
This work is licensed under a Creative Commons Attribution 4.0 International License
OpenCL Overview
Heterogeneous Parallel Computation
Neil Trevett
Khronos President and OpenCL Chair
VP Developer Ecosystems, NVIDIA
ntrevett@nvidia.com|@neilt3d
January 2021
© The Khronos® Group Inc. 2021 - Page 2
This work is licensed under a Creative Commons Attribution 4.0 International License
Khronos Compute Acceleration Standards
GPU
GPU rendering +
compute
acceleration
Heterogeneous
compute
acceleration
Single source C++ programming
with compute acceleration
Graph-based vision and
inferencing acceleration
Lower-level APIs
Direct Hardware Control
Intermediate
Representation
(IR) supporting
parallel execution
and graphics
Higher-level Languages and APIs
Streamlined development and
performance portability
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
AI/Tensor HW
Increasing industry interest in
parallel compute acceleration to
combat the ‘End of Moore’s Law’
SYCL and SPIR were
originally OpenCL
Subgroups
© The Khronos® Group Inc. 2021 - Page 3
This work is licensed under a Creative Commons Attribution 4.0 International License
OpenCL – Low-level Parallel Programing
Complements GPU-only APIs
Simpler programming model
Relatively lightweight run-time
More language flexibility, e.g., pointers
Rigorously defined numeric precision
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL C
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
OpenCL
Devices
Host
CPU
NN HW
Runtime OpenCL API to
compile, load and execute
kernels across devices
Programming and Runtime Framework
for Application Acceleration
Offload compute-intensive kernels onto parallel
heterogeneous processors
CPUs, GPUs, DSPs, FPGAs, Tensor Processors
OpenCL C or C++ kernel languages
Platform Layer API
Query, select and initialize compute devices
Runtime API
Build and execute kernels programs on multiple devices
Explicit Application Control
Which programs execute on what device
Where data is stored in memories in the system
When programs are run, and what operations are
dependent on earlier operations
© The Khronos® Group Inc. 2021 - Page 4
This work is licensed under a Creative Commons Attribution 4.0 International License
OpenCL is Widely Deployed and Used
Accelerated Implementations
Modo
Desktop Creative Apps
CLBlast
SYCL-BLAS
Linear Algebra
Libraries
Parallel
Languages
Math and Physics
Libraries
Vision, Imaging
and Video Libraries
The industry’s most pervasive, cross-vendor, open standard
for low-level heterogeneous parallel programming
Arm Compute Library
SYCL-DNN
Machine Learning
Libraries and Frameworks
TI DL Library (TIDL)
VeriSilicon
Xiaomi
clDNN
Intel
Intel
Synopsis
MetaWare EV
NNAPI
https://en.wikipedia.org/wiki/List_of_OpenCL_applications
Vegas Pro
ForceBalance
Molecular Modelling Libraries
Machine Learning
Compilers
© The Khronos® Group Inc. 2021 - Page 5
This work is licensed under a Creative Commons Attribution 4.0 International License
OpenCL Open-Source Ecosystem Momentum
Tripling in under four years
April 2020, Folding@Home hit a new record of 2.4 exaflops,
faster than the top 500 traditional supercomputers combined,
thanks to almost 1 million new members of the network.
Folding@Home uses OpenCL to offload computations onto the
GPUs contained in the networked home PCs
October 2020, SARS-CoV-2 Simulations Go Exascale to Capture Spike Opening
with over a million citizen scientists banding together through the
Folding@home distributed computing project to create the first Exascale
computer and simulate an unprecedented 0.1 seconds of the viral proteome
© The Khronos® Group Inc. 2021 - Page 6
This work is licensed under a Creative Commons Attribution 4.0 International License
OpenCL 3.0
OpenCL C:
- kernels,
- address spaces,
- special types,
...
Most of C++17:
- inheritance,
- templates,
- type deduction,
...
C++ for OpenCL
Increased Ecosystem Flexibility
All functionality beyond OpenCL 1.2 queryable plus
macros for optional OpenCL C language features
New extensions that become widely adopted will be
integrated into new OpenCL core specifications
OpenCL C++ for OpenCL
Open-source C++ for OpenCL front end compiler
combines OpenCL C and C++17 replacing
OpenCL C++ language specification
Unified Specification
All versions of OpenCL in one specification for easier
maintenance, evolution and accessibility
Source on Khronos GitHub for community feedback,
functionality requests and bug fixes
Moving Applications to OpenCL 3.0
OpenCL 1.2 applications – no change
OpenCL 2.X applications - no code changes if all used
functionality is present
Queries recommended for future portability
C++ for OpenCL
Supported by Clang and uses the LLVM
compiler infrastructure
OpenCL C code is valid and fully compatible
Supports most C++17 features
Generates SPIR-V kernels
© The Khronos® Group Inc. 2021 - Page 7
This work is licensed under a Creative Commons Attribution 4.0 International License
Asynchronous DMA Extensions
OpenCL embraces a new class of Embedded Processors
Many DSP-like devices have Direct Memory Access hardware
Transfer data between global and local memories via DMA transactions
Transactions run asynchronously in parallel to device compute enabling wait for transactions to complete
Multiple transactions can be queued to run concurrently or in order via fences
OpenCL abstracts DMA capabilities via extended asynchronous workgroup copy built-ins
(New!) 2- and 3-dimensional async workgroup copy extensions support complex memory transfers
(New!) async workgroup fence built-in controls execution order of dependent transactions
New extensions complement the existing 1-dimensional async workgroup copy built-ins
Async Fence controls order of dependent transactions
All transactions prior to async_fence must complete
before any new transaction starts, without a
synchronous wait
async_copy1
async_copy2
async_fence
async_copy3
Async 3D-3D Copy Transaction
Copy
Transaction
Reshaping possible
Vglobal = Vlocal
Volume
global
Volume
local
The first of significant upcoming advances in OpenCL to
enhance support for embedded processors
© The Khronos® Group Inc. 2021 - Page 8
This work is licensed under a Creative Commons Attribution 4.0 International License
Roadmap: External Memory Sharing
• Generic extension to import external memory and semaphores exported by other APIs
- Explicitly hand-off memory ownership with OpenCL
- Wait and signal imported external semaphores
• Layer with API-specific interop extensions
- Vulkan interop first
- DX12 and other APIs in the future
• Improved flexibility over previous interop APIs using implicit resources
- As were used for DX9-11 and OpenGL
Import handles to memory
and semaphores
Synchronize memory
access and ownership
Vulkan
OpenCL
Interop
© The Khronos® Group Inc. 2021 - Page 9
This work is licensed under a Creative Commons Attribution 4.0 International License
Google Ports TensorFlow Lite to OpenCL
OpenCL providing ~2x inferencing
speedup over OpenGL ES
acceleration
TensorFlow Lite uses OpenGL ES as
a backup if OpenCL not available …
…but most mobile GPU vendors
provide OpenCL drivers - even if
not exposed directly to Android
developers
OpenCL is increasingly used as
acceleration target for higher-level
framework and compilers
© The Khronos® Group Inc. 2021 - Page 10
This work is licensed under a Creative Commons Attribution 4.0 International License
ML Compiler Steps
1.Import Trained
Network Description
2. Apply graph-level
optimizations e.g., node fusion,
node lowering and memory tiling
3. Decompose to primitive
instructions and emit programs
for accelerated run-times
Consistent Steps
Fast progress but still area of intense research
If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and
won’t need complex metacommands (e.g., combined primitive commands like DirectML)
Embedded NN Compilers
CEVA Deep Neural Network (CDNN)
Cadence Xtensa Neural Network Compiler (XNNC)
© The Khronos® Group Inc. 2021 - Page 11
This work is licensed under a Creative Commons Attribution 4.0 International License
SPIR-V Language Ecosystem
OpenCL C
C++ for OpenCL
clspv
triSYCL
Intel DPC++
Codeplay
ComputeCpp
LLVM
Clang
SYCL
SPIR-V LLVM
IR Translator
Khronos Open Source
3rd Party Open Source
Language Definitions
Closed Source
Environment Specs
OpenCL Vulkan
OpenCLon12
Inc. Mesa SPIR-V to
DXIL
SPIRV-Cross
GLSL
HLSL
Metal
Shading
Language
glslang
GLSL
HLSL DXC
DXIL
SPIR-V Tools
(Dis)Assembler
Validator
Optimize/Remap
Fuzzer
Reducer
OpenCL C
Online
Compilation
SPIR-V enables a rich ecosystem of languages and compilers to
target low-level APIs such as Vulkan and OpenCL, including
deployment flexibility: e.g., running OpenCL C kernels on Vulkan
IREE
© The Khronos® Group Inc. 2021 - Page 12
This work is licensed under a Creative Commons Attribution 4.0 International License
Layered OpenCL over Vulkan
• Clspv – Google’s open-source OpenCL kernel to Vulkan SPIR-V compiler
- Tracks top-of-tree LLVM and Clang, not a fork
• Clvk – prototype open-source OpenCL to Vulkan run-time API translator
• Used for shipping production apps and engines on Android
- Adobe Premiere Rush video editor – 200K lines of OpenCL C kernel code
- Butterfly Network iQ Ultrasound on Android
- Experimenting with Xiaomi MACE inferencing engine
Clang+Clspv
Compiler
OpenCL C or
C++ for OpenCL
Kernel Sources
OpenCL
Application
Host Code
Clvk run-time
API Translator
https://github.com/kpet/clvk
https://github.com/google/clspv
Vulkan
Runtime
© The Khronos® Group Inc. 2021 - Page 13
This work is licensed under a Creative Commons Attribution 4.0 International License
Layered OpenCL over DirectX12
• GPU-accelerated OpenCL on any system with DX12
- PC (x86 or Arm) and Cloud
• OpenCLOn12 - Microsoft and COLLABORA leveraging Clang/LLVM and MESA
- OpenCL 1.2 over DX12 is in development
- Also, OpenGLOn12 – OpenGL 3.3 over DX12
- https://devblogs.microsoft.com/directx/in-the-works-opencl-and-opengl-mapping-layers-to-directx/
DX12
Runtime
Clang+LLVM+
SPIR-V LLVM
OpenCL C or
C++ for OpenCL
Kernel Sources
OpenCL
Application
Host Code
CLOn12 Run-time
API Translator
Mesa SPIR-V
to DXIL
DXIL
Translates through
MESA’s NIR Intermediate
Representation
© The Khronos® Group Inc. 2021 - Page 14
This work is licensed under a Creative Commons Attribution 4.0 International License
Get Involved!
• OpenCL 3.0 increases deployment flexibility and
sets the stage for raising the bar on pervasively available functionality
- https://www.khronos.org/registry/OpenCL/
• OpenCL specification feedback on GitHub
- https://github.com/KhronosGroup/OpenCL-Docs/issues
• We want to know what you need next from OpenCL on the Khronos Forums!
- https://community.khronos.org/c/opencl
• Engage with Khronos and help OpenCL evolve
- Join as a Khronos member for a voice and a vote in any Khronos standard
- Or request an invite to the OpenCL Advisory Panel
- https://www.khronos.org/members/
• Neil Trevett
- ntrevett@nvidia.com
- @neilt3d

More Related Content

What's hot

ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...Kuralamudhan Ramakrishnan
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeOpen Source Experience
 
Container Native Development Tools - Talk by Mickey Boxell
Container Native Development Tools - Talk by Mickey BoxellContainer Native Development Tools - Talk by Mickey Boxell
Container Native Development Tools - Talk by Mickey BoxellOracle Developers
 
Apache DeviceMap - ApacheCon Europe 2014
Apache DeviceMap - ApacheCon Europe 2014Apache DeviceMap - ApacheCon Europe 2014
Apache DeviceMap - ApacheCon Europe 2014Werner Keil
 
Cloud native buildpacks_collabnix
Cloud native buildpacks_collabnixCloud native buildpacks_collabnix
Cloud native buildpacks_collabnixSuman Chakraborty
 
P to V to C: The Value of Bringing “Everything” to Containers
P to V to C: The Value of Bringing “Everything” to ContainersP to V to C: The Value of Bringing “Everything” to Containers
P to V to C: The Value of Bringing “Everything” to ContainersVMware Tanzu
 
Cloud native buildpacks-cncf
Cloud native buildpacks-cncfCloud native buildpacks-cncf
Cloud native buildpacks-cncfSuman Chakraborty
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
PKS is Not JAK8sP (Just Another Kubernetes Platform)
PKS is Not JAK8sP (Just Another Kubernetes Platform)PKS is Not JAK8sP (Just Another Kubernetes Platform)
PKS is Not JAK8sP (Just Another Kubernetes Platform)VMware Tanzu
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
cncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetescncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetesKrishna-Kumar
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...mfrancis
 

What's hot (20)

NFV features in kubernetes
NFV features in kubernetesNFV features in kubernetes
NFV features in kubernetes
 
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
ONS 2018 LA - Intel Tutorial: Cloud Native to NFV - Alon Bernstein, Cisco & K...
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edge
 
Shree duth awasthi_cv
Shree duth awasthi_cvShree duth awasthi_cv
Shree duth awasthi_cv
 
Shree_Duth_Awasthi_Resume
Shree_Duth_Awasthi_ResumeShree_Duth_Awasthi_Resume
Shree_Duth_Awasthi_Resume
 
Container Native Development Tools - Talk by Mickey Boxell
Container Native Development Tools - Talk by Mickey BoxellContainer Native Development Tools - Talk by Mickey Boxell
Container Native Development Tools - Talk by Mickey Boxell
 
Apache DeviceMap - ApacheCon Europe 2014
Apache DeviceMap - ApacheCon Europe 2014Apache DeviceMap - ApacheCon Europe 2014
Apache DeviceMap - ApacheCon Europe 2014
 
Cloud native buildpacks_collabnix
Cloud native buildpacks_collabnixCloud native buildpacks_collabnix
Cloud native buildpacks_collabnix
 
P to V to C: The Value of Bringing “Everything” to Containers
P to V to C: The Value of Bringing “Everything” to ContainersP to V to C: The Value of Bringing “Everything” to Containers
P to V to C: The Value of Bringing “Everything” to Containers
 
Cloud native buildpacks-cncf
Cloud native buildpacks-cncfCloud native buildpacks-cncf
Cloud native buildpacks-cncf
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Considering Bare Metal
Considering Bare MetalConsidering Bare Metal
Considering Bare Metal
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
PKS is Not JAK8sP (Just Another Kubernetes Platform)
PKS is Not JAK8sP (Just Another Kubernetes Platform)PKS is Not JAK8sP (Just Another Kubernetes Platform)
PKS is Not JAK8sP (Just Another Kubernetes Platform)
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
cncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetescncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetes
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...
Building Server-Side Eclipse based Web applications - Jochen Hiller, Principa...
 

Similar to OpenCL Overview Japan Virtual Open House Feb 2021

“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...Edge AI and Vision Alliance
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...Edge AI and Vision Alliance
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...Edge AI and Vision Alliance
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...Edge AI and Vision Alliance
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductTyrone Systems
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...Edge AI and Vision Alliance
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...Edge AI and Vision Alliance
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeLee Calcote
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summits
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0Marcel Mitran
 
Docker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersDocker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersRyan Hodgin
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareMark Hinkle
 
ERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projectsERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projectsChristian Charreyre
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Pradeep Singh
 

Similar to OpenCL Overview Japan Virtual Open House Feb 2021 (20)

“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtime
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
Docker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersDocker Overview - Rise of the Containers
Docker Overview - Rise of the Containers
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source Software
 
ERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projectsERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projects
 
Linux internals v4
Linux internals v4Linux internals v4
Linux internals v4
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 

More from The Khronos Group Inc.

Vulkan Ray Tracing Update JP Translation
Vulkan Ray Tracing Update JP TranslationVulkan Ray Tracing Update JP Translation
Vulkan Ray Tracing Update JP TranslationThe Khronos Group Inc.
 

More from The Khronos Group Inc. (20)

OpenXR 1.0 Reference Guide
OpenXR 1.0 Reference GuideOpenXR 1.0 Reference Guide
OpenXR 1.0 Reference Guide
 
Vulkan Ray Tracing Update JP Translation
Vulkan Ray Tracing Update JP TranslationVulkan Ray Tracing Update JP Translation
Vulkan Ray Tracing Update JP Translation
 
Vulkan ML JP Translation
Vulkan ML JP TranslationVulkan ML JP Translation
Vulkan ML JP Translation
 
OpenCL Overview JP Translation
OpenCL Overview JP TranslationOpenCL Overview JP Translation
OpenCL Overview JP Translation
 
glTF overview JP Translation
glTF overview JP TranslationglTF overview JP Translation
glTF overview JP Translation
 
Khronos Overview JP Translation
Khronos Overview JP TranslationKhronos Overview JP Translation
Khronos Overview JP Translation
 
OpenCL 3.0 Reference Guide
OpenCL 3.0 Reference GuideOpenCL 3.0 Reference Guide
OpenCL 3.0 Reference Guide
 
OpenVX 1.3 Reference Guide
OpenVX 1.3 Reference GuideOpenVX 1.3 Reference Guide
OpenVX 1.3 Reference Guide
 
OpenXR 0.90 Overview Guide
OpenXR 0.90 Overview GuideOpenXR 0.90 Overview Guide
OpenXR 0.90 Overview Guide
 
Vulkan 1.1 Reference Guide
Vulkan 1.1 Reference GuideVulkan 1.1 Reference Guide
Vulkan 1.1 Reference Guide
 
SYCL 1.2.1 Reference Card
SYCL 1.2.1 Reference CardSYCL 1.2.1 Reference Card
SYCL 1.2.1 Reference Card
 
OpenCL 2.2 Reference Guide
OpenCL 2.2 Reference GuideOpenCL 2.2 Reference Guide
OpenCL 2.2 Reference Guide
 
OpenGL 4.6 Reference Guide
OpenGL 4.6 Reference GuideOpenGL 4.6 Reference Guide
OpenGL 4.6 Reference Guide
 
glTF 2.0 Reference Guide
glTF 2.0 Reference GuideglTF 2.0 Reference Guide
glTF 2.0 Reference Guide
 
OpenVX 1.2 Reference Guide
OpenVX 1.2 Reference GuideOpenVX 1.2 Reference Guide
OpenVX 1.2 Reference Guide
 
WebGL 2.0 Reference Guide
WebGL 2.0 Reference GuideWebGL 2.0 Reference Guide
WebGL 2.0 Reference Guide
 
OpenGL SC 2.0 Quick Reference
OpenGL SC 2.0 Quick ReferenceOpenGL SC 2.0 Quick Reference
OpenGL SC 2.0 Quick Reference
 
OpenVX 1.1 Reference Guide
OpenVX 1.1 Reference GuideOpenVX 1.1 Reference Guide
OpenVX 1.1 Reference Guide
 
Vulkan 1.0 Quick Reference
Vulkan 1.0 Quick ReferenceVulkan 1.0 Quick Reference
Vulkan 1.0 Quick Reference
 
OpenCL 2.1 Reference Guide
OpenCL 2.1 Reference GuideOpenCL 2.1 Reference Guide
OpenCL 2.1 Reference Guide
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 

OpenCL Overview Japan Virtual Open House Feb 2021

  • 1. © The Khronos® Group Inc. 2021 - Page 1 This work is licensed under a Creative Commons Attribution 4.0 International License OpenCL Overview Heterogeneous Parallel Computation Neil Trevett Khronos President and OpenCL Chair VP Developer Ecosystems, NVIDIA ntrevett@nvidia.com|@neilt3d January 2021
  • 2. © The Khronos® Group Inc. 2021 - Page 2 This work is licensed under a Creative Commons Attribution 4.0 International License Khronos Compute Acceleration Standards GPU GPU rendering + compute acceleration Heterogeneous compute acceleration Single source C++ programming with compute acceleration Graph-based vision and inferencing acceleration Lower-level APIs Direct Hardware Control Intermediate Representation (IR) supporting parallel execution and graphics Higher-level Languages and APIs Streamlined development and performance portability GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Increasing industry interest in parallel compute acceleration to combat the ‘End of Moore’s Law’ SYCL and SPIR were originally OpenCL Subgroups
  • 3. © The Khronos® Group Inc. 2021 - Page 3 This work is licensed under a Creative Commons Attribution 4.0 International License OpenCL – Low-level Parallel Programing Complements GPU-only APIs Simpler programming model Relatively lightweight run-time More language flexibility, e.g., pointers Rigorously defined numeric precision OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL C Kernel Code GPU DSP CPU CPU FPGA OpenCL Devices Host CPU NN HW Runtime OpenCL API to compile, load and execute kernels across devices Programming and Runtime Framework for Application Acceleration Offload compute-intensive kernels onto parallel heterogeneous processors CPUs, GPUs, DSPs, FPGAs, Tensor Processors OpenCL C or C++ kernel languages Platform Layer API Query, select and initialize compute devices Runtime API Build and execute kernels programs on multiple devices Explicit Application Control Which programs execute on what device Where data is stored in memories in the system When programs are run, and what operations are dependent on earlier operations
  • 4. © The Khronos® Group Inc. 2021 - Page 4 This work is licensed under a Creative Commons Attribution 4.0 International License OpenCL is Widely Deployed and Used Accelerated Implementations Modo Desktop Creative Apps CLBlast SYCL-BLAS Linear Algebra Libraries Parallel Languages Math and Physics Libraries Vision, Imaging and Video Libraries The industry’s most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming Arm Compute Library SYCL-DNN Machine Learning Libraries and Frameworks TI DL Library (TIDL) VeriSilicon Xiaomi clDNN Intel Intel Synopsis MetaWare EV NNAPI https://en.wikipedia.org/wiki/List_of_OpenCL_applications Vegas Pro ForceBalance Molecular Modelling Libraries Machine Learning Compilers
  • 5. © The Khronos® Group Inc. 2021 - Page 5 This work is licensed under a Creative Commons Attribution 4.0 International License OpenCL Open-Source Ecosystem Momentum Tripling in under four years April 2020, Folding@Home hit a new record of 2.4 exaflops, faster than the top 500 traditional supercomputers combined, thanks to almost 1 million new members of the network. Folding@Home uses OpenCL to offload computations onto the GPUs contained in the networked home PCs October 2020, SARS-CoV-2 Simulations Go Exascale to Capture Spike Opening with over a million citizen scientists banding together through the Folding@home distributed computing project to create the first Exascale computer and simulate an unprecedented 0.1 seconds of the viral proteome
  • 6. © The Khronos® Group Inc. 2021 - Page 6 This work is licensed under a Creative Commons Attribution 4.0 International License OpenCL 3.0 OpenCL C: - kernels, - address spaces, - special types, ... Most of C++17: - inheritance, - templates, - type deduction, ... C++ for OpenCL Increased Ecosystem Flexibility All functionality beyond OpenCL 1.2 queryable plus macros for optional OpenCL C language features New extensions that become widely adopted will be integrated into new OpenCL core specifications OpenCL C++ for OpenCL Open-source C++ for OpenCL front end compiler combines OpenCL C and C++17 replacing OpenCL C++ language specification Unified Specification All versions of OpenCL in one specification for easier maintenance, evolution and accessibility Source on Khronos GitHub for community feedback, functionality requests and bug fixes Moving Applications to OpenCL 3.0 OpenCL 1.2 applications – no change OpenCL 2.X applications - no code changes if all used functionality is present Queries recommended for future portability C++ for OpenCL Supported by Clang and uses the LLVM compiler infrastructure OpenCL C code is valid and fully compatible Supports most C++17 features Generates SPIR-V kernels
  • 7. © The Khronos® Group Inc. 2021 - Page 7 This work is licensed under a Creative Commons Attribution 4.0 International License Asynchronous DMA Extensions OpenCL embraces a new class of Embedded Processors Many DSP-like devices have Direct Memory Access hardware Transfer data between global and local memories via DMA transactions Transactions run asynchronously in parallel to device compute enabling wait for transactions to complete Multiple transactions can be queued to run concurrently or in order via fences OpenCL abstracts DMA capabilities via extended asynchronous workgroup copy built-ins (New!) 2- and 3-dimensional async workgroup copy extensions support complex memory transfers (New!) async workgroup fence built-in controls execution order of dependent transactions New extensions complement the existing 1-dimensional async workgroup copy built-ins Async Fence controls order of dependent transactions All transactions prior to async_fence must complete before any new transaction starts, without a synchronous wait async_copy1 async_copy2 async_fence async_copy3 Async 3D-3D Copy Transaction Copy Transaction Reshaping possible Vglobal = Vlocal Volume global Volume local The first of significant upcoming advances in OpenCL to enhance support for embedded processors
  • 8. © The Khronos® Group Inc. 2021 - Page 8 This work is licensed under a Creative Commons Attribution 4.0 International License Roadmap: External Memory Sharing • Generic extension to import external memory and semaphores exported by other APIs - Explicitly hand-off memory ownership with OpenCL - Wait and signal imported external semaphores • Layer with API-specific interop extensions - Vulkan interop first - DX12 and other APIs in the future • Improved flexibility over previous interop APIs using implicit resources - As were used for DX9-11 and OpenGL Import handles to memory and semaphores Synchronize memory access and ownership Vulkan OpenCL Interop
  • 9. © The Khronos® Group Inc. 2021 - Page 9 This work is licensed under a Creative Commons Attribution 4.0 International License Google Ports TensorFlow Lite to OpenCL OpenCL providing ~2x inferencing speedup over OpenGL ES acceleration TensorFlow Lite uses OpenGL ES as a backup if OpenCL not available … …but most mobile GPU vendors provide OpenCL drivers - even if not exposed directly to Android developers OpenCL is increasingly used as acceleration target for higher-level framework and compilers
  • 10. © The Khronos® Group Inc. 2021 - Page 10 This work is licensed under a Creative Commons Attribution 4.0 International License ML Compiler Steps 1.Import Trained Network Description 2. Apply graph-level optimizations e.g., node fusion, node lowering and memory tiling 3. Decompose to primitive instructions and emit programs for accelerated run-times Consistent Steps Fast progress but still area of intense research If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and won’t need complex metacommands (e.g., combined primitive commands like DirectML) Embedded NN Compilers CEVA Deep Neural Network (CDNN) Cadence Xtensa Neural Network Compiler (XNNC)
  • 11. © The Khronos® Group Inc. 2021 - Page 11 This work is licensed under a Creative Commons Attribution 4.0 International License SPIR-V Language Ecosystem OpenCL C C++ for OpenCL clspv triSYCL Intel DPC++ Codeplay ComputeCpp LLVM Clang SYCL SPIR-V LLVM IR Translator Khronos Open Source 3rd Party Open Source Language Definitions Closed Source Environment Specs OpenCL Vulkan OpenCLon12 Inc. Mesa SPIR-V to DXIL SPIRV-Cross GLSL HLSL Metal Shading Language glslang GLSL HLSL DXC DXIL SPIR-V Tools (Dis)Assembler Validator Optimize/Remap Fuzzer Reducer OpenCL C Online Compilation SPIR-V enables a rich ecosystem of languages and compilers to target low-level APIs such as Vulkan and OpenCL, including deployment flexibility: e.g., running OpenCL C kernels on Vulkan IREE
  • 12. © The Khronos® Group Inc. 2021 - Page 12 This work is licensed under a Creative Commons Attribution 4.0 International License Layered OpenCL over Vulkan • Clspv – Google’s open-source OpenCL kernel to Vulkan SPIR-V compiler - Tracks top-of-tree LLVM and Clang, not a fork • Clvk – prototype open-source OpenCL to Vulkan run-time API translator • Used for shipping production apps and engines on Android - Adobe Premiere Rush video editor – 200K lines of OpenCL C kernel code - Butterfly Network iQ Ultrasound on Android - Experimenting with Xiaomi MACE inferencing engine Clang+Clspv Compiler OpenCL C or C++ for OpenCL Kernel Sources OpenCL Application Host Code Clvk run-time API Translator https://github.com/kpet/clvk https://github.com/google/clspv Vulkan Runtime
  • 13. © The Khronos® Group Inc. 2021 - Page 13 This work is licensed under a Creative Commons Attribution 4.0 International License Layered OpenCL over DirectX12 • GPU-accelerated OpenCL on any system with DX12 - PC (x86 or Arm) and Cloud • OpenCLOn12 - Microsoft and COLLABORA leveraging Clang/LLVM and MESA - OpenCL 1.2 over DX12 is in development - Also, OpenGLOn12 – OpenGL 3.3 over DX12 - https://devblogs.microsoft.com/directx/in-the-works-opencl-and-opengl-mapping-layers-to-directx/ DX12 Runtime Clang+LLVM+ SPIR-V LLVM OpenCL C or C++ for OpenCL Kernel Sources OpenCL Application Host Code CLOn12 Run-time API Translator Mesa SPIR-V to DXIL DXIL Translates through MESA’s NIR Intermediate Representation
  • 14. © The Khronos® Group Inc. 2021 - Page 14 This work is licensed under a Creative Commons Attribution 4.0 International License Get Involved! • OpenCL 3.0 increases deployment flexibility and sets the stage for raising the bar on pervasively available functionality - https://www.khronos.org/registry/OpenCL/ • OpenCL specification feedback on GitHub - https://github.com/KhronosGroup/OpenCL-Docs/issues • We want to know what you need next from OpenCL on the Khronos Forums! - https://community.khronos.org/c/opencl • Engage with Khronos and help OpenCL evolve - Join as a Khronos member for a voice and a vote in any Khronos standard - Or request an invite to the OpenCL Advisory Panel - https://www.khronos.org/members/ • Neil Trevett - ntrevett@nvidia.com - @neilt3d