Successfully reported this slideshow.
Your SlideShare is downloading. ×

"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Copyright © Khronos® Group Inc. 2018 - Page 1
Vision and
Inferencing
Update
September 2018
Neil Trevett | Khronos Presiden...
Copyright © Khronos® Group Inc. 2018 - Page 2
Khronos Mission
Software
Silicon
Khronos is an International Industry Consor...
Copyright © Khronos® Group Inc. 2018 - Page 3
Khronos Primary Standards
3D Graphics
VR and AR
Heterogenous Compute
(Parall...

YouTube videos are no longer supported on SlideShare

View original on YouTube

Check these out next

1 of 27 Ad

"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

Download to read offline

For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2018-alliance-vitf-khronos

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Neil Trevett, President of the Khronos Group, delivers the presentation "Update on Khronos Standards for Vision and Machine Learning" at the Embedded Vision Alliance's December 2017 Vision Industry and Technology Forum. Trevett shares updates on recent, current and planned Khronos standardization activities aimed at streamlining the deployment of embedded vision and AI.

For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2018-alliance-vitf-khronos

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Neil Trevett, President of the Khronos Group, delivers the presentation "Update on Khronos Standards for Vision and Machine Learning" at the Embedded Vision Alliance's December 2017 Vision Industry and Technology Forum. Trevett shares updates on recent, current and planned Khronos standardization activities aimed at streamlining the deployment of embedded vision and AI.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to "Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group (20)

Advertisement

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

Advertisement

"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

  1. 1. Copyright © Khronos® Group Inc. 2018 - Page 1 Vision and Inferencing Update September 2018 Neil Trevett | Khronos President NVIDIA | VP Developer Ecosystem ntrevett@nvidia.com | @neilt3d www.khronos.org
  2. 2. Copyright © Khronos® Group Inc. 2018 - Page 2 Khronos Mission Software Silicon Khronos is an International Industry Consortium creating royalty-free, open standards to enable software to access hardware acceleration for 3D Graphics, Virtual and Augmented Reality, Parallel Computing, Neural Networks and Vision Processing
  3. 3. Copyright © Khronos® Group Inc. 2018 - Page 3 Khronos Primary Standards 3D Graphics VR and AR Heterogenous Compute (Parallel Processing) Vision and Inferencing APIs File Formats
  4. 4. Copyright © Khronos® Group Inc. 2018 - Page 4 Topics for Today 1 2 3 Landscape of vision and inferencing acceleration Where Khronos open standards provide non-proprietary ecosystem choices New updates on Khronos vision and inferencing standards
  5. 5. Copyright © Khronos® Group Inc. 2018 - Page 5 Vision and Inferencing Runtimes Machine Learning Acceleration Desktop and Cloud GPU/TPU Acceleration Diverse Inferencing Acceleration Hardware Training on Desktop and Cloud Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Training Frameworks Deployment on Embedded Devices Optimization Trained Networks Vision and Inferencing Applications Training Data Sets OR Live Data
  6. 6. Copyright © Khronos® Group Inc. 2018 - Page 6 Machine Learning Training Desktop and Cloud Hardware cuDNN MIOpen clDNN Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks TPU Authoring Interchange GPUs have well established APIs and libraries for compute acceleration
  7. 7. Copyright © Khronos® Group Inc. 2018 - Page 7 Vision and Inferencing Runtimes Machine Learning Acceleration Desktop and Cloud GPU/TPU Acceleration Diverse Inferencing Acceleration Hardware Training on Desktop and Cloud Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Training Frameworks Deployment on Embedded Devices Optimization Trained Networks Vision and Inferencing Applications Training Data Sets OR Live Data
  8. 8. Copyright © Khronos® Group Inc. 2018 - Page 8 NNEF - Neural Network Exchange Format NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator With NNEF Before NNEF NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Optimization and processing tools
  9. 9. Copyright © Khronos® Group Inc. 2018 - Page 9 Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution NNEF Captures a Neural Network Description Network Structure File Distilled, platform independent network description Human readable, syntactical elements from Python Standardized Operations Rigorously defined semantics Linear, convolution, pooling, normalization, activation, unary/binary Supports fully connected, convolutional, recurrent architectures Two Levels of Expressiveness Flat Basic transfer of computation graphs with standardized operations Simple to parse and translate to vendor specific formats Compositional Define custom compound operations Higher-level graph descriptions More complex to parse but offers more optimization hints Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution Split Structure and Data files Easy independent access to network structure or individual parameter data Set of files can use a container such as tar or zip with optional compression and encryption Can associate multiple Data Files with one Network Structure File e.g. the same data in multiple formats
  10. 10. Copyright © Khronos® Group Inc. 2018 - Page 10 NNEF 1.0 Files Syntax Parser/ Validator TensorFlow and Caffe Exporters NNEF open source projects hosted on Khronos NNEF GitHub repository Apache 2.0 license https://github.com/KhronosGroup/NNEF-Tools TensorFlow and Caffe2 Importer / Exporters Google NNAPI Convertor OpenVX Ingestion & Execution Live Imminent NNEF V1.0 released in August 2018 After positive industry feedback on Provisional specification released in December 2017 NNEF Working Group Participants
  11. 11. Copyright © Khronos® Group Inc. 2018 - Page 11 NNEF and ONNX Embedded Inferencing Import Training Interchange Defined Specification Open Source Project Stability for hardware deployment Software stack flexibility Multi-company Governance Initiated by Facebook Flexible Precision / Quantization 32-bit Floating Point only Comparing Neural Network Exchange Industry Initiatives ONNX and NNEF are Complementary - ONNX will HAVE to move fast to track authoring framework interchange - NNEF provides a stable bridge from training into edge inferencing engines Bidirectional translator in open source Initiating open source bidirectional translator Khronos tried to use LLVM as a hardware IR BUT LLVM evolves without needing to preserve backwards compatibility. Fine for software compilers – very difficult to manage for hardware toolchains and roadmaps SO Khronos created hardware oriented SPIR-V with bidirectional translation to LLVM Same Industry Dynamics as LLVM and SPIR-V
  12. 12. Copyright © Khronos® Group Inc. 2018 - Page 12 Vision and Inferencing Runtimes Machine Learning Acceleration Desktop and Cloud GPU/TPU Acceleration Diverse Inferencing Acceleration Hardware Training on Desktop and Cloud Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Training Frameworks Deployment on Embedded Devices Optimization Trained Networks Vision and Inferencing Applications Training Data Sets OR Live Data
  13. 13. Copyright © Khronos® Group Inc. 2018 - Page 13 Three Broad Inferencing Choices Run in training framework Export to inference runtime Compile to optimized code Acceleration needs high-level programming tools – typically large C++ applications E.g. TensorFlow or TensorFlow Lite The most popular industry choice today. Runtime often uses underlying acceleration APIs E.g. TensorRT, CoreML Often used to merge custom or vision code alongside inferencing runtime – generates LLVM for CPU + accelerated API code
  14. 14. Copyright © Khronos® Group Inc. 2018 - Page 14 SYCL Single Source C++ Parallel Programming • SYCL 1.2.1 Adopters Program released in July 2018 with open source conformance tests - https://www.khronos.org/news/press/khronos-releases-conformance-test-suite-for-sycl-1.2.1 • Multiple Implémentations shipping: triSYCL, ComputeCpp - http://sycl.tech • Multiple SYCL libraries for vision and inferencing - SYCL-BLAS, SYCL-DNN, SYCL-Eigen C++ Kernel Fusion can gives better performance on complex apps and libs than hand-coding Single application source file using STANDARD C++C++ templates and lambda functions separate host & device code Accelerated code passed into device OpenCL compilers
  15. 15. Copyright © Khronos® Group Inc. 2018 - Page 15 Python Client C++ Client Optional C API TensorFlow tensor Kernels (> 800 kernels) ConvolutionsMatrix multiply Eigen Tensors SYCL-BLAS Library SYCL-DNN Library TensorFlow on SYCL / OpenCL State-of-the-art C++ compilers can fuse nodes in vision and neural network graphs to provide optimized performance often faster than hand- coded applications
  16. 16. Copyright © Khronos® Group Inc. 2018 - Page 16 Three Broad Inferencing Choices Run in training framework Export to inference runtime Compile to optimized code Acceleration needs high-level programming tools – typically large C++ applications E.g. TensorFlow or TensorFlow Lite The most popular industry choice today. Runtime often uses underlying acceleration APIs E.g. TensorRT, CoreML Often used to merge custom or vision code alongside inferencing runtime – generates LLVM for CPU + accelerated API code
  17. 17. Copyright © Khronos® Group Inc. 2018 - Page 17 Platform Inferencing Stacks Microsoft Windows Windows Machine Learning (WinML) Google Android Neural Network API (NNAPI) Apple MacOS and iOS CoreML https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ https://developer.android.com/ndk/guides/neuralnetworks/ https://developer.apple.com/documentation/coreml Core ML Model Consistent Three Steps 1. Import trained NN model file 2. Build optimized version of graph 3. Run graph on accelerated runtime using underlying low-level API
  18. 18. Copyright © Khronos® Group Inc. 2018 - Page 18 NNVM - Open Compiler for AI Inferencing http://www.tvmlang.org/2017/08/17/tvm-release-announcement.html SPIR-V IR for parallel accelerators Backend in development LLVM IR for CPUs 1.Import Trained Network Description 2. Graph-level Optimizations 3. Decompose to primitive instructions and emit programs for accelerated run-times Paul G. Allen School of Computer Science & Engineering, University of Washington Facebook Glow Compiler (Graph Lowering Optimizations) https://facebook.ai/developers/tools/glow
  19. 19. Copyright © Khronos® Group Inc. 2018 - Page 19 OpenVX PowerEfficiency Computation Flexibility Dedicated Hardware GPU Compute Multi-core CPUX1 X10 X100 Vision DSPs Wide range of vision hardware architectures OpenVX provides a high-level Graph-based abstraction -> Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! -> Portable, Efficient Vision Processing! Vision Node Vision Node Vision NodeVision Node Vision Processing Graph Shipping Implementations
  20. 20. Copyright © Khronos® Group Inc. 2018 - Page 20 Extending OpenVX for Inferencing #1 Neural Network Extension • OpenVX Nodes to represent common NN Layers • 1D-4D Tensors to connect layers and common • INT16, INT7.8, INT8, and U8 Tensor Ops vxActivationLayer vxConvolutionLayer vxDeconvolutionLayer vxFullyConnectedLayer vxNormalizationLayer vxPoolingLayer vxSoftmaxLayer vxROIPoolingLayer … Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes An OpenVX graph mixing CNN nodes with traditional vision nodes NNEF Translator converts NNEF representation into OpenVX Node Graphs NNEF Translator • Ingests NNEF File and builds OpenVX node graph • Open source project in progress Importing NNEF Neural Network Descriptions
  21. 21. Copyright © Khronos® Group Inc. 2018 - Page 21 Extending OpenVX for Inferencing #2 OpenVX/OpenCL Interop • Provisional Extension • Enables custom OpenCL acceleration to be invoked from OpenVX User Kernels • Memory objects can be mapped or copied Kernel/Graph Import • Provisional Extension • Defines container for executable or IR code • Enables arbitrary code to be inserted as a OpenVX Node in a graph OpenCL Command Queue Application cl_mem buffers Fully asynchronous host-device operations during data exchange OpenVX data objects Runtime Runtime Map or copy OpenVX data objects into cl_mem buffers Copy or export cl_mem buffers into OpenVX data objects OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule OpenCL kernel execution OpenVX/OpenCL Interop Creating Custom User Nodes
  22. 22. Copyright © Khronos® Group Inc. 2018 - Page 22 NNEF and OpenVX for Inferencing Compilation Kernel Import Ingestion Proprietary Runtimes Vision Nodes User Nodes NN Extension Translator Executable Code To mix inferencing with vision and other custom processing Acceleration APIs Many inferencing stacks end up using OpenCL for hardware acceleration Compile to executable code Execute accelerated OpenVX Runtime Compile to IR/Binary
  23. 23. Copyright © Khronos® Group Inc. 2018 - Page 23 GPU OpenCL – Unique Heterogeneous Runtime FPGA DSP Custom Hardware GPU CPUCPUCPUGPU Growing number of optimized OpenCL vision and inferencing libraries Vision: OpenCV, Halide, Visioncpp Machine Learning: Xiaomi MACE, Arm Compute Library Linear Algebra: clDNN, clBlast, ViennaCL Application or Inferencing Run-time Fragmented GPU API Landscape OpenCL is the only industry standard for low-level heterogeneous compute Portable control over memory and parallel task execution “The closest you can be to your embedded accelerator and still be portable” Application or Inferencing Run-time
  24. 24. Copyright © Khronos® Group Inc. 2018 - Page 24 OpenCL Ecosystem Roadmap 2011 OpenCL 1.2 OpenCL C Kernel Language OpenCL 2.1 SPIR-V in Core 2015 SYCL 1.2 C++11 Single source programming OpenCL 2.2 C++ Kernel Language 2017 SYCL 1.2.1 C++11 Single source programming Bringing Heterogeneous compute to standard ISO C++ Khronos hosting C++17 Parallel STL C++20 Parallel STL with Ranges Proposal Processor Deployment Flexibility Parallel computation across diverse processor architectures Kernel Deployment Flexibility Execute OpenCL C kernels on Vulkan GPU runtimes OpenCL has an active three track roadmap
  25. 25. Copyright © Khronos® Group Inc. 2018 - Page 25 OpenCL Next - Feature Set Flexibility • Defining OpenCL features that become optional for enhanced deployment flexibility - API and language features e.g. floating point precisions • Feature Sets avoid fragmentation - Defined to suit specific markets – e.g. desktop, embedded vision and inferencing • Implementations are conformant if fully support feature set functionality OpenCL 2.2 Functionality = queryable, optional feature Khronos-defined OpenCL 2.2 Full Profile Feature Set Khronos-defined OpenCL 1.2 Full Profile Feature Set Industry-defined Feature Set E.g. Embedded Vision and Inferencing
  26. 26. Copyright © Khronos® Group Inc. 2018 - Page 26 Universal Deployment Flexibility Open source SPIRV-Cross converts SPIR-V to MSL or HLSLClspv and clvk open source tools OpenCL Programs Native Vulkan Drivers UWP and D3D based Consoles Open source shims convert Vulkan to Metal or D3D API calls Open source tools enable OpenCL and Vulkan apps to be increasingly deployed on any platform

×