Successfully reported this slideshow.
Your SlideShare is downloading. ×

Vulkan ML Japan Virtual Open House Feb 2021

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 10 Ad

More Related Content

Slideshows for you (20)

Similar to Vulkan ML Japan Virtual Open House Feb 2021 (20)

Advertisement

More from The Khronos Group Inc. (20)

Recently uploaded (20)

Advertisement

Vulkan ML Japan Virtual Open House Feb 2021

  1. 1. © The Khronos® Group Inc. 2020 - Page 1 This work is licensed under a Creative Commons Attribution 4.0 International License Machine Learning in Khronos | Vulkan Pierre Boudier NVIDIA January 2021
  2. 2. © The Khronos® Group Inc. 2020 - Page 2 This work is licensed under a Creative Commons Attribution 4.0 International License Objectives of Vulkan Machine Learning (ML) • Enable native Vulkan application to use ML with low latency and overhead - Avoid interop, or to embed very large third-party frameworks (python) • There are many common blocks in Neural Nets: - Matrix multiplication - Convolution - Tanh/Sigmoid/ReLU - … • But the eco system is actually diverse and new mathematical approaches are introduced every week: - Locality sensitive hashing, binary weights, sparse matrices, … • Vulkan should not limit itself to current architectures - Luckily, Vulkan already has a compute shader abstraction !
  3. 3. © The Khronos® Group Inc. 2020 - Page 3 This work is licensed under a Creative Commons Attribution 4.0 International License Training Data Trained Networks NNEF/ONNX Neural Network Training Inferencing Acceleration GPU Compiled Code Hardware Acceleration APIs Accélération Hardware (GPUs) Applications link to compiled inferencing code or dynamically generate it Networks trained on high-end desktop and cloud systems
  4. 4. © The Khronos® Group Inc. 2020 - Page 4 This work is licensed under a Creative Commons Attribution 4.0 International License Data type extensions • By default, the 32 bit floating precision is used for both training and inferencing, which are basically just running a computational graph: - Training runs a forward pass, and often times a backward pass to propagate back the gradient - Inferencing is just about doing the forward pass • But both can be done in lower precision types for faster compute time and reduced data storage - The following extensions are available in vulkan 1.2: - VK_KHR_shader_float16_int8 - VK_KHR_8bit_storage / VK_KHR_16bit_storage - 8 bit integers data types are used for quantized Neural Nets - FP16 data types can be used for faster math with gradient rescaling in training - Upcoming: SPV_KHR_integer_dot_product - Take advantage of fast integer math without implicit compiler peephole optimization for quantized neural networks
  5. 5. © The Khronos® Group Inc. 2020 - Page 5 This work is licensed under a Creative Commons Attribution 4.0 International License Improved Compute Shader • New extensions are devised to improve efficiency - Upcoming: - VK_KHR_workgroup_memory_explicit_layout - Allow more efficient data loading into shared memory for further use with efficient matrix multiplication operations. - VK_EXT_ML_primitives - Exposes basic primitives used in the main stream Neural Nets as optimized building blocks - Available: - VK_NV_cooperative_matrix (NVIDIA) - Exposes high throughput matrix/vector multiplication hardware units - Typically used be convolution / matmul layer in fp16 formats
  6. 6. © The Khronos® Group Inc. 2020 - Page 6 This work is licensed under a Creative Commons Attribution 4.0 International License Import Formats Caffe, Keras, MXNet, ONNX TensorFlow Graph, PyTorch, ONNX Front-end / IR NNVM / Relay IR XLA HLO Output SPIR-V SPIR-V / vulkan API Primary Machine Learning Compilers
  7. 7. © The Khronos® Group Inc. 2020 - Page 7 This work is licensed under a Creative Commons Attribution 4.0 International License Google MLIR and IREE Compilers MLIR Multi-level Intermediate Representation Format and library of compiler utilities that sits between the trained model representation and low-level compilers/executors that generate hardware-specific code IREE Intermediate Representation Execution Environment Lowers and optimizes ML models for real-time accelerated inferencing on mobile/edge heterogeneous hardware Contains scheduling logic to communicate data dependencies to low-level parallel pipelined hardware/APIs like Vulkan, and execution logic to encode dense computation in the form of hardware/API- specific binaries like SPIR-V IREE is a research project today. Google is working with Khronos working groups to explore how SPIR-V code can provide effective inferencing acceleration on APIs such as Vulkan Trained Models Generate Hardware Specific Binaries Optimizes and Lowers for Acceleration
  8. 8. © The Khronos® Group Inc. 2020 - Page 8 This work is licensed under a Creative Commons Attribution 4.0 International License Third Party frameworks Alibaba Tencent Ax inc. MNN NCNN Ailia https://github.com/alibaba/M NN https://github.com/Tencent/nc nn https://axinc.jp/en/ Lightweight frameworks for inferencing and training for mobile platforms Cross platform optimized inferencing framework SDK for optimized running of many popular neural networks on multiple platform
  9. 9. © The Khronos® Group Inc. 2020 - Page 9 This work is licensed under a Creative Commons Attribution 4.0 International License Non Neural Net Machine Learning Jülich Ethical ML UNITY VkFFT Kompute ML agents https://github.com/DTolm/VkF FT https://github.com/EthicalML/ vulkan-kompute https://github.com/Unity- Technologies/ml-agents Accelerated fast fourrier transform library, which can be used for image resampling, image registration, signal processing … General purpose GPU compute framework cross graphics card Open source project that enables simulation for training intelligent agents via the Unity engine
  10. 10. © The Khronos® Group Inc. 2020 - Page 10 This work is licensed under a Creative Commons Attribution 4.0 International License Call To Action • The Vulkan Machine Learning Subgroup is welcoming feedback: - What functionality would you like to see in Vulkan to accelerate your ML needs ? - Neural Nets - Other algorithms: - random forest - logistic regression - K-nearest neighbor - T-SNE - … - Do you have existing product using Vulkan for ML: - Do you want to get in touch with hardware vendors ? - Do you have suggestion on how to improve your pain points? • Contact us at: pboudier@nvidia.com • Help us and join the Vulkan Advisory Panel, or become a Khronos member

Editor's Notes

  • 1. Hi everyone – I am Neil Trevett - I work on developer ecosystems at NVIDA and I am President of the Khronos Group. I am going to give the latest updates on Khronos’ open standards for vision and inferencing acceleration.
  • 5. Here we see how Khronos standards can be used in a typical mobile or embedded application needing vision or inferencing acceleration. Neural network training typically occurs on high-powered desktops or servers at high precision, using large training data sets. The resulting trained network, encapsulated in an NNEF file, can be imported into the target system in two ways: either ingested into an inferencing engine such as OpenVX, or passed into a machine learning compiler to generate code that executes inferencing operations. Either the compiled code or the inferencing engine API is linked into the application, which may also use the SYCL compiler to generate additional parallel code if the application is written in C++. Code generated by all these methods can target the lower level OpenCL API to portably reach into diverse hardware processors.
     
    Let’s look at each of these Khronos standards in a little more detail – starting with NNEF.
  • 16. In addition to inferencing engines, there is active research into machine learning compilers from companies such as Amazon with TVM, Facebook with Glow and Google with XLA.

×