SlideShare a Scribd company logo
1 of 38
Download to read offline
Inference
Accelerators
CELLSTRAT AI LAB
DARSHAN C G
DNN Pipeline Overview
5 critical factors that are used to measure APIs for Inference:
1.Throughput:
The volume of output within a given period. Often measured in inferences/second or samples/second.
2.Efficiency: Amount of throughput delivered per unit-power, often expressed as performance/watt.
3.Latency: Time to execute an inference, usually measured in milliseconds. Low latency is critical to
delivering rapidly growing, real-time inference-based services.
4.Accuracy: A trained neural network’s ability to deliver the correct answer.
5.Memory usage: The host and device memory that need to be reserved to do inference on a network
depends on the algorithms used. This constrains what networks and what combinations of networks can
run on a given inference platform. This is particularly important for systems where multiple networks are
needed and memory resources are limited.
NVIDIA TENSORRT
Programmable Inference Accelerator
FRAMWORKS GPU Platforms
https://developer.nvidia.com/tensorrt
TENSORRT
Performance
https://developer.nvidia.com/tensorrt
https://developer.nvidia.com/tensorrt
TENSORRT DEPLOYMENT WORKFLOW
Faster TensorFlow Inference and Volta Support
TENSORRT OPTIMIZATION
➢ Optimizations are completely
automatic.
➢ Performed with a single function call.
LAYER & TENSOR FUSION
Unoptimized Network Optimized Network
https://developer.nvidia.com/tensorrt
1. Vertical Fusion.
2. Horizontal Fusion.
3. Layer Elimination.
Network Layers Before Layers After
VGG-19 43 27
Incepton V3 309 113
ResNet-152 670 159
Note: All results the regarding Performance is taken Official Tensorrt official Blog
Precision calibration
Automatic weights calibration is
performed, if INT8 is used.
Performance boost with
reduced precision
https://developer.nvidia.com/tensorrt
Dynamic Tensor Memory
Kernel Auto-Tuning
• Reduces memory footprint
and improves memory re-use.
• Manages memory allocation
for each tensor only for the
duration of its usage.
Tesla V100 Jetson TX2 Drive PX2
KERNEL AUTO-TUNING and DYNAMIC TENSOR MEMORY
TensorRT will pick the implementation from a library
of kernels that delivers the best performance for the
target GPU, input data size, filter size, tensor layout,
batch size and other parameters.
EXAMPLE: DEPLOYING TENSORFLOW MODELS WITH
TENSORRT
Trained Neural
Network
TensorRT Optimizer
Inference Results
Optimized Runtime
Engine
Import, optimize and deploy
TensorFlow models using
TensorRT python API
1.Start with a frozen TensorFlow
model
2.Create a model parser
3.Optimize model and create a
runtime engine
4.Perform inference using the
optimized runtime engine
Img Source: https://www.tensorflow.org/lite/convert/index?hl=RU
Features
1. Light Weight.
2. Low-Latency.
3. Privacy.
4. Improves Power Consumption.
5. Efficient Model Format.
6. Pre- trained models.
Converter Interpreter
Transforms TF models into a form
efficient for reading by Interpreter.
Diverse Platform Support(Android, ios,
Embedded Linux and Microcontrollers).
Introduces optimizations to improve
binary model Performance and/or
reduce model Size.
Platform APIs for accelerated Inference.
Components of Tflite
Acceleration Available
Software NNAPI
(also as a delegate)
Hardware EDGE TPU
GPU
CPU Optimizations
Performance
->GPU Delegate and NNAPI.
-> Not all operations are supported by GPU backend.
Optimization Techniques
1. Quantization.
2. Weight Pruning.
3. Model Topology Transforms.
Why Quantization:
1. All Available CPU platforms are supported.
2. Reducing Latency and inference Cost.
3.Low Memory Footprint.
4. Allow execution on fixed point operations.
5. Optimised models for Special Purpose HW
Accelerators(TPU).
Tensorflow->Saved model-> TFLite Converter-> TFLite Model
Workflow
Saved Models, Frozen Graphs, HDF5 models, tf.session(Python API Only)
https://www.tensorflow.org/lite/convert/index?hl=RU
Parameters for Conversion
Saved Model -> Tf.lite.TFLiteConverter.from_saved_model(…)
Keras Model-> Tf.lite.TFLiteConverter.from_keras_model(…)
Concrete Functions-> Tf.lite.TFLiteConverter.from_from_concrete_functions(…)
Output of the Conversion is Tflite Model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
import tensorflow as tf
# Create a simple Keras model.
x = [-1, 0, 1, 2, 3, 4]
y = [-3, -1, 1, 3, 5, 7]
model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')
model.fit(x, y, epochs=50)
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
import numpy as np
import tensorflow as tf
# Load the MobileNet tf.keras model.
model = tf.keras.applications.MobileNetV2(
weights="imagenet", input_shape=(224, 224, 3))
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
End-to-end MobileNet conversion
# Test the TensorFlow Lite model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
tflite_results = interpreter.get_tensor(output_details[0]['index'])
# Test the TensorFlow model on random input data.
tf_results = model(tf.constant(input_data))
# Compare the result.
for tf_result, tflite_result in zip(tf_results, tflite_results):
np.testing.assert_almost_equal(tf_result, tflite_result, decimal=5)
#Saving from a SavedModel
tflite_convert --output_file=model.tflite --saved_model_dir=/tmp/saved_model
Command Line
#Saving from a Keras Model
tflite_convert --output_file=model.tflite --Keras_model_file=saved_model.h5
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
Quantization
DEFAULT: OPTIMIZE_FOR_SPEED/SIZE
Inference at Edge
Factors driving this trend:
1. Smaller, faster models.
2. on Device Accelerators.
3. Demands move ML Capability from loud to On-device.
4. Smart device Growth demands bring ML to Edge
Benefits of On-deice ML
1. High Performance.
2. Local Data Acceseibility.
3. Better Privacy.
4.Works Offline.
Coral
Mendel OS.
Edge TPU Compile.
Mendel Development Tool
https://www.seeedstudio.com/Coral-Dev-Board-p-2900.html
Raspberry Pi
Small Sized, Low cost,
Just Like a Computer,
Raspbian OS.
https://robu.in/product/raspberry-pi-4-model-b-with-4-gb-ram/
Options
1. Compile Tensorflow from Source.
2. Install Tensorflow from Pip.
3. Use Interpreter Directly.
CODE DEMO

More Related Content

What's hot

Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
1 introduction to dsp processor 20140919
1 introduction to dsp processor 201409191 introduction to dsp processor 20140919
1 introduction to dsp processor 20140919Hans Kuo
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAShien-Chun Luo
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...AMD Developer Central
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtplYan Drugalya
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Yan Drugalya
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelAlexey Smirnov
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 

What's hot (18)

Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
1 introduction to dsp processor 20140919
1 introduction to dsp processor 201409191 introduction to dsp processor 20140919
1 introduction to dsp processor 20140919
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
 
Gv2512441247
Gv2512441247Gv2512441247
Gv2512441247
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into Kernel
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 

Similar to Inference accelerators

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneBring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneMeetupDataScienceRoma
 
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...ijceronline
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...William Nadolski
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in ProductionMatthias Feys
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...Codemotion
 
IEC 60870-5 101 Protocol Server Simulator User manual
IEC 60870-5 101 Protocol Server Simulator User manualIEC 60870-5 101 Protocol Server Simulator User manual
IEC 60870-5 101 Protocol Server Simulator User manualFreyrSCADA Embedded Solution
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceDESMOND YUEN
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Databricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 

Similar to Inference accelerators (20)

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone ScardapaneBring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone Scardapane
 
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
Simone Scardapane - Bring your neural networks to the browser with TF.js! - C...
 
Keras and TensorFlow
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlow
 
IEC 60870-5 101 Protocol Server Simulator User manual
IEC 60870-5 101 Protocol Server Simulator User manualIEC 60870-5 101 Protocol Server Simulator User manual
IEC 60870-5 101 Protocol Server Simulator User manual
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
H344250
H344250H344250
H344250
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performance
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 

Inference accelerators

  • 3. 5 critical factors that are used to measure APIs for Inference: 1.Throughput: The volume of output within a given period. Often measured in inferences/second or samples/second. 2.Efficiency: Amount of throughput delivered per unit-power, often expressed as performance/watt. 3.Latency: Time to execute an inference, usually measured in milliseconds. Low latency is critical to delivering rapidly growing, real-time inference-based services. 4.Accuracy: A trained neural network’s ability to deliver the correct answer. 5.Memory usage: The host and device memory that need to be reserved to do inference on a network depends on the algorithms used. This constrains what networks and what combinations of networks can run on a given inference platform. This is particularly important for systems where multiple networks are needed and memory resources are limited.
  • 4. NVIDIA TENSORRT Programmable Inference Accelerator FRAMWORKS GPU Platforms https://developer.nvidia.com/tensorrt
  • 5.
  • 9. Faster TensorFlow Inference and Volta Support
  • 10. TENSORRT OPTIMIZATION ➢ Optimizations are completely automatic. ➢ Performed with a single function call.
  • 11. LAYER & TENSOR FUSION Unoptimized Network Optimized Network https://developer.nvidia.com/tensorrt
  • 12. 1. Vertical Fusion. 2. Horizontal Fusion. 3. Layer Elimination. Network Layers Before Layers After VGG-19 43 27 Incepton V3 309 113 ResNet-152 670 159 Note: All results the regarding Performance is taken Official Tensorrt official Blog
  • 13.
  • 14. Precision calibration Automatic weights calibration is performed, if INT8 is used. Performance boost with reduced precision https://developer.nvidia.com/tensorrt
  • 15. Dynamic Tensor Memory Kernel Auto-Tuning • Reduces memory footprint and improves memory re-use. • Manages memory allocation for each tensor only for the duration of its usage. Tesla V100 Jetson TX2 Drive PX2 KERNEL AUTO-TUNING and DYNAMIC TENSOR MEMORY TensorRT will pick the implementation from a library of kernels that delivers the best performance for the target GPU, input data size, filter size, tensor layout, batch size and other parameters.
  • 16.
  • 17. EXAMPLE: DEPLOYING TENSORFLOW MODELS WITH TENSORRT Trained Neural Network TensorRT Optimizer Inference Results Optimized Runtime Engine Import, optimize and deploy TensorFlow models using TensorRT python API 1.Start with a frozen TensorFlow model 2.Create a model parser 3.Optimize model and create a runtime engine 4.Perform inference using the optimized runtime engine
  • 18.
  • 19.
  • 21. Features 1. Light Weight. 2. Low-Latency. 3. Privacy. 4. Improves Power Consumption. 5. Efficient Model Format. 6. Pre- trained models.
  • 22. Converter Interpreter Transforms TF models into a form efficient for reading by Interpreter. Diverse Platform Support(Android, ios, Embedded Linux and Microcontrollers). Introduces optimizations to improve binary model Performance and/or reduce model Size. Platform APIs for accelerated Inference. Components of Tflite
  • 23.
  • 24. Acceleration Available Software NNAPI (also as a delegate) Hardware EDGE TPU GPU CPU Optimizations Performance ->GPU Delegate and NNAPI. -> Not all operations are supported by GPU backend.
  • 25. Optimization Techniques 1. Quantization. 2. Weight Pruning. 3. Model Topology Transforms. Why Quantization: 1. All Available CPU platforms are supported. 2. Reducing Latency and inference Cost. 3.Low Memory Footprint. 4. Allow execution on fixed point operations. 5. Optimised models for Special Purpose HW Accelerators(TPU).
  • 26. Tensorflow->Saved model-> TFLite Converter-> TFLite Model
  • 27. Workflow Saved Models, Frozen Graphs, HDF5 models, tf.session(Python API Only) https://www.tensorflow.org/lite/convert/index?hl=RU
  • 28. Parameters for Conversion Saved Model -> Tf.lite.TFLiteConverter.from_saved_model(…) Keras Model-> Tf.lite.TFLiteConverter.from_keras_model(…) Concrete Functions-> Tf.lite.TFLiteConverter.from_from_concrete_functions(…) Output of the Conversion is Tflite Model converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert()
  • 29. import tensorflow as tf # Create a simple Keras model. x = [-1, 0, 1, 2, 3, 4] y = [-3, -1, 1, 3, 5, 7] model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])]) model.compile(optimizer='sgd',loss='mean_squared_error') model.fit(x, y, epochs=50) # Convert the model. converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert()
  • 30. import numpy as np import tensorflow as tf # Load the MobileNet tf.keras model. model = tf.keras.applications.MobileNetV2( weights="imagenet", input_shape=(224, 224, 3)) # Convert the model. converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert() # Load TFLite model and allocate tensors. interpreter = tf.lite.Interpreter(model_content=tflite_model) interpreter.allocate_tensors() # Get input and output tensors. input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() End-to-end MobileNet conversion
  • 31. # Test the TensorFlow Lite model on random input data. input_shape = input_details[0]['shape'] input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32) interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() # The function `get_tensor()` returns a copy of the tensor data. # Use `tensor()` in order to get a pointer to the tensor. tflite_results = interpreter.get_tensor(output_details[0]['index']) # Test the TensorFlow model on random input data. tf_results = model(tf.constant(input_data)) # Compare the result. for tf_result, tflite_result in zip(tf_results, tflite_results): np.testing.assert_almost_equal(tf_result, tflite_result, decimal=5)
  • 32. #Saving from a SavedModel tflite_convert --output_file=model.tflite --saved_model_dir=/tmp/saved_model Command Line #Saving from a Keras Model tflite_convert --output_file=model.tflite --Keras_model_file=saved_model.h5
  • 33. import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_quant_model = converter.convert() Quantization DEFAULT: OPTIMIZE_FOR_SPEED/SIZE
  • 34. Inference at Edge Factors driving this trend: 1. Smaller, faster models. 2. on Device Accelerators. 3. Demands move ML Capability from loud to On-device. 4. Smart device Growth demands bring ML to Edge Benefits of On-deice ML 1. High Performance. 2. Local Data Acceseibility. 3. Better Privacy. 4.Works Offline.
  • 35. Coral Mendel OS. Edge TPU Compile. Mendel Development Tool https://www.seeedstudio.com/Coral-Dev-Board-p-2900.html
  • 36. Raspberry Pi Small Sized, Low cost, Just Like a Computer, Raspbian OS. https://robu.in/product/raspberry-pi-4-model-b-with-4-gb-ram/
  • 37. Options 1. Compile Tensorflow from Source. 2. Install Tensorflow from Pip. 3. Use Interpreter Directly.