NVIDIA’s Pascal GPUs provide developers a platform for both training and deploying neural networks. In deployment GPUs allow lower latencies or servicing large inference workloads with a smaller set of accelerated nodes. One advanced technique to optimize throughput is to leverage the Pascal GPU family’s reduced precision instructions. I’ll show how you can start with a network trained in FP32 and deploy that same network with 16 bit or even 8 bit weights and activations using TensorRT. I’ll talk in some detail about the mechanics of converting a neural network and what kinds of performance and accuracy we are seeing on image net style networks.
I’ll end with a quick overview of how developers can deploy these DL networks as micro services using the GPU REST Engine.
Thanks to Chris Gottbrath from the Nvidia TensorRT Team!!