High Performance Deep Learning on Edge Devices With Apache MXNet:
Deep network based models are marked by an asymmetry between the large amount of compute power needed to train a model, and the relatively small amount of compute power needed to deploy a trained model for inference. This is particularly true in computer vision tasks such as object detection or image classification, where millions of labeled images and large numbers of GPUs are needed to produce an accurate model that can be deployed for inference on low powered devices with a single CPU. The challenge when deploying vision models on these low powered devices though, is getting inference to run efficiently enough to allow for near real time processing of a video stream. Fortunately Apache MXNet provides the tools to solve this issues, allowing users to create highly performant models with tools like separable convolutions, quantized weights and sparsity exploitation as well as providing custom hardware kernels to ensure inference calculations are accelerated to the maximum amount allowed by the hardware the model is being deployed on. This is demonstrated though a state of the art MXNet based vision network running in near real time on a low powered Raspberry Pi device. We finally discuss how running inference at the edge as well as leveraging MXNet’s efficient modeling tools can be used to massively drive down compute costs for deploying deep networks in a production system at scale.