The field of deep learning has experienced a remarkable improvement in last decade. In the varying set of problems, deep learning models have achieved and have surpassed human performance. However, success comes up with tradeoffs. In order to achieve “superhuman” performances, deep learning models need powerful hardware and vast amount of memory. That’s why, majority of the deep learning models are mobilized over relatively big computing centers. At the end of 2010s, deep learning architectures have started being mobilized on embedded devices, edge devices and mobile phones. Since then, successful mobile architectures have been proposed. These architectures have increased inference speed or memory footprint significantly compared with state-of-the-art models. In this presentation, we are going to compare subsequent optimization modules with naïve implementations with respect to predefined metrics.
2. OVERVIEW
➢ Limited Resource Environments
➢ Training Improvements
➢ Self-Adversarial Training
➢ Arcihtectural Improvements
➢ Model Quantization
➢ Depthwise Separable Convolutions
➢ References
3. LIMITED RESOURCE ENVIRONMENTS
➢ In actual, the supply of a resource is always limited at any point of time.
➢ Virtually unlimited resources: On-demand extensions are available.Training
environments mostly have virtually unlimited resources. ( e.g. data centers,
cloud services )
➢ Limited resources: Not extendable. ( e.g. Perseverance (Mars Rover),
embedded devices, mobile phones )
5. FIRE DETECTION DATASET
• It is a benchmark dataset for model experiments.
• In the following months, it is going to be released public.
• 4200 training images , 672 validation images
7. SELF-ADVERSARIAL TRAINING
• By adding small but intentional worst-case perturbations, perturbed input
results in the model outputting an incorrect answer with high confidence.[1]
• Even though deep learning models have a complex non-linear computational
graph, they can be deceived by simple linear method which is called Fast
Gradient Sign Method.
• In our experiments, we’ve used Fast Gradient Sign Method.
10. CONCLUSION
• FGSM is a valid data augmentation strategy. It has improved performance with
considerably small training time drawback.
• One advantage of FGSM is its perturbation vector strictly depends on current
state of the trained model. It is a self-evolving data augmentation strategy.
12. MODEL QUANTIZATION
• Quantization converts a real value to an integer value. Reverse of this process
is called Dequantization.
• In general, quantization converts from 32-bit floating point to 1-byte which is
x4 memory saving!
Typical Quantization Schema[2]
S is called scale, Z is called zero-point. Together, they
define an affine transformation between real values
and integer values.
14. QUANTIZATION AWARE TRAINING
• This technique readjusts floating point weights to the nearest quantization level
after every training step in the given quantization interval [a,b].
Quantization
Step
Quantization
Level
Clamp function translates input domain to quantization
interval.
17. CONCLUSION
• In single batch inference, quantized inference outperforms approximately
doubles up in speed. However, in general performance, it seems that there are
inconsistencies on inference.
• When the results are compared with respect to inference, still, standard FP-32
inference has better results. It has higher average inference time but less
deviation.
20. USE CASE
• Unsupervised anomaly detection on real time streams requires continuous
training of deep learning model.
• To increase inference speed and memory, we’ve proposed using depthwise
separable convolutions.
• A hourglass model is trained with normal video frames and then tested with
anormal video frames.
22. RESULTS
Naïve convolution – 537K parameters
Average InferenceTime: 0.106s
DS convolution – 93.8K parameters
Average InferenceTime: 0.144s
23. CONCLUSION
• While replacing naïve convolutions with depthwise separable convolutions, 2
extra layers has been added.That’s why inference speed may have reduced
even there are less parameters in DS Convolution.
• Real-time anomaly detection with self-trained models are still active research
field.
25. REFERENCES
1. Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy.
"Explaining and harnessing adversarial examples." arXiv preprint
arXiv:1412.6572 (2014).
2. Jacob, Benoit, et al. "Quantization and training of neural networks for
efficient integer-arithmetic-only inference." Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition. 2018.
3. Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural
networks for mobile vision applications." arXiv preprint
arXiv:1704.04861 (2017).