Introduction to IEEE STANDARDS and its different types.pptx
Fast Algorithms for Quantized Convolutional Neural Networks
1. 1
Fast Algorithms for Quantized
Convolutional Neural Networks
Alessandro Pappalardo alessandro1.pappalardo@mail.polimi.it
NECSTLab, Politecnico di Milano
Lawrence Berkeley National Laboratory
06/05/2017
2. 2
Introduction
Convolutional neural
networks are at the forefront
of big data processing.
Embedded devices and
smartphones are at the
heart of big data.
Image: Dumoulin, Vincent, and Francesco Visin. "A guide to convolution
arithmetic for deep learning." arXiv preprint arXiv:1603.07285 (2016).
4. 4
💽 ➡️ Quantized Convnets
• Precision of convolution reduced to b bits.
• Arrays of unsigned integers in [0, 2b - 1].
• Quantization scheme: map minimum to 0, maximum to 2b – 1.
• Memory savings.
5. 5
🖥️ ➡️ Question
Can we can take advantage of the pre-
determined finiteness of the possible values
assumed by the convolution operands to gain
computational savings at an algorithmic level
6. 6
Solution
Number Theoretic Transforms (NTTs) are• DFTs defined on a
finite field.
NTTs hold the• Circular Convolution Property (CCP) and can be
computed through FFT-like fast algorithms.
Reason in terms of finite fields.
10. 10
Benchmarks setting
Qconv• : a C99 compliant and portable NTT implementation
against a naïve convolution implementation.
Platform of choice:• Raspberry Pi Zero.
14. 14
Fermat Number Transforms
FNTs are NTTs mod 𝒑 = 𝟐 𝟐 𝒕
+ 𝟏.
1. Support FFT algorithms.
2. For a length N up to 2 𝑡+1, the forward and inverse
transforms requires only modular adds and shifts.
3. Reduction mod a Fermat prime p:
logical AND, right unsigned shift, modular subtraction
15. 15
Methodology
FNT m 𝐹4 = 65537 with blocks:
• 8x8
• 16x16
• 32x32
RNS of 𝐹3 and 𝐹4 to increase the maximum output bitwidth to:
𝑙𝑜𝑔2 𝐹3 ∙ 𝐹4 ≅ 24 bits
Overlap-and-save algorithm
FNT mod 𝐹3 = 257 with blocks:
• 8x8
16• x16
16. 16
Optimizations
Forward DIF FFT, inverse DIT transforms FFT.•
Precomputed power• -of-two twiddle factors .
Switch the order of the two inner• -most loop of the FFT.
Avoid useless transpositions.•
Normalize the non discarded output only.•