Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Fast Algorithms for Quantized Convolutional Neural Networks
1. 1
Fast Algorithms for Quantized
Convolutional Neural Networks
Alessandro Pappalardo alessandro1.pappalardo@mail.polimi.it
NECSTLab, Politecnico di Milano
Oracle
06/07/2017
2. 2
Introduction
Convolutional neural
networks are at the
forefront of big data
processing.
Embedded devices and
smartphones are at the
heart of big data.
Image: Dumoulin, Vincent, and Francesco Visin. "A guide to convolution
arithmetic for deep learning." arXiv preprint arXiv:1603.07285 (2016).
4. 4
💽 ➡️ Quantized Convnets
• Precision of convolution reduced to b bits.
• Arrays of unsigned integers in [0, 2b - 1].
• Quantization scheme: map minimum to 0, maximum to 2b –
1.
• Memory savings.
5. 5
🖥🖥 ➡️ Question
Can we can take advantage of the pre-
determined finiteness of the possible
values assumed by the convolution
operands to gain computational savings
at an algorithmic level
6. 6
Solution
• Number Theoretic Transforms (NTTs) are DFTs defined
on a finite field.
• NTTs hold the Circular Convolution Property (CCP) and
can be computed through FFT-like fast algorithms.
Reason in terms of finite fields.
10. 10
Benchmarks setting
• Qconv: a C99 compliant and portable NTT
implementation against a naïve convolution
implementation.
• Platform of choice: Raspberry Pi Zero.
12. 12
Future works
• Implement SIMD subroutines for forward and inverse
transforms.
• Map element-wise products to finite field matrix
multiplication in FFLAS.
• FPGA hardware implementation.
• Optimize for 3x3 filters.
14. 14
Fermat Number Transforms
FNTs are NTTs mod 𝒑 = 𝟐 𝟐 𝒕
+ 𝟏.
1. Support FFT algorithms.
2. For a length N up to 2 𝑡+1, the forward and inverse
transforms requires only modular adds and shifts.
3. Reduction mod a Fermat prime p:
logical AND, right unsigned shift, modular subtraction
15. 15
Methodology
FNT m 𝐹4 = 65537 with
blocks:
• 8x8
• 16x16
• 32x32
RNS of 𝐹3 and 𝐹4 to increase the maximum output bitwidth to:
𝑙𝑜𝑔2 𝐹3 ∙ 𝐹4 ≅ 24 bits
Overlap-and-save algorithm
FNT mod 𝐹3 = 257 with blocks:
• 8x8
• 16x16
16. 16
Optimizations
• Forward DIF FFT, inverse DIT transforms FFT.
• Precomputed power-of-two twiddle factors .
• Switch the order of the two inner-most loop of the FFT.
• Avoid useless transpositions.
• Normalize the non discarded output only.