9. Thanks for your attention
Questions?
marco.bacis@mail.polimi.it
stefania.deligia@mail.polimi.it
serena.farina@mail.polimi.it
Please, follow us:
https://twitter.com/nneuramaatnecst
https://www.facebook.com/pg/NNEURaMAatNECST/
https://www.slideshare.net/NNEURaMAProject
XOHW17_NNEURaMA_public
Editor's Notes
Hello from NNEURaMA project. In today’s video we are going to show the reasons that have brought us towards the use of an FPGA chip in our project.
First, let’s have a brief recap of our project : nneurama aims at recognizing the presence of microaneurysms in retinal images.
In order to perform the classification, we have chosen to use Convolutional Neural Networks, as they are the best kind of algorithm suited for images recognition and classification.
In particular, a CNN is composed of different layers which extract different features from a picture and combine them. All these operations are carried over thanks to a set of trained weights.
The most compute intensive operations of CNNs are convolutions and pooling, so we have chosen our architecture based on their acceleration.
FPGAs are not the only type of architecture available to implement our application. Standars CPUs, GPUs and also Application specific integrated circuits are available too. We are now going to illustrate what are the characteristics that have brought us far from these competitors.
The main operation in a CNN is MAC (Multiply and Accumulate). Usually, a CPU performs this operations in a sequential manner, which leads to low performances in case of a high number of operations, such as in a CNN. With a custom hardware design a higher parallelization is possible, meaning that many operations can be performed in parallel, given their independence. This is the case of convolution, as the MAC operations on the same layer depend only on the previous layer, and not between themselves. In this way, a better use of the hardware is possible, with higher performances.
Given that a custom hardware design seems to be the best option to implement our algorithm, we are faced with the choice of implementing it on an fpga, or design a custom digital circuit (called ASIC: Application specific integrated circuit). ASICs allow to have the best performances available given a specific application to accelerate, but they present high costs and a fixed design. In the case of Neural networks (and CNNs) the parameters and topology of the network are improved iteratively, meaning that the design is continuosly updated. The network is changed even over a long period of time, because of new and better methods and techniques. This prevent us to choose a fixed circuit design for our application, as the performances don’t justify the high cost and low adaptivity. Meanwhile, FPGAs show a better tradeoff between costs, performances and flexibility, thanks to their reconfigurability.
Apart from video processing, GPUs can be used to accelerate algorithms by using a large set of parallel processors. Unfortunately, they require a high power to work, and even in large datacenters power consumption has become an issue, in addition to heat dissipation.
On the other hand, FPGAs have a lower power consumption and can reach the same performances of GPUs with given conditions.
Based on the previous explanations we can draw our conclusions… CPUs don’t have the needed performance in term of throughput, ASICs are performant but not reconfigurable and GPUs are too power hungry…
This has lead us to choose FPGAs for the implementation of our CNN-based algorithm.
Thanks for watching! Like and share this video. You can find us also on facebook and twitter, If you have any question feel free to contact us!