Deep Learning with FPGA
Drive towards dedicated Hardware for Efficient Learning
Ayush Singh
College of Computer and Information Sciences
Northeastern University
Intro
Deep Learning is an evolutionary machine learning technique
Deep Learning requires a lot of computations for acceptable accuracy
Modern models are highly complex ( 11.2B connections and 5M params )
Traditionally, industry used the processing power of CPU infrastructure
Enter GPU running same code and event horizon for ASICs and FPGAs
Evolution: CPU > GPU > FPGA <=> ASIC
As data and throughput demands increased, started looking for alternative
GPU became heros, good with parallel computations and use same code
GPU are power hogs, have low precision and high cache miss rate(TLB, IRO)
Deep Learning Models maturing and categorically became network specific
Drive to get faster results on embedded devices with limited resources
Field Programmable Gate Arrays
Hardware implementation of Algorithms which is always faster
Latency exponentially lower (nanoseconds vs microseconds GPU)
Orders of magnitude lower power consumption ( 20W FPGA vs 200W GPU )
Lower clock cycles, (500 Mhz vs 1348 Mhz)
Programmed conventionally using (HDL, Verilog vs C++ CUDA)
Current State
Successful demonstration of throughput and efficiency on custom chips
Gaining traction in Industry ( baidu, ms, google, etc)
Fraction research vs GPU
Beats GPU processing by at least twice the time ( 20TFLOPS vs 55TFLOPS)
Energy consumption 50-80x and throughput 20-40x better than CPU
Limitations
● Longer development time compared to GPU (1 month vs 1 day)
● Limited Block and Dynamic RAM
● Not cheap to manufacture if production volume is low
● Lack talent to code domain specific, imagine Hardware Eng expert in CNN
● Speedup bound by the using fixed instead of floating point precision
● Low bandwidth compared to GPU (780 Gb/s vs 20Gb/s)
● Porting source code is a pain for iteratively new chips
Future
● Brings power of Deep Learning to embedded systems and compute farms
● Flexibility for creative applications at chip, server and warehouse level
● Opens doors to research in compressing and optimizing ML Techniques
● Eventual transition to ASIC as was the case with Bitcoin Era
● Emergence of new development platforms e.g. OpenCL, DeepCL, vs
DeepCompute, CUDA
● Hybrid architecture: GPU as main while FPGA as auxiliary, CPU to correct
Companies
A lot of recent startups took upon
themselves to address this gap
● DeepPhi - FPGA
● Microsoft - FPGA
● Falcon Computing - FPGA
● Nervana - ASIC
● Wave Computing - ASIC
● Cognimem - ASIC
● Xilinx - Supplier
● Kintex - Supplier
● NVIDIA - GPU King
● INTEL with Nervana and Altera
● IBM with Xilinx
● Baidu with Nervana
References
https://www.nextplatform.com/2016/08/23/fpga-based-deep-learning-accelerators-take-asics/
https://www.nextplatform.com/2016/08/08/deep-learning-chip-upstart-set-take-gpus-task/
http://cadlab.cs.ucla.edu/~cong/slides/HALO15_keynote.pdf
http://on-demand.gputechconf.com/supercomputing/2014/presentation/SC424-deep-learning-gpu-clusters.pdf
https://arxiv.org/abs/1602.04283
https://www.nextplatform.com/2016/02/29/broader-paths-etched-into-fpga-datacenter-roadmap/
https://arxiv.org/pdf/1504.04788.pdf
https://arxiv.org/abs/1510.00149
Thank You

Deep learning with FPGA