This document presents a methodology for accelerating convolutional neural networks (CNNs) on FPGAs using a dataflow approach. The key aspects of the methodology are:
1. Exploiting the dataflow pattern of CNN operations using independent modules with parametric levels of parallelism.
2. A streaming and dataflow computational paradigm with efficient memory access and full buffering to improve performance and scalability.
3. Modular implementations of convolution and fully-connected modules along with a network design approach, resulting in improved memory bandwidth utilization and high scalability given limited FPGA resources.