This document proposes two new methods for implementing 2D convolution for convolutional neural networks (CNNs) in an energy efficient manner on FPGAs.
Method 1 uses a FIFO-based approach to reduce data transfer by reading the input feature map only once. It implements the convolution using processing elements that store partial sums in FIFOs to accumulate the outputs.
Method 2 called single partial product 2D convolution groups input pixels into complementary sets to keep all multipliers occupied in parallel, reducing computation time. It analyzes patterns in how frequently different input pixels are used to optimize the input stream. Both methods aim to improve efficiency over existing CNN implementations on FPGAs.