This document discusses a parallel hardware implementation of convolution using Vedic mathematics on an FPGA. It aims to improve the speed of convolution by using 16 parallel 4x4 bit Vedic multipliers based on the Urdhva Tiryagbhyam algorithm and optimized adders. The design achieves a delay of 17.996 ns, significantly faster than prior work that used serial processing with one multiplier. Vedic mathematics provides an efficient multiplication approach to serve as the core computation and enable the parallel implementation for faster convolution. The design was coded in VHDL and synthesized on a Xilinx FPGA for verification.