This document discusses the implementation of an optimal step size normalized least mean square (NLMS) adaptive filtering algorithm on an FPGA. It begins with background on LMS and NLMS algorithms. It then derives the optimal step size NLMS algorithm and discusses its theoretical performance advantages over fixed step size LMS. The paper presents the FPGA implementation of the optimal step size NLMS algorithm for audio noise cancellation. Simulation results show the proposed algorithm has faster convergence, lower error rates, and superior noise cancellation performance compared to LMS and variable step size LMS algorithms. Area utilization and maximum operating frequency are also compared for the different implementations on an Altera Cyclone III FPGA.