This document summarizes an audio enhancement project that aims to remove a specific unwanted noise from an audio recording using computer vision techniques. It proposes treating the spectrogram of an audio signal as an image and applying object detection algorithms to identify and remove unwanted noises. The algorithm works by having the user mimic the noise to generate a noise template. It then scans the spectrogram, extracts HOG features from patches, and identifies patches similar to the noise template. The noise regions identified can then be removed from the spectrogram to synthesize a cleaned audio signal. Key steps include generating the spectrogram, extracting the noise template, scanning and classifying patches, and synthesizing the cleaned audio.