The document reports on a term project to design and implement serial and parallel solutions for an image processing algorithm. Experiments were conducted on 16x16 and 256x256 pixel images using C++ and CUDA. The CPU parallel solution was nearly 3 times faster than the GPU solution due to significant data transfer times between the CPU and GPU. While GPUs can provide massive parallelism, data transfers negated performance gains for this algorithm.