CPU architectures are good for serial programs but have slower memory access and thread switching than GPUs. GPU architectures devote most transistors to processing elements for parallel computation on large data sets using shared memory. CUDA provides C extensions for programming GPUs for general purpose processing using a thread hierarchy of blocks and grids. Matrix addition can be accelerated on a GPU using CUDA by assigning each thread to calculate one element. Results show GPU acceleration provides speedups over CPUs for tasks in radio astronomy imaging pipelines.