The document provides an introduction to GPU programming using CUDA. It outlines GPU and CPU architectures, the CUDA programming model involving threads, blocks and grids, and CUDA C language extensions. It also discusses compilation with NVCC, memory hierarchies, profiling code with Valgrind/Callgrind, and Amdahl's law in the context of parallelization. A simple CUDA program example is provided to demonstrate basic concepts like kernel launches and data transfers between host and device memory.