This paper presents an implementation of the discrete cosine transform (DCT) in an 8x8 block format using NVIDIA CUDA technology, showcasing both sequential and parallel processing approaches. The authors demonstrate that GPU acceleration significantly enhances performance compared to CPU implementations, particularly for data-intensive applications in image and video coding. The findings indicate that GPU-optimized DCT can effectively address real-time digital signal processing needs while maintaining the quality of results.