This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.