This document describes a low power pipelined FFT processor architecture based on the Radix-4 single delay commutator (R4SDC) algorithm. It implements and compares 16, 64, and 256-point FFT architectures using conventional R4SDC, a complex multiplier, and a multiplier-less architecture based on common subexpression technique. For the 16-point FFT, an ordered R4SDC architecture is proposed that reorders coefficients and inputs to minimize switching activity and reduce power consumption compared to the conventional design. Simulation results show the area and power requirements of the different architectural implementations.