This paper proposes an efficient Montgomery multiplication algorithm for modular multiplication that uses a single-level carry-save adder to avoid carry propagation and reduce hardware costs. A configurable carry-save adder is also introduced to decrease the extra clock cycles needed for operand precomputation and format conversion by half. Additionally, a mechanism is developed to detect and skip unnecessary carry-save addition operations to maintain short critical path delays while hiding the extra clock cycles. Experimental results showed that the proposed Montgomery modular multiplier achieves higher performance and lower area-time costs compared to previous designs.