The document describes a modified architecture for Montgomery modular multiplication that uses a modified carry save adder (MCSA) to improve performance. The MCSA allows for faster pre-computation and format conversion steps in the Montgomery algorithm compared to using a configurable carry save adder. The proposed SCS-MCSA based Montgomery modular multiplier was implemented on a Xilinx FPGA and achieved lower hardware cost and shorter critical path delay than previous designs. Simulation results showed it requires fewer logic resources and completes a modular multiplication in 8.203 nanoseconds.