The document proposes a parallel-parallel input single output (PPI-SO) design for matrix multiplication that reduces hardware resources compared to existing designs. It uses fewer multipliers and registers than existing designs, trading off increased completion time. Simulation results show the PPI-SO design uses 30% less energy and involves 70% less area-delay product than other designs.