This document discusses parallelizing matrix multiplication using OpenMP. It first describes traditional sequential matrix multiplication, then introduces parallelizing the loops with OpenMP pragmas. It further optimizes the implementation by caching matrices in stack memory for more efficient parallel access from threads, and by dividing the workload evenly among threads. Measurements show the parallel implementation has significantly better performance than sequential multiplication.