This document discusses parallelizing matrix multiplication in compiler design. It begins by defining matrix multiplication and generating random square matrices. It then describes the traditional sequential matrix multiplication algorithm and introduces doing the multiplication using parallel for loops. The key steps of an optimized parallel implementation are discussed, including putting common calculations in one place, using a cache-friendly algorithm, and efficiently using stack vs heap memory. The implemented program uses OpenMP to parallelize the nested for loops of the multiplication across multiple threads.