This document summarizes several dense matrix algorithms for operations like matrix-vector multiplication, matrix-matrix multiplication, and solving systems of linear equations.
For matrix-vector multiplication, it describes 1D and 2D row-wise partitioning approaches. The 2D approach has lower parallel runtime of O(log n) but requires n2 processes. A modified 2D approach uses block partitioning and has parallel runtime of O(n/√p + log p) when using p < n2 processes.
For matrix-matrix multiplication, simple parallel and Canon's algorithms are described. Canon's algorithm has optimal O(n3) memory usage by rotating matrix blocks among processes. A DNS algorithm achieves optimal O(log n