This document discusses techniques for optimizing parallel programs for locality. It covers several key topics:
- Symmetric multiprocessors are a popular parallel architecture where all processors can access shared memory uniformly. Cache coherence protocols allow caches to share data while reading.
- Parallel programs must have good data locality to perform well. Techniques like grouping related operations on the same processor and executing code in succession on processors can improve locality.
- Loop-level parallelism is a common target for parallelization. Dividing loop iterations evenly across processors can achieve good parallelism if iterations are independent.