The document discusses optimization techniques used in an auto-tuning framework for parallel multicore stencil computations. It describes loop unrolling, cache blocking, and arithmetic simplification implemented as AST transformations. Cache blocking exposes temporal locality and increases cache reuse by organizing data into blocks that fit in cache. The framework applies these serial optimizations and additional parallelization strategies by modifying the AST to reflect the chosen parallelization before code generation.