This document summarizes research on optimizing an explicit finite-difference scheme for fluid dynamics simulations to achieve high performance on many-core systems like the PEZY-SC2 processor. The researchers developed a code generation framework that uses temporal blocking to optimize for low memory bandwidth. On a PEZY-SC2 system with 16 million cores, they achieved 4.78 PFlops and 21.5% efficiency, comparable to other works on higher bandwidth machines. Temporal blocking reduced the required memory bandwidth and allowed good weak scaling to larger core counts.