This document describes a hardware acceleration approach for N-body simulations using an FPGA. It presents a semi-dataflow architecture and tiling technique to efficiently compute pairwise forces with reduced data transfers. Evaluation shows the FPGA implementation achieves 4400 million particle-pairs per second, outperforming a CPU and achieving high performance per watt compared to other platforms. Future work involves connecting to the host via PCIe and further optimizing performance per watt.