This document discusses refactoring the RAMSES code for performance optimization on CPUs and GPUs. The goals are to recognize GPU-friendly parts of the code, minimize data transfer between the CPU and GPU, and implement GPU-to-GPU communication. The communication infrastructure was redesigned to use collective communication routines instead of point-to-point for better GPU support. The Poisson solver was ported to the GPU, replacing communicators with regular arrays for improved data locality. Initial results show the GPU version can be over 1.5 times faster than 8 CPU cores. Ongoing work includes optimizing data transfer and porting non-GPU parts to OpenMP.