HybridMPI (HMPI) is an MPI implementation optimized for shared memory communication on multi-core systems. HMPI uses a process-based approach instead of thread-based to avoid locking overhead. It allocates shared memory using a heap allocator to eliminate data copying between processes on the same node. For small messages, HMPI uses an immediate transfer protocol to prevent cache misses, while for large messages it uses a synergistic transfer protocol for high bandwidth. Evaluation shows HMPI causes fewer cache misses and outperforms traditional MPI in applications like MiniMD and Lulesh.