This document summarizes a parallel GPU implementation of the traveling salesman problem (TSP). The TSP algorithm used is an iterative hill climbing search that generates random initial tours and refines them through opt-2 moves until a local minimum is reached. The algorithm was optimized for GPUs by distributing independent climbers across threads and minimizing memory accesses through techniques like caching distance matrices in shared memory. Evaluation on a Tesla GPU found it was 7.8x faster than an 8-core CPU implementation and produced optimal tours in 4 out of 5 test cases using 100,000-200,000 climbers.