The document discusses improving the performance of the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP) on CUDA GPUs by avoiding duplicated computation among threads. It first provides background on GPU architecture, the PFSP problem, and related parallelization methods. It then observes that if two permutations share the same prefix, their completion time tables will contain identical column data equal to the length of the prefix, leading to duplicated computation. The paper proposes an approach where each thread is assigned a permutation and allocated shared memory to store and compute the completion time table in parallel, avoiding this duplicated work by leveraging the shared prefix property. Experimental results show the new approach runs up to 1.5 times faster than an existing