The document proposes FAVOR, a method to improve the efficiency of attention in Transformers. FAVOR reduces computational complexity of attention from O(L^2d) to O(Lr) by applying a mapping before attention to represent queries and keys with lower dimensionality r. It introduces a kernel function to replace softmax that is more stable during training. Experiments show FAVOR achieves faster inference speed and higher accuracy than baseline Transformers.