This document provides a comprehensive survey of techniques for adapting transformer models to handle long input sequences, discussing the limitations of self-attention and presenting various approaches to mitigate these issues. It covers advancements such as sparse transformers, transformer-xl, reformer, routing transformer, and longformer, each with distinct algorithmic complexities and methodologies for optimizing attention mechanisms. The document also includes references to foundational research and implementation resources for these transformer variations.