The document introduces the H-Transformer-1D model for fast one dimensional hierarchical attention on sequences. It begins by discussing the self-attention mechanism in Transformers and how it has achieved state-of-the-art results across many tasks. However, self-attention has a computational complexity of O(n2) due to the quadratic matrix operations, which becomes a bottleneck for long sequences. The document then reviews related works that aim to reduce this complexity through techniques like sparse attention. It proposes using H-matrix and multigrid methods from numerical analysis to hierarchically decompose the attention matrix and make it sparse. The following sections will explain how this is applied in H-Transformer-1D and how it can be implemented