- The document proposes a novel method to compute relevancy for Transformer networks by propagating relevance through attention maps in a layer-wise manner rather than averaging across layers or relying on simplistic assumptions. - It introduces a non-parametric relevance propagation method based on Taylor decomposition to backpropagate relevance through attention, skip connections and other operations in each layer separately. - Experiments show that the proposed layer-wise relevance propagation method outperforms existing methods at identifying relevant regions in images and is more robust to input perturbations.