TransPose proposes a Transformer-based model for human pose estimation that aims to improve explainability. It applies a Transformer encoder to feature maps from an image to estimate keypoint heatmaps. Self-attention can visualize relationships between pixels. The model achieves comparable accuracy to CNN models but with 73% fewer parameters and faster speed. Heatmap visualizations show which locations influence each joint the most.