This document describes a new model called Attention is All You Need that uses attention mechanisms without recurrent or convolutional layers. It introduces a model architecture that uses multi-head attention and feed-forward networks along with techniques like dropout and label smoothing. The model is evaluated on machine translation tasks using WMT 2014 English-German and English-French datasets and outperforms existing sequence models.