The document describes a deep recurrent neural network model with multi-head attention mechanisms for punctuation restoration. The model stacks multiple bidirectional recurrent layers to encode context and applies multi-head attention to each layer to capture hierarchical features. Evaluated on an English speech transcription dataset, the proposed model outperforms previous methods that use convolutional or recurrent neural networks alone or with single-head attention.