This document summarizes work on automatic video captioning as part of the TRECVid video-to-text task. It discusses the motivation for automatic video captioning due to the exponential growth of multimedia content. It describes the video dataset used, which consisted of over 50k Vine videos that were manually captioned. It then discusses several approaches that participants used for generating captions, including keyframe extraction and spatial cropping. Evaluation metrics included BLEU, METEOR, CIDEr and semantic textual similarity. Results showed that while performance is good, human captions are still superior. Future work may incorporate audio and generate descriptions that consider temporal context.