This document discusses using automated speech recognition (ASR) to generate transcripts for MOOC videos. It notes that manually transcribing video takes 10 times as long as the video. The document then discusses how ASR works, its accuracy, and how the University of Valencia used ASR to transcribe transcripts for 30 MOOC courses in 70% less time compared to manual transcription. It provides details on how to improve ASR quality through training models on related transcribed audio and text corpora. Overall, the document advocates that ASR is mature enough to help with captioning as long as the transcripts are then reviewed.