This document provides an overview of building an automatic speech recognition (ASR) engine. It discusses speech as a natural modality with high throughput that needs to account for errors in ASR output. It describes the components of a dialogue system including the ASR, natural language understanding, text-to-speech, and a dialogue manager. The document then discusses the components inside the recognizer including the acoustic model, language model, feature extraction using MFCC, and decoding using techniques like beam search. It also discusses topics like building the lexicon, acoustic modeling, and using deep learning approaches in ASR.