Automatic speech recognition (ASR) is the technology that converts speech to written text. There are two main approaches: static systems that use acoustic, pronunciation, and language models sequentially; and end-to-end neural networks that use deep neural networks for feature extraction, acoustic modeling, and language modeling. Challenges for ASR systems include noise, variations in accents and ages, transferring learning across dialects, and operating locally on devices without internet.