The document is a comprehensive review of modern text-to-speech (TTS) systems, detailing advancements from various models such as WaveNet, Deep Voice, and Tacotron, along with their evaluation metrics like the Mean Opinion Score (MOS). It covers technical architectures, training methods, and speech synthesis quality improvements over time, emphasizing recent innovations such as transformers and unsupervised style modeling. Additionally, it introduces the Ruslan corpus, a significant resource for Russian speech synthesis, providing insights into the evolution of TTS technologies.