The document discusses different components and methods of text-to-speech (TTS) synthesis systems. It describes a typical TTS system as having a high-level component that converts input text to a phonetic representation and a low-level component that generates the audio output. The low-level component can use either formant synthesis, which generates parameters for vocal tract filters, or waveform concatenation, which joins prerecorded speech units. The document also examines choices for the size of the stored speech units used in concatenation systems, such as words, phones, and diphones.