The document discusses sample-based generative models for speech synthesis, including techniques for splitting waveforms into frames, extracting features, and generating audio. Key models mentioned include WaveNet and SampleRNN, each with distinct advantages and disadvantages in terms of quality, generation speed, and memory usage. The document also references various research papers on these advanced synthesis methods.