Frank Nack discusses audio and emotion recognition from speech. The document covers listening to sounds, producing sounds, and interpreting emotions from acoustic variables in speech like pitch, intensity, speech rate, and voice quality. It also discusses challenges with speech data collection and segmentation, feature extraction, and using classifiers like SVM, neural networks, decision trees, and HMM for emotion recognition. Measurement and benchmarking of audio emotion recognition systems is difficult due to varying conditions and datasets.