1) The document describes the development of a speaker-dependent speech recognition system using MATLAB. It uses Gaussian mixture models for acoustic modeling and mel-frequency cepstral coefficients for feature extraction.
2) The system is designed to recognize isolated digits 0-9. Voice activity detection is performed to detect segments of speech. Various windowing functions are evaluated to reduce spectral leakage during feature extraction.
3) 13 MFCCs plus energy are extracted from each 30ms frame with 10ms shift. The first and second derivatives are also calculated to capture dynamic information, resulting in a 39-dimensional feature vector.