This document discusses research on using acoustic features to predict topic changes in instructional videos. The researchers analyzed over 120 hours of educational content from YouTube to extract acoustic features every 50ms. They used MFCC and InterSpeech 2009 features and manually labeled topic changes. Dimensionality reduction was applied to obtain a manageable dataset for training models like SVM and Random Forest, achieving 78% accuracy. Future work involves combining visual and acoustic cues to generate accurate tables of contents for educational videos.