This document presents a framework for detecting annotation errors in text-to-speech corpora. The authors describe classifying words and utterances as correctly or incorrectly annotated using various machine learning classifiers and features. Their best models achieved around 90% accuracy at the word level and 97% accuracy at the utterance level. Future work could involve applying these methods to more data from different languages and speech styles.