This document discusses research on text mining, focusing on the classification of textual data from three classic literature books using two algorithms: k-nearest neighbors (KNN) and bigram-based maximum likelihood. The study highlights the challenges of categorizing unstructured data and evaluates the performance of both algorithms, concluding that the bigram approach yields better results. It details the data collection methodology, algorithmic processes, and various evaluations performed to establish the accuracy and effectiveness of the categorization techniques.