The document discusses using n-grams for content-based file classification and information retrieval. It proposes representing documents as n-gram profiles to be used with machine learning algorithms like k-nearest neighbors (kNN) and support vector machines (SVM). SQL procedures are provided to generate n-gram data in both horizontal and vertical database formats for efficient storage and querying. Preliminary results show n-grams, including multigrams, can improve classification performance over single n-grams.