This document describes a hybrid approach using supervised and unsupervised learning to discover high-level categories of documents on a statistical website. It trained classifiers on labeled documents and classified unlabeled documents, clustering those with low classification probabilities. The supervised models had better accuracy than unsupervised. Clustering uncovered new potential categories beyond the original 13. Further evaluation will compare automatically and manually generated categories.