The document proposes a new similarity measure for text classification and clustering that considers three cases: when a feature appears in both documents, in one document, or in none. It evaluates the effectiveness of this measure on real-world data sets, finding it performs better than other measures. It also describes an existing system for document clustering that has disadvantages like dependency on initial random assignments and local rather than global minimum variance. The proposed system develops a hierarchical algorithm for more efficient and high-performing document clustering using a novel way to evaluate similarity between documents.