This document discusses two natural language processing techniques: universal topic classification and named entity disambiguation.
For universal topic classification, it proposes using Apache Lucene/Solr's MoreLikeThis query to find related Wikipedia articles based on document terms, and then categorizing the document using the topics of related articles. It also discusses using Wikipedia categories to provide a hierarchical structure.
For named entity disambiguation, it suggests using MoreLikeThis with surrounding context to disambiguate entities mentioned in a document (e.g. determining if "George Bush" refers to George H. W. Bush or George W. Bush). The document outlines work in progress to integrate these techniques into the Stanbol semantic framework.