This paper describes BABAR, a knowledge extraction and representation system, completely implemented in CLOS, that is primarily geared towards organizing and reasoning about knowledge extracted from the Wikipedia Website. The system combines natural language processing techniques, knowledge representation paradigms and machine learning algorithms. BABAR is currently an ongoing independent research project that when sufficiently mature, may provide various commercial opportunities.
BABAR uses natural language processing to parse both page name and page contents. It automatically generates Wikipedia topic taxonomies thus providing a model for organizing the approximately 4,000,000 existing Wikipedia pages. It uses similarity metrics to establish concept relevancy and clustering algorithms to group topics based on semantic relevancy. Novel algorithms are presented that combine approaches from the areas of machine learning and recommender systems. The system also generates a knowledge hypergraph which will ultimately be used in conjunction with an automated reasoner to answer questions about particular topics.