ICWSM 2011 Tutorial
Lyle Ungar and Ronen Feldman
The proliferation of documents available on the Web and on corporate intranets is driving a new wave of text mining research and application. Earlier research addressed extraction of information from relatively small collections of well-structured documents such as newswire or scientific publications. Text mining from the other corpora such as the web requires new techniques drawn from data mining, machine learning, NLP and IR. Text mining requires preprocessing document collections (text categorization, information extraction, term extraction), storage of the intermediate representations, analysis of these intermediate representations (distribution analysis, clustering, trend analysis, association rules, etc.), and visualization of the results. In this tutorial we will present the algorithms and methods used to build text mining systems. The tutorial will cover the state of the art in this rapidly growing area of research, including recent advances in unsupervised methods for extracting facts from text and methods used for web-scale mining. We will also present several real world applications of text mining. Special emphasis will be given to lessons learned from years of experience in developing real world text mining systems, including recent advances in sentiment analysis and how to handle user generated text such as blogs and user reviews.
Lyle H. Ungar is an Associate Professor of Computer and Information Science (CIS) at the University of Pennsylvania. He also holds appointments in several other departments at Penn in the Schools of Engineering and Applied Science, Business (Wharton), and Medicine. Dr. Ungar received a B.S. from Stanford University and a Ph.D. from M.I.T. He directed Penn's Executive Masters of Technology Management (EMTM) Program for a decade, and is currently Associate Director of the Penn Center for BioInformatics (PCBI). He has published over 100 articles and holds eight patents. His current research focuses on developing scalable machine learning methods for data mining and text mining.
Ronen Feldman is an Associate Professor of Information Systems at the Business School of the Hebrew University in Jerusalem. He received his B.Sc. in Math, Physics and Computer Science from the Hebrew University and his Ph.D. in Computer Science from Cornell University in NY. He is the author of the book "The Text Mining Handbook" published by Cambridge University Press in 2007.