Peter Norvig, the Director of Research at Google, said in the Amazon book review for the book “Statistical Natural Language Processing” “If someone told me I had to make a million bucks in one year, and I could only refer to one book to do it, I`d grab a copy of this book and start a web text-processing company.” Despite this vast availability of free-form text, without a scalable way to process this data we have missed an opportunity. Apache Mahout is an open source machine learning library built atop Hadoop. We will cover how to use it to do some interesting, scalable machine learning (classification, clustering, etc.) on free-form text. We will learn about classification using Random Forests, clustering beyond simple K-Means using Latent Dirichlet Allocation. We will investigate the limitations and abilities of this platform together as it relates to natural language. 1. http://www.amazon.com/review/R3GSYXSKRU8V17
Clipping is a handy way to collect important slides you want to go back to later.