SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Text Classification Powered by Apache Mahout and Lucene

by on Dec 05, 2013

  • 1,416 views

Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin ...

Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin

Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features.
Configure


Statistics

Views

Total Views
1,416
Views on SlideShare
1,258
Embed Views
158

Actions

Likes
2
Downloads
44
Comments
0

2 Embeds 158

http://www.lucenerevolution.org 157
http://lucenerevolution.org 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Text Classification Powered by Apache Mahout and Lucene Text Classification Powered by Apache Mahout and Lucene Presentation Transcript