×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Configuring Mahout Clustering Jobs - Frank Scholten

by on Oct 21, 2011

  • 2,618 views

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 ...

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

For more than a decade internet search engines have helped users find documents they are looking for. However, what if users aren't looking for anything specific but want a summary of a large document collection and want to be surprised? One solution to this problem is document clustering. Clustering algorithms group documents that have similar content. Real-life examples of clustering are clustered search results of Google news, or tag clouds which group documents under a shared label. Apache Mahout is a framework for scalable machine learning on top of Apache Hadoop and can be used for large scale document clustering. This talk introduces clustering in general and shows you step-by-step how to configure Mahout clustering jobs to create a tag cloud from a document collection. This talk is suitable for people who have some experience with Hadoop and perhaps Mahout. Knowledge of clustering is not required.

Statistics

Views

Total Views
2,618
Views on SlideShare
2,618
Embed Views
0

Actions

Likes
2
Downloads
42
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Configuring Mahout Clustering Jobs - Frank Scholten Configuring Mahout Clustering Jobs - Frank Scholten Presentation Transcript