Introduction to Apache Mahout

1,156 views

Published on

A presentation given on final semester of BSc IT

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,156
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Introduction to Apache Mahout

  1. 1. Name:Aman AdhikariEmail: adhikariaman01@gmail.com
  2. 2.  Machine Learning , a branch of AI, is aboutconstruction and study of system that canlearn from existing data.It is used in field like:Information retrievalIdentify key topics in large collections of textBiologyLinear Algebra etc.
  3. 3.  An Apache Software Foundation project tocreate scalable machine learning librariesunder the Apache Software License.WHY MAHOUT ?Many Open Source Machine Learning libraries either: Lack Community Lack Documentation and Examples Lack Scalability Lack the Apache License Or are not research-oriented
  4. 4.  Began life at 2008 as sub project of ApacheLucene (search, text mining- API). Lucene commiter felt it to include asseparate project and mahout absorbed Tastecollaborative filtering project. At April 2010, Mahout became top levelapache project
  5. 5.  Google News sees about 3.5 million newnews articles per day and clustered withother articles in minutes to deliver timely.Other eg. Picasa. Mahout makes use of hadoop. Some algorithms won’t scale to massive machineclusters but map-reduce framework like apachehadoop do. Mahout convert algorithm to work at scale on topof Hadoop.
  6. 6.  Recommender engines (CollaborativeFiltering) Clustering Classification
  7. 7.  Extensive framework for collaborativefiltering. Recommenders:-- User Based-- Item Based Online and Offline support-- Offline can utilize hadoop Used by Amazon , Facebook etc.
  8. 8.  Clustering techniques attempt to group alarge number of things together into clustersthat share some similarity. K-means , Fuzzy K-means Summly app also summarize similar storiesfrom different news site and gives a briefnews on that app.(concept of Google news)
  9. 9.  Classification techniques decide how much athing is or isn’t part of some type orcategory, or how much it does or doesn’thave some attribute. Example:-- Yahoo Mail spam checker-- Facebook face detection
  10. 10.  Mahout is young ,open source , scalablemachine learning library from apache Its technique are no longer theory insteaddeployed to solve in real world like e-commerce, video , picture etc. Scalability being the major issue Hadoop ison rescue.

×