Machine Learning , a branch of AI, is aboutconstruction and study of system that canlearn from existing data.It is used in field like:Information retrievalIdentify key topics in large collections of textBiologyLinear Algebra etc.
An Apache Software Foundation project tocreate scalable machine learning librariesunder the Apache Software License.WHY MAHOUT ?Many Open Source Machine Learning libraries either: Lack Community Lack Documentation and Examples Lack Scalability Lack the Apache License Or are not research-oriented
Began life at 2008 as sub project of ApacheLucene (search, text mining- API). Lucene commiter felt it to include asseparate project and mahout absorbed Tastecollaborative filtering project. At April 2010, Mahout became top levelapache project
Google News sees about 3.5 million newnews articles per day and clustered withother articles in minutes to deliver timely.Other eg. Picasa. Mahout makes use of hadoop. Some algorithms won’t scale to massive machineclusters but map-reduce framework like apachehadoop do. Mahout convert algorithm to work at scale on topof Hadoop.
Extensive framework for collaborativefiltering. Recommenders:-- User Based-- Item Based Online and Offline support-- Offline can utilize hadoop Used by Amazon , Facebook etc.
Clustering techniques attempt to group alarge number of things together into clustersthat share some similarity. K-means , Fuzzy K-means Summly app also summarize similar storiesfrom different news site and gives a briefnews on that app.(concept of Google news)
Classification techniques decide how much athing is or isn’t part of some type orcategory, or how much it does or doesn’thave some attribute. Example:-- Yahoo Mail spam checker-- Facebook face detection
Mahout is young ,open source , scalablemachine learning library from apache Its technique are no longer theory insteaddeployed to solve in real world like e-commerce, video , picture etc. Scalability being the major issue Hadoop ison rescue.