By : Puneet Gupta
M.Tech (Future Studies and Planning)
 A mahout is one who drives an elephant as its
master
 Its close association with Apache Hadoop
which uses an elephant as its logo.
 Apache Mahout started as a sub-project of
Apache’s Lucene in 2008. In 2010, Mahout
became a top level project of Apache.
Apache Mahout?
• Apache Mahout is an open source project
• Mahout is a Java library
- Implementing Machine Learning techniques
• Recommendation
• Clustering
• Classification
What can we do?
•Currently Mahout supports mainly three use cases:
–Recommendation - takes users' behavior and from
that tries to find items users might like.
–Clustering - takes e.g. text documents and groups
them into groups of topically related documents.
–Classification - learns from existing categorized
documents what documents of a specific category
look like and is able to assign unlabeled documents
to the (hopefully) correct category.
Why Mahout?
• Mahout is not the only Machine Learning
framework
– Weka
– R
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
– Scalable
•Based on Hadoop (not mandatory!)
Why do need a scalable framework?
Algorithms
•Recommendation
– User-based Collaborative Filtering
– Item-based Collaborative Filtering
– Slope One Recommenders
– Singular Value Decomposition
Algorithms
•Clustering
- Canopy
- K-Means
- Fuzzy K-Means
- Latent Dirichlet Allocation (LDA)
- MinHash Clustering
- Hierarchical Clustering
Algorithms
•Classification
- Logistic Regression
- Bayes
- Random Forests
- Hidden Markov Models
- Support Vector Machines
- Neural Networks
- Restricted Boltzmann Machines
Mahout Vs R
• Mahout is a java library But R is an
expression language with a very simple
syntax.
• Mahout use for Big data and R use
prototype data
Apache mahout

Apache mahout

  • 1.
    By : PuneetGupta M.Tech (Future Studies and Planning)
  • 2.
     A mahoutis one who drives an elephant as its master  Its close association with Apache Hadoop which uses an elephant as its logo.  Apache Mahout started as a sub-project of Apache’s Lucene in 2008. In 2010, Mahout became a top level project of Apache.
  • 3.
    Apache Mahout? • ApacheMahout is an open source project • Mahout is a Java library - Implementing Machine Learning techniques • Recommendation • Clustering • Classification
  • 4.
    What can wedo? •Currently Mahout supports mainly three use cases: –Recommendation - takes users' behavior and from that tries to find items users might like. –Clustering - takes e.g. text documents and groups them into groups of topically related documents. –Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabeled documents to the (hopefully) correct category.
  • 5.
    Why Mahout? • Mahoutis not the only Machine Learning framework – Weka – R • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation – Scalable •Based on Hadoop (not mandatory!)
  • 6.
    Why do needa scalable framework?
  • 7.
    Algorithms •Recommendation – User-based CollaborativeFiltering – Item-based Collaborative Filtering – Slope One Recommenders – Singular Value Decomposition
  • 8.
    Algorithms •Clustering - Canopy - K-Means -Fuzzy K-Means - Latent Dirichlet Allocation (LDA) - MinHash Clustering - Hierarchical Clustering
  • 9.
    Algorithms •Classification - Logistic Regression -Bayes - Random Forests - Hidden Markov Models - Support Vector Machines - Neural Networks - Restricted Boltzmann Machines
  • 10.
    Mahout Vs R •Mahout is a java library But R is an expression language with a very simple syntax. • Mahout use for Big data and R use prototype data