Practical Machine Learning
  A Tutorial on Apache Mahout


               Biju B
         NLP R&D Division
         365Media Pvt. Ltd.
         bijub@365Media.in

             FOSSMEET NITC,
                 Calicut


          4-6 February 2011




   Biju B & Jaganadh G   Practical Machine Learning
nlp r d $ whoweare




     Working in Natural Language Processing (NLP), Machine Learning,
     Data Mining
     Passionate about Free and Open source :-)
     When gets free time teaches Python and blogs at
     http://jaganadhg.freeflux.net/blog and contributes to
     Openstreetmap
     Works for 365Media Pvt. Ltd. Coimbatore India.
     twitter handle : @jaganadhg, @bijub




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning
      Dont expect some mathy equations here




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis
     Fraud Detraction




                        Biju B & Jaganadh G   Practical Machine Learning
Mahout



  Mahout
  Open Source project by Apache Foundation
  Goal of this project is to build scalable machine learning libraries




                          Biju B & Jaganadh G   Practical Machine Learning
Mahout




  Mahout
  Mahout: a person who drives elephant ;-)
  The name comes from the project’s use of Apache Hadoop.




                       Biju B & Jaganadh G   Practical Machine Learning
Why a new library ?



  There are more than 30 Java libraries/ tools available for Machine
  Learning.
  Weka , Mallet, Classifier4j, Rapidminer ........
      Large Amount of data processing is not an easy task
      Machine Learning tools are supposed to produce quick results
      If the amount of data is too large it is not easy to process with a
      single machine (Even if it is powerful)
      Mahout is scalable: the core algorithms in Mahout are implemented
      on top of Apache Hadoop using the map/reduce paradigm




                        Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout




                Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier
     Random forest decision tree based classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Recommendation




    Filter information based on user preference
    Searching a large set of people and finding a smaller set with tastes
    similar to you
    e.g :- Amazon’s book recommendation , Netflix movie
    recommendation




                      Biju B & Jaganadh G   Practical Machine Learning
Document Classification




     Classify documents based on its content
     e.g: - spam filtering,priority inbox




                       Biju B & Jaganadh G   Practical Machine Learning
Demo


       Building recommendations engines with Mahout
       Document Classification with Mahout




                       Biju B & Jaganadh G   Practical Machine Learning
Reference




            Biju B & Jaganadh G   Practical Machine Learning
Reference


     Mahout in Action - Book by Sean Owen and Robin Anil, published
     by Manning Publications.
     Taming Text - By Grant Ingersoll and Tom Morton, published by
     Manning Publications.
     Introducing Apache Mahout - Grant Ingersoll - Intro to Apache
     Mahout focused on clustering, classification and collaborative
     filtering. https://www.ibm.com/developerworks/java/library/j-
     mahout/index.html
     Programming Collective Intelligence: Building Smart Web 2.0
     Applications
     http://www.amazon.com/Programming-Collective-Intelligence-
     Building-Applications/dp/0596529325




                      Biju B & Jaganadh G   Practical Machine Learning
Useful Resources




     Apache Mahout Site http://mahout.apache.org/
     Apache Mahout Mailing List user@mahout.apache.org
     The code which I used for Mahout demo is available at
     http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
     Twenty News Group data set
     http://people.csail.mit.edu/jrennie/20Newsgroups/20news-
     bydate.tar.gz




                      Biju B & Jaganadh G   Practical Machine Learning
Questions ??




               Biju B & Jaganadh G   Practical Machine Learning
Acknowledgments



  Thanks to :
      Manning Publications for Review Copy of the book ”Mahout in
      Action”
      Apache Mahout mailing list members
      Ted Dunning and Robin Anil for suggestions
      @chelakkandupoda for review and criticism
      Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and
      encouragement




                       Biju B & Jaganadh G   Practical Machine Learning
Finally




          Biju B & Jaganadh G   Practical Machine Learning

Mahout Tutorial FOSSMEET NITC

  • 1.
    Practical Machine Learning A Tutorial on Apache Mahout Biju B NLP R&D Division 365Media Pvt. Ltd. bijub@365Media.in FOSSMEET NITC, Calicut 4-6 February 2011 Biju B & Jaganadh G Practical Machine Learning
  • 2.
    nlp r d$ whoweare Working in Natural Language Processing (NLP), Machine Learning, Data Mining Passionate about Free and Open source :-) When gets free time teaches Python and blogs at http://jaganadhg.freeflux.net/blog and contributes to Openstreetmap Works for 365Media Pvt. Ltd. Coimbatore India. twitter handle : @jaganadhg, @bijub Biju B & Jaganadh G Practical Machine Learning
  • 3.
    Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 4.
    Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 5.
    Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Biju B & Jaganadh G Practical Machine Learning
  • 6.
    Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Biju B & Jaganadh G Practical Machine Learning
  • 7.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Biju B & Jaganadh G Practical Machine Learning
  • 8.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes Biju B & Jaganadh G Practical Machine Learning
  • 9.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Biju B & Jaganadh G Practical Machine Learning
  • 10.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Biju B & Jaganadh G Practical Machine Learning
  • 11.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Biju B & Jaganadh G Practical Machine Learning
  • 12.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Biju B & Jaganadh G Practical Machine Learning
  • 13.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Biju B & Jaganadh G Practical Machine Learning
  • 14.
    Machine Learning andOur Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Fraud Detraction Biju B & Jaganadh G Practical Machine Learning
  • 15.
    Mahout Mahout Open Source project by Apache Foundation Goal of this project is to build scalable machine learning libraries Biju B & Jaganadh G Practical Machine Learning
  • 16.
    Mahout Mahout Mahout: a person who drives elephant ;-) The name comes from the project’s use of Apache Hadoop. Biju B & Jaganadh G Practical Machine Learning
  • 17.
    Why a newlibrary ? There are more than 30 Java libraries/ tools available for Machine Learning. Weka , Mallet, Classifier4j, Rapidminer ........ Large Amount of data processing is not an easy task Machine Learning tools are supposed to produce quick results If the amount of data is too large it is not easy to process with a single machine (Even if it is powerful) Mahout is scalable: the core algorithms in Mahout are implemented on top of Apache Hadoop using the map/reduce paradigm Biju B & Jaganadh G Practical Machine Learning
  • 18.
    Algorithms in ApacheMahout Biju B & Jaganadh G Practical Machine Learning
  • 19.
    Algorithms in ApacheMahout Collaborative Filtering Biju B & Jaganadh G Practical Machine Learning
  • 20.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders Biju B & Jaganadh G Practical Machine Learning
  • 21.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Biju B & Jaganadh G Practical Machine Learning
  • 22.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Biju B & Jaganadh G Practical Machine Learning
  • 23.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Biju B & Jaganadh G Practical Machine Learning
  • 24.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Biju B & Jaganadh G Practical Machine Learning
  • 25.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Biju B & Jaganadh G Practical Machine Learning
  • 26.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Biju B & Jaganadh G Practical Machine Learning
  • 27.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Biju B & Jaganadh G Practical Machine Learning
  • 28.
    Algorithms in ApacheMahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Biju B & Jaganadh G Practical Machine Learning
  • 29.
    Recommendation Filter information based on user preference Searching a large set of people and finding a smaller set with tastes similar to you e.g :- Amazon’s book recommendation , Netflix movie recommendation Biju B & Jaganadh G Practical Machine Learning
  • 30.
    Document Classification Classify documents based on its content e.g: - spam filtering,priority inbox Biju B & Jaganadh G Practical Machine Learning
  • 31.
    Demo Building recommendations engines with Mahout Document Classification with Mahout Biju B & Jaganadh G Practical Machine Learning
  • 32.
    Reference Biju B & Jaganadh G Practical Machine Learning
  • 33.
    Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective-Intelligence- Building-Applications/dp/0596529325 Biju B & Jaganadh G Practical Machine Learning
  • 34.
    Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Biju B & Jaganadh G Practical Machine Learning
  • 35.
    Questions ?? Biju B & Jaganadh G Practical Machine Learning
  • 36.
    Acknowledgments Thanksto : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Biju B & Jaganadh G Practical Machine Learning
  • 37.
    Finally Biju B & Jaganadh G Practical Machine Learning