Mahout Tutorial FOSSMEET NITC

5,501 views

Published on

Biju B and Jaganadh G's presentation on Mahout at FOSSMEET-NITC

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,501
On SlideShare
0
From Embeds
0
Number of Embeds
90
Actions
Shares
0
Downloads
200
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Mahout Tutorial FOSSMEET NITC

  1. 1. Practical Machine Learning A Tutorial on Apache Mahout Biju B NLP R&D Division 365Media Pvt. Ltd. bijub@365Media.in FOSSMEET NITC, Calicut 4-6 February 2011 Biju B & Jaganadh G Practical Machine Learning
  2. 2. nlp r d $ whoweare Working in Natural Language Processing (NLP), Machine Learning, Data Mining Passionate about Free and Open source :-) When gets free time teaches Python and blogs at http://jaganadhg.freeflux.net/blog and contributes to Openstreetmap Works for 365Media Pvt. Ltd. Coimbatore India. twitter handle : @jaganadhg, @bijub Biju B & Jaganadh G Practical Machine Learning
  3. 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  4. 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  5. 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Biju B & Jaganadh G Practical Machine Learning
  6. 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Biju B & Jaganadh G Practical Machine Learning
  7. 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Biju B & Jaganadh G Practical Machine Learning
  8. 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Biju B & Jaganadh G Practical Machine Learning
  9. 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Biju B & Jaganadh G Practical Machine Learning
  10. 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Biju B & Jaganadh G Practical Machine Learning
  11. 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Biju B & Jaganadh G Practical Machine Learning
  12. 12. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Biju B & Jaganadh G Practical Machine Learning
  13. 13. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Biju B & Jaganadh G Practical Machine Learning
  14. 14. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Fraud Detraction Biju B & Jaganadh G Practical Machine Learning
  15. 15. Mahout Mahout Open Source project by Apache Foundation Goal of this project is to build scalable machine learning libraries Biju B & Jaganadh G Practical Machine Learning
  16. 16. Mahout Mahout Mahout: a person who drives elephant ;-) The name comes from the project’s use of Apache Hadoop. Biju B & Jaganadh G Practical Machine Learning
  17. 17. Why a new library ? There are more than 30 Java libraries/ tools available for Machine Learning. Weka , Mallet, Classifier4j, Rapidminer ........ Large Amount of data processing is not an easy task Machine Learning tools are supposed to produce quick results If the amount of data is too large it is not easy to process with a single machine (Even if it is powerful) Mahout is scalable: the core algorithms in Mahout are implemented on top of Apache Hadoop using the map/reduce paradigm Biju B & Jaganadh G Practical Machine Learning
  18. 18. Algorithms in Apache Mahout Biju B & Jaganadh G Practical Machine Learning
  19. 19. Algorithms in Apache Mahout Collaborative Filtering Biju B & Jaganadh G Practical Machine Learning
  20. 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Biju B & Jaganadh G Practical Machine Learning
  21. 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Biju B & Jaganadh G Practical Machine Learning
  22. 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Biju B & Jaganadh G Practical Machine Learning
  23. 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Biju B & Jaganadh G Practical Machine Learning
  24. 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Biju B & Jaganadh G Practical Machine Learning
  25. 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Biju B & Jaganadh G Practical Machine Learning
  26. 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Biju B & Jaganadh G Practical Machine Learning
  27. 27. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Biju B & Jaganadh G Practical Machine Learning
  28. 28. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Biju B & Jaganadh G Practical Machine Learning
  29. 29. Recommendation Filter information based on user preference Searching a large set of people and finding a smaller set with tastes similar to you e.g :- Amazon’s book recommendation , Netflix movie recommendation Biju B & Jaganadh G Practical Machine Learning
  30. 30. Document Classification Classify documents based on its content e.g: - spam filtering,priority inbox Biju B & Jaganadh G Practical Machine Learning
  31. 31. Demo Building recommendations engines with Mahout Document Classification with Mahout Biju B & Jaganadh G Practical Machine Learning
  32. 32. Reference Biju B & Jaganadh G Practical Machine Learning
  33. 33. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective-Intelligence- Building-Applications/dp/0596529325 Biju B & Jaganadh G Practical Machine Learning
  34. 34. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Biju B & Jaganadh G Practical Machine Learning
  35. 35. Questions ?? Biju B & Jaganadh G Practical Machine Learning
  36. 36. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Biju B & Jaganadh G Practical Machine Learning
  37. 37. Finally Biju B & Jaganadh G Practical Machine Learning

×