Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intro to Mahout

4,113 views

Published on

A short introduction to Mahout during SCISR meetup

http://bit.ly/scisr

Published in: Technology, Education

Intro to Mahout

  1. 1. Ofer Vugman May 2012
  2. 2. Agenda and such… What is ML (Machine Learning) ML Common Use Cases Mahout Overview Algorithms in Mahout Mahout Commercial Use Mahout Summary
  3. 3. What is ML “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” Intro. To Machine Learning by E. Alpaydin
  4. 4. ML Common Use Cases Recommendation
  5. 5. ML Common Use Cases Classification
  6. 6. ML Common Use Cases Clustering
  7. 7. ML Common Libraries
  8. 8. Mahout Overview – What ?A mahout is a person who keeps and drives an elephant
  9. 9. Mahout Overview – What ? A scalable machine learning library
  10. 10. Mahout Overview – What ? Began life at 2008 as a subproject of Apache’s Lucene project On 2010 Mahout became a top-level Apache project in its own right Implemented in Java Built upon Apache’s Hadoop (Look ! An Elephant !)
  11. 11. Mahout Overview – Why ? Many open source ML libraries either:  Lack community  Lack documentation and examples  Lack scalability  Lack the Apache license  Are research oriented  Not well tested  Not built over existing production quality libraries
  12. 12. Mahout Overview – Why ? Scalability  Scalable to reasonably large datasets (core algorithms implemented in Map/Reduce, runnable on Hadoop)  Scalable to support your business case (Apache License)  Scalable community
  13. 13. Mahout Overview – Why ? Built over existing production quality libraries
  14. 14. Mahout Overview – Use Cases Mahout currently supports mainly four use cases: 1. Recommendation 2. Clustering 3. Classification 4. Frequent Itemset Mining
  15. 15. Mahout Overview - Technical System Requirements  Linux (or Cygwin on Windows)  Java 1.6.x or greater  Maven 2.0.11 or greater to build the source code  Hadoop 0.2 or greater** Not all algorithms are implemented to work on Hadoop clusters
  16. 16. Algorithms in Mahout We’ll focus on one example:  Collaborative Filtering (Recommenders) Yet there are many (many !!) more, you can find them all on https://cwiki.apache.org/confluence/dis play/MAHOUT/Algorithms
  17. 17. Algorithms Examples –Recommendation Help users find items they might like based on historical preferences Based on example by Sebastian Schelter in “Distributed Itembased Collaborative Filtering with Apache Mahout”
  18. 18. Algorithms Examples –Recommendation Alice 5 1 4 Bob ? 2 5 Peter 4 3 2
  19. 19. Algorithms Examples –Recommendation Algorithm  Neighborhood-based approach  Works by finding similarly rated items in the user-item-matrix (e.g. cosine, Pearson- Correlation, Tanimoto Coefficient)  Estimates a users preference towards an item by looking at his/her preferences towards similar items
  20. 20. Algorithms Examples –Recommendation Prediction: Estimate Bobs preference towards “The Matrix” 1. Look at all items that  a) are similar to “The Matrix“  b) have been rated by Bob => “Alien“, “Inception“ 2. Estimate the unknown preference with a weighted sum
  21. 21. Algorithms Examples –Recommendation MapReduce phase 1  Map – Make user the key (Alice, Matrix, 5) Alice (Matrix, 5) (Alice, Alien, 1) Alice (Alien, 1) (Alice, Inception, 4) Alice (Inception, 4) (Bob, Alien, 2) Bob (Alien, 2) (Bob, Inception, 5) Bob (Inception, 5) (Peter, Matrix, 4) Peter (Matrix, 4) (Peter, Alien, 3) Peter (Alien, 3) (Peter, Inception, 2) Peter (Inception, 2)
  22. 22. Algorithms Examples –Recommendation MapReduce phase 1  Reduce – Create inverted index Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) Bob (Alien, 2) (Inception, 5) Bob (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2)
  23. 23. Algorithms Examples –Recommendation MapReduce phase 2  Map – Isolate all co-occurred ratings (all cases where a user rated both items) Matrix, Alien (5,1) Matrix, Alien (4,3)Alice (Matrix, 5) (Alien, 1) (Inception, 4) Alien, Inception (1,4)Bob (Alien, 2) (Inception, 5) Alien, Inception (2,5)Peter(Matrix, 4) (Alien, 3) (Inception, 2) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4)
  24. 24. Algorithms Examples –Recommendation MapReduce phase 2  Reduce – Compute similarities Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Matrix, Alien (-0.47) Alien, Inception (2,5) Matrix, Inception (0.47) Alien, Inception (3,2) Alien, Inception(-0.63) Matrix, Inception (4,2) Matrix, Inception (5,4)
  25. 25. Algorithms Examples –Recommendation Alice 5 1 4 Bob 1.5 2 5 Peter 4 3 2
  26. 26. Mahout Commercial Use Commercial use
  27. 27. Mahout Resources Mahout website - http://mahout.apache.org/ Introducing Apache Mahout – http://www.ibm.com/developerworks/java/lib rary/j-mahout/ “Mahout In Action” by Sean Owen and Robin Anil
  28. 28. Mahout Summary ML is all over the web today Mahout is about scalable machine learning Mahout has functionality for many of today’s common machine learning tasks MapReduce magic in action
  29. 29. Mahout Summary Thank you and good night

×