Mahout
Algorithms
Mahmut Karakaya
Agenda
- Introduction
- Collaborative Filtering
- Map/Reduce
- Clustering
- Demo
What mahout means
Elephant rider in Hindi
What Apache Mahout is
- Java, Hadoop
- Collaborative Filtering
- Mahout In Action
- user@mahout.apache.org
- 0.9 (1-Feb-20...
Who uses Mahout
Mahout in Apache Foundation
overstock.com saves $2m a year
Judd Bagley Saum Noursalehi
Others
- Weka (Machine Learning Library)
- Lenskit (Grouplens)
- EasyRec (RestAPI)
- Write yourself:)
Need to know ML?
Need to know ML?
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapred.inpu...
Data Model (u,i,r)
Similarity
Cosine Similarity
Cosine Similarity
Collaborative Filtering
- Data format = userId, itemId, rating
- Create Model + Predict
Item Based - Similarity Matrix (Item-Item)
Item Based - Predict
- Weighted Sum:
r^(3,1) = 2 * 0.91 + ...
Item Based
Item Based.. Why in Mahout
- Generic recommender like User Based
- User Based similarity matrix is heavier
Singular Value Decomposition (SVD)
SVDRecommeder
Factorization
Factorizer
Singular Value Decomposition (SVD)
m * n → m * k + n * k
10M → 100K + 10K
Lets say; m=10K
n = 1K
k=10
Singular Value Decomposition (SVD)
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=10
SVD.. Why in Mahout
- Won Netflix Prize
- Parallelizable by row, column
Map / Reduce Mapper
1.txt 2.txt
Hello Hello
Hello
Map / Reduce Mapper
Map / Reduce Mapper
Map1 Map2
Hello,1 Hello,1
Hello,1
Map / Reduce Reducer
Map / Reduce Reducer
Hello,3
Map / Reduce ItemBased
Map / Reduce ItemBased
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapre...
Map / Reduce ItemBased
Map / Reduce ItemBased
Map / Reduce ItemBased
Map 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Map 2
Map / Reduce ItemBased
Reduce 2
Map / Reduce ItemBased
Map / Reduce.. Why in Mahout
Clustering
- KMeans Clustering (SM,MR)
- Fuzzy kMeans (SM,MR)
- Canopy Clustering (SM,MR)
- Dirichlet (SM,MR)
Kmeans
Kmeans
Clustering Evaluation
Clustering Intra Distance
Clustering Inter Distance
Clustering.. Why in Mahout
- Sparsity
- ~10m of 11m users registered 1 Sony product
Clustering.. Why in Mahout
- Group Recommendation
- Cluster Based Recommendation
Create WishList Experience
- Mahout (SVD)
- Play
- Heroku
- MongoLab
- Rest
http://recommenderplaybbs.herokuapp.com/
Thank you
Upcoming SlideShare
Loading in...5
×

Apache Mahout Algorithms

866

Published on

Published in: Software, Technology, Education

Apache Mahout Algorithms

  1. 1. Mahout Algorithms Mahmut Karakaya
  2. 2. Agenda - Introduction - Collaborative Filtering - Map/Reduce - Clustering - Demo
  3. 3. What mahout means Elephant rider in Hindi
  4. 4. What Apache Mahout is - Java, Hadoop - Collaborative Filtering - Mahout In Action - user@mahout.apache.org - 0.9 (1-Feb-2014)
  5. 5. Who uses Mahout
  6. 6. Mahout in Apache Foundation
  7. 7. overstock.com saves $2m a year Judd Bagley Saum Noursalehi
  8. 8. Others - Weka (Machine Learning Library) - Lenskit (Grouplens) - EasyRec (RestAPI) - Write yourself:)
  9. 9. Need to know ML?
  10. 10. Need to know ML? hadoop.jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item. RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData
  11. 11. Data Model (u,i,r)
  12. 12. Similarity
  13. 13. Cosine Similarity
  14. 14. Cosine Similarity
  15. 15. Collaborative Filtering - Data format = userId, itemId, rating - Create Model + Predict
  16. 16. Item Based - Similarity Matrix (Item-Item)
  17. 17. Item Based - Predict - Weighted Sum: r^(3,1) = 2 * 0.91 + ...
  18. 18. Item Based
  19. 19. Item Based.. Why in Mahout - Generic recommender like User Based - User Based similarity matrix is heavier
  20. 20. Singular Value Decomposition (SVD)
  21. 21. SVDRecommeder
  22. 22. Factorization
  23. 23. Factorizer
  24. 24. Singular Value Decomposition (SVD)
  25. 25. m * n → m * k + n * k 10M → 100K + 10K Lets say; m=10K n = 1K k=10 Singular Value Decomposition (SVD)
  26. 26. SVD k=3 λ=0.1 a=40 c.a=1
  27. 27. SVD k=3 λ=0.1 a=40 c.a=1
  28. 28. SVD k=3 λ=0.1 a=40 c.a=10
  29. 29. SVD.. Why in Mahout - Won Netflix Prize - Parallelizable by row, column
  30. 30. Map / Reduce Mapper 1.txt 2.txt Hello Hello Hello
  31. 31. Map / Reduce Mapper
  32. 32. Map / Reduce Mapper Map1 Map2 Hello,1 Hello,1 Hello,1
  33. 33. Map / Reduce Reducer
  34. 34. Map / Reduce Reducer Hello,3
  35. 35. Map / Reduce ItemBased
  36. 36. Map / Reduce ItemBased hadoop.jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item. RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData
  37. 37. Map / Reduce ItemBased
  38. 38. Map / Reduce ItemBased
  39. 39. Map / Reduce ItemBased Map 1
  40. 40. Map / Reduce ItemBased Reduce 1
  41. 41. Map / Reduce ItemBased Reduce 1
  42. 42. Map / Reduce ItemBased Map 2
  43. 43. Map / Reduce ItemBased Reduce 2
  44. 44. Map / Reduce ItemBased
  45. 45. Map / Reduce.. Why in Mahout
  46. 46. Clustering - KMeans Clustering (SM,MR) - Fuzzy kMeans (SM,MR) - Canopy Clustering (SM,MR) - Dirichlet (SM,MR)
  47. 47. Kmeans
  48. 48. Kmeans
  49. 49. Clustering Evaluation
  50. 50. Clustering Intra Distance
  51. 51. Clustering Inter Distance
  52. 52. Clustering.. Why in Mahout - Sparsity - ~10m of 11m users registered 1 Sony product
  53. 53. Clustering.. Why in Mahout - Group Recommendation - Cluster Based Recommendation
  54. 54. Create WishList Experience - Mahout (SVD) - Play - Heroku - MongoLab - Rest http://recommenderplaybbs.herokuapp.com/
  55. 55. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×