Your SlideShare is downloading. ×

Apache Mahout Algorithms

816

Published on

Published in: Software, Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
816
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
72
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mahout Algorithms Mahmut Karakaya
  • 2. Agenda - Introduction - Collaborative Filtering - Map/Reduce - Clustering - Demo
  • 3. What mahout means Elephant rider in Hindi
  • 4. What Apache Mahout is - Java, Hadoop - Collaborative Filtering - Mahout In Action - user@mahout.apache.org - 0.9 (1-Feb-2014)
  • 5. Who uses Mahout
  • 6. Mahout in Apache Foundation
  • 7. overstock.com saves $2m a year Judd Bagley Saum Noursalehi
  • 8. Others - Weka (Machine Learning Library) - Lenskit (Grouplens) - EasyRec (RestAPI) - Write yourself:)
  • 9. Need to know ML?
  • 10. Need to know ML? hadoop.jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item. RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData
  • 11. Data Model (u,i,r)
  • 12. Similarity
  • 13. Cosine Similarity
  • 14. Cosine Similarity
  • 15. Collaborative Filtering - Data format = userId, itemId, rating - Create Model + Predict
  • 16. Item Based - Similarity Matrix (Item-Item)
  • 17. Item Based - Predict - Weighted Sum: r^(3,1) = 2 * 0.91 + ...
  • 18. Item Based
  • 19. Item Based.. Why in Mahout - Generic recommender like User Based - User Based similarity matrix is heavier
  • 20. Singular Value Decomposition (SVD)
  • 21. SVDRecommeder
  • 22. Factorization
  • 23. Factorizer
  • 24. Singular Value Decomposition (SVD)
  • 25. m * n → m * k + n * k 10M → 100K + 10K Lets say; m=10K n = 1K k=10 Singular Value Decomposition (SVD)
  • 26. SVD k=3 λ=0.1 a=40 c.a=1
  • 27. SVD k=3 λ=0.1 a=40 c.a=1
  • 28. SVD k=3 λ=0.1 a=40 c.a=10
  • 29. SVD.. Why in Mahout - Won Netflix Prize - Parallelizable by row, column
  • 30. Map / Reduce Mapper 1.txt 2.txt Hello Hello Hello
  • 31. Map / Reduce Mapper
  • 32. Map / Reduce Mapper Map1 Map2 Hello,1 Hello,1 Hello,1
  • 33. Map / Reduce Reducer
  • 34. Map / Reduce Reducer Hello,3
  • 35. Map / Reduce ItemBased
  • 36. Map / Reduce ItemBased hadoop.jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item. RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData
  • 37. Map / Reduce ItemBased
  • 38. Map / Reduce ItemBased
  • 39. Map / Reduce ItemBased Map 1
  • 40. Map / Reduce ItemBased Reduce 1
  • 41. Map / Reduce ItemBased Reduce 1
  • 42. Map / Reduce ItemBased Map 2
  • 43. Map / Reduce ItemBased Reduce 2
  • 44. Map / Reduce ItemBased
  • 45. Map / Reduce.. Why in Mahout
  • 46. Clustering - KMeans Clustering (SM,MR) - Fuzzy kMeans (SM,MR) - Canopy Clustering (SM,MR) - Dirichlet (SM,MR)
  • 47. Kmeans
  • 48. Kmeans
  • 49. Clustering Evaluation
  • 50. Clustering Intra Distance
  • 51. Clustering Inter Distance
  • 52. Clustering.. Why in Mahout - Sparsity - ~10m of 11m users registered 1 Sony product
  • 53. Clustering.. Why in Mahout - Group Recommendation - Cluster Based Recommendation
  • 54. Create WishList Experience - Mahout (SVD) - Play - Heroku - MongoLab - Rest http://recommenderplaybbs.herokuapp.com/
  • 55. Thank you

×