Your SlideShare is downloading. ×
0
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Indic threads pune12-recommenders-apache-mahout

443

Published on

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
443
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. How to Build a RecommendationEngine Using Apache MahoutViraj ParipatyadarGS Lab
  • 2. Contents• A recommendation problem• What is a recommender• Building a recommender using Mahout • Tips and tweaks• Recommender considerations 2
  • 3. A book store• Sells books: • By various authors • Of various categories • On different subjects • From various publishers• Readers/buyers are asked to rate• Readers/buyers can provide reviews You walk into the store (buy something for a friend)
  • 4. The store owner• Asks you what: • your friend reads (already owns) • your friend usually likes more• Has data on what: • his customers buy • his customers rate and review• Uses a few strategies
  • 5. 1 - Find similar booksDepending on which books your friend has, pickbooks:• by the same author• on the same/similar subject/s• in the same category• from the same publication (those with highest sales numbers)
  • 6. 2 - Find books with similar readership• Define some similarity • e.g. two books are as similar as the number of readers rating both of them• Define some limit of relevance • e.g. only consider books which are more than 4 readers similar• Look for all books which are similar to books your friend owns Pick books from this set that you friend doesn’t own
  • 7. 3 - Find people with similar tastes• Define some similarity • e.g. two people are as similar as the number of books they like from the same category• Define some limit of relevance • e.g. only consider the 3 top people when ordered according to how similar they are to your friend• Look for users similar to your friend and see what they read Pick books which these people like and your friend doesn’t own
  • 8. Example data 1,101,5.0 3,101,2.5 4,106,4.0 1,102,3.0 3,104,4.0 5,101,4.0 1,103,2.5 3,105,4.5 5,102,3.0 2,101,2.0 3,107,5.0 5,103,2.0 2,102,2.5 4,101,5.0 5,104,4.0 2,103,5.0 4,103,3.0 5,105,3.5 2,104,2.0 4,104,4.5 5,106,4.0• Your friend owns three books: • Gave 5 stars to book 101 (likes hugely and talks about it all the time) • Gave 3 stars to book 102 (has shown some liking to it) • Gave 2.5 stars to book 103 (has read it, but didn’t say bad things about it) Now, we need to recommend for your friend books he hasn’t seen
  • 9. A pictorial representation 1 5 3 101 102 103 104 105 106 107 2 4
  • 10. Visualize 1 5 3 101 102 103 104 105 106 107 2 4
  • 11. A (slightly) bigger example 1,101,5.0 3,111,2.5 6,103,2.0 1,102,3.0 4,101,5.0 6,106,4.0 1,103,2.5 4,103,3.0 6,113,3.0 1,109,3.5 4,104,4.5 6,115,5.0 1,112,4.0 4,106,4.0 7,103,4.5 2,101,2.0 4,109,2.0 7,104,2.5 2,102,2.5 4,111,2.5 7,108,4.0 2,103,5.0 5,101,4.0 7,109,3.5 2,104,2.0 5,102,3.0 7,110,3.5 2,107,4.5 5,103,2.0 7,112,2.5 2,113,3.5 5,104,4.0 8,101,2.0 3,101,2.5 5,105,3.5 8,105,4.0 3,104,4.0 5,106,4.0 8,106,4.5 3,105,4.5 5,109,3.0 8,110,3.0 3,107,5.0 5,112,4.0 8,114,5.0 3,115,4.0 6,101,4.5 8,115,3.5
  • 12. A pictorial representation 1 2 3 4101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 5 6 7 8 Clearly, not a viable option
  • 13. Mahout to the rescue
  • 14. What is Apache Mahout• Apache Mahout • A machine learning library • Works with Apache Hadoop• Use cases: • Recommenders • Clustering • Classification
  • 15. Recommenders in Mahout• Recommenders use data culled from user behavior• Recommending using Mahout • Similarity between users or items • Expressed as a number between 0-1 • Neighborhood of users/items • Recommendation using this info and an algorithm • Generic • Specialized
  • 16. Similarity• Various algorithms: • Euclidean distance • Pearson correlation • Cosine measure • Spearman correlation • Tanimoto coefficient • Log-likelyhood• Effectiveness dependent on the input data• Influences running time and memory
  • 17. Neighborhood• Nearest N neighborhood (say, 4): 5 3 4 U 2 1• Threshold neighborhood (say, > 0.8): 5 3 4 U 2 1
  • 18. Recommender• Recommenders • Generic recommender • User based • Item based • Slope-one recommender • Singular Value Decomposition based • Liner Interpolation based • Cluster-based• Recommender rescorer• Recommender evaluator
  • 19. A real-life Web application• News aggregator-cum-reader • Fetches news from a news service • Shows the news in a uniform UI • Lets readers read, like/dislike and comment on news • Link social networks and share• Make this a personalized newspaper • Track user actions • Derive and store preferences • Generate recommendations • Leverage social accounts, etc.
  • 20. Overall design Third party User, application REST data (MySQL) applications News Phone/tablet Controller REST aggregation, stora applications API (REST) ge (Hbase) Preferences, Reco REST mmender Web application (Mahout)
  • 21. Recommender REST service Recommender Fetch recommendations (offline, run REST Input user actions periodically) (Grizzly, Tomcat) Input Database table dump MySQL
  • 22. How to extract data – one dimension News article readership10000 4299 1000 511 128 100 51 News article readership 13 10 4 4 2 1 1 1 2 3 4 5 6 7 8 9 Number of News Articles
  • 23. How to extract data – add dimensions10000 1000 100 News article readership Topic 10 readership 1 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 42 44 46 51 57 Number of News articles / Topics
  • 24. How more data helps40353025 No. of readers with x articles20 each No. of readers15 with x topics each105 1 20 0 100 200 300 400 500 600 700 800 Number of news articles/topics
  • 25. How more data helps9876 No. of readers5 with x articles each4 No. of readers3 with x topics each210 5 25 45 65 85 Number of news articles/topics
  • 26. How more data helps3.5 32.5 No. of readers 2 with x articles each1.5 No. of readers with x topics each 10.5 0 95 145 195 245 295 345 395 Number of news articles/topics
  • 27. Learnings• Know thy user • Frequency of visits • Preference logic wrt user• Know thy items • Should have enough items per user • Maximize items per action • Should have enough intersections • Should not be transient• Use tweaking abilities• Sharpen the saw
  • 28. Questions ?
  • 29. Thank you viraj@gslab.comviraj.paripatyadar@gmail.com

×