Indic threads pune12-recommenders-apache-mahout
Upcoming SlideShare
Loading in...5
×
 

Indic threads pune12-recommenders-apache-mahout

on

  • 675 views

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

Statistics

Views

Total Views
675
Views on SlideShare
675
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Indic threads pune12-recommenders-apache-mahout Indic threads pune12-recommenders-apache-mahout Presentation Transcript

    • How to Build a RecommendationEngine Using Apache MahoutViraj ParipatyadarGS Lab
    • Contents• A recommendation problem• What is a recommender• Building a recommender using Mahout • Tips and tweaks• Recommender considerations 2
    • A book store• Sells books: • By various authors • Of various categories • On different subjects • From various publishers• Readers/buyers are asked to rate• Readers/buyers can provide reviews You walk into the store (buy something for a friend)
    • The store owner• Asks you what: • your friend reads (already owns) • your friend usually likes more• Has data on what: • his customers buy • his customers rate and review• Uses a few strategies
    • 1 - Find similar booksDepending on which books your friend has, pickbooks:• by the same author• on the same/similar subject/s• in the same category• from the same publication (those with highest sales numbers)
    • 2 - Find books with similar readership• Define some similarity • e.g. two books are as similar as the number of readers rating both of them• Define some limit of relevance • e.g. only consider books which are more than 4 readers similar• Look for all books which are similar to books your friend owns Pick books from this set that you friend doesn’t own
    • 3 - Find people with similar tastes• Define some similarity • e.g. two people are as similar as the number of books they like from the same category• Define some limit of relevance • e.g. only consider the 3 top people when ordered according to how similar they are to your friend• Look for users similar to your friend and see what they read Pick books which these people like and your friend doesn’t own
    • Example data 1,101,5.0 3,101,2.5 4,106,4.0 1,102,3.0 3,104,4.0 5,101,4.0 1,103,2.5 3,105,4.5 5,102,3.0 2,101,2.0 3,107,5.0 5,103,2.0 2,102,2.5 4,101,5.0 5,104,4.0 2,103,5.0 4,103,3.0 5,105,3.5 2,104,2.0 4,104,4.5 5,106,4.0• Your friend owns three books: • Gave 5 stars to book 101 (likes hugely and talks about it all the time) • Gave 3 stars to book 102 (has shown some liking to it) • Gave 2.5 stars to book 103 (has read it, but didn’t say bad things about it) Now, we need to recommend for your friend books he hasn’t seen
    • A pictorial representation 1 5 3 101 102 103 104 105 106 107 2 4
    • Visualize 1 5 3 101 102 103 104 105 106 107 2 4
    • A (slightly) bigger example 1,101,5.0 3,111,2.5 6,103,2.0 1,102,3.0 4,101,5.0 6,106,4.0 1,103,2.5 4,103,3.0 6,113,3.0 1,109,3.5 4,104,4.5 6,115,5.0 1,112,4.0 4,106,4.0 7,103,4.5 2,101,2.0 4,109,2.0 7,104,2.5 2,102,2.5 4,111,2.5 7,108,4.0 2,103,5.0 5,101,4.0 7,109,3.5 2,104,2.0 5,102,3.0 7,110,3.5 2,107,4.5 5,103,2.0 7,112,2.5 2,113,3.5 5,104,4.0 8,101,2.0 3,101,2.5 5,105,3.5 8,105,4.0 3,104,4.0 5,106,4.0 8,106,4.5 3,105,4.5 5,109,3.0 8,110,3.0 3,107,5.0 5,112,4.0 8,114,5.0 3,115,4.0 6,101,4.5 8,115,3.5
    • A pictorial representation 1 2 3 4101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 5 6 7 8 Clearly, not a viable option
    • Mahout to the rescue
    • What is Apache Mahout• Apache Mahout • A machine learning library • Works with Apache Hadoop• Use cases: • Recommenders • Clustering • Classification
    • Recommenders in Mahout• Recommenders use data culled from user behavior• Recommending using Mahout • Similarity between users or items • Expressed as a number between 0-1 • Neighborhood of users/items • Recommendation using this info and an algorithm • Generic • Specialized
    • Similarity• Various algorithms: • Euclidean distance • Pearson correlation • Cosine measure • Spearman correlation • Tanimoto coefficient • Log-likelyhood• Effectiveness dependent on the input data• Influences running time and memory
    • Neighborhood• Nearest N neighborhood (say, 4): 5 3 4 U 2 1• Threshold neighborhood (say, > 0.8): 5 3 4 U 2 1
    • Recommender• Recommenders • Generic recommender • User based • Item based • Slope-one recommender • Singular Value Decomposition based • Liner Interpolation based • Cluster-based• Recommender rescorer• Recommender evaluator
    • A real-life Web application• News aggregator-cum-reader • Fetches news from a news service • Shows the news in a uniform UI • Lets readers read, like/dislike and comment on news • Link social networks and share• Make this a personalized newspaper • Track user actions • Derive and store preferences • Generate recommendations • Leverage social accounts, etc.
    • Overall design Third party User, application REST data (MySQL) applications News Phone/tablet Controller REST aggregation, stora applications API (REST) ge (Hbase) Preferences, Reco REST mmender Web application (Mahout)
    • Recommender REST service Recommender Fetch recommendations (offline, run REST Input user actions periodically) (Grizzly, Tomcat) Input Database table dump MySQL
    • How to extract data – one dimension News article readership10000 4299 1000 511 128 100 51 News article readership 13 10 4 4 2 1 1 1 2 3 4 5 6 7 8 9 Number of News Articles
    • How to extract data – add dimensions10000 1000 100 News article readership Topic 10 readership 1 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 42 44 46 51 57 Number of News articles / Topics
    • How more data helps40353025 No. of readers with x articles20 each No. of readers15 with x topics each105 1 20 0 100 200 300 400 500 600 700 800 Number of news articles/topics
    • How more data helps9876 No. of readers5 with x articles each4 No. of readers3 with x topics each210 5 25 45 65 85 Number of news articles/topics
    • How more data helps3.5 32.5 No. of readers 2 with x articles each1.5 No. of readers with x topics each 10.5 0 95 145 195 245 295 345 395 Number of news articles/topics
    • Learnings• Know thy user • Frequency of visits • Preference logic wrt user• Know thy items • Should have enough items per user • Maximize items per action • Should have enough intersections • Should not be transient• Use tweaking abilities• Sharpen the saw
    • Questions ?
    • Thank you viraj@gslab.comviraj.paripatyadar@gmail.com