Recommendation engines matching items to users
Upcoming SlideShare
Loading in...5
×
 

Recommendation engines matching items to users

on

  • 2,313 views

Harnessing Hadoop for Big Data, Series II

Harnessing Hadoop for Big Data, Series II

Statistics

Views

Total Views
2,313
Views on SlideShare
2,313
Embed Views
0

Actions

Likes
0
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Recommendation engines matching items to users Recommendation engines matching items to users Presentation Transcript

  • Jobin Wilson jobin.wilson@flytxt.comCopyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011
  • Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface@BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers
  • Session Agenda!• Recommendation Engines – Whats the big deal?• Conceptual Overview• Collaborative Filtering• Engineering Challenges• Apache Mahout• Getting your recommender to production• Q&A 3 View slide
  • Whats the big deal? View slide
  • Ooh Ads too!
  • Big deal? Advertisers Recommend Best Ads Ads Content Users Ad Network Content Publishers ML Algorithms User Behavior Modelling Maximization Criteria
  • BTW, What was the challenge?User Base : 2 billion+ users world wideContent Base : 12.51 billion+ indexed pagesAdvertiser Base : millions of active advertisersReal-time nature : Responses in < 200 msMulti –objective optimization problemNoisy Data
  • Recommendation Engines: Overview A specific type of information filtering system technique that attempts to recommend information items or social elements that are likely to be of interest to the user. Technologies that can help us sift through all the available information to predict products or services that could be interesting to us. Applying knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction.
  • We need a crystal ball to predict ? We all have opinions/tastes which we express as our likes or dislikes. Our tastes follow some patterns. We tend to like things which are similar to things which we already like(e.g. Songs) We tend to like things which are liked by people who are similar to us(e.g. Movies) From fancy research to mainstream
  • Collaborative Filtering Problem : We have U users and I items in the system, a user Uk need to be recommended with a set of m items which are yet un-picked by him which he might be interested in picking up. Solution : Maintain a database of users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user - User Neighborhood Recommend items rated highly by these similar users, but not rated by the current user. E.g. Amazon, Filpkart etc
  • Utility Matrix Matrix of values representing each user’s level of affinity to each item. Sparse matrix Recommendation engine needs to predict the values for the empty cells based on available cell values Denser the matrix, better the quality of recommendation User | Item i1 i2 i3 i4 i5 u1 r12 r14 r15 u2 r21 r22 r25 u3 r32 r34 u4 r43 r45
  • Engineering Challenges Massive Data Volume : how do I deal with TBs of raw data to build my recommendations? Hadoop and Map-Reduce shines! How can I make it work in ‘Real-Time’ ? Batch pre-compute and store in HBase could help! Will my solution scale? soon my user base is going to double!. Sure, you can make it scale!
  • Engineering Challenges Do I need a cloud based infrastructure? Depends! Hadoop compatible Machine Learning library? Mahout would help! How can I represent/transform my input data appropriately? Pig/Hive might help!, if not ,map-reduce is always there!
  • Apache Mahout Overview Scalable machine learning library core algorithms for clustering, classification and batch based collaborative filtering implemented over Hadoop Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA etc Vibrant community support. Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on) mahout-dev-subscribe@apache.org
  • Taking Recommendation Engines to production Analyzing the input data, what kind of info I can collect from users Selecting the appropriate recommender (e.g. user based, Item based ) Strategy to recommend to anonymous users(or first time users) Strategy for distributed computing, modeling the problem as map- reduce Choosing the deployment model Monitoring the system
  • Conclusion Very popular field of research and implementation More and more products and services are leveraging the concept From fancy research to live production systems at scale Making peoples lives easier by assisting in making decisions
  • Some more concepts.… Concept of similarity – distance measure etc Pearson Correlation User neighborhood computation
  • THANK YOU Contact : jobin.wilson@flytxt.comhttp://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 18
  • http://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 19