Recommendation engines : Matching items to users

Jobin Wilson
jobin.wilson@flytxt.com

Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011

Who am I ?
• Architect @ Flytxt (Big Data Analytics & Automation)

• Passionate about data, distributed computing , machine learning

• Previously

•Virtualization & Cloud Lifecycle Management(BMC)

• Designed and Implemented Cloud Life Cycle Management Interface@BMC

• Large Scale Data Centre Automation(AOL)

• Implemented Centralized Data Center Management Framework for AOL

•Workflow Systems & Automation (Accenture)

• Implemented Service Management Suit for various customers

Session Agenda!

• Recommendation Engines – What's the big deal?

• Conceptual Overview

• Collaborative Filtering

• Engineering Challenges

• Apache Mahout

• Getting your recommender to production

• Q&A

3

Big deal? Advertisers

Recommend Best Ads
Ads

Content

Users
Ad
Network

Content Publishers
ML Algorithms
User Behavior Modelling
Maximization Criteria

BTW, What was the challenge?
User Base : 2 billion+ users world wide

Content Base : 12.51 billion+ indexed pages

Advertiser Base : millions of active advertisers

Real-time nature : Responses in < 200 ms

Multi –objective optimization problem

Noisy Data

Recommendation Engines: Overview
A specific type of information filtering system
technique that attempts to recommend information
items or social elements that are likely to be of interest
to the user.

Technologies that can help us sift through all the
available information to predict products or services
that could be interesting to us.

Applying knowledge discovery techniques to the
problem of making personalized recommendations for
information, products or services, usually during a live
interaction.

We need a crystal ball to predict ?
We all have opinions/tastes which we express as our likes or dislikes.

Our tastes follow some patterns.

We tend to like things which are similar to things which we already
like(e.g. Songs)

We tend to like things which are liked by people who are similar to
us(e.g. Movies)

From fancy research to mainstream

Collaborative Filtering
Problem : We have U users and I items in the system, a user Uk need to
be recommended with a set of m items which are yet un-picked by him
which he might be interested in picking up.

Solution :

Maintain a database of users’ ratings of a variety of items.

For a given user, find other similar users whose ratings strongly
correlate with the current user - User Neighborhood

Recommend items rated highly by these similar users, but not rated by
the current user.

E.g. Amazon, Filpkart etc

Utility Matrix
Matrix of values representing each user’s level of affinity to each item.
Sparse matrix

Recommendation engine needs to predict the values for the empty cells
based on available cell values

Denser the matrix, better the quality of recommendation

User | Item i1 i2 i3 i4 i5
u1 r12 r14 r15
u2 r21 r22 r25
u3 r32 r34
u4 r43 r45

Engineering Challenges
Massive Data Volume : how do I deal with TBs of raw data to build my
recommendations?

Hadoop and Map-Reduce shines!

How can I make it work in ‘Real-Time’ ?

Batch pre-compute and store in HBase could help!

Will my solution scale? soon my user base is going to double!.

Sure, you can make it scale!

Engineering Challenges

Do I need a cloud based infrastructure?

Depends!

Hadoop compatible Machine Learning library?

Mahout would help!

How can I represent/transform my input data appropriately?

Pig/Hive might help!, if not ,map-reduce is always there!

Apache Mahout Overview
Scalable machine learning library

core algorithms for clustering, classification and batch based
collaborative filtering implemented over Hadoop

Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA
etc

Vibrant community support.

Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on)

mahout-dev-subscribe@apache.org

Taking Recommendation Engines to production

Analyzing the input data, what kind of info I can collect from users

Selecting the appropriate recommender (e.g. user based, Item based )

Strategy to recommend to anonymous users(or first time users)

Strategy for distributed computing, modeling the problem as map-
reduce

Choosing the deployment model

Monitoring the system

Conclusion

Very popular field of research and implementation

More and more products and services are leveraging the concept

From fancy research to live production systems at scale

Making peoples lives easier by assisting in making decisions

Some more concepts.…

Concept of similarity – distance measure etc

Pearson Correlation

User neighborhood computation

Recommendation engines : Matching items to users

More Related Content

Viewers also liked

Similar to Recommendation engines : Matching items to users

Recently uploaded

Recommendation engines : Matching items to users