SlideShare a Scribd company logo
1 of 47
1
Recommender Systems
Collaborative Filtering &
Content-Based Recommending
2
Recommender Systems
• Systems for recommending items (e.g. books,
movies, CD’s, web pages, newsgroup messages)
to users based on examples of their preferences.
• Many websites provide recommendations (e.g.
Amazon, NetFlix, Pandora).
• Recommenders have been shown to substantially
increase sales at on-line stores.
• There are two basic approaches to recommending:
– Collaborative Filtering (a.k.a. social filtering)
– Content-based
3
Book Recommender
Red
Mars
Juras-
sic
Park
Lost
World
2001
Found
ation
Differ-
ence
Engine
Machine
Learning
User
Profile
Neuro-
mancer
2010
4
Personalization
• Recommenders are instances of personalization
software.
• Personalization concerns adapting to the individual
needs, interests, and preferences of each user.
• Includes:
– Recommending
– Filtering
– Predicting (e.g. form or calendar appt. completion)
• From a business perspective, it is viewed as part of
Customer Relationship Management (CRM).
5
Machine Learning and Personalization
• Machine Learning can allow learning a user
model or profile of a particular user based
on:
– Sample interaction
– Rated examples
• This model or profile can then be used to:
– Recommend items
– Filter information
– Predict behavior
6
Collaborative Filtering
• Maintain a database of many users’ ratings of a
variety of items.
• For a given user, find other similar users whose
ratings strongly correlate with the current user.
• Recommend items rated highly by these similar
users, but not rated by the current user.
• Almost all existing commercial recommenders use
this approach (e.g. Amazon).
7
Collaborative Filtering
A 9
B 3
C
: :
Z 5
A
B
C 9
: :
Z 10
A 5
B 3
C
: :
Z 7
A
B
C 8
: :
Z
A 6
B 4
C
: :
Z
A 10
B 4
C 8
. .
Z 1
User
Database
Active
User
Correlation
Match
A 9
B 3
C
. .
Z 5
A 9
B 3
C
: :
Z 5
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
8
Collaborative Filtering Method
• Weight all users with respect to similarity
with the active user.
• Select a subset of the users (neighbors) to
use as predictors.
• Normalize ratings and compute a prediction
from a weighted combination of the
selected neighbors’ ratings.
• Present items with highest predicted ratings
as recommendations.
9
Similarity Weighting
• Typically use Pearson correlation coefficient between
ratings for active user, a, and another user, u.
u
a r
r
u
a
u
a
r
r
c


)
,
(
covar
, 
ra and ru are the ratings vectors for the m items rated by
both a and u
ri,j is user i’s rating for item j
10
Covariance and Standard Deviation
• Covariance:
• Standard Deviation:
m
r
r
r
r
r
r
m
i
u
i
u
a
i
a
u
a




 1
,
, )
)(
(
)
,
(
covar
m
r
r
m
i
i
x
x


 1
,
m
r
r
m
i
x
i
x
rx



 1
2
, )
(

11
Significance Weighting
• Important not to trust correlations based on
very few co-rated items.
• Include significance weights, sa,u, based on
number of co-rated items, m.
u
a
u
a
u
a c
s
w ,
,
, 













50
if
50
50
if
1
, m
m
m
s u
a
12
Neighbor Selection
• For a given active user, a, select correlated
users to serve as source of predictions.
• Standard approach is to use the most similar
n users, u, based on similarity weights, wa,u
• Alternate approach is to include all users
whose similarity weight is above a given
threshold.
13
Rating Prediction
• Predict a rating, pa,i, for each item i, for active user, a,
by using the n selected neighbor users, u  {1,2,…n}.
• To account for users different ratings levels, base
predictions on differences from a user’s average rating.
• Weight users’ ratings contribution by their similarity to
the active user.






 n
u
u
a
n
u
u
i
u
u
a
a
i
a
w
r
r
w
r
p
1
,
1
,
,
,
)
(
14
Problems with Collaborative Filtering
• Cold Start: There needs to be enough other users
already in the system to find a match.
• Sparsity: If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to find
users that have rated the same items.
• First Rater: Cannot recommend an item that has
not been previously rated.
– New items
– Esoteric items
• Popularity Bias: Cannot recommend items to
someone with unique tastes.
– Tends to recommend popular items.
15
Content-Based Recommending
• Recommendations are based on information on the
content of items rather than on other users’
opinions.
• Uses a machine learning algorithm to induce a
profile of the users preferences from examples
based on a featural description of content.
• Some previous applications:
– Newsweeder (Lang, 1995)
– Syskill and Webert (Pazzani et al., 1996)
16
Advantages of Content-Based Approach
• No need for data on other users.
– No cold-start or sparsity problems.
• Able to recommend to users with unique tastes.
• Able to recommend new and unpopular items
– No first-rater problem.
• Can provide explanations of recommended
items by listing content-features that caused an
item to be recommended.
17
Disadvantages of Content-Based Method
• Requires content that can be encoded as
meaningful features.
• Users’ tastes must be represented as a
learnable function of these content features.
• Unable to exploit quality judgments of other
users.
– Unless these are somehow included in the
content features.
18
LIBRA
Learning Intelligent Book Recommending Agent
• Content-based recommender for books using
information about titles extracted from Amazon.
• Uses information extraction from the web to
organize text into fields:
– Author
– Title
– Editorial Reviews
– Customer Comments
– Subject terms
– Related authors
– Related titles
19
LIBRA System
Amazon Pages
Rated
Examples
User Profile
Machine Learning
Learner
Information
Extraction
LIBRA
Database
Recommendations
1.~~~~~~
2.~~~~~~~
3.~~~~~
:
:
: Predictor
20
Sample Amazon Page
Age of Spiritual Machines
21
Sample Extracted Information
Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence>
Author: <Ray Kurzweil>
Price: <11.96>
Publication Date: <January 2000>
ISBN: <0140282025>
Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind>
Author: <Hans Moravec> >
…
Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> >
…
Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> >
…
Related Authors: <Hans P. Moravec> <K. Eric Drexler>…
Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
22
Libra Content Information
• Libra uses this extracted information to
form “bags of words” for the following
slots:
– Author
– Title
– Description (reviews and comments)
– Subjects
– Related Titles
– Related Authors
23
Libra Overview
• User rates selected titles on a 1 to 10 scale.
• Libra uses a naïve Bayesian text-categorization
algorithm to learn a profile from these rated
examples.
– Rating 6–10: Positive
– Rating 1–5: Negative
• The learned profile is used to rank all other books as
recommendations based on the computed posterior
probability that they are positive.
• User can also provide explicit positive/negative
keywords, which are used as priors to bias the role
of these features in categorization.
24
Bayesian Categorization in LIBRA
• Model is generalized to generate a vector of bags
of words (one bag for each slot).
– Instances of the same word in different slots are treated
as separate features:
• “Chrichton” in author vs. “Chrichton” in description
• Training examples are treated as weighted positive
or negative examples when estimating conditional
probability parameters:
– An example with rating 1  r  10 is given:
positive probability: (r – 1)/9
negative probability: (10 – r)/9
25
Implementation
• Stopwords removed from all bags.
• A book’s title and author are added to its own
related title and related author slots.
• All probabilities are smoothed using Laplace
estimation to account for small sample size.
• Lisp implementation is quite efficient:
– Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs
– Test: 200 books per second
26
Explanations of Profiles and
Recommendations
• Feature strength of word wk appearing in a
slot sj :
)
s
negative,
|
(
)
,
positive
|
(
log
)
,
(
strength
j
k
j
k
j
k
w
P
s
w
P
s
w 
27
Libra Demo
http://www.cs.utexas.edu/users/libra
28
Experimental Data
• Amazon searches were used to find books
in various genres.
• Titles that have at least one review or
comment were kept.
• Data sets:
– Literature fiction: 3,061 titles
– Mystery: 7,285 titles
– Science: 3,813 titles
– Science Fiction: 3.813 titles
29
Rated Data
• 4 users rated random examples within a
genre by reviewing the Amazon pages
about the title:
– LIT1 936 titles
– LIT2 935 titles
– MYST 500 titles
– SCI 500 titles
– SF 500 titles
30
Experimental Method
• 10-fold cross-validation to generate learning curves.
• Measured several metrics on independent test data:
– Precision at top 3: % of the top 3 that are positive
– Rating of top 3: Average rating assigned to top 3
– Rank Correlation: Spearman’s, rs, between system’s and
user’s complete rankings.
• Test ablation of related author and related title slots
(LIBRA-NR).
– Test influence of information generated by Amazon’s
collaborative approach.
31
Experimental Result Summary
• Precision at top 3 is fairly consistently in the
90’s% after only 20 examples.
• Rating of top 3 is fairly consistently above 8 after
only 20 examples.
• All results are always significantly better than
random chance after only 5 examples.
• Rank correlation is generally above 0.3 (moderate)
after only 10 examples.
• Rank correlation is generally above 0.6 (high)
after 40 examples.
32
Precision at Top 3 for Science
33
Rating of Top 3 for Science
34
Rank Correlation for Science
35
User Studies
• Subjects asked to use Libra and get
recommendations.
• Encouraged several rounds of feedback.
• Rated all books in final list of
recommendations.
• Selected two books for purchase.
• Returned reviews after reading selections.
• Completed questionnaire about the system.
36
Combining Content and Collaboration
• Content-based and collaborative methods have
complementary strengths and weaknesses.
• Combine methods to obtain the best of both.
• Various hybrid approaches:
– Apply both methods and combine recommendations.
– Use collaborative data as content.
– Use content-based predictor as another collaborator.
– Use content-based predictor to complete
collaborative data.
37
Movie Domain
• EachMovie Dataset [Compaq Research Labs]
– Contains user ratings for movies on a 0–5 scale.
– 72,916 users (avg. 39 ratings each).
– 1,628 movies.
– Sparse user-ratings matrix – (2.6% full).
• Crawled Internet Movie Database (IMDb)
– Extracted content for titles in EachMovie.
• Basic movie information:
– Title, Director, Cast, Genre, etc.
• Popular opinions:
– User comments, Newspaper and Newsgroup reviews, etc.
38
Content-Boosted Collaborative Filtering
IMDb
EachMovie Web Crawler
Movie
Content
Database
Full User
Ratings Matrix
Collaborative
Filtering
Active
User Ratings
User Ratings
Matrix (Sparse)
Content-based
Predictor
Recommendations
39
Content-Boosted CF - I
Content-Based
Predictor
Training Examples
Pseudo User-ratings Vector
Items with Predicted Ratings
User-ratings Vector
User-rated Items
Unrated Items
40
Content-Boosted CF - II
• Compute pseudo user ratings matrix
– Full matrix – approximates actual full user ratings matrix
• Perform CF
– Using Pearson corr. between pseudo user-rating vectors
User Ratings
Matrix
Pseudo User
Ratings Matrix
Content-Based
Predictor
41
Experimental Method
• Used subset of EachMovie (7,893 users; 299,997
ratings)
• Test set: 10% of the users selected at random.
– Test users that rated at least 40 movies.
– Train on the remainder sets.
• Hold-out set: 25% items for each test user.
– Predict rating of each item in the hold-out set.
• Compared CBCF to other prediction approaches:
– Pure CF
– Pure Content-based
– Naïve hybrid (averages CF and content-based
predictions)
42
Metrics
• Mean Absolute Error (MAE)
– Compares numerical predictions with user ratings
• ROC sensitivity [Herlocker 99]
– How well predictions help users select high-quality
items
– Ratings  4 considered “good”; < 4 considered “bad”
• Paired t-test for statistical significance
43
Results - I
0.9
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
MAE
Algorithm
MAE
CF
Content
Naïve
CBCF
CBCF is significantly better (4% over CF) at (p < 0.001)
44
Results - II
0.58
0.6
0.62
0.64
0.66
0.68
ROC-4
Algorithm
ROC Sensitivity
CF
Content
Naïve
CBCF
CBCF outperforms rest (5% improvement over CF)
45
Active Learning
(Sample Section, Learning with Queries)
• Used to reduce the number of training
examples required.
• System requests ratings for specific items
from which it would learn the most.
• Several existing methods:
– Uncertainty sampling
– Committee-based sampling
46
Semi-Supervised Learning
(Weakly Supervised, Bootstrapping)
• Use wealth of unlabeled examples to aid
learning from a small amount of labeled data.
• Several recent methods developed:
– Semi-supervised EM (Expectation Maximization)
– Co-training
– Transductive SVM’s
47
Conclusions
• Recommending and personalization are
important approaches to combating
information over-load.
• Machine Learning is an important part of
systems for these tasks.
• Collaborative filtering has problems.
• Content-based methods address these
problems (but have problems of their own).
• Integrating both is best.

More Related Content

Similar to Recommenders.ppt

User personalization
User personalizationUser personalization
User personalizationSandeep Kath
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation EngineShravaniBheema
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Introduction to recommendation system
Introduction to recommendation systemIntroduction to recommendation system
Introduction to recommendation systemAravindharamanan S
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfAlphaIssaghaDiallo
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 

Similar to Recommenders.ppt (20)

User personalization
User personalizationUser personalization
User personalization
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Introduction to recommendation system
Introduction to recommendation systemIntroduction to recommendation system
Introduction to recommendation system
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
WORD
WORDWORD
WORD
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 

More from Aravind Reddy

Patient’s Condition Classification Using Drug Reviews.pptx
Patient’s Condition Classification Using Drug Reviews.pptxPatient’s Condition Classification Using Drug Reviews.pptx
Patient’s Condition Classification Using Drug Reviews.pptxAravind Reddy
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for developmentAravind Reddy
 
Tech Jobs Green Jobs- Deck . 24-11.pptx
Tech Jobs  Green Jobs- Deck . 24-11.pptxTech Jobs  Green Jobs- Deck . 24-11.pptx
Tech Jobs Green Jobs- Deck . 24-11.pptxAravind Reddy
 
The Normal Distribution.ppt
The Normal Distribution.pptThe Normal Distribution.ppt
The Normal Distribution.pptAravind Reddy
 
Princuiples of pimary data.ppt
Princuiples of pimary data.pptPrincuiples of pimary data.ppt
Princuiples of pimary data.pptAravind Reddy
 
Types of Primary and Secondary Sources.ppt
Types of Primary and Secondary Sources.pptTypes of Primary and Secondary Sources.ppt
Types of Primary and Secondary Sources.pptAravind Reddy
 
FILE MANAGEMENT1.ppt
FILE MANAGEMENT1.pptFILE MANAGEMENT1.ppt
FILE MANAGEMENT1.pptAravind Reddy
 
adminsitarative data data-57511556 (1).pptx
adminsitarative data data-57511556 (1).pptxadminsitarative data data-57511556 (1).pptx
adminsitarative data data-57511556 (1).pptxAravind Reddy
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.pptAravind Reddy
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxAravind Reddy
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 

More from Aravind Reddy (14)

Patient’s Condition Classification Using Drug Reviews.pptx
Patient’s Condition Classification Using Drug Reviews.pptxPatient’s Condition Classification Using Drug Reviews.pptx
Patient’s Condition Classification Using Drug Reviews.pptx
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Final ppt.pptx
Final ppt.pptxFinal ppt.pptx
Final ppt.pptx
 
Tech Jobs Green Jobs- Deck . 24-11.pptx
Tech Jobs  Green Jobs- Deck . 24-11.pptxTech Jobs  Green Jobs- Deck . 24-11.pptx
Tech Jobs Green Jobs- Deck . 24-11.pptx
 
The Normal Distribution.ppt
The Normal Distribution.pptThe Normal Distribution.ppt
The Normal Distribution.ppt
 
Princuiples of pimary data.ppt
Princuiples of pimary data.pptPrincuiples of pimary data.ppt
Princuiples of pimary data.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Types of Primary and Secondary Sources.ppt
Types of Primary and Secondary Sources.pptTypes of Primary and Secondary Sources.ppt
Types of Primary and Secondary Sources.ppt
 
FILE MANAGEMENT1.ppt
FILE MANAGEMENT1.pptFILE MANAGEMENT1.ppt
FILE MANAGEMENT1.ppt
 
adminsitarative data data-57511556 (1).pptx
adminsitarative data data-57511556 (1).pptxadminsitarative data data-57511556 (1).pptx
adminsitarative data data-57511556 (1).pptx
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.ppt
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
 
loan.docx
loan.docxloan.docx
loan.docx
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 

Recently uploaded

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 

Recently uploaded (20)

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 

Recommenders.ppt

  • 1. 1 Recommender Systems Collaborative Filtering & Content-Based Recommending
  • 2. 2 Recommender Systems • Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. • Many websites provide recommendations (e.g. Amazon, NetFlix, Pandora). • Recommenders have been shown to substantially increase sales at on-line stores. • There are two basic approaches to recommending: – Collaborative Filtering (a.k.a. social filtering) – Content-based
  • 4. 4 Personalization • Recommenders are instances of personalization software. • Personalization concerns adapting to the individual needs, interests, and preferences of each user. • Includes: – Recommending – Filtering – Predicting (e.g. form or calendar appt. completion) • From a business perspective, it is viewed as part of Customer Relationship Management (CRM).
  • 5. 5 Machine Learning and Personalization • Machine Learning can allow learning a user model or profile of a particular user based on: – Sample interaction – Rated examples • This model or profile can then be used to: – Recommend items – Filter information – Predict behavior
  • 6. 6 Collaborative Filtering • Maintain a database of many users’ ratings of a variety of items. • For a given user, find other similar users whose ratings strongly correlate with the current user. • Recommend items rated highly by these similar users, but not rated by the current user. • Almost all existing commercial recommenders use this approach (e.g. Amazon).
  • 7. 7 Collaborative Filtering A 9 B 3 C : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C : : Z A 10 B 4 C 8 . . Z 1 User Database Active User Correlation Match A 9 B 3 C . . Z 5 A 9 B 3 C : : Z 5 A 10 B 4 C 8 . . Z 1 Extract Recommendations C
  • 8. 8 Collaborative Filtering Method • Weight all users with respect to similarity with the active user. • Select a subset of the users (neighbors) to use as predictors. • Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. • Present items with highest predicted ratings as recommendations.
  • 9. 9 Similarity Weighting • Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. u a r r u a u a r r c   ) , ( covar ,  ra and ru are the ratings vectors for the m items rated by both a and u ri,j is user i’s rating for item j
  • 10. 10 Covariance and Standard Deviation • Covariance: • Standard Deviation: m r r r r r r m i u i u a i a u a      1 , , ) )( ( ) , ( covar m r r m i i x x    1 , m r r m i x i x rx     1 2 , ) ( 
  • 11. 11 Significance Weighting • Important not to trust correlations based on very few co-rated items. • Include significance weights, sa,u, based on number of co-rated items, m. u a u a u a c s w , , ,               50 if 50 50 if 1 , m m m s u a
  • 12. 12 Neighbor Selection • For a given active user, a, select correlated users to serve as source of predictions. • Standard approach is to use the most similar n users, u, based on similarity weights, wa,u • Alternate approach is to include all users whose similarity weight is above a given threshold.
  • 13. 13 Rating Prediction • Predict a rating, pa,i, for each item i, for active user, a, by using the n selected neighbor users, u  {1,2,…n}. • To account for users different ratings levels, base predictions on differences from a user’s average rating. • Weight users’ ratings contribution by their similarity to the active user.        n u u a n u u i u u a a i a w r r w r p 1 , 1 , , , ) (
  • 14. 14 Problems with Collaborative Filtering • Cold Start: There needs to be enough other users already in the system to find a match. • Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. • First Rater: Cannot recommend an item that has not been previously rated. – New items – Esoteric items • Popularity Bias: Cannot recommend items to someone with unique tastes. – Tends to recommend popular items.
  • 15. 15 Content-Based Recommending • Recommendations are based on information on the content of items rather than on other users’ opinions. • Uses a machine learning algorithm to induce a profile of the users preferences from examples based on a featural description of content. • Some previous applications: – Newsweeder (Lang, 1995) – Syskill and Webert (Pazzani et al., 1996)
  • 16. 16 Advantages of Content-Based Approach • No need for data on other users. – No cold-start or sparsity problems. • Able to recommend to users with unique tastes. • Able to recommend new and unpopular items – No first-rater problem. • Can provide explanations of recommended items by listing content-features that caused an item to be recommended.
  • 17. 17 Disadvantages of Content-Based Method • Requires content that can be encoded as meaningful features. • Users’ tastes must be represented as a learnable function of these content features. • Unable to exploit quality judgments of other users. – Unless these are somehow included in the content features.
  • 18. 18 LIBRA Learning Intelligent Book Recommending Agent • Content-based recommender for books using information about titles extracted from Amazon. • Uses information extraction from the web to organize text into fields: – Author – Title – Editorial Reviews – Customer Comments – Subject terms – Related authors – Related titles
  • 19. 19 LIBRA System Amazon Pages Rated Examples User Profile Machine Learning Learner Information Extraction LIBRA Database Recommendations 1.~~~~~~ 2.~~~~~~~ 3.~~~~~ : : : Predictor
  • 20. 20 Sample Amazon Page Age of Spiritual Machines
  • 21. 21 Sample Extracted Information Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence> Author: <Ray Kurzweil> Price: <11.96> Publication Date: <January 2000> ISBN: <0140282025> Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> Author: <Hans Moravec> > … Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > … Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > … Related Authors: <Hans P. Moravec> <K. Eric Drexler>… Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
  • 22. 22 Libra Content Information • Libra uses this extracted information to form “bags of words” for the following slots: – Author – Title – Description (reviews and comments) – Subjects – Related Titles – Related Authors
  • 23. 23 Libra Overview • User rates selected titles on a 1 to 10 scale. • Libra uses a naïve Bayesian text-categorization algorithm to learn a profile from these rated examples. – Rating 6–10: Positive – Rating 1–5: Negative • The learned profile is used to rank all other books as recommendations based on the computed posterior probability that they are positive. • User can also provide explicit positive/negative keywords, which are used as priors to bias the role of these features in categorization.
  • 24. 24 Bayesian Categorization in LIBRA • Model is generalized to generate a vector of bags of words (one bag for each slot). – Instances of the same word in different slots are treated as separate features: • “Chrichton” in author vs. “Chrichton” in description • Training examples are treated as weighted positive or negative examples when estimating conditional probability parameters: – An example with rating 1  r  10 is given: positive probability: (r – 1)/9 negative probability: (10 – r)/9
  • 25. 25 Implementation • Stopwords removed from all bags. • A book’s title and author are added to its own related title and related author slots. • All probabilities are smoothed using Laplace estimation to account for small sample size. • Lisp implementation is quite efficient: – Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs – Test: 200 books per second
  • 26. 26 Explanations of Profiles and Recommendations • Feature strength of word wk appearing in a slot sj : ) s negative, | ( ) , positive | ( log ) , ( strength j k j k j k w P s w P s w 
  • 28. 28 Experimental Data • Amazon searches were used to find books in various genres. • Titles that have at least one review or comment were kept. • Data sets: – Literature fiction: 3,061 titles – Mystery: 7,285 titles – Science: 3,813 titles – Science Fiction: 3.813 titles
  • 29. 29 Rated Data • 4 users rated random examples within a genre by reviewing the Amazon pages about the title: – LIT1 936 titles – LIT2 935 titles – MYST 500 titles – SCI 500 titles – SF 500 titles
  • 30. 30 Experimental Method • 10-fold cross-validation to generate learning curves. • Measured several metrics on independent test data: – Precision at top 3: % of the top 3 that are positive – Rating of top 3: Average rating assigned to top 3 – Rank Correlation: Spearman’s, rs, between system’s and user’s complete rankings. • Test ablation of related author and related title slots (LIBRA-NR). – Test influence of information generated by Amazon’s collaborative approach.
  • 31. 31 Experimental Result Summary • Precision at top 3 is fairly consistently in the 90’s% after only 20 examples. • Rating of top 3 is fairly consistently above 8 after only 20 examples. • All results are always significantly better than random chance after only 5 examples. • Rank correlation is generally above 0.3 (moderate) after only 10 examples. • Rank correlation is generally above 0.6 (high) after 40 examples.
  • 32. 32 Precision at Top 3 for Science
  • 33. 33 Rating of Top 3 for Science
  • 35. 35 User Studies • Subjects asked to use Libra and get recommendations. • Encouraged several rounds of feedback. • Rated all books in final list of recommendations. • Selected two books for purchase. • Returned reviews after reading selections. • Completed questionnaire about the system.
  • 36. 36 Combining Content and Collaboration • Content-based and collaborative methods have complementary strengths and weaknesses. • Combine methods to obtain the best of both. • Various hybrid approaches: – Apply both methods and combine recommendations. – Use collaborative data as content. – Use content-based predictor as another collaborator. – Use content-based predictor to complete collaborative data.
  • 37. 37 Movie Domain • EachMovie Dataset [Compaq Research Labs] – Contains user ratings for movies on a 0–5 scale. – 72,916 users (avg. 39 ratings each). – 1,628 movies. – Sparse user-ratings matrix – (2.6% full). • Crawled Internet Movie Database (IMDb) – Extracted content for titles in EachMovie. • Basic movie information: – Title, Director, Cast, Genre, etc. • Popular opinions: – User comments, Newspaper and Newsgroup reviews, etc.
  • 38. 38 Content-Boosted Collaborative Filtering IMDb EachMovie Web Crawler Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings User Ratings Matrix (Sparse) Content-based Predictor Recommendations
  • 39. 39 Content-Boosted CF - I Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings User-ratings Vector User-rated Items Unrated Items
  • 40. 40 Content-Boosted CF - II • Compute pseudo user ratings matrix – Full matrix – approximates actual full user ratings matrix • Perform CF – Using Pearson corr. between pseudo user-rating vectors User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor
  • 41. 41 Experimental Method • Used subset of EachMovie (7,893 users; 299,997 ratings) • Test set: 10% of the users selected at random. – Test users that rated at least 40 movies. – Train on the remainder sets. • Hold-out set: 25% items for each test user. – Predict rating of each item in the hold-out set. • Compared CBCF to other prediction approaches: – Pure CF – Pure Content-based – Naïve hybrid (averages CF and content-based predictions)
  • 42. 42 Metrics • Mean Absolute Error (MAE) – Compares numerical predictions with user ratings • ROC sensitivity [Herlocker 99] – How well predictions help users select high-quality items – Ratings  4 considered “good”; < 4 considered “bad” • Paired t-test for statistical significance
  • 44. 44 Results - II 0.58 0.6 0.62 0.64 0.66 0.68 ROC-4 Algorithm ROC Sensitivity CF Content Naïve CBCF CBCF outperforms rest (5% improvement over CF)
  • 45. 45 Active Learning (Sample Section, Learning with Queries) • Used to reduce the number of training examples required. • System requests ratings for specific items from which it would learn the most. • Several existing methods: – Uncertainty sampling – Committee-based sampling
  • 46. 46 Semi-Supervised Learning (Weakly Supervised, Bootstrapping) • Use wealth of unlabeled examples to aid learning from a small amount of labeled data. • Several recent methods developed: – Semi-supervised EM (Expectation Maximization) – Co-training – Transductive SVM’s
  • 47. 47 Conclusions • Recommending and personalization are important approaches to combating information over-load. • Machine Learning is an important part of systems for these tasks. • Collaborative filtering has problems. • Content-based methods address these problems (but have problems of their own). • Integrating both is best.