Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 07 June 2016 v1.1
1. Recommender Systems
meet Linked Open Data
Tommaso Di Noia
16th International Conference on Web Engineering June 7th, 2016
tommaso.dinoia@poliba.it
@TommasoDiNoia
2. Agenda
• Linked Open Data
• What is a Recommender System and how
does it work?
• Evaluating a Recommender System
• Recommender Systems and Linked Open Data
6. Web vs Linked Data
Web Linked Data
Analogy File System Database
Designed for Men Machines
(Software Agents)
Main elements Documents Things
Links between Documents Things
Semantics Implicit Explicit
Courtesy of Prof. Enrico Motta, The Open University, Milton Keynes – Uk – Semantic Web: Technologies and Applications.
29. The rating matrix
5 1 2 4 3 ?
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The Matrix
Titanic
I love shopping
Argo
Love Actually
The hangover
Tommaso
Francesco
Vittoria
Jessica
Paolo
30. The rating matrix
(in the real world)
5 ? ? 4 3 ?
2 4 5 ? 5 ?
? 3 ? 4 ? 3
3 5 ? 5 2 ?
4 ? 5 ? 5 2
The Matrix
Titanic
I love shopping
Argo
Love Actually
The hangover
Tommaso
Francesco
Vittoria
Jessica
Paolo
38. Collaborative Filtering
• Memory-based
– Mainly based on k-NN
– Does not requireany preliminary model building
phase
• Model-based
– Learn a predictive model beforecomputing
recommendations
40. k-Nearest Neighbors
k = 5
N
A neighborhood of 20 to 50 neighbors is a reasonable choice
[Herlocker et al. An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms, Information
Retrieval 5 (2002), no. 4, 287–310.]
43. Content-Based RS
• Items are described in terms of
attributes/features
• A finite set of values is associated to each
feature
• Item representation is a (Boolean) vector
47. Hybrid recommender systems
[Robin D. Burke. Hybrid recommender systems: Survey and experiments. User Model. User-Adapt. Interact., 12(4):331–370, 2002.]
Weighted
The scores (or votes) of several recommendation
techniques are combined together to produce a single
recommendation.
Switching
The system switches between recommendation
techniques depending on the current situation.
Mixed
Recommendations from several different
recommenders are presented at the same time
Feature combination
Features from different recommendation data sources
are thrown together into a single recommendation
algorithm.
Cascade One recommender refines the recommendations
given by another.
Feature augmentation Output from one technique is used as an input feature
to another.
Meta-level
The model learned by one recommender is used as
input to another.
50. Protocols
• Rated test-items
• All unrated items: compute a score for every
item not rated by the user (also items not
appearing in the user test set)
52. MAE and RMSE drawback
• Not very suitable for top-N recommendation
– Errorsin the highest part of the recommendation
list are considered in the same way as the ones in
the lowest part
53. Accuracy metrics for top-N
recommendation
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @ 𝑁
𝑃5@𝑁 =
|𝐿5 𝑁 ∩ 𝑇𝑆5
o
|
𝑁
𝑅𝑒𝑐𝑎𝑙𝑙 @ 𝑁
𝑅5@𝑁 =
|𝐿5 𝑁 ∩ 𝑇𝑆5
o
|
|𝑇𝑆5
o
|
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝐺𝑎𝑖𝑛 @ 𝑁
𝑛𝐷𝐶𝐺5@𝑁 =
1
𝐼𝐷𝐶𝐺@𝑁
d
2wW,x − 1
logQ(1 + 𝑘)
,
|}%
𝐿5 𝑁 is the recommendation list
up to the N-th element
𝑇𝑆5
o
is the set of relevant test
items for 𝑢
𝐼𝐷𝐶𝐺@𝑁 indicates the score
Obtained by an ideal ranking of 𝐿5 𝑁
55. Is all about precision?
• Novelty
– Recommend items in the long tail
• Diversity
– Avoid to recommend only items in a small subset
of the catalog
– Suggest diverse items in the recommendation list
• Serendipity
– Suggest unexpected but interesting items
62. Traditional Content-based RSs
• Base on keyword/attribute -based item
representations
• Rely on the quality of the content-analyzer to
extract expressive item features
• Lack of knowledge about the items
129. Explicit computation of the feature map
# edges involving 𝑒º at l hops from 𝑥J
a.k.a. frequency of the entityin the
item neighborhood graph
factor takinginto account at which hop the entity appears
h-hop Item Entity-based
Neighborhood Graph Kernel
𝑤<M,•¾
= d 𝛼Œ ⋅ 𝑐‰ÀÁ
<M ,•¾
Â
Œ}%
𝑘»¼ 𝑥J, 𝑥K = 𝜙»¼ 𝑥J , 𝜙»¼ 𝑥K
𝜙»¼ 𝑥J = (𝑤<M,•¶
, 𝑤<M,•½
, …, 𝑤<M,•¾
,… , 𝑤<M,•¿
)
144. Implementation
• LODreclib – a Java library to build a LOD based
recommendersystem
https://github.com/sisinflab/lodreclib
• Cinemappy (currently for iOS only) – a
context-awaremobile recommender system
https://itunes.apple.com/it/app/cinemappy/id6
81762350?mt=8
148. Open issues
• Generalize to graph pattern extraction to represent
features
• Automatically select the triples related to the domain
of interest
• Automatically select meaningful properties to
represent items
• Analysis with respect to «knowledge coverage» of the
dataset
– What is the best approach?
• Cross-domain recommendation
• More graph-based similarity/relatedness metrics
149. Does the LOD dataset selection
matter?
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio. Content-based recommendations via DBpedia and Freebase: a case study
in the music domain. The 14th International Semantic Web Conference - ISWC 2015
150. Conclusions
• Linked Open Data to enrich the content descriptions of
item
• Exploit different characteristcs of the semantic network
to represent/learn features
• Improved accuracy
• Improved novelty
• Improved Aggregate Diversity
• Entity linking for a better expoitation of text-based data
• Select the right approach, dataset, set of properties to
build your RS
151. Not covered here
• User profile
• Preferences
• Context-aware
• Knowledge-based approaches
• Cross-domain
• Feature selection
• …