Movie Recommendation with DBpedia
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio
3rd Italian Information Retrieval Workshop (IIR 2012) - Bari
January 26, 2012
In this paper we present MORE (acronym of MORE than MOvie REcommendation), a Facebook application that semantically recommends movies to the user leveraging the knowledge within Linked Data and the information elicited from her profile. MORE exploits the power of social knowledge bases (e.g. DBpedia) to detect semantic sim- ilarities among movies. These similarities are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets. Precision and recall experiments prove the validity of our ap- proach for movie recommendation. MORE is freely available as a Facebook application.
1. MOVIE RECOMMENDATION WITH DBPEDIA
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio
mirizzi@deemail.poliba.it, t.dinoia@poliba.it , azzurra.ragone@exprivia.it, ostuni@deemail.poliba.it, disciascio@poliba.it
Politecnico di Bari
Via Orabona, 4
70125 Bari (ITALY)
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
2. Outline
DBpedia: a nucleus for a Web of Open Data
Social knowledge bases for similarity detection
Semantic Vector Space Model
Vector Space Model adapted to RDF graphs
MORE: More than Movie Recommendation
Content-based recommendation in action
Evaluation
Precision and Recall experiments with MovieLens
Conclusion
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
3. What is Linked Data?
Linked Data is about using
the Web to connect related
data that wasn't previously
linked, or using the Web to
lower the barriers to linking
data currently linked using
other methods. More
specifically, Wikipedia defines
Linked Data as “a term used
to describe a recommended
best practice for exposing,
sharing, and connecting
pieces of data, information,
and knowledge on the
Semantic Web using URIs and
RDF.”
[www.linkeddata.org]
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
4. DBpedia: a Nucleus for a Web of Data (i)
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
5. DBpedia: a Nucleus for a Web of Data (ii)
The DBpedia
knowledge base
currently
describes more
than 3.64 million
things, highly
interconnected
in the RDF graph.
Let’s use all this
knowledge to
build smarter
content-based
recommender
systems
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
6. Social KBs for similarity detection
Catherine
Crime Zeta-Jones
George Clooney Ocean’s Twelve
Ocean’s Eleven
Brad Pitt
Steven
Soderbergh
2000s crime films
American criminal
comedy films
Crime films
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
7. Semantic Vector Space Model (i)
Quick recap on Vector Space Model
Vector Space Model is an algebraic model
for representing both text documents
and queries as vectors of index terms wt,d
that are positive and non-binary.
T
vd w1,d , w2,d ,..., wN ,d
wt ,d tft ,d idft
nt ,d D
tft ,d idft log
k
nk ,d d D t d
' '
[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]
N
d j dq wi , j wi ,q
sim(d j , q) i 1
N N
dj q
i 1
w2 i , j i 1
w2 i , q
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
8. Semantic Vector Space Model (ii)
Vector Space Model
Ocean’s Eleven applied to RDF graphs
Ocean’s Twelve
George Clooney Each resource (movie) is
Brad Pitt expressed as a tensor in a
Catherine Zeta-Jones multi-dimensional space
Steven Soderberg where each dimension
2000s crime films
Crime films
corresponds to a specific
American criminal… genre property of the considered
subject/broader datasets (e.g., starring,
Crime director
American criminal… starring subject/broader, director,
Catherine Zeta-Jones
Crime
Ocean’s Eleven
Ocean’s Twelve
Brad Pitt
Steven Soderberg
George Clooney
Crime films
2000s crime films
genre, …)
Ocean’s Eleven
Ocean’s Twelve
starring
Brad Pitt
George Clooney
therine Zeta-Jones
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
11. MORE: More than Movie Recommendation
MORE is a Facebook application
that semantically recommends
movies to the user leveraging
the knowledge within DBpedia.
MORE supports the user in
exploratory browsing tasks by
guiding their search through a
semantic knowledge space.
Similarities between movies are
computed by a Semantic
version of the classical Vector
Space Model (sVSM), applied to
semantic datasets.
http://apps.facebook.com/movie-recommendation/
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
12. Semantic Content-based Recommender
Given a user profile, defined as:
profile(u) m j u likes m j
We compute a similarity between mi and the information encoded in profile(u):
1
(u ) P p simp (m j , mi )
m j profile p
r (u, mi )
profile(u )
If this similarity is greater or equal to 0.5, we suggest the movie mi to the user u.
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
13. Training the system
In order to identify the best possible values for the coefficients p (i.e., the weights
associated to each property), we train the system via a genetic algorithm adopting an N-
fold cross validation approach (with N = 5) on the 100k MovieLens dataset.
At the end we obtain a set Ap = {p1, …, p5} of 5 different values for each p, e.g.:
Then, we evaluate the performances with standard precision and recall tests, when p
is one of the following:
min( Ap ) max( Ap ) avg ( Ap ) median( Ap ) lowestError ( Ap )
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
14. Evaluation: Precision & Recall
Rec @ N TestSet Rec @ N TestSet
P@ N R@ N
N TestSet
N 3, 4,5, 6, 7
The figure shows high values of Precision and Recall.
The best values are obtained choosing the lowest
misclassification error on Ap for the coefficients p.
We also evaluated the importance of the
subject/broader property. The information of this
property is peculiar of ontological datasets.
As shown in the figure, the performances drastically
decrease if we do not consider this property.
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
15. Conclusion & Future directions
The huge amount of data available on DBpedia can be successfully exploited to
build content-based recommender systems.
We have presented MORE, a Facebook application that leverages the knowledge
within DBpedia to produce movie recommendations by means of a semantic
version of the classical vector space model (sVSM).
Evaluation against historical datasets and high values of precision and recall prove
the validity of our approach.
We are currently working on:
Testing the approach with different domains
Improving the recommendation with a hybrid approach (content-based and collaborative filtering)
We acknowledge partial support of HP IRP 2011. Grant CW267313.
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012
16. Q? A!
3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
January 26, 2012