• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
 

Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…

on

  • 468 views

Linked Open Data to Support Content-based Recommender Systems ...

Linked Open Data to Support Content-based Recommender Systems
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker

I-SEMANTICS 2012, 8th Int. Conf. on Semantic Systems, Sept. 5-7, 2012, Graz, Austria.

The World Wide Web is moving from a Web of hyper-linked Documents to a Web of linked Data. Thanks to the Semantic Web spread and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets. These datasets are connected with each other to form the so called Linked Open Data cloud. As of today, there are tons of RDF data available in the Web of Data, but only few applications really exploit their potential power. In this paper we show how these data can successfully be used to develop a recommender system (RS) that relies exclusively on the information encoded in the Web of Data. We implemented a content-based RS that leverages the data available within Linked Open Data datasets (in particular DBpedia, Freebase and LinkedMDB) in order to recommend movies to the end users. We extensively evaluated the approach and validated the effectiveness of the algorithms by experimentally measuring their accuracy with precision and recall metrics.

Statistics

Views

Total Views
468
Views on SlideShare
464
Embed Views
4

Actions

Likes
0
Downloads
15
Comments
0

1 Embed 4

http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC… Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC… Presentation Transcript

    • LINKED OPEN DATA TO SUPPORT CONTENT-BASED RECOMMENDER SYSTEMSTommaso Di Noia1, Roberto Mirizzi2, Vito Claudio Ostuni1, Davide Romito1, Markus Zanker3 t.dinoia@poliba.it, roberto.mirizzi@hp.com, ostuni@deemail.poliba.it, romito@deemail.poliba.it, markus.zanker@uni-klu.ac.at1Politecnico di Bari 2HP Labs 3Alpen-Adria-Universität KlagenfurtVia Orabona, 4 1501 Page Mill Road Universitätsstraße 65 -6770125 Bari (ITALY) Palo Alto, CA (US) 94304 9020 Klagenfurt, Austria I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Outline What are (Content-based) Recommender Systems?  The main drawback: limited content analysis Vector Space Model for Linked Open Data (LOD)  Vector Space Model adapted to RDF graphs A Semantic Content-based Recommender System  A Memory-based algorithm which uses a LOD-based item similarity measure Evaluation  Precision and Recall experiments with MovieLens Conclusion I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Recommender Systems A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]Input Data: A set of users U={u1, …, uN} A set of items I={i1, …, iM} The rating matrix R=[ru,i]Problem Definition: Given user u and target item i Predict the rating ru,i I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Content-based Recommender Systems CB-RSs recommend items to a user based on their description and on the profile of the user’s interests * Item1, 5 Item2, 1 Item5, 4 Item10, 5 …. Top-N Recommendations User profile Item7 Recommender Item15 System Item11 …Items Item1 Item2 …. Item’s Item100 descriptions(*) Pazzani, M. J., & Billsus, D. Content-Based Recommendation Systems. The Adaptive Web. Lecture Notes in Computer Science vol. 4321, 325-341, 2007 I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Main CB RS Drawback: Limited Content Analysis No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.* The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items. Need of domain knowledge! We need rich descriptions of the items!(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira,editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • A Linked Data based SolutionUse Linked Data to mitigatethe limited content analysisissuePlenty of structured dataavailable  No ContentAnalyzer required I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • LINKED DATA as structured information source for item’s descriptions Let’s use all this ontological knowledge to build smarter CB RSsRich items descriptions I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Computing similarity in LOD datasets I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Vector Space Model for LOD (i) Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary. T vd   w1,d , w2,d ,..., wN ,d    wt ,d  tft ,d  idft nt ,d D tft ,d  idft  log  k nk ,d d  D t  d  [http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]  N d j dq wi , j  wi ,q sim(d j , q)   i 1   N N dj q i 1 w2 i , j  i 1 w2 i , q I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Vector Space Model for LOD(ii) Righteous Kill Heat Righteous Kill Al Pacino Robert De Niro Brian Dennehy Heat Robert De Niro starring Al Pacino Brian Dennehy John AvnetSerial killer films Heist films Crime films genre subject/broader Drama director starring Crime films Heat Brian Dennehy Drama John Avnet Righteous Kill Heist films Robert De Niro Al Pacino Serial killer films I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Vector Space Model for LOD(iii) Robert Brian Al Pacino STARRING De Niro Dennehy (a1) (a2) (a3) Righteous    Kill (m1) Heat (m2)   Righteous Kill Heat wactorx ,moviey  tf actorx ,moviey  idf actorx Righteous Kill (m1) wa1,m1 wa2,m1 wa3,m1 Heat (m2) wa1,m2 wa2,m2 0 I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Vector Space Model for LOD(iv) wa1 ,m1  wa1 ,m2  wa2 ,m1  wa2 ,m2  wa3 ,m1  wa3 ,m2simstarring (m1 , m2 )  wa1 ,m1  wa2 ,m1  wa3 ,m1  wa1 ,m2  wa2 ,m2  wa3 ,m2 2 2 2 2 2 2  starring  simstarring (m1 , m2 ) +  director  simdirector (m1 , m2 ) +  subject  simsubject (m1 , m2 ) + … = sim(m1 , m2 ) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Semantic Content-based RecommenderGiven a user profile, defined as:  profile(u)   m j , v j  v j =1 if u likes m j , v j =-1 otherwise We predict the rating using a Nearest Neighbor Classifier wherein the similaritymeasure is a linear combination of local property similarities  p  sim p (m j , mi )  m j  profile ( u ) vj p P r (u , mi )  profile(u ) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Training the system(i) In order to identify the best possible values for the coefficients p (i.e., the weights associated to the properties), we train the system via a genetic algorithm.Fitness function: Minimize the number of misclassification errors ei on thetraining data (user profile) Optimization  training data user u optimal values Item1, 1 Min ei (p1 p2 p3 ….) Item2, -1 Item5, 1  …. | profile ( u )| User profile I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Training the system(ii)In some cases (e.g. new user problem) the user could have not rated any item yet.The user-profile is empty. We cannot learn the αp coefficients!Look at Amazon.comUse Amazon’s collaborative results to capture movie similarities We collected a set of 1000 movies from Amazon. For each one of these movies we look at the correspondent recommendation list. Righteous Kill Heat Increment the weights αp associated to First suggestion the common properties between the two movies. e.g. They have same actors in common and no directors. Hence we can increase the weight of the property starring. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Experiment settings(i)MovieLens 1M datasetOne-One mapping between MovieLens and DBpediaUsing SPARQL queries and Levensthein Distance3,654 matched movies on 3,952Binarization of the 1-5 rating scale profile(u)   m j , v j  v j =1 if r(u,m j )  ru , v j =-1 otherwise  Evaluation goal : Top-N recommendations Metrics: Precision@n + Recall@n Rec @ N  TestSet Rec @ N  TestSet P@ N  R@ N  N  1, 2...20 N TestSet I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Experiment settings(ii)Propertiesdcterms:subject + skos:broader + DBpedia Ontology +Freebase + LinkedMDB genres Extracted Graph 53,840 actors, 18,149 directors, 29,352 distinct writers and 27,035 categories from DBpedia 667 genres from Freebase 26 genres from LinkedMDB I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Alpha-coefficients evaluation The α-coefficents obtained with the genetic algorithm give us the best performance. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Property subset evaluation The subject+broader solution is better than only subject or subject+more broaders. Too many broaders introduce noise. The best solution is achieved with subject+broader+ genres. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Evaluation against other approaches Our solution outperforms a Linked Data approach (LDSD) and others content-based which do not leverage LOD. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Conclusion & Future directions The huge amount of data available on Linked Data datasets can be successfully exploited to overcome limited content analysis. We have presented a semantic version of the classical vector space model to compute item similarities. Evaluation against historical datasets and high values of precision and recall prove the validity of our approach. We are currently working on:  Testing the approach with different domains  Improving the recommendation with a hybrid approach (content-based and collaborative filtering) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
    • Q&AWe acknowledge partial support of HP IRP 2011. Grant CW267313. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria