Linked Open Data to support content based Recommender Systems
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
604
On Slideshare
604
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
21
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. LINKED OPEN DATA TO SUPPORT CONTENT-BASED RECOMMENDER SYSTEMSTommaso Di Noia1, Roberto Mirizzi2, Vito Claudio Ostuni1, Davide Romito1, Markus Zanker3 t.dinoia@poliba.it, roberto.mirizzi@hp.com, ostuni@deemail.poliba.it, romito@deemail.poliba.it, markus.zanker@uni-klu.ac.at1Politecnico di Bari 2HP Labs 3Alpen-Adria-Universität KlagenfurtVia Orabona, 4 1501 Page Mill Road Universitätsstraße 65 -6770125 Bari (ITALY) Palo Alto, CA (US) 94304 9020 Klagenfurt, Austria I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 2. Outline What are (Content-based) Recommender Systems?  The main drawback: limited content analysis Vector Space Model for Linked Open Data (LOD)  Vector Space Model adapted to RDF graphs A Semantic Content-based Recommender System  A Memory-based algorithm which uses a LOD-based item similarity measure Evaluation  Precision and Recall experiments with MovieLens Conclusion I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 3. Recommender Systems A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]Input Data: A set of users U={u1, …, uN} A set of items I={i1, …, iM} The rating matrix R=[ru,i]Problem Definition: Given user u and target item i Predict the rating ru,i I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 4. Content-based Recommender Systems CB-RSs recommend items to a user based on their description and on the profile of the user’s interests * Item1, 5 Item2, 1 Item5, 4 Item10, 5 …. Top-N Recommendations User profile Item7 Recommender Item15 System Item11 …Items Item1 Item2 …. Item’s Item100 descriptions(*) Pazzani, M. J., & Billsus, D. Content-Based Recommendation Systems. The Adaptive Web. Lecture Notes in Computer Science vol. 4321, 325-341, 2007 I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 5. Main CB RS Drawback: Limited Content Analysis No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.* The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items. Need of domain knowledge! We need rich descriptions of the items!(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira,editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 6. A Linked Data based SolutionUse Linked Data to mitigatethe limited content analysisissuePlenty of structured dataavailable  No ContentAnalyzer required I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 7. LINKED DATA as structured information source for item’s descriptions Let’s use all this ontological knowledge to build smarter CB RSsRich items descriptions I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 8. Computing similarity in LOD datasets I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 9. Vector Space Model for LOD (i) Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary. T vd   w1,d , w2,d ,..., wN ,d    wt ,d  tft ,d  idft nt ,d D tft ,d  idft  log  k nk ,d d  D t  d  [http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]  N d j dq wi , j  wi ,q sim(d j , q)   i 1   N N dj q i 1 w2 i , j  i 1 w2 i , q I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 10. Vector Space Model for LOD(ii) Righteous Kill Heat Righteous Kill Al Pacino Robert De Niro Brian Dennehy Heat Robert De Niro starring Al Pacino Brian Dennehy John AvnetSerial killer films Heist films Crime films genre subject/broader Drama director starring Crime films Heat Brian Dennehy Drama John Avnet Righteous Kill Heist films Robert De Niro Al Pacino Serial killer films I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 11. Vector Space Model for LOD(iii) Robert Brian Al Pacino STARRING De Niro Dennehy (a1) (a2) (a3) Righteous    Kill (m1) Heat (m2)   Righteous Kill Heat wactorx ,moviey  tf actorx ,moviey  idf actorx Righteous Kill (m1) wa1,m1 wa2,m1 wa3,m1 Heat (m2) wa1,m2 wa2,m2 0 I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 12. Vector Space Model for LOD(iv) wa1 ,m1  wa1 ,m2  wa2 ,m1  wa2 ,m2  wa3 ,m1  wa3 ,m2simstarring (m1 , m2 )  wa1 ,m1  wa2 ,m1  wa3 ,m1  wa1 ,m2  wa2 ,m2  wa3 ,m2 2 2 2 2 2 2  starring  simstarring (m1 , m2 ) +  director  simdirector (m1 , m2 ) +  subject  simsubject (m1 , m2 ) + … = sim(m1 , m2 ) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 13. Semantic Content-based RecommenderGiven a user profile, defined as:  profile(u)   m j , v j  v j =1 if u likes m j , v j =-1 otherwise We predict the rating using a Nearest Neighbor Classifier wherein the similaritymeasure is a linear combination of local property similarities  p  sim p (m j , mi )  m j  profile ( u ) vj p P r (u , mi )  profile(u ) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 14. Training the system(i) In order to identify the best possible values for the coefficients p (i.e., the weights associated to the properties), we train the system via a genetic algorithm.Fitness function: Minimize the number of misclassification errors ei on thetraining data (user profile) Optimization  training data user u optimal values Item1, 1 Min ei (p1 p2 p3 ….) Item2, -1 Item5, 1  …. | profile ( u )| User profile I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 15. Training the system(ii)In some cases (e.g. new user problem) the user could have not rated any item yet.The user-profile is empty. We cannot learn the αp coefficients!Look at Amazon.comUse Amazon’s collaborative results to capture movie similarities We collected a set of 1000 movies from Amazon. For each one of these movies we look at the correspondent recommendation list. Righteous Kill Heat Increment the weights αp associated to First suggestion the common properties between the two movies. e.g. They have same actors in common and no directors. Hence we can increase the weight of the property starring. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 16. Experiment settings(i)MovieLens 1M datasetOne-One mapping between MovieLens and DBpediaUsing SPARQL queries and Levensthein Distance3,654 matched movies on 3,952Binarization of the 1-5 rating scale profile(u)   m j , v j  v j =1 if r(u,m j )  ru , v j =-1 otherwise  Evaluation goal : Top-N recommendations Metrics: Precision@n + Recall@n Rec @ N  TestSet Rec @ N  TestSet P@ N  R@ N  N  1, 2...20 N TestSet I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 17. Experiment settings(ii)Propertiesdcterms:subject + skos:broader + DBpedia Ontology +Freebase + LinkedMDB genres Extracted Graph 53,840 actors, 18,149 directors, 29,352 distinct writers and 27,035 categories from DBpedia 667 genres from Freebase 26 genres from LinkedMDB I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 18. Alpha-coefficients evaluation The α-coefficents obtained with the genetic algorithm give us the best performance. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 19. Property subset evaluation The subject+broader solution is better than only subject or subject+more broaders. Too many broaders introduce noise. The best solution is achieved with subject+broader+ genres. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 20. Evaluation against other approaches Our solution outperforms a Linked Data approach (LDSD) and others content-based which do not leverage LOD. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 21. Conclusion & Future directions The huge amount of data available on Linked Data datasets can be successfully exploited to overcome limited content analysis. We have presented a semantic version of the classical vector space model to compute item similarities. Evaluation against historical datasets and high values of precision and recall prove the validity of our approach. We are currently working on:  Testing the approach with different domains  Improving the recommendation with a hybrid approach (content-based and collaborative filtering) I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria
  • 22. Q&AWe acknowledge partial support of HP IRP 2011. Grant CW267313. I-SEMANTICS 2012 – 8th Int. Conference on Semantic Systems September 5-7, 2012 Graz, Austria