SlideShare a Scribd company logo
1 of 16
Download to read offline
MOVIE RECOMMENDATION WITH DBPEDIA

Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio
     mirizzi@deemail.poliba.it, t.dinoia@poliba.it , azzurra.ragone@exprivia.it, ostuni@deemail.poliba.it, disciascio@poliba.it




                                                              Politecnico di Bari
                                                              Via Orabona, 4
                                                              70125 Bari (ITALY)




                                    3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                          January 26, 2012
Outline
 DBpedia: a nucleus for a Web of Open Data
    Social knowledge bases for similarity detection


 Semantic Vector Space Model
    Vector Space Model adapted to RDF graphs


 MORE: More than Movie Recommendation
    Content-based recommendation in action


 Evaluation
    Precision and Recall experiments with MovieLens


 Conclusion

                     3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                           January 26, 2012
What is Linked Data?

                                                               Linked Data is about using
                                                               the Web to connect related
                                                               data that wasn't previously
                                                               linked, or using the Web to
                                                               lower the barriers to linking
                                                               data currently linked using
                                                               other methods. More
                                                               specifically, Wikipedia defines
                                                               Linked Data as “a term used
                                                               to describe a recommended
                                                               best practice for exposing,
                                                               sharing, and connecting
                                                               pieces of data, information,
                                                               and knowledge on the
                                                               Semantic Web using URIs and
                                                               RDF.”
                                                               [www.linkeddata.org]




3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                      January 26, 2012
DBpedia: a Nucleus for a Web of Data (i)




          3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                January 26, 2012
DBpedia: a Nucleus for a Web of Data (ii)
                                                                         The DBpedia
                                                                         knowledge base
                                                                         currently
                                                                         describes more
                                                                         than 3.64 million
                                                                         things, highly
                                                                         interconnected
                                                                         in the RDF graph.



                                                                         Let’s use all this
                                                                          knowledge to
                                                                           build smarter
                                                                          content-based
                                                                          recommender
                                                                             systems


          3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                January 26, 2012
Social KBs for similarity detection
                                                                                Catherine
                                              Crime                             Zeta-Jones



                                         George Clooney                              Ocean’s Twelve

    Ocean’s Eleven
                                                Brad Pitt


                                             Steven
                                           Soderbergh
2000s crime films
                                                                            American criminal
                                                                              comedy films



                                  Crime films



                     3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                           January 26, 2012
Semantic Vector Space Model (i)

                                                                Quick recap on Vector Space Model
                                                                Vector Space Model is an algebraic model
                                                                for representing both text documents
                                                                and queries as vectors of index terms wt,d
                                                                that are positive and non-binary.
                                                                                                                              T
                                                                           vd   w1,d , w2,d ,..., wN ,d 
                                                                                                         

                                                                                           wt ,d  tft ,d  idft

                                                                                  nt ,d                                               D
                                                                       tft ,d                           idft  log
                                                                                     k
                                                                                          nk ,d                            d  D t  d 
                                                                                                                                  '                         '


[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

                                                                                                               
                                                                                                                    N
                                                                                            d j dq                         wi , j  wi ,q
                                                                      sim(d j , q)                                i 1


                                                                                                                                     
                                                                                                              N                           N
                                                                                            dj q
                                                                                                              i 1
                                                                                                                   w2 i , j              i 1
                                                                                                                                                 w2 i , q


                                      3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                            January 26, 2012
Semantic Vector Space Model (ii)
                                                                                         Vector Space Model
      Ocean’s Eleven                                                                         applied to RDF graphs
     Ocean’s Twelve
     George Clooney                                                        Each resource (movie) is
            Brad Pitt                                                      expressed as a tensor in a
Catherine Zeta-Jones                                                       multi-dimensional space
   Steven Soderberg                                                        where each dimension
   2000s crime films
         Crime films
                                                                           corresponds to a specific
 American criminal…                                          genre         property of the considered
                                                           subject/broader datasets (e.g., starring,
               Crime                                     director
                         American criminal…             starring           subject/broader, director,
                        Catherine Zeta-Jones




                                       Crime
                              Ocean’s Eleven
                             Ocean’s Twelve

                                    Brad Pitt

                           Steven Soderberg
                             George Clooney




                                 Crime films
                           2000s crime films



                                                                           genre, …)


                                                                                   Ocean’s Eleven
                                                                                   Ocean’s Twelve
                                                                                                                           starring




                                                                                                               Brad Pitt
                                                                                                       George Clooney

                                                                                                     therine Zeta-Jones
                              3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                    January 26, 2012
Semantic Vector Space Model (iii)
                                   George
                                    George     Catherine
                                                Catherine     Brad
                                                               Brad
                                     George      Catherine      Brad
                   STARRING Clooney [gc] Z. Jones [czj]
                    STARRING Clooney [gc] Z. Jones [czj]    Pitt [bp]
                     STARRING (38 movies) (22 Jones [czj] (35Pitt[bp]
                                  Clooney [gc] Z.
                                                  movies)
                                                             Pitt [bp]
                                                              movies)
                                 (38 movies)
                                  (38 movies) (22 movies) (35 movies)
                                               (22 movies) (35 movies)
                  Ocean’s
                   Ocean’s
                    Ocean’s
                  Eleven [o11]]
                   Eleven [o11 ]      
                                                             
                    Eleven [o11                                 
                  (13 actors)
                   (13 actors)
                    (13 actors)
                  Ocean’s
                   Ocean’s
                    Ocean’s
                  Twelve [o12]]
                   Twelve [o12 ]      
                                                             
                    Twelve [o12                                 
Ocean’s Eleven    (15 actors)
                   (15 actors)                                                                   Ocean’s Twelve
                    (15 actors)
                           wactorx ,moviey  tf actorx ,moviey  idf actorx

                                      wgc ,o12  wgc ,o11  wczj ,o12  wczj ,o11  wbp ,o12  wbp ,o11
         simstarring (o12 , o11 ) 
                                           wgc ,o12  wczj ,o12  wbp ,o12  wgc ,o11  wbp ,o11
                                            2          2           2          2          2




                              3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                    January 26, 2012
Semantic Vector Space Model (iv)

                                      1        49184
wgc,o1 2  tf gc,o1 2  idf gc   log                 0.207
                                     15          38                             starring  simstarring (o12 , o11 ) +
                                      1        49184
wgc,o1 1  tf gc,o1 1  idf gc   log                 0.239
                                     13          38
                                       1        49184                             genre  simgenre (o12 , o11 )    +
wczj ,o1 2  tf czj ,o1 2  idf czj   log             0.223
                                      15          22
                                              49184
wczj ,o1 1  tf czj ,o1 1  idf czj  0  log        0                          subject  simsubject (o12 , o11 ) +
                                                22
                                      1        49184
wbp,o1 2  tf bp,o1 2  idf bp   log                 0.210
                                     15          35                                           …                    =
                                      1        49184
wbp,o1 1  tf bp,o1 1  idf bp   log                0.242
                                     13          35
                                                                                          sim(o12 , o11)


                               3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                     January 26, 2012
MORE: More than Movie Recommendation
                                                               MORE is a Facebook application
                                                               that semantically recommends
                                                               movies to the user leveraging
                                                               the knowledge within DBpedia.

                                                               MORE supports the user in
                                                               exploratory browsing tasks by
                                                               guiding their search through a
                                                               semantic knowledge space.

                                                               Similarities between movies are
                                                               computed by a Semantic
                                                               version of the classical Vector
                                                               Space Model (sVSM), applied to
                                                               semantic datasets.




    http://apps.facebook.com/movie-recommendation/
             3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                   January 26, 2012
Semantic Content-based Recommender
Given a user profile, defined as:

                                               
                          profile(u)  m j u likes m j                  
We compute a similarity between mi and the information encoded in profile(u):
                                                1
                                         (u ) P  p  simp (m j , mi )
                                  m j  profile   p
                   r (u, mi ) 
                                                    profile(u )

If this similarity is greater or equal to 0.5, we suggest the movie mi to the user u.




                          3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                January 26, 2012
Training the system
In order to identify the best possible values for the coefficients p (i.e., the weights
associated to each property), we train the system via a genetic algorithm adopting an N-
fold cross validation approach (with N = 5) on the 100k MovieLens dataset.

At the end we obtain a set Ap = {p1, …, p5} of 5 different values for each p, e.g.:




Then, we evaluate the performances with standard precision and recall tests, when p
is one of the following:
min( Ap )       max( Ap )             avg ( Ap )             median( Ap )                lowestError ( Ap )

                          3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                January 26, 2012
Evaluation: Precision & Recall
                                                      Rec @ N  TestSet                         Rec @ N  TestSet
                                         P@ N                                         R@ N 
                                                                  N                                  TestSet

                                                                        N  3, 4,5, 6, 7

                                       The figure shows high values of Precision and Recall.
                                       The best values are obtained choosing the lowest
                                       misclassification error on Ap for the coefficients p.




We also evaluated the importance of the
subject/broader property. The information of this
property is peculiar of ontological datasets.

As shown in the figure, the performances drastically
decrease if we do not consider this property.


                        3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                              January 26, 2012
Conclusion & Future directions
 The huge amount of data available on DBpedia can be successfully exploited to
  build content-based recommender systems.

 We have presented MORE, a Facebook application that leverages the knowledge
  within DBpedia to produce movie recommendations by means of a semantic
  version of the classical vector space model (sVSM).

 Evaluation against historical datasets and high values of precision and recall prove
  the validity of our approach.

 We are currently working on:
     Testing the approach with different domains
     Improving the recommendation with a hybrid approach (content-based and collaborative filtering)


 We acknowledge partial support of HP IRP 2011. Grant CW267313.



                           3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                                                 January 26, 2012
Q?                                           A!




     3rd Italian Information Retrieval Workshop (IIR 2012) – Bari
                           January 26, 2012

More Related Content

Similar to Movie Recommendation with DBpedia - IIR 2012

Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip finalDeborah McGuinness
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1BarryK88
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonAlex Coley
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Alexandre Passant
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record LinkagePriya Pandian
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017Edward Blurock
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebFabrizio Orlandi
 
How Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentHow Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentNikos Loutas
 

Similar to Movie Recommendation with DBpedia - IIR 2012 (20)

Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
 
Data Ownership: Who Owns 'My Data'?
Data Ownership: Who Owns 'My Data'?Data Ownership: Who Owns 'My Data'?
Data Ownership: Who Owns 'My Data'?
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz London
 
090626cc tech-summit
090626cc tech-summit090626cc tech-summit
090626cc tech-summit
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record Linkage
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017
 
Lgd 2
Lgd 2Lgd 2
Lgd 2
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
Data 2.0|
Data 2.0|Data 2.0|
Data 2.0|
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
How Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentHow Linked Data is transforming eGovernment
How Linked Data is transforming eGovernment
 

More from Roku

Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRoku
 
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Roku
 
From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010Roku
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Roku
 
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010Roku
 
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09Roku
 
Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...Roku
 

More from Roku (7)

Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
 
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
 
From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010
 
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
 
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
 
Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

Movie Recommendation with DBpedia - IIR 2012

  • 1. MOVIE RECOMMENDATION WITH DBPEDIA Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio mirizzi@deemail.poliba.it, t.dinoia@poliba.it , azzurra.ragone@exprivia.it, ostuni@deemail.poliba.it, disciascio@poliba.it Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY) 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 2. Outline  DBpedia: a nucleus for a Web of Open Data  Social knowledge bases for similarity detection  Semantic Vector Space Model  Vector Space Model adapted to RDF graphs  MORE: More than Movie Recommendation  Content-based recommendation in action  Evaluation  Precision and Recall experiments with MovieLens  Conclusion 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 3. What is Linked Data? Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.” [www.linkeddata.org] 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 4. DBpedia: a Nucleus for a Web of Data (i) 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 5. DBpedia: a Nucleus for a Web of Data (ii) The DBpedia knowledge base currently describes more than 3.64 million things, highly interconnected in the RDF graph. Let’s use all this knowledge to build smarter content-based recommender systems 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 6. Social KBs for similarity detection Catherine Crime Zeta-Jones George Clooney Ocean’s Twelve Ocean’s Eleven Brad Pitt Steven Soderbergh 2000s crime films American criminal comedy films Crime films 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 7. Semantic Vector Space Model (i) Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary. T vd   w1,d , w2,d ,..., wN ,d    wt ,d  tft ,d  idft nt ,d D tft ,d  idft  log  k nk ,d d  D t  d  ' ' [http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]  N d j dq wi , j  wi ,q sim(d j , q)   i 1   N N dj q i 1 w2 i , j  i 1 w2 i , q 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 8. Semantic Vector Space Model (ii) Vector Space Model Ocean’s Eleven applied to RDF graphs Ocean’s Twelve George Clooney Each resource (movie) is Brad Pitt expressed as a tensor in a Catherine Zeta-Jones multi-dimensional space Steven Soderberg where each dimension 2000s crime films Crime films corresponds to a specific American criminal… genre property of the considered subject/broader datasets (e.g., starring, Crime director American criminal… starring subject/broader, director, Catherine Zeta-Jones Crime Ocean’s Eleven Ocean’s Twelve Brad Pitt Steven Soderberg George Clooney Crime films 2000s crime films genre, …) Ocean’s Eleven Ocean’s Twelve starring Brad Pitt George Clooney therine Zeta-Jones 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 9. Semantic Vector Space Model (iii) George George Catherine Catherine Brad Brad George Catherine Brad STARRING Clooney [gc] Z. Jones [czj] STARRING Clooney [gc] Z. Jones [czj] Pitt [bp] STARRING (38 movies) (22 Jones [czj] (35Pitt[bp] Clooney [gc] Z. movies) Pitt [bp] movies) (38 movies) (38 movies) (22 movies) (35 movies) (22 movies) (35 movies) Ocean’s Ocean’s Ocean’s Eleven [o11]] Eleven [o11 ]     Eleven [o11    (13 actors) (13 actors) (13 actors) Ocean’s Ocean’s Ocean’s Twelve [o12]] Twelve [o12 ]     Twelve [o12    Ocean’s Eleven (15 actors) (15 actors) Ocean’s Twelve (15 actors) wactorx ,moviey  tf actorx ,moviey  idf actorx wgc ,o12  wgc ,o11  wczj ,o12  wczj ,o11  wbp ,o12  wbp ,o11 simstarring (o12 , o11 )  wgc ,o12  wczj ,o12  wbp ,o12  wgc ,o11  wbp ,o11 2 2 2 2 2 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 10. Semantic Vector Space Model (iv) 1 49184 wgc,o1 2  tf gc,o1 2  idf gc   log  0.207 15 38  starring  simstarring (o12 , o11 ) + 1 49184 wgc,o1 1  tf gc,o1 1  idf gc   log  0.239 13 38 1 49184  genre  simgenre (o12 , o11 ) + wczj ,o1 2  tf czj ,o1 2  idf czj   log  0.223 15 22 49184 wczj ,o1 1  tf czj ,o1 1  idf czj  0  log 0  subject  simsubject (o12 , o11 ) + 22 1 49184 wbp,o1 2  tf bp,o1 2  idf bp   log  0.210 15 35 … = 1 49184 wbp,o1 1  tf bp,o1 1  idf bp   log  0.242 13 35 sim(o12 , o11) 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 11. MORE: More than Movie Recommendation MORE is a Facebook application that semantically recommends movies to the user leveraging the knowledge within DBpedia. MORE supports the user in exploratory browsing tasks by guiding their search through a semantic knowledge space. Similarities between movies are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets. http://apps.facebook.com/movie-recommendation/ 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 12. Semantic Content-based Recommender Given a user profile, defined as:  profile(u)  m j u likes m j  We compute a similarity between mi and the information encoded in profile(u): 1  (u ) P  p  simp (m j , mi ) m j  profile p r (u, mi )  profile(u ) If this similarity is greater or equal to 0.5, we suggest the movie mi to the user u. 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 13. Training the system In order to identify the best possible values for the coefficients p (i.e., the weights associated to each property), we train the system via a genetic algorithm adopting an N- fold cross validation approach (with N = 5) on the 100k MovieLens dataset. At the end we obtain a set Ap = {p1, …, p5} of 5 different values for each p, e.g.: Then, we evaluate the performances with standard precision and recall tests, when p is one of the following: min( Ap ) max( Ap ) avg ( Ap ) median( Ap ) lowestError ( Ap ) 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 14. Evaluation: Precision & Recall Rec @ N  TestSet Rec @ N  TestSet P@ N  R@ N  N TestSet N  3, 4,5, 6, 7 The figure shows high values of Precision and Recall. The best values are obtained choosing the lowest misclassification error on Ap for the coefficients p. We also evaluated the importance of the subject/broader property. The information of this property is peculiar of ontological datasets. As shown in the figure, the performances drastically decrease if we do not consider this property. 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 15. Conclusion & Future directions  The huge amount of data available on DBpedia can be successfully exploited to build content-based recommender systems.  We have presented MORE, a Facebook application that leverages the knowledge within DBpedia to produce movie recommendations by means of a semantic version of the classical vector space model (sVSM).  Evaluation against historical datasets and high values of precision and recall prove the validity of our approach.  We are currently working on:  Testing the approach with different domains  Improving the recommendation with a hybrid approach (content-based and collaborative filtering)  We acknowledge partial support of HP IRP 2011. Grant CW267313. 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012
  • 16. Q? A! 3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012