Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
dbrecMusic recommendations using DBpedia         Alexandre Passant - DERI, NUI Galway                  In-Use Track @ ISWC...
Good news, it doesn’t fitanymore in a slide !Many producers, only afew consumers (besidessearch engines): BBC,Drupal ,,,
Agenda• Semantic Distance over Linked Data• dbrec - architecture, dataset and UI• Evaluation• Lessons learnt• Next steps a...
Semantic Distance
Semantic Distance over    Linked Data• Relying only on links• Relying only on instance data• Using dereferencable URIs • A...
Linked Data
Linked Data              e:l1e:r1   e:l1          e:r2         e:l2e:l2   e:l3          e:l3e:r3                 e:r4
G = (R, L, I)                                         e:l1                           e:r1   e:l1          e:r2• R = {r , r...
e:l1e:r1   e:l1          e:r2         e:l2e:l2   e:l3          e:l3e:r3                 e:r4
e:l1                        e:l1e:r1   e:l1          e:r2   e:r1   e:l1          e:r2         e:l2                        ...
e:l1                        e:l1e:r1   e:l1          e:r2   e:r1   e:l1          e:r2         e:l2                        ...
e:l1                        e:l1e:r1   e:l1          e:r2   e:r1   e:l1          e:r2         e:l2                        ...
e:l1e:r1   e:l1          e:r2         e:l2e:l2   e:l3          e:l3e:r3                 e:r4
LDSD
The LDSD ontology                Our own ontology, but                could map with MuSim                in the future
dbrec
At a glance• A system providing recommendations for all  DBpedia bands and artists (±40K) using LDSD    • And explaining i...
Architecture                (2) Dataset reducing (1) Dataset                       (3) LDSD                 (4) Useridenti...
Dataset•   Retrieving all artists and bands in DBpedia (±40K)    •   Including incoming / outcoming links    •   Approxima...
Distribution               20K+ artists (50%) are               not linked to any other               artist
Curation• 118 properties linking artists together  • 18 mis-used, 35 wrongly defined (e.g.    dbprop:klfsgProperty)• 578 pr...
Computing distance• 9,797 minutes                 Done for all artists in                 DBpedia                         ...
Artist        Distance   Elvis Presley      0.0978June Carter Cash      0.1056  Willie Nelson       0.1322Kris Kristoffers...
User interface
Sorry, slideshare people,that’s a movie so youwon’t be able to see it !
Evaluation
Evaluation settings• Off-line and on-line user evaluation • Using common RecSys metrics• 10 subjects • 2 women, 8 men • 24...
Metrics•   Off-line evaluation - comparison with last.fm    •   5 artists / bands    •   2 blind list, 10 ranked recommend...
dbrec vs last.fm• Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
Results for the precision(t=X means items are                                Precisionrelevant if ranked X ormore)Cannot c...
Novel recommendations• Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s...
Observations• Explanations for unknown bands • Checked for 198 / 310• But also for known ones • 24 / 190• Helped to unders...
Interviews              User-interface Explanations Enjoyable          9             7  Useful            9             9 ...
Lessons learnt
Data quality• Issues with DBpedia properties  • Misused : dbprop:notableInstruments  • Wrongly defined : dbprop:klfsgProper...
Use, but replicate• More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia             ...
Use, but replicateSELECT ?labelWHERE {    ?x rdfs:label ?label .    { ?x a dbpedia:MusicArtist }    UNION    { ?x a dbpedi...
Use, but replicate• Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • ...
SPARQL, Be quick or be neat   • “List all artists / bands sharing common     property-values with the current one”     • F...
SPARQL, Be quick or be neat                                                                  Tests done in the local      ...
Next steps
Next steps•   Other data sources    •   FreeBase, MusicBrainz, etc.•   Distance improvement    •   Propagation, feature se...
Conclusion• Defined and applied a Semantic Distance  measure to Linked Data• Used it to build a end-user music  recommender...
Questions ?
Contact:alexandre.passant@deri.org - http://apassant.net - @terraces                   Acknowledgements:   Science Foundat...
Pictures credits•   http://flickr.com/photos/yumlog2/20896759/ by yuki*•   http://richard.cyganiak.de/2007/10/lod/ by Richa...
Dbrec - Music recommendations using DBpedia
Upcoming SlideShare
Loading in …5
×

Dbrec - Music recommendations using DBpedia

2,894 views

Published on

Slides of my ISWC2010 talk on dbrec. Shanghai, November 2010.

Published in: Technology

Dbrec - Music recommendations using DBpedia

  1. 1. dbrecMusic recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  2. 2. Good news, it doesn’t fitanymore in a slide !Many producers, only afew consumers (besidessearch engines): BBC,Drupal ,,,
  3. 3. Agenda• Semantic Distance over Linked Data• dbrec - architecture, dataset and UI• Evaluation• Lessons learnt• Next steps and conclusion
  4. 4. Semantic Distance
  5. 5. Semantic Distance over Linked Data• Relying only on links• Relying only on instance data• Using dereferencable URIs • And using resources following the LD principles
  6. 6. Linked Data
  7. 7. Linked Data e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  8. 8. G = (R, L, I) e:l1 e:r1 e:l1 e:r2• R = {r , r , ..., r } 1 2 n e:l2• L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3• I = {i , i , ..., i } 1 2 n e:r3 e:r4
  9. 9. e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  10. 10. e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4
  11. 11. e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4 e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  12. 12. e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4 e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4
  13. 13. e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  14. 14. LDSD
  15. 15. The LDSD ontology Our own ontology, but could map with MuSim in the future
  16. 16. dbrec
  17. 17. At a glance• A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL)• Integrating related Web data for an improved user-experience
  18. 18. Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) Useridentification computation interface RDF Data RDF Data
  19. 19. Dataset• Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples• Removing datatype properties • 2.2M (75%)• Merging /ontology and /property • 1.7M (55%)
  20. 20. Distribution 20K+ artists (50%) are not linked to any other artist
  21. 21. Curation• 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty)• 578 properties linking artist to resources • 183 used only once, 36 wrongly defined• 767 properties linking resources to artists • 336 used only once, 115 wrongly defined• Dataset reduced to 1M triples
  22. 22. Computing distance• 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06• 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  23. 23. Artist Distance Elvis Presley 0.0978June Carter Cash 0.1056 Willie Nelson 0.1322Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  24. 24. User interface
  25. 25. Sorry, slideshare people,that’s a movie so youwon’t be able to see it !
  26. 26. Evaluation
  27. 27. Evaluation settings• Off-line and on-line user evaluation • Using common RecSys metrics• 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  28. 28. Metrics• Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5• On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  29. 29. dbrec vs last.fm• Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  30. 30. Results for the precision(t=X means items are Precisionrelevant if ranked X ormore)Cannot compute recall dbrec dbrec(implies users know allbands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  31. 31. Novel recommendations• Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news !• Evaluated 274 of them on dbrec • 3.05(± 1.09)
  32. 32. Observations• Explanations for unknown bands • Checked for 198 / 310• But also for known ones • 24 / 190• Helped to understand the recommendation • Even if they already knew the band
  33. 33. Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10Easy to use 10 9 Confusing 0 2Complicated 0 2 Too geeky 1 6
  34. 34. Lessons learnt
  35. 35. Data quality• Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property• Requires data curation ! • Automated and manual
  36. 36. Use, but replicate• More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and• Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  37. 37. Use, but replicateSELECT ?labelWHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band }}
  38. 38. Use, but replicate• Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names• Network errors, etc. using different languages. So much more than 8 queries
  39. 39. SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  40. 40. SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  41. 41. Next steps
  42. 42. Next steps• Other data sources • FreeBase, MusicBrainz, etc.• Distance improvement • Propagation, feature selection, etc.• User Interface • User-friendly explanations• LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  43. 43. Conclusion• Defined and applied a Semantic Distance measure to Linked Data• Used it to build a end-user music recommender system, with ±40K artists• Evaluated it using RecSys metrics• Learnt several domain-independent lessons regarding LOD consumption
  44. 44. Questions ?
  45. 45. Contact:alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  46. 46. Pictures credits• http://flickr.com/photos/yumlog2/20896759/ by yuki*• http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch• http://flickr.com/photos/loungerie/2196866243/ by loungerie• http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor• http://flickr.com/photos/homer4k/461407380/ by homer4k• http://flickr.com/photos/jpellgen/2390204986/ by jpellgen• http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee• http://flickr.com/photos/28509009@N03/2668650475/ by marcreis• http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone

×