• Save
Dbrec - Music recommendations using DBpedia
Upcoming SlideShare
Loading in...5
×
 

Dbrec - Music recommendations using DBpedia

on

  • 2,869 views

Slides of my ISWC2010 talk on dbrec. Shanghai, November 2010.

Slides of my ISWC2010 talk on dbrec. Shanghai, November 2010.

Statistics

Views

Total Views
2,869
Views on SlideShare
2,866
Embed Views
3

Actions

Likes
5
Downloads
0
Comments
1

2 Embeds 3

https://www.linkedin.com 2
http://apassant.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Interesting proof-of-concept demonstration of an explanatory music recommendation approach that utilizes Semantic Web technologies.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Dbrec - Music recommendations using DBpedia Dbrec - Music recommendations using DBpedia Presentation Transcript

  • dbrecMusic recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  • Good news, it doesn’t fitanymore in a slide !Many producers, only afew consumers (besidessearch engines): BBC,Drupal ,,,
  • Agenda• Semantic Distance over Linked Data• dbrec - architecture, dataset and UI• Evaluation• Lessons learnt• Next steps and conclusion
  • Semantic Distance
  • Semantic Distance over Linked Data• Relying only on links• Relying only on instance data• Using dereferencable URIs • And using resources following the LD principles
  • Linked Data
  • Linked Data e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  • G = (R, L, I) e:l1 e:r1 e:l1 e:r2• R = {r , r , ..., r } 1 2 n e:l2• L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3• I = {i , i , ..., i } 1 2 n e:r3 e:r4
  • e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  • e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4
  • e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4 e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  • e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4 e:l1 e:l1e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2e:l2 e:l3 e:l3 e:l2 e:l3 e:l3e:r3 e:r4 e:r3 e:r4
  • e:l1e:r1 e:l1 e:r2 e:l2e:l2 e:l3 e:l3e:r3 e:r4
  • LDSD
  • The LDSD ontology Our own ontology, but could map with MuSim in the future
  • dbrec
  • At a glance• A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL)• Integrating related Web data for an improved user-experience
  • Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) Useridentification computation interface RDF Data RDF Data
  • Dataset• Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples• Removing datatype properties • 2.2M (75%)• Merging /ontology and /property • 1.7M (55%)
  • Distribution 20K+ artists (50%) are not linked to any other artist
  • Curation• 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty)• 578 properties linking artist to resources • 183 used only once, 36 wrongly defined• 767 properties linking resources to artists • 336 used only once, 115 wrongly defined• Dataset reduced to 1M triples
  • Computing distance• 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06• 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  • Artist Distance Elvis Presley 0.0978June Carter Cash 0.1056 Willie Nelson 0.1322Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  • User interface
  • Sorry, slideshare people,that’s a movie so youwon’t be able to see it !
  • Evaluation
  • Evaluation settings• Off-line and on-line user evaluation • Using common RecSys metrics• 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  • Metrics• Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5• On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  • dbrec vs last.fm• Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  • Results for the precision(t=X means items are Precisionrelevant if ranked X ormore)Cannot compute recall dbrec dbrec(implies users know allbands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  • Novel recommendations• Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news !• Evaluated 274 of them on dbrec • 3.05(± 1.09)
  • Observations• Explanations for unknown bands • Checked for 198 / 310• But also for known ones • 24 / 190• Helped to understand the recommendation • Even if they already knew the band
  • Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10Easy to use 10 9 Confusing 0 2Complicated 0 2 Too geeky 1 6
  • Lessons learnt
  • Data quality• Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property• Requires data curation ! • Automated and manual
  • Use, but replicate• More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and• Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  • Use, but replicateSELECT ?labelWHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band }}
  • Use, but replicate• Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names• Network errors, etc. using different languages. So much more than 8 queries
  • SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  • SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  • Next steps
  • Next steps• Other data sources • FreeBase, MusicBrainz, etc.• Distance improvement • Propagation, feature selection, etc.• User Interface • User-friendly explanations• LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  • Conclusion• Defined and applied a Semantic Distance measure to Linked Data• Used it to build a end-user music recommender system, with ±40K artists• Evaluated it using RecSys metrics• Learnt several domain-independent lessons regarding LOD consumption
  • Questions ?
  • Contact:alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  • Pictures credits• http://flickr.com/photos/yumlog2/20896759/ by yuki*• http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch• http://flickr.com/photos/loungerie/2196866243/ by loungerie• http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor• http://flickr.com/photos/homer4k/461407380/ by homer4k• http://flickr.com/photos/jpellgen/2390204986/ by jpellgen• http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee• http://flickr.com/photos/28509009@N03/2668650475/ by marcreis• http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone