SlideShare a Scribd company logo
dbrec
Music recommendations using DBpedia
         Alexandre Passant - DERI, NUI Galway
                  In-Use Track @ ISWC2010
             11th November 2010, Shanghai, China
Good news, it doesn’t fit
anymore in a slide !

Many producers, only a
few consumers (besides
search engines): BBC,
Drupal ,,,
Agenda

• Semantic Distance over Linked Data
• dbrec - architecture, dataset and UI
• Evaluation
• Lessons learnt
• Next steps and conclusion
Semantic Distance
Semantic Distance over
    Linked Data
• Relying only on links
• Relying only on instance data
• Using dereferencable URIs
 • And using resources following the LD
    principles
Linked Data
Linked Data
              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
G = (R, L, I)
                                         e:l1

                           e:r1   e:l1          e:r2

• R = {r , r , ..., r }
          1    2       n
                                    e:l2

• L = {l , l , ..., l }
         1 2       n       e:l2   e:l3          e:l3


• I = {i , i , ..., i }
         1 2       n

                           e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
LDSD
The LDSD ontology




                Our own ontology, but
                could map with MuSim
                in the future
dbrec
At a glance
• A system providing recommendations for all
  DBpedia bands and artists (±40K) using LDSD
    • And explaining its recommendations
    • Both using Linked Data and Semantic
      Web standards (RDF, SPARQL)
• Integrating related Web data for an improved
  user-experience
Architecture
                (2) Dataset reducing




 (1) Dataset                       (3) LDSD                 (4) User
identification                     computation              interface

                    RDF Data                    RDF Data
Dataset
•   Retrieving all artists and bands in DBpedia (±40K)
    •   Including incoming / outcoming links
    •   Approximately 3M triples
•   Removing datatype properties
    •   2.2M (75%)
•   Merging /ontology and /property
    •   1.7M (55%)
Distribution




               20K+ artists (50%) are
               not linked to any other
               artist
Curation
• 118 properties linking artists together
  • 18 mis-used, 35 wrongly defined (e.g.
    dbprop:klfsgProperty)
• 578 properties linking artist to resources
  • 183 used only once, 36 wrongly defined
• 767 properties linking resources to artists
  • 336 used only once, 115 wrongly defined
• Dataset reduced to 1M triples
Computing distance
• 9,797 minutes
                 Done for all artists in
                 DBpedia
                                                Artist    Time (sec.)
                                              Ramones       25.20
  • 2 x AMD Opteron 250                     Johnny Cash     61.16
    4GB Ubuntu 8.10                              U2         50.06

• 50M triples                                The Clash
                                            Bad Religion
                                                            43.34
                                                            34.98
  • Modelled using the                     The Aggrolites    7.35
    LDSD ontology                            Janis Joplin   23.12
Artist        Distance
   Elvis Presley      0.0978
June Carter Cash      0.1056
  Willie Nelson       0.1322
Kris Kristofferson    0.1407
    Bob Dylan         0.1466
  Marty Robbins       0.1673
  Rosanne Cash        0.1782
 Charlie McCoy        0.1836
   Gene Autry         0.1910
    Carl Smith        0.1980
User interface
Sorry, slideshare people,
that’s a movie so you
won’t be able to see it !
Evaluation
Evaluation settings
• Off-line and on-line user evaluation
 • Using common RecSys metrics
• 10 subjects
 • 2 women, 8 men
 • 24 to 34 years old
 • 35 to 55 minutes per interview, F2F
Metrics
•   Off-line evaluation - comparison with last.fm
    •   5 artists / bands
    •   2 blind list, 10 ranked recommendations per list
    •   Marks from 1 to 5
•   On-line recommendation - dbrec only
    •   5 artists / bands
    •   Browsing recommendations using dbrec
    •   Marks from 1 to 5, plus observations and interviews
dbrec vs last.fm

• Average mark of recommendations
 • 3.37(±1.19)
 • 3.44(±1.25) w/ on-line
 • 3.69(±1.01) for last.fm
Results for the precision
(t=X means items are
                                Precision
relevant if ranked X or
more)

Cannot compute recall

                                 dbrec           dbrec
(implies users know all
bands in the system)
                                                             last.fm
                                (off-line)   (off+on-line)
                          t=2    92.05          90.59        98.32

                          t=3    76.63          77.72        87.91

                          t=4    49.06          51.23        58.05

                          t=5    20.09            25         25.165
Novel recommendations
• Lots of unknown recommendations
 • 62% for dbrec (59.6% w/ on-line)
 • 40.4% for last.fm
 • But that’s a good news !
• Evaluated 274 of them on dbrec
 • 3.05(± 1.09)
Observations
• Explanations for unknown bands
 • Checked for 198 / 310
• But also for known ones
 • 24 / 190
• Helped to understand the recommendation
 • Even if they already knew the band
Interviews
              User-interface Explanations
 Enjoyable          9             7
  Useful            9             9
 Enriching         8             10
Easy to use        10             9
 Confusing         0              2
Complicated        0              2
 Too geeky         1              6
Lessons learnt
Data quality
• Issues with DBpedia properties
  • Misused : dbprop:notableInstruments
  • Wrongly defined : dbprop:klfsgProperty
  • Duplicates : /ontology versus /property
• Requires data curation !
  • Automated and manual
Use, but replicate
• More and more public SPARQL endpoints
 • Often limited to X max results
 • 5,000 on DBpedia              But, that’s fair enough.

                                 Hosting a SPARQL
                                 endpoint is costly and


• Difficult to use in production
                                 opening-it up fully to
                                 anyone would require lots
                                 of maintenance, etc.



 • Requires local replica
 • But implies synchronisation !
Use, but replicate
SELECT ?label
WHERE {
    ?x rdfs:label ?label .
    { ?x a dbpedia:MusicArtist }
    UNION
    { ?x a dbpedia:Band }
}
Use, but replicate

• Names of all DBpedia artists
 • Get number of results w/ COUNT
 • Run n/5000 queries (LIMIT + OFFSET)
 • Recompose results         The query had more than
                             40K results, since most
                             artists got their names


• Network errors, etc.
                             using different
                             languages.

                             So much more than 8
                             queries
SPARQL, Be quick or be neat
   • “List all artists / bands sharing common
     property-values with the current one”
     • Fits in a single SPARQL query
     • But does not scale
   • “Optimisation” has to be done manually by
     splitting the query and recomposing results
     using an external script
SPARQL, Be quick or be neat
                                                                  Tests done in the local
                                                                  RDF store

                                                                  1: full-query
                                                                  2: split by property
                                                                  3: split by property-
                                                                  object

                                                                  Up to 75% faster

                   Direct SPARQL       Property-slicing      Complete-slicing
                 Queries     Time    Queries       Time    Queries           Time
  Ramones          1        139.97     20         109.51     66              37.84
 Johnny Cash       1        257.81     30         152.60    135              75.35
     U2            1        155.53     22         122.91     70              44.03
  The Clash        1        146.43     20         110.84     79              42.61
 Bad Religion      1        104.08     23          86.49     97              47.35
The Aggrolites     1        145.92     13         114.52     28              28.33
 Janis Joplin      1        230.88     27         151.00     98              62.81
Next steps
Next steps
•   Other data sources
    •   FreeBase, MusicBrainz, etc.
•   Distance improvement
    •   Propagation, feature selection, etc.
•   User Interface
    •   User-friendly explanations
•   LOD-compliance
    •   Mapping with other ontologies, SPARQL endpoint
Conclusion
• Defined and applied a Semantic Distance
  measure to Linked Data
• Used it to build a end-user music
  recommender system, with ±40K artists
• Evaluated it using RecSys metrics
• Learnt several domain-independent lessons
  regarding LOD consumption
Questions ?
Contact:
alexandre.passant@deri.org - http://apassant.net - @terraces

                   Acknowledgements:
   Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2)

                       References:
    AIII Spring Symposium 2010 - LinkedAI Symposium
                 ESWC2010 - Demo Track
                  ISWC2010 - In-Use Track
Pictures credits
•   http://flickr.com/photos/yumlog2/20896759/ by yuki*

•   http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch

•   http://flickr.com/photos/loungerie/2196866243/ by loungerie

•   http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor

•   http://flickr.com/photos/homer4k/461407380/ by homer4k

•   http://flickr.com/photos/jpellgen/2390204986/ by jpellgen

•   http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee

•   http://flickr.com/photos/28509009@N03/2668650475/ by marcreis

•   http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone

More Related Content

Similar to Dbrec - Music recommendations using DBpedia

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
Ted Dunning
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
MapR Technologies
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
chizhangufl
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
Ted Dunning
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
Ted Dunning
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
MapR Technologies
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
Lucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
Jose Quesada
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
Duyhai Doan
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
Duyhai Doan
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
Duyhai Doan
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
Erik Bernhardsson
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Hakka Labs
 
Digital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDFDigital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDF
Md Mazedul Islam Khan
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
Skills Matter
 

Similar to Dbrec - Music recommendations using DBpedia (20)

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
 
Digital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDFDigital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDF
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 

More from Alexandre Passant

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discovery
Alexandre Passant
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
Alexandre Passant
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Alexandre Passant
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talk
Alexandre Passant
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
Alexandre Passant
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next Generation
Alexandre Passant
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
Alexandre Passant
 
i-Semantics panel
i-Semantics paneli-Semantics panel
i-Semantics panel
Alexandre Passant
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
Alexandre Passant
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic Microblogging
Alexandre Passant
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversations
Alexandre Passant
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
Alexandre Passant
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
Alexandre Passant
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Alexandre Passant
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:store
Alexandre Passant
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate Blogging
Alexandre Passant
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Alexandre Passant
 
The Social Web
The Social WebThe Social Web
The Social Web
Alexandre Passant
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online Communities
Alexandre Passant
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0
Alexandre Passant
 

More from Alexandre Passant (20)

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discovery
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talk
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next Generation
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
 
i-Semantics panel
i-Semantics paneli-Semantics panel
i-Semantics panel
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic Microblogging
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversations
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en Entreprise
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:store
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate Blogging
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
 
The Social Web
The Social WebThe Social Web
The Social Web
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online Communities
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0
 

Recently uploaded

"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 

Recently uploaded (20)

"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 

Dbrec - Music recommendations using DBpedia

  • 1. dbrec Music recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  • 2. Good news, it doesn’t fit anymore in a slide ! Many producers, only a few consumers (besides search engines): BBC, Drupal ,,,
  • 3.
  • 4. Agenda • Semantic Distance over Linked Data • dbrec - architecture, dataset and UI • Evaluation • Lessons learnt • Next steps and conclusion
  • 6. Semantic Distance over Linked Data • Relying only on links • Relying only on instance data • Using dereferencable URIs • And using resources following the LD principles
  • 8. Linked Data e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 9. G = (R, L, I) e:l1 e:r1 e:l1 e:r2 • R = {r , r , ..., r } 1 2 n e:l2 • L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3 • I = {i , i , ..., i } 1 2 n e:r3 e:r4
  • 10. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 11. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 12. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 13. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 14. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 15. LDSD
  • 16. The LDSD ontology Our own ontology, but could map with MuSim in the future
  • 17. dbrec
  • 18. At a glance • A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL) • Integrating related Web data for an improved user-experience
  • 19. Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) User identification computation interface RDF Data RDF Data
  • 20. Dataset • Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples • Removing datatype properties • 2.2M (75%) • Merging /ontology and /property • 1.7M (55%)
  • 21. Distribution 20K+ artists (50%) are not linked to any other artist
  • 22. Curation • 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty) • 578 properties linking artist to resources • 183 used only once, 36 wrongly defined • 767 properties linking resources to artists • 336 used only once, 115 wrongly defined • Dataset reduced to 1M triples
  • 23. Computing distance • 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06 • 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  • 24. Artist Distance Elvis Presley 0.0978 June Carter Cash 0.1056 Willie Nelson 0.1322 Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  • 26. Sorry, slideshare people, that’s a movie so you won’t be able to see it !
  • 28. Evaluation settings • Off-line and on-line user evaluation • Using common RecSys metrics • 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  • 29. Metrics • Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5 • On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  • 30. dbrec vs last.fm • Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  • 31. Results for the precision (t=X means items are Precision relevant if ranked X or more) Cannot compute recall dbrec dbrec (implies users know all bands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  • 32. Novel recommendations • Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news ! • Evaluated 274 of them on dbrec • 3.05(± 1.09)
  • 33. Observations • Explanations for unknown bands • Checked for 198 / 310 • But also for known ones • 24 / 190 • Helped to understand the recommendation • Even if they already knew the band
  • 34. Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10 Easy to use 10 9 Confusing 0 2 Complicated 0 2 Too geeky 1 6
  • 36. Data quality • Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property • Requires data curation ! • Automated and manual
  • 37. Use, but replicate • More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and • Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  • 38. Use, but replicate SELECT ?label WHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band } }
  • 39. Use, but replicate • Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names • Network errors, etc. using different languages. So much more than 8 queries
  • 40. SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  • 41. SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35 The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  • 43. Next steps • Other data sources • FreeBase, MusicBrainz, etc. • Distance improvement • Propagation, feature selection, etc. • User Interface • User-friendly explanations • LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  • 44. Conclusion • Defined and applied a Semantic Distance measure to Linked Data • Used it to build a end-user music recommender system, with ±40K artists • Evaluated it using RecSys metrics • Learnt several domain-independent lessons regarding LOD consumption
  • 46. Contact: alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  • 47. Pictures credits • http://flickr.com/photos/yumlog2/20896759/ by yuki* • http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch • http://flickr.com/photos/loungerie/2196866243/ by loungerie • http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor • http://flickr.com/photos/homer4k/461407380/ by homer4k • http://flickr.com/photos/jpellgen/2390204986/ by jpellgen • http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee • http://flickr.com/photos/28509009@N03/2668650475/ by marcreis • http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone