Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mapping, Interlinking and Exposing MusicBrainz as Linked Data


Published on

Slides from my keynote at the 1st International Workshop on Semantic Music and Media (SMAM2013)

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Mapping, Interlinking and Exposing MusicBrainz as Linked Data

  1. 1. Mapping,  Interlinking  and   Exposing  MusicBrainz  as   Linked  Data   1st  Interna*onal  Workshop  on     Seman*c  Music  and  Media  (SMAM2013)   Sydney,  Oct  21,  2013   Peter  Haase  
  2. 2. What  this  talk  is  about   A  Linked  Data  Perspec=ve   worksOn publishedTo affiliation affiliation (previous) isAbout builtWith participatesIn participatesIn
  3. 3. EUCLID:  EdUca=onal  Curriculum  for  the   usage  of  LinkedData Course eBook Other channels @euclid_project euclidproject euclidproject
  4. 4. Analysis  &   Mining  Module   Visualiza*on   Module   RDFa   Data acquisition LD Dataset Access Application EUCLID  Scenario   SPARQL Endpoint Vocabulary   Mapping   Publishing Interlinking   Physical  Wrapper   Streaming providers Downloads Musical Content Cleansing   LD  Wrapper   R2R  Transf.   Integrated   Dataset   LD  Wrapper   RDF/   XML   Metadata Other content
  5. 5. MusicBrainz   •  MusicBrainz  is  an  open  music  encyclopedia  that  collects   music  metadata  and  makes  it  available  to  the  public.   •  MusicBrainz  aims  to  be:   •   The  ul=mate  source  of  music  informa=on  by  allowing  anyone  to   contribute  and  releasing  the  data  under  open  licenses.   •   The  universal  lingua  franca  for  music  by  providing  a  reliable  and   unambiguous  form  of   music  iden*fica*on,  enabling  both  people  and  machines  to  have  meaningful   conversa*ons  about  music.   •  Like  Wikipedia,  MusicBrainz  is  maintained  by  a  global   community  of  users  and  we  want  everyone  —  including   you  —  to  par*cipate  and  contribute.   •  MusicBrainz  is  operated  by  the   MetaBrainz  Founda*on,  dedicated  to  keeping   MusicBrainz  free  and  open  source.  
  6. 6. LD  Dataset   Access   Publishing  Rela=onal  Databases  as  RDF:   W3C  RDB2RDF   SPARQL   Endpoint   Publishing   Integrated   Data  in   Triplestore   Vocabulary   Mapping   Interlinking   R2RML   Engine   Cleansing   Task:  Publish  data  from   rela*onal  DBMS  as     Linked  Data     Approach:  map  from   rela*onal  schema  to   seman*c  vocabulary  with   R2RML     Publishing:  two  alterna*ves  –   Data  acquisi*on   •  •  Rela*onal   DBMS   Translate  SPARQL  into  SQL  on   the  fly   Batch  transform  data  into   RDF,  infer,  index  ,  integrate   and  provide  SPARQL  access  in   a  triplestore  
  7. 7. Publishing  MusicBrainz   h"ps://;on_Schema   MusicBrainz  DB      h"p://   Music   Ontology   R2RML   Concrete  Example  Mapping   Table  Recording(gid,  length)   R2RML  Mapping   Ontology  concept  mo:recording    
  8. 8. MusicBrainz  Next  Gen  Schema   ar=st    As  pre-­‐NGS,  but              further  a`ributes   ar=st_credit    Allows  joint  credit   release_group    Cf.  ‘album’            versus:   •  work   release   •  track   medium     •  tracklist   •  recording
  9. 9. Music  Ontology   OWL  ontology  with  following  core  concepts  (classes)  and   rela*onships  (proper*es):   Source:
  10. 10. R2RML  Class  Mapping   Mapping  tables  to  classes  is  ‘easy’:     lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .    
  11. 11. R2RML  Property  Mapping   Mapping  columns  to  proper*es  can  be  easy:     lb:artist_name  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery            """SELECT  artist.gid,                    FROM  artist                    INNER  JOIN  artist_name  ON  ="""]  ;      rr:subjectMap  [rr:template                                            "{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  foaf:name  ;            rr:objectMap  [rr:column  "name"]]  .  
  12. 12. NGS  Advanced  Rela=ons   Major  en**es  (Ar*st,  Release  Group,  Track,  etc.)  plus  URL   are  paired    (l_ar*st_ar*st)   Each  pairing    of  instances    refers  to  a  Link   Links  have  types      (cf.  RDF  proper*es)    and  a`ributes
  13. 13. R2RML  Mapping  Editor   R2RML: Expose data from relational DBMS as RDF / via SPARQL Endpoint Problem: R2RML Mappings are hard to create R2RML   Engine   R2RML   Mappings   R2RML  Edi*ng  Made  Easy!   Hides  vocabulary  intricacies  from  end-­‐user   Access  to  metadata  about  rela*onal  databases   Preview  of  generated  triples  and  SQL  queries   Very  expressive  (Supports  most  of  R2RML)   SPARQL  Endpoint   Rela*onal   Database   See our R2RML Mapping Editor in the ISWC Demo Session on Wednesday!
  14. 14. Scale   MusicBrainz  RDF  derived  via  R2RML:   150M Triples lb:artist_member  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery          """SELECT  a1.gid,  a2.gid  AS  band                FROM  artist  a1                    INNER  JOIN  l_artist_artist  ON  =   l_artist_artist.entity0                      INNER  JOIN  link  ON  =                      INNER  JOIN  link_type  ON  link_type  =                      INNER  JOIN  artist  a2  on  l_artist_artist.entity1  =                  WHERE   link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'"""]  ;      rr:subjectMap  [rr:template  "{gid} #_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:member_of  ;            rr:objectMap  [rr:template  "{band} #_"  ;                                        rr:termType  rr:IRI]]  .  
  15. 15. Some  Sta=s=cs  –  RDF  Dump   (Lead) Table area artist dbpedia label medium recording release_group release track work Triples 59798 36868228 172017 201832 18069143 11400354 3050818 9764887 75506495 1728955 156822527 Time (s) 2 423 13 3 163 209 31 151 794 20 1809
  16. 16. Informa=on  Workbench   PlaGorm  for  Linked  Data  Applica=ons   §  Seman*cs-­‐  &  Linked  Data-­‐based   integra=on  of  private  and  public   data  sources  based  on  data   providers   •  •  •  Generic  and  specific  providers  for   various  data  formats  and  sources   Supports  established  mapping   frameworks  (e.g.  R2RML,  SILK,  …)   Named  graphs  for  managing   contexts  and  provenance   §  Intelligent  Data  Access  and  Analy=cs   •  •  •  Flexible  self-­‐service  UI   Visualiza*on,  explora*on,   dashboarding  and  repor*ng   Seman*c  search   §  Collabora=on  and  knowledge   management   •  •  Cura*on  &  authoring   Collabora*ve  workflows   §    Open  standards  and  technologies   •  •  •  Seman*c  Wiki  based  frontend     (Using  SMW  Syntax)     Suppor*ng  W3C  standards  (OWL,  RDF,   SPARQL,,  …)   Community  Edi*on  (Open  Source)  +   Enterprise  Edi*on  (Commercial)  
  17. 17. Realiza=on  within  the     Informa=on  Workbench  Architecture   Customized  applica*on   solu*ons   Reusable  UI  and  data   integra*on  components     Data  storage  and   management  plajorm   External  resources  to  reuse   data  and  create  mashups  
  18. 18. The  “MusicBrainz  Explorer”  Applica=on   Music Ontology Ontology Data R2RML Data Providers Templates Widgets
  19. 19. Ontology  as  a  “Structural  Backbone”   Resource  page         Defining   UI   structure   Resource  page         mo:Track   mo:Ar=st   Defining   data   structure   rdf:type   Yesterday   UI  templates   Template:  …     Template:mo:Track       Template:mo:Ar=st               Ontology   (RDFS/OWL)   rdf:type   The_Beatles   RDF  Data   Graph  
  20. 20. Information  Workbench:     Browsing  a  Music  Artist  
  21. 21. Information  Workbench:     Visualization  techniques  
  22. 22. Naviga=on  Through  the  Data   Source:
  23. 23. SPARQL  visualization   Top ten The Beatles releases according to the sum of track durations in minutes SPARQL  Query     SELECT  ?release                  ((SUM(xsd:double(?duration/60000)))  AS  ?avg)     WHERE  {      <>                    foaf:made  ?release  .    ?release  mo:record  ?record  .    ?record  mo:track  ?track  .    ?track  mo:duration  ?duration  .}     GROUP  BY  ?release   ORDER  BY  DESC(?avg)   LIMIT  10   Result  set  
  24. 24. SPARQL  visualization   Top ten The Beatles releases according to the sum of track durations in minutes Widget   {{#widget:  BarChart  |   query  ='SELECT  (COUNT(?Release)  AS  ?COUNT)  ? label  WHERE  {         <­‐ca0b-­‐4321-­‐b7e5-­‐ cff6565dd4c0#_>  foaf:made  ?Release.      ?Release  rdf:type  mo:Release  .    ?Release  dc:title  ?label  .}   GROUP  BY  ?label   ORDER  BY  DESC(?COUNT)   LIMIT  20'   |  settings  =  'Settings:barvertical_mb'     |  asynch  =  'true'   |  input  =  'label'   |  output  =  'COUNT'   |  height  =  '300’}}   Visualization:  Bar  chart  
  25. 25. Information  Workbench:     SPARQL  visualization   Top ten The Beatles releases according to the sum of track durations in minutes Other  visualiza*ons  of  the  same  result  set  …   Line  chart:   Pie  chart:  
  26. 26. Automated  Widget  Suggestion   1   Table Pivot view Bar chart Line chart Pie chart 2   Select a suggested visualization 3   Visualization automatically built
  27. 27. Try  it  out!   R2RML  Mappings   •  h`ps://­‐R2RML   MusicBrainz  RDF  Dump   •  h`p://   MusicBrainz  Linked  Data  Demo  system   •  h`p://   Informa*on  Workbench   •  h`p://*on-­‐workbench/   Euclid  Project   •      h`p://euclid-­‐  
  28. 28. Acknowledgements   The  Euclid  Project   Barry  Norton     Michael  Meier   Andriy  Nikolov   Yves  Raimond   Kurt  Jacobson   Thomas  Gaengler   Juan  Sequeda   Simon  Dixon     (in  no  par;cular  order)    
  29. 29. Thank  you!   Contact     Peter  Haase   fluid  Opera*ons  AG   Altro`str.  31   Walldorf   Germany     +49  (0)  6227  358087-­‐0