Hello Cleveland!Linked Data Publication of Live Music Archives      Sean Bechhofer*, Kevin Page+, David De Roure+    *Scho...
The Proposition๏ Publication of structured metadata describing an audio  collection๏ Links to external resources provide a...
The Players• The Internet Archive Live Music Archive  ✦      Community contributed live audio recordings• Semantic Technol...
The etree Collection• Internet Archive Live Music Archive• Community contributed live performance recordings  ✦          “...
Semantic Technologies• Semantic Technologies aim to provide structured, machine  readable representations of content  ✦   ...
Semantic Technologies                    RDF                       OWL/SKOS•       Triple Based Representation   • Shared ...
Linked Data• A set of common principles for data publication    1.   Use URIs for identification    2.   Use HTTP URIs (tha...
Linked Data Resources• MusicBrainz  ✦      RDF conversions of MusicBrainz data• Geonames  ✦      Information about locatio...
Data mangling• Download of etree metadata files• Simple data conversion  ✦      XML to RDF  ✦      etree data model• Alignm...
ModellingMusic OntologyEvent Ontology                             10
Data Alignment• MusicBrainz  ✦      Artist alignment via simple name queries• Geographical Locations  ✦      Query against...
Layering• Alignments are captured in an additional layer of data on top of  the underlying source facts• Preserving origin...
ModellingSimilarity Ontology                                  13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Big Picture              28
Discussion• So far entirely metadata based  ✦      No processing of underlying audio• Alignment is a little messy  ✦      ...
The Future• Better alignment  ✦      Beyond simple string queries• More alignment  ✦      Adding in, e.g. MusicBrainz trac...
Thanks! You’ve been a   great audience!http://etree.linkedmusic.org                               31
Upcoming SlideShare
Loading in …5
×

Linked Data Publication of Live Music Archives

1,346 views

Published on

A talk given at DMRN+7, QMUL, December 2012

  • Be the first to comment

Linked Data Publication of Live Music Archives

  1. 1. Hello Cleveland!Linked Data Publication of Live Music Archives Sean Bechhofer*, Kevin Page+, David De Roure+ *School of Computer Science, University of Manchester +Oxford eResearch Centre, University of Oxford @seanbechhofer DMRN+7, QMUL, December 2012
  2. 2. The Proposition๏ Publication of structured metadata describing an audio collection๏ Links to external resources provide additional context and information๏ Rich query to allow the extraction of “interesting” subcollections 2
  3. 3. The Players• The Internet Archive Live Music Archive ✦ Community contributed live audio recordings• Semantic Technologies ✦ RDF, Ontologies, SPARQL and Linked Data• Additional resources ✦ Artist DBs, Geographical Information,Venue information, etc.• Some ruby scripts..... 3
  4. 4. The etree Collection• Internet Archive Live Music Archive• Community contributed live performance recordings ✦ “Legal bootlegs”• Approx 4,000 artists, ✦ 100,000 performances• Why is it interesting? ✦ Audio available in various formats ✤ mp3, ogg, shn, flac.... ✦ Multiple performances by artists ✦ Cover versions 4
  5. 5. Semantic Technologies• Semantic Technologies aim to provide structured, machine readable representations of content ✦ Unified frameworks for (meta)data• RDF: Resource Description Framework ✦ Triple based representation of information• OWL/SKOS: Ontologies & Vocabularies for content description ✦ Shared vocabularies plus definitional capabilities• SPARQL ✦ A query language for RDF data ✦ A generic API 5
  6. 6. Semantic Technologies RDF OWL/SKOS• Triple Based Representation • Shared Vocabularies for• Common Data Model content description• Identification via URIs ✦ Facilitating interoperation and exchange• Easy Integration ✦ Everybody talks the same ✦ Graph Merging language • OWL allows for rich• Query via SPARQL expressions and definitions A flexible, generic API • SKOS supports simpler ✦ thesauri/controlled vocabularies 6
  7. 7. Linked Data• A set of common principles for data publication 1. Use URIs for identification 2. Use HTTP URIs (that will dereference) 3. Return useful information when dereferenced 4. Include links in that information• Common infrastructure facilitates construction of applications.• Use of content negotiation to supply “appropriate” representations 7
  8. 8. Linked Data Resources• MusicBrainz ✦ RDF conversions of MusicBrainz data• Geonames ✦ Information about locations• DBpedia ✦ Structured representation of Wikipedia content• BBC ✦ Programme information, artist information 8
  9. 9. Data mangling• Download of etree metadata files• Simple data conversion ✦ XML to RDF ✦ etree data model• Alignments ✦ String matching plus bespoke methods for locations ✦ Explicit capture of alignments• Publication Infrastructure ✦ fuseki server + pubby front end 9
  10. 10. ModellingMusic OntologyEvent Ontology 10
  11. 11. Data Alignment• MusicBrainz ✦ Artist alignment via simple name queries• Geographical Locations ✦ Query against Geonames ✦ Query against last.fm ✦ Combination of string matching and lat/long 11
  12. 12. Layering• Alignments are captured in an additional layer of data on top of the underlying source facts• Preserving original metadata Allows clients to make their own judgements sameAs ✦ ✦ Preserves subjectivity• Explicitly exposing the source of the mappings ✦ Use of Provenance vocabularies 12
  13. 13. ModellingSimilarity Ontology 13
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. 18
  19. 19. 19
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. 25
  26. 26. 26
  27. 27. 27
  28. 28. Big Picture 28
  29. 29. Discussion• So far entirely metadata based ✦ No processing of underlying audio• Alignment is a little messy ✦ But has to be automated• Dataset itself is an interesting artefact ✦ Contrasts with some other LD activities.• Is this actually useful? Do artists really get a better reception when they play in their home town? 29
  30. 30. The Future• Better alignment ✦ Beyond simple string queries• More alignment ✦ Adding in, e.g. MusicBrainz track/work resources ✦ Other collections? ✦ Modelling questions• Characterising Alignments• Audio Fingerprinting ✦ Identifying further track level matches• Crowdsourcing corrections• Extracting subcollections ✦ What would you want?? 30
  31. 31. Thanks! You’ve been a great audience!http://etree.linkedmusic.org 31

×