SlideShare a Scribd company logo
1 of 31
Download to read offline
Hello Cleveland!



Linked Data Publication of Live Music Archives
      Sean Bechhofer*, Kevin Page+, David De Roure+
    *School of Computer Science, University of Manchester
       +Oxford eResearch Centre, University of Oxford


                      @seanbechhofer

             DMRN+7, QMUL, December 2012
The Proposition
๏ Publication of structured metadata describing an audio
  collection

๏ Links to external resources provide additional context
  and information

๏ Rich query to allow the extraction of “interesting”
  subcollections




                                                           2
The Players
• The Internet Archive Live Music Archive
  ✦
      Community contributed live audio recordings


• Semantic Technologies
  ✦
      RDF, Ontologies, SPARQL and Linked Data


• Additional resources
  ✦
      Artist DBs, Geographical Information,Venue information, etc.

• Some ruby scripts.....


                                                                     3
The etree Collection
• Internet Archive Live Music Archive
• Community contributed live performance recordings
  ✦
          “Legal bootlegs”
• Approx 4,000 artists,
  ✦
          100,000 performances
• Why is it interesting?
  ✦
          Audio available in various formats
      ✤
            mp3, ogg, shn, flac....
  ✦
          Multiple performances by artists
  ✦
          Cover versions


                                                      4
Semantic Technologies
• Semantic Technologies aim to provide structured, machine
  readable representations of content
  ✦
      Unified frameworks for (meta)data


• RDF: Resource Description Framework
  ✦
      Triple based representation of information
• OWL/SKOS: Ontologies & Vocabularies for content description
  ✦
      Shared vocabularies plus definitional capabilities
• SPARQL
  ✦
      A query language for RDF data
  ✦
      A generic API

                                                                5
Semantic Technologies
                    RDF                       OWL/SKOS
•       Triple Based Representation   • Shared Vocabularies for
•       Common Data Model               content description
•       Identification via URIs         ✦
                                           Facilitating interoperation and
                                           exchange
•       Easy Integration               ✦
                                           Everybody talks the same
    ✦
         Graph Merging                     language
                                      • OWL allows for rich
• Query via SPARQL                      expressions and definitions
         A flexible, generic API
                                      • SKOS supports simpler
    ✦




                                        thesauri/controlled
                                        vocabularies
                                                                             6
Linked Data
• A set of common principles for data publication

    1.   Use URIs for identification
    2.   Use HTTP URIs (that will dereference)
    3.   Return useful information when dereferenced
    4.   Include links in that information

• Common infrastructure facilitates construction of applications.
• Use of content negotiation to supply “appropriate”
  representations

                                                                    7
Linked Data Resources
• MusicBrainz
  ✦
      RDF conversions of MusicBrainz data
• Geonames
  ✦
      Information about locations
• DBpedia
  ✦
      Structured representation of Wikipedia content
• BBC
  ✦
      Programme information, artist information




                                                       8
Data mangling
• Download of etree metadata files
• Simple data conversion
  ✦
      XML to RDF
  ✦
      etree data model
• Alignments
  ✦
      String matching plus bespoke
      methods for locations
  ✦
      Explicit capture of alignments
• Publication Infrastructure
  ✦
      fuseki server + pubby front end



                                        9
Modelling




Music Ontology
Event Ontology
                             10
Data Alignment
• MusicBrainz
  ✦
      Artist alignment via simple name queries


• Geographical Locations
  ✦
      Query against Geonames
  ✦
      Query against last.fm
  ✦
      Combination of string matching and lat/long




                                                    11
Layering
• Alignments are captured in an additional layer of data on top of
  the underlying source facts
• Preserving original metadata
      Allows clients to make their own judgements
                                                    sameAs
  ✦


  ✦
      Preserves subjectivity
• Explicitly exposing the source of the mappings
  ✦
      Use of Provenance vocabularies




                                                                 12
Modelling



Similarity Ontology




                                  13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Big Picture




              28
Discussion
• So far entirely metadata based
  ✦
      No processing of underlying audio
• Alignment is a little messy
  ✦
      But has to be automated
• Dataset itself is an interesting artefact
  ✦
      Contrasts with some other LD activities.
• Is this actually useful?


             Do artists really get a better reception when
                    they play in their home town?

                                                             29
The Future
• Better alignment
  ✦
      Beyond simple string queries
• More alignment
  ✦
      Adding in, e.g. MusicBrainz track/work resources
  ✦
      Other collections?
  ✦
      Modelling questions
• Characterising Alignments
• Audio Fingerprinting
  ✦
      Identifying further track level matches
• Crowdsourcing corrections
• Extracting subcollections
  ✦
      What would you want??
                                                         30
Thanks! You’ve been a
   great audience!




http://etree.linkedmusic.org
                               31

More Related Content

Similar to Linked Data Publication of Live Music Archives

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Resource and Metadata Management with a Linked Data perspective
Resource and Metadata Management with a Linked Data perspectiveResource and Metadata Management with a Linked Data perspective
Resource and Metadata Management with a Linked Data perspectiveHannes Ebner
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly CommunicationBernhard Haslhofer
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data LakeAaron Cordova
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solrLucidworks (Archived)
 
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Baden Hughes
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataScott Sosna
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcherLIBER Europe
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...The European Library
 
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)IWMW
 
London HUG
London HUGLondon HUG
London HUGBoudicca
 
Common Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web DataCommon Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web Datahuguk
 
Performance of graph query languages
Performance of graph query languagesPerformance of graph query languages
Performance of graph query languagesAthiq Ahamed
 
The making of planaby 2
The making of planaby 2The making of planaby 2
The making of planaby 2jtayler
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 

Similar to Linked Data Publication of Live Music Archives (20)

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Resource and Metadata Management with a Linked Data perspective
Resource and Metadata Management with a Linked Data perspectiveResource and Metadata Management with a Linked Data perspective
Resource and Metadata Management with a Linked Data perspective
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly Communication
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open Data
 
Repositioning realignment and the researcher
Repositioning realignment and the researcherRepositioning realignment and the researcher
Repositioning realignment and the researcher
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)
IWMW 2003: Semantic Web Technologies for UK HE and FE Institutions (Part 2)
 
London HUG
London HUGLondon HUG
London HUG
 
Common Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web DataCommon Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web Data
 
Performance of graph query languages
Performance of graph query languagesPerformance of graph query languages
Performance of graph query languages
 
The making of planaby 2
The making of planaby 2The making of planaby 2
The making of planaby 2
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 

More from seanb

Linked Data Publication of Live Music Archives and Analyses
Linked Data Publication of Live Music Archives and AnalysesLinked Data Publication of Live Music Archives and Analyses
Linked Data Publication of Live Music Archives and Analysesseanb
 
Animation 14: Computer Science and Music
Animation 14: Computer Science and MusicAnimation 14: Computer Science and Music
Animation 14: Computer Science and Musicseanb
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014seanb
 
RO Advisory Kickoff Slides
RO Advisory Kickoff SlidesRO Advisory Kickoff Slides
RO Advisory Kickoff Slidesseanb
 
Ontologies and Vocabularies
Ontologies and VocabulariesOntologies and Vocabularies
Ontologies and Vocabulariesseanb
 
FISHLink Presentation at JISC MRD Workshop
FISHLink Presentation at JISC MRD WorkshopFISHLink Presentation at JISC MRD Workshop
FISHLink Presentation at JISC MRD Workshopseanb
 
SKOS, Past, Present and Future
SKOS, Past, Present and FutureSKOS, Past, Present and Future
SKOS, Past, Present and Futureseanb
 
Semantic Web for Multimedia
Semantic Web for MultimediaSemantic Web for Multimedia
Semantic Web for Multimediaseanb
 

More from seanb (9)

Linked Data Publication of Live Music Archives and Analyses
Linked Data Publication of Live Music Archives and AnalysesLinked Data Publication of Live Music Archives and Analyses
Linked Data Publication of Live Music Archives and Analyses
 
Animation 14: Computer Science and Music
Animation 14: Computer Science and MusicAnimation 14: Computer Science and Music
Animation 14: Computer Science and Music
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014Research Objects @ HARMONY 2014
Research Objects @ HARMONY 2014
 
RO Advisory Kickoff Slides
RO Advisory Kickoff SlidesRO Advisory Kickoff Slides
RO Advisory Kickoff Slides
 
Ontologies and Vocabularies
Ontologies and VocabulariesOntologies and Vocabularies
Ontologies and Vocabularies
 
FISHLink Presentation at JISC MRD Workshop
FISHLink Presentation at JISC MRD WorkshopFISHLink Presentation at JISC MRD Workshop
FISHLink Presentation at JISC MRD Workshop
 
SKOS, Past, Present and Future
SKOS, Past, Present and FutureSKOS, Past, Present and Future
SKOS, Past, Present and Future
 
Semantic Web for Multimedia
Semantic Web for MultimediaSemantic Web for Multimedia
Semantic Web for Multimedia
 

Linked Data Publication of Live Music Archives

  • 1. Hello Cleveland! Linked Data Publication of Live Music Archives Sean Bechhofer*, Kevin Page+, David De Roure+ *School of Computer Science, University of Manchester +Oxford eResearch Centre, University of Oxford @seanbechhofer DMRN+7, QMUL, December 2012
  • 2. The Proposition ๏ Publication of structured metadata describing an audio collection ๏ Links to external resources provide additional context and information ๏ Rich query to allow the extraction of “interesting” subcollections 2
  • 3. The Players • The Internet Archive Live Music Archive ✦ Community contributed live audio recordings • Semantic Technologies ✦ RDF, Ontologies, SPARQL and Linked Data • Additional resources ✦ Artist DBs, Geographical Information,Venue information, etc. • Some ruby scripts..... 3
  • 4. The etree Collection • Internet Archive Live Music Archive • Community contributed live performance recordings ✦ “Legal bootlegs” • Approx 4,000 artists, ✦ 100,000 performances • Why is it interesting? ✦ Audio available in various formats ✤ mp3, ogg, shn, flac.... ✦ Multiple performances by artists ✦ Cover versions 4
  • 5. Semantic Technologies • Semantic Technologies aim to provide structured, machine readable representations of content ✦ Unified frameworks for (meta)data • RDF: Resource Description Framework ✦ Triple based representation of information • OWL/SKOS: Ontologies & Vocabularies for content description ✦ Shared vocabularies plus definitional capabilities • SPARQL ✦ A query language for RDF data ✦ A generic API 5
  • 6. Semantic Technologies RDF OWL/SKOS • Triple Based Representation • Shared Vocabularies for • Common Data Model content description • Identification via URIs ✦ Facilitating interoperation and exchange • Easy Integration ✦ Everybody talks the same ✦ Graph Merging language • OWL allows for rich • Query via SPARQL expressions and definitions A flexible, generic API • SKOS supports simpler ✦ thesauri/controlled vocabularies 6
  • 7. Linked Data • A set of common principles for data publication 1. Use URIs for identification 2. Use HTTP URIs (that will dereference) 3. Return useful information when dereferenced 4. Include links in that information • Common infrastructure facilitates construction of applications. • Use of content negotiation to supply “appropriate” representations 7
  • 8. Linked Data Resources • MusicBrainz ✦ RDF conversions of MusicBrainz data • Geonames ✦ Information about locations • DBpedia ✦ Structured representation of Wikipedia content • BBC ✦ Programme information, artist information 8
  • 9. Data mangling • Download of etree metadata files • Simple data conversion ✦ XML to RDF ✦ etree data model • Alignments ✦ String matching plus bespoke methods for locations ✦ Explicit capture of alignments • Publication Infrastructure ✦ fuseki server + pubby front end 9
  • 11. Data Alignment • MusicBrainz ✦ Artist alignment via simple name queries • Geographical Locations ✦ Query against Geonames ✦ Query against last.fm ✦ Combination of string matching and lat/long 11
  • 12. Layering • Alignments are captured in an additional layer of data on top of the underlying source facts • Preserving original metadata Allows clients to make their own judgements sameAs ✦ ✦ Preserves subjectivity • Explicitly exposing the source of the mappings ✦ Use of Provenance vocabularies 12
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 29. Discussion • So far entirely metadata based ✦ No processing of underlying audio • Alignment is a little messy ✦ But has to be automated • Dataset itself is an interesting artefact ✦ Contrasts with some other LD activities. • Is this actually useful? Do artists really get a better reception when they play in their home town? 29
  • 30. The Future • Better alignment ✦ Beyond simple string queries • More alignment ✦ Adding in, e.g. MusicBrainz track/work resources ✦ Other collections? ✦ Modelling questions • Characterising Alignments • Audio Fingerprinting ✦ Identifying further track level matches • Crowdsourcing corrections • Extracting subcollections ✦ What would you want?? 30
  • 31. Thanks! You’ve been a great audience! http://etree.linkedmusic.org 31