In this talk, Albert Meroño Peñuela will summarize the ongoing efforts to bridge this gap by means of knowledge representations used in the Semantic Web (RDF and ontologies). In particular, he will describe recent research at the Vrije Universiteit Amsterdam on applying semantic models to the popular digital music format MIDI, and its implications for a future Web capable of providing a universal interface to musical knowledge.
One Score To Rule Them All: Semantics in Music Notation
1. ‹#› Het begint met een idee
ONE SCORE TO
RULE THEM ALL:
SEMANTICS IN
MUSIC NOTATION
Albert Meroño-Peñuela, et al.
DHDK seminar, University of Bologna,
13/02/2018
2. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
3. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
4. Vrije Universiteit Amsterdam
4
ME
• Postdoc researcher at VU University Amsterdam,
Knowledge Representation & Reasoning
• Computer Science!
• Interfaces between the Digital Humanities and the
Semantic Web
• Representation of and access to cultural knowledge,
such as contained in historical objects, music sheets,
and statistical registers
• Ontologies, Linked Data, Semantic Music, APIs,
reproducibility, provenance
”Refining Statistical Data on the Web”
5. Vrije Universiteit Amsterdam
5
OUTLINE
• Digital Data-driven Humanities
• The human-machine spectrum of DH
• Beyond text processing
• Enabling a Global & Repeatable Social History
• Data preparation
• Data integration
• Reusing and publishing schemas
• Accessing the data OR Asking the same questions to different datasets
• One score to rule them all
• Music on the Web
• The MIDI Linked Data Cloud
• Creative applications
• This slide deck at http://tinyurl.com/semanticmusic
6. Vrije Universiteit Amsterdam
6
WHAT IS DIGITAL HUMANITIES?
“to study human culture in a more scientific way”
“to compute data from the humanities”
• Albert: “doing humanities is exactly equal to doing
science”
• Repeatability
• Hypothesis testing
• Pragmatic, clean, idealized
• Jacky: “doing humanities is completely different to
doing science”
• Interpretative approach, relativistic
• Give value to argumentation and vagueness instead of truth
• Focus on the questions we do ask
• https://storify.com/ingorohlfing/overly-honest-methods-in-science
11. ‹#› Het begint met een idee
ENABLING SOCIAL HISTORY
ON THE WEB
12. Vrije Universiteit Amsterdam
12
WHAT IS SOCIAL HISTORY?
Contrasted with political history, intellectual history and the history of great men
Explains history from the perspective of ordinary people (demography, work, family,
migration)
Uses (to a great degree) social science methods Data science!
14. Vrije Universiteit Amsterdam
14
DATA PREPARATION
Present data = high volume
Historical data = high variety
Multiple legacy (tabular) formats
Diverse identity, unity, rigidity and dependence
Preparing them to gain knowledge is expensive
Manual data munging
Hardly reproducible
21. Vrije Universiteit Amsterdam
21
LINKED DATA – THE RDF GRAPH DATA MODEL
The Divine Comedy was written by Dante
Subject Predicate Object
dbr:Divine_Comedy dbp:author dbr:Dante_Alighieri .
22. Vrije Universiteit Amsterdam
22
LINKED DATA – THE RDF GRAPH DATA MODEL
The Divine Comedy was written by Dante
Subject Predicate Object
dbr:Divine_Comedy dbp:author dbr:Dante_Alighieri .
dbr: <http://dbpedia.org/resource/...>
dbp: <http://dbpedia.org/property/...>
23. Vrije Universiteit Amsterdam
23
LINKED DATA – THE RDF GRAPH DATA MODEL
The Divine Comedy was written by Dante
Subject Predicate Object
dbr:Divine_Comedy dbp:author dbr:Dante_Alighieri .
dbr: <http://dbpedia.org/resource/...>
dbp: <http://dbpedia.org/property/...>
dbr:Divine_Comedy rdf:type owl:Thing , dbo:Poem .
dbr:Divine_Comedy :completed “1320” .
…
25. Vrije Universiteit Amsterdam
25
GENERATING LINKED DATA FROM CSV
Semi-automatic
Generic
Domain independent
Microdata =
CSVW
[COW]
Macrodata = RDF
Data Cube [QBer]
[TabLinker]
Credits to Rinke Hoekstra
26. Vrije Universiteit Amsterdam
LSD DIMENSIONS – FINDING THE VERB
http://lsd-dimensions.org/
Index of statistical dimensions and associated concept schemes on
the Web
27. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
28. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
29. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
30. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
New code lists
• HISCO
http://historyofwork.iisg.nl/ Credits to Richard Zijdeman
31. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
New code lists
• Gemeentegeschiedenis.nl
http://www.gemeentegeschiedenis.nl/ Credits to Ivo Zandhuis
32. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
New code lists
http://licr.io/ Credits to Ashkan Ashkpour
33. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
New code lists
http://licr.io/ Credits to Ashkan Ashkpour
34. R E F I N I N G S TAT I S T I C A L D ATA O N T H E W E B
Credits to Richard Zijdeman
http://nlgis.nl/
36. ‹#› Het begint met een idee
36 Het begint met een idee
One .rq file for SPARQL query
Good support of query curation
processes
> Versioning
> Branching
> Clone-pull-push
Web-friendly features!
> One URI per query
> Uniquely identifiable
> De-referenceable
(raw.githubusercontent.com)
36 Faculty / department / title presentation
GITHUB AS A HUB OF
SPARQL QUERIES
37. ‹#› Het begint met een idee
37 Het begint met een idee
http://grlc.io/
38. Vrije Universiteit Amsterdam
38
THE GRLC SERVICE
Assuming your repo is at https://github.com/:owner/:repo
and your grlc instance at :host,
> http://:host/:owner/:repo/spec returns the JSON swagger spec
> http://:host/:owner/:repo/api-docs returns the swagger UI
> http://:host/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls
operation with specifiec parameter values
> Uses BASIL’s SPARQL variable name convention for query parameters
Sends requests to
> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators
> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
41. Vrije Universiteit Amsterdam
41
EVALUATION – USE CASES
CEDAR: Access to census data for
historians
> Hides SPARQL
> Allows them to fill query parameters
through forms
> Co-existence of SPARQL and non-SPARQL
clients
CLARIAH - Born Under a Bad Sign:
Do prenatal and early-life
conditions have an impact on
socioeconomic and health
outcomes later in life? (uses 1891
Canada and Sweden Linked Census Data)
> Reduction of coupling between SPARQL
libs and R
> Shorter R code – input stream as CSV
42. Vrije Universiteit Amsterdam
> “multiple copies of the same queries in different places (…)
was problematic. grlc allows queries to be maintained in a
single location”
> “with grlc the R code becomes clearer due to the decoupling
with SPARQL; and shorter, since a curl suffices to retrieve the
data”
> “it allows us to manage SPARQL queries separate from the rest
of the API – this enables, for instance, to have different queries
without having to deploy a new version of the API”
> “we use grlc to provide FAQ for those who would prefer REST
over SPARQL, but also to explore the data”
> “we use grlc to expose the ECAI conference proceedings not
only as Linked Data that can be used by Semantic Web
practitioners, but also as a Web API that web developers can
consume”
> “grlc helps to share, extend and repurpose queries by
providing a URI for the resulted queries and by supporting
collaborative update of those queries”
42
QUALITATIVE EVALUATION
47. Vrije Universiteit Amsterdam
The jam became global (i.e. de-referenceable URIs from
anywhere) rather than local
> But any video stream would have been more accurate (for humans)
The jam became machine readable
> But not all of it
Digital music as Linked Data?
But why?
47
REPRESENTING MUSIC IN RDF?
49. Vrije Universiteit Amsterdam
49
LINKED MUSIC ON THE WEB
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar,
Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
Etree
See Daquino et al. 2017 (WHiSe II)
Characterizing the Landscape of
Musical Data on the Web: state of
the art and challenges
50. Vrije Universiteit Amsterdam
Symbolic music databases (MusicXML, MIDI, NIFF, MEI) are non-interoperable
From Daquino et al.’s (WHiSe 2017):
“Repositories and digital libraries are the most representative resources
collecting musical data. They mainly offer digitisations of scores and lyrics
(77%), published as PDF and/or JPG (40%)”
“The more the scale of repositories increases, the less structured formats for
representing symbolic notation seem to be used and the less depth of
analysis is provided”
“Larger collections are more likely to feature melody”
Can we find ways of increasing the level of structure of musical data
without compromising its scalability?
50
COOL, BUT…
51. Vrije Universiteit Amsterdam
MIDI: Digital music representation protocol
> (i.e. leaving nothing to analog signals actual instruments)
Popular/abundant, production, standard
Musical Instrument Digital Interface (1983)
> Universal synthesizer interface
> Roland (I. Kakehashi), Yamaha, Korg, Kawai (1981)
> Digital, fine-grained representation of musical tracks and events
> Wide range of controllers and instruments
51
MIDI
52. Vrije Universiteit Amsterdam
[ 144, 60, 100 ]
52
BUT WHAT IS MIDI?
Thanks @rumyra! https://www.youtube.com/watch?v=khsBjXKJOPs
53. Vrije Universiteit Amsterdam
[ 144, 60, 100 ]
[ 128, 60, 64 ]
53
BUT WHAT IS MIDI?
Thanks @rumyra! https://www.youtube.com/watch?v=khsBjXKJOPs
54. Vrije Universiteit Amsterdam
midi2rdf: lossless conversion of MIDI to RDF (and back)
Albert Meroño-Peñuela, Rinke Hoekstra. “The Song Remains the Same: Lossless Conversion and
Streaming of MIDI to RDF and Back”. In: 13th Extended Semantic Web Conference (ESWC 2016),
posters and demos track. May 29th — June 2nd, Heraklion, Crete, Greece (2016).
rdf2midi, direct stream mapping
54
MIDI2RDF & RDF2MIDI
https://midi-ld.github.io/
55. Vrije Universiteit Amsterdam
Music representation format which is
> 100% digital (i.e. leaving nothing to analog signals)
> Secundary list
MIDI (Musical Instrument Digital Interface)
> Universal synthesizer interface
> Roland (I. Kakehashi), Yamaha, Korg, Kawai (1981)
> Digital, fine-grained representation of musical events
> Wide range of controllers and instruments
55
WEEKEND EXPERIMENT
58. Vrije Universiteit Amsterdam
58
MIDI LINKED DATA RESOURCES
MIDI Pieces http://purl.org/midi-ld/piece/
> Access to MIDI level triples
> Cryptographic hash for unique MIDI content
http://purl.org/midi-ld/pattern/87dd99fb346cd4c7934cb36a00868cbe
MIDI Notes http://purl.org/midi-ld/notes/
> Type, label, octave, pitch value
MIDI Programs http://purl.org/midi-ld/programs/
> All instruments linked to DBpedia
MIDI Chords http://purl.org/midi-ld/chords/
> Label, quality, number of pitch classes, intervals
Enrichments
> Provenance
> Integrated lyrics (mostly from karaoke data)
> Key (Krumhansl-Schumkler), scale degree, metric accents
59. Vrije Universiteit Amsterdam
59
MIDI LINKED DATA RESOURCES
Current collections
The largest MIDI collection on the Internet (thanks @midi_man)
Lakh MIDI dataset (thanks @colinraffel)
MySongBook MIDI
Yours! https://midi-ld.github.com
308,443 interconnected MIDI files
10,215,557,355 triples
Full dump, SPARQL endpoint, RESTful API
60. Vrije Universiteit Amsterdam
60
ENABLING SEMANTIC WEB RESEARCH
Data integration
> Further format interoperability: MIDI, MusicXML, NIFF, MEI
> Integration with formats of other arts: LabanXML
Entity linking
> Audio (Spotify URIs), symbolic notation (MIDI), metadata (MusicBrainz)
> High heterogeneity, low overlap
> Challenge to entity linking algorithms
Semantics and ontologies
> Music Ontology, Chord Ontology, Timeline Ontology
> Underspecification of musical concepts
> Reasoning
> Challenge for ontology alignment
61. Vrije Universiteit Amsterdam
61
ENABLING MUSICOLOGY RESEARCH
Analysis of chords, patterns and melodies at Web scale
> Integrating knowledge from external databases
> Historical, geographical, cultural, economic, sylistic contexts
Everything has a URI
> Annotation tasks, workflow descriptions
Establishing standard Web vocabularies
> Chords (iReal Pro), melodies, metadata
Recommender systems
> Collaborative filtering, content-based feature extraction, hybrid
> Notation-based support for abstract representation of musical concepts
Machine learning (multimodal training data, convincing samples)
Audiolisation
65. Vrije Universiteit Amsterdam
65
RDF PI
https://github.com/midi-ld/Web-MIDI-API
Live coding music directly in RDF (MIDI)
Everything happens in your browser (RDF
parsing, Web MIDI API)
66. Vrije Universiteit Amsterdam
66
THE MUSIC SEMANTIC GAP
• MIR tasks have a
performance ceiling
of 65% accuracy,
independently of
the method
• Cause: semantic
gap
• The closer to the
gap, the harder the
task
Some ontologies in
place, BUT:
• Metadata
• Audio features
• Ignore notation
67. Vrije Universiteit Amsterdam
67
THE MUSIC SEMANTIC GAP
What knowledge representations
and algorithms are needed to
generalize music symbolic notation
and include it into the existing
music retrieval formalisms, in order
to reduce the semantic gap?
• A knowledge graph of symbolic
notation
• Data and methods
Challenges:
1. KR for notation (horizontal gap)
← machine learning, ontology
engineering
2. Bridging notation and humans
(vertical gap) ← ontology
matching
3. Multimodal entity linking
(inter-dataset gap) ← hybrid
FT, DTW + LIMES
68. Music and Knowledge
Representation
"Music impregnates every person’s memory,
reasoning, and language. And yet, we lack a global
view of all of humankind’s musical knowledge,
telling us precisely what music we know, how
much there is, and how it differs across societies."
69. Vrije Universiteit Amsterdam
69
CONCLUSIONS (I)
Semantic Web and Digital Humanities: to science, or not to
science?
Data preparation = 80% of work
> We throw it away after use!
Linked Data based solutions
> Use RDF to make research repeatable – but more intuitive tools needed
> Statistical dimensions & codelists – but hard to find, might be missing
> GitHub for queries as Linked Data APIs – enables reproducibility, you need
an expert JUST ONCE
70. Vrije Universiteit Amsterdam
70
CONCLUSIONS (AND II)
One score to rule them all
> General knowledge representation language (RDF) for music (MIDI)
> Mappings for MusicXML, MEI, NIFF, and others
> The spectrum of symbolic music vs low level audio signal
Quality (& automatic) links to external Linked Datasets
> MusicBrainz, DBpedia, etc.
> Hybrid approaches (metadata, lyrics, incipits, MIR algorithms)
Tools
> (Contextual) querying
> Annotation (every note has a URL!)
> Workflow recording
Your ideas & contributions most welcome! https://midi-ld.github.io/
71. Vrije Universiteit Amsterdam
> Albert Meroño-Peñuela. “Humanists And Scientists: More Alike Than Different”. eHumanities Magazine,
number 7, February 2016 (HTML)
> Albert Meroño-Peñuela, Rinke Hoekstra. “grlc Makes GitHub Taste Like Linked Data APIs”. SALAD 2016 —
Services and Applications over Linked Data APIs and Data. International workshop, ESWC 2016, May 29th,
Heraklion, Crete, Greece (2016). (PDF)
> Rinke Hoekstra, Albert Meroño-Peñuela, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Ivo Zandhuis. “An
Ecosystem for Linked Humanities Data”. In: Proceedings of the 1st Workshop on Humanities in the SEmantic
web (WHiSE 2016). ESWC 2016, May 29th, Heraklion, Crete, Greece (2016). (PDF)
> Albert Meroño-Peñuela, Rinke Hoekstra. “The Song Remains the Same: Lossless Conversion and Streaming of
MIDI to RDF and Back”. In: 13th Extended Semantic Web Conference (ESWC 2016), posters and demos track.
May 29th — June 2nd, Heraklion, Crete, Greece (2016). (PDF)
> Albert Meroño-Peñuela. “Refining Statistical Data on the Web”. Vrije Universiteit Amsterdam (2016) (Amazon)
(VU-DARE)
> Albert Meroño-Peñuela, Christophe Guéret, Stefan Schlobach. “Linked Edit Rules: A Web Friendly Way of
Checking Quality of RDF Data Cubes”. Proceedings of the 3rd International Workshop on Semantic Statistics
(SemStats 2015), ISWC 2015, Bethlehem, PA, USA (2015). (PDF)
> Bas Stringer, Albert Meroño-Peñuela, Antonis Loizou, Sanne Abeln, Jaap Heringa. “To SCRY Linked Data:
Extending SPARQL the Easy Way”. Diversity++ workshop, ISWC 2015, Bethlehem, PA, USA (2015). (PDF)
> Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke van Erp, Kees Mandemakers, Leen Breure, Andrea
Scharnhorst, Stefan Schlobach, Frank van Harmelen. “Semantic Technologies for Historical Research: A
Survey”. Semantic Web — Interoperability, Usability, Applicability, 6(6), pp. 539–564. IOS Press (2015).
> Albert Meroño-Peñuela, Ashkan Ashkpour, Christophe Guéret, Stefan Schlobach. “CEDAR: The Dutch
Historical Censuses as Linked Open Data”. Semantic Web — Interoperability, Usability, Applicability, 8(2), pp.
297–310. IOS Press (2015).71
PUBLICATIONS
72. ‹#› Het begint met een idee
THANK YOU!
@albertmeronyo
DATALEGEND.NET
CLARIAH.NL
72
73. Vrije Universiteit Amsterdam
73
A BASIC WEB SYSTEMS COMMUNICATION TOOLKIT
1. Endpoint location is volatile
Names encapsulate semantics of operations → Should be
meaningless, just as email addresses
HTTP : http://example.org/canihasdata
2. Consensus on data semantics is necessary
Simple object exchange format + 15 years of Web ontology
development to semantically describe data
JSON+LD : [{ "@id": "eg:Albert",
"rdf:type": [{ "@id": "foaf:Person" }]}]