1. DOREMUSa Graph of Interlinked Musical Work
Pasquale Lisena
EURECOM, France
@pasqLisena
M. Achichi, P. Lisena, K. Todorov, R. Troncy, J. Delahousse
2. 2
Which works have been composed
by Mozart when he was <10?
How many works have been composed and
performed for the 1st time in the same city?
Which composers had the chance to
direct their own work in a performance
during the last decade?
3. 3
metadata about
artists, works, performances, scores
Music
knowledge graph
used for building the knowledge graph
open-source, reusable
Tools for converting
and interlinking
5. 5
M. Lasar (2011). Digging into Pandora’s Music Genome with musicologist Nolan Gasser.
https://arstechnica.com/tech-policy/2011/01/digging-into-pandoras-music-genome-with-musicologist-nolan-gasser/
When it comes to classical
music, on the other hand, it's
much more about the
composition itself, because
even though the interpretation
can vary in various subtle ways.
CLASSICALPOP VS
For pop music the experience of
the music is really defined by
the recording.
6. 6
CLASSICALPOP VS
Track-based Work-based
60 years of history
Thousand years
from Gregorian chant to a work written last
Tuesday
Songs Multi-movement works
Major, minor
Polyphonic, homophonic,
monophonic
8. 8
Music archives have
very detailed knowledge
PROBLEMS
● Multiple formats
● No possible interoperability
● Need for discovering overlapping knowledge
● Information codified as free text
● Not always publicly accessible
APPROACH
Semantic Web!
9. 9
Improve music description to foster
music exchange and reuse
Travel to the heart of the musical
archives in France’s greatest
institutions
Connect sources, multiply usage,
enrich user experience
10. 10
Building the
DOREMUS graph
DATA CONVERSION
DATA LINKING
LINK VALIDATION
DATA MODELING
marc2rdf
string2vocabulary
...custom converters
legato
11. DATA CONVERSION DATA LINKING LINK VALIDATION
11
The DOREMUS Model
- Music specific extension of FRBRoo
- Dynamic: it is made up of autonomous
combined modules
- Relies on Linked Data principles
(everything is an URI, RDF model)
FRBR
museum
information
bibliographic
records
DATA MODELING
Choffé, Pierre, and Françoise Leresche. DOREMUS: connecting sources, enriching
catalogues and user experience. In 24th IFLA World Library and Information
Congress. 2016.
13. 13
F14
Work
F22
Expression
M2
Opus
Statement
F28
Expression
Creation
R3 is
realized in
E7
Activity
5
1
“Sonate pour violoncelle et piano no 1”@fr
“Sonates" , "Sonata in F"
Ludwig van
Beethoven
Ludwig von Beethoven
composer
compositeur@fr
compositore@it
R17 created
R19createda
realizationof
U17 has opus
statement
U12 has
genre
P102 has title
U31 had
function of
type
P14 carried
out by
P9
consists of
P4 has time
span1796
Sonata
sonata@it , sonate@fr ,
klaviersonate@de
M42 Performed
Expression
Creation
M43
Performed
Expression
Berlin
P4 has time
span
1796
P7 took
place at
F24 Publication
Expression
F30
Publication
Event
P4 has time
span
1797
P7 took
place at
Vienna
U4 had princeps
publication
U54 is performed
expression of
P165
incorporates
1770
1827
P98
born
P100
died
U11 has key
F Major
F Dur@de , Fa majeur@fr,
Fa maggiore@it , Fa mayor@es
M6
Casting
M23
Casting
Detail
U13
has
casting
1
U30
quantity
U2
foresees
mop
Piano
Pianoforte@it
Fortepian@pl
M23
Casting
Detail
1
U30
quantity
U2
foresees
mop
Cello
Violoncello@it
Violoncelle@fr
F15
Complex
Work
F19
Publication
Work
M44
Performed
Work
U5 had
premiere
U38 has
descriptive
expression
R10 has member
15. Controlled Vocabularies for Music
Metadata
GENRES
Diabolo
IAML
Itema3
Redomi
RAMEAU
Medium of performance
MIMO
Itema3
IAML
Diabolo
RAMEAU
Redomi
Musical keys
Modes
Catalogues
Derivation types
Functions
more available at
http://data.doremus.org/vocabularies
23 families of vocabularies · 11,000+ concepts · 610 links between terms
published at ISMIR 2018
INTERLINKED
INTERLINKED
16. 16
Dealing with
different formats
Works: INTERMARC
Scores: INTERMARC
Discs: INTERMARC
Works: UNIMARC
Scores: INTERMARC
Performances: XML
Works - Recordings - Scores
3 different XML sources
A pre-digital archive format in Radio France
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
17. Source datasets
17
Works
62 550 | XML
Scores
9 154 | XML
Concerts
340 609 | XML
Discs
9 500 | XML
Works
6 846 | UNIMARC
Scores
30 319 | UNIMARC
Concerts
5 164 | XML
Discs
8 602 | XML
Works
135 940 | INTERMARC
Scores
89 184 | INTERMARC
19. 19
001 FRBNF139081882FR
100 $313891295$w.0..b.....$aBeethoven$mLudwig van$d1770-1827
144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur
001 FRBNF139081882FR
100 $313891295$w.0..b.....$aBeethoven$mLudwig van$d1770-1827
144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur
LANG TITLE MOP OPUS KEY
MARC FILE
MARC must die
http://lj.libraryjournal.com/2002/10/ljarchives/marc-must-die
“ Roy Tennant, 2002
”
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
20. 20
marc2rdf
MARC PARSER
● Parsing of the file
● Interpretation of the fields
● Graph generation
MARC
files
mapping
rules
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
21. 21
144 $w....b.fre.$aSonates$bPiano$pOp. 27, no 2$tDo dièse mineur
F22 Expression: Opus Number
F22 Self-Contained Expression
U17 has opus statement M2 Opus Statement
[U42 has opus number M12 Opus Number]
+ [U43 has opus subnumber M13 Opus Subnumber]
TUM : 144 $p, chain of digits
TUM : 144 $p, chain of digits before the comma
Remove the abbreviation “Op.” before the number
144 $pOp. 352 --> M12 = 352
144 $pOp. 27, no 2 --> M12 = 27, M13 =2
UNIT OF INFORMATION
PATH
INTERMARC BNF
TRANSFER RULE
EXAMPLE
MAPPING
RULES
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
23. 23
marc2rdf
MARC PARSER
FREE TEXT
INTERPRETER
STRING 2
VOCABULARY
● Replace labels with URIs from
controlled vocabularies
MARC
files
vocabularies
“Violoncelle”@fr <http://www.mimo-db.eu/InstrumentsKeywords/3582>
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
24. 24
STRING 2 VOCABULARY
● Match against a family of vocabularies
“Soprano”@it
MIMO IAML DIABOLO ITEMA3 REDOMI RAMEAU
GENRE
“C Major”@en
GENRE
vocabulary:key/c
KEY
vocabulary:key/c
https://github.com/DOREMUS-ANR/string2vocabulary
● 2 passes
○ Exact label + language
○ Exact label, any language
● Correction of editorial mistakes
DATA MODELING DATA LINKING LINK VALIDATIONDATA CONVERSION
26. 26
GRAPH BNF GRAPH PHILHARMONIE
http://data.doremus.org/expression/d72301f0-0aba-
3ba6-93e5-c4efbee9c6ea
“Quasi una fantasia”
COMPOSER Beethoven
ORDER NUM 14
OPUS 27, n 2
GENRE sonata
CASTING piano
KEY C sharp major
1st PUB ?
PREMIERE ?
http://data.doremus.org/expression/37932fbc-fef3-3edb-
9fae-1eec9b4be01d
“Sonata quasi una fantasia”
COMPOSER Beethoven
ORDER NUM 14
OPUS 27, n 2
GENRE sonata, romantic music
CASTING piano (1)
KEY C sharp major
1st PUB 1802, Vienna
PREMIERE ?
sameAs
27. 27
DATA MODELING LINK VALIDATIONDATA CONVERSION DATA LINKING
Challenges
● Not all the works have values for all the
properties
lack of attributes
● Similar values do not necessarily imply a
match
i.e. Beethoven’s Sonata n. 1, Sonata n. 2, Sonata n. 3
● Lexical, semantic, transliteration,
orthographic mismatches
On the left: Beethoven.
On the right: (the same) Beethoven.
28. 28
DATA MODELING LINK VALIDATIONDATA CONVERSION DATA LINKING
First Linking
Composer + Catalogue
Wolfgang Amadeus Mozart
Eine kleine Nachtmusik K 525
Wolfgang Amadeus Mozart
Serenade No. 13 in G major KV 525
sameAs
29. 29
DATA MODELING LINK VALIDATIONDATA CONVERSION DATA LINKING
Legato
New linking system
Existing data linking system were not satisfactory
30. 30
DATA MODELING LINK VALIDATIONDATA CONVERSION DATA LINKING
* works to be compared are grouped by composer
*
32. 32
DATA MODELING LINK VALIDATIONDATA CONVERSION DATA LINKING
Heterogeneities Task False Positive Trap
Legato performances at the
OAEI campaign 2017
sandbox mainbox
SPIMBENCHDOREMUS
33. 33
DATA LINKINGDATA MODELING DATA CONVERSION LINK VALIDATION
certain links
confidence score +
experts’ validation
?
SINGLE LINK TRIANGLE MISSING LINK CONFLICT
inference if
experts’ validation
remove with
experts’ check
34. 34
What is in the Knowledge Graph?
89.872
persons
(composers,
performers, …)
18.075
corporate bodies
(orchestras, chorus,
publishers, …)
357.451
musical
works
16k components
4k derived works
193.412
concerts and
studio recordings
469.131
performed work
3.833
foreseen
concerts
31.296
publications
48.006
scores
35. 35
Future Work ● More interlinking with MusicBrainz
● Internal interlinking of performances
● Create bridges with other communities
(musicologists, streaming services, …)
Applications
● Explorative Search Engine
● KG-Based Recommender
System
http://overture.doremus.org/
DOREMUS CHATBOT
https://chatbot.doremus.org/
36. GitHub page
converters, interlinking tool, data dumps, ...
github.com/DOREMUS-ANR/
OVERTURE
discover DOREMUS data
overture.doremus.org
DOREMUS website
www.doremus.org
CHATBOT
q&a system for classical music
chatbot.doremus.org
THIS PRESENTATION
https://goo.gl/1UmKnVpasquale.lisena@eurecom.fr
@pasqLisena
37. 37
Persons
9.269 euterpe
1.503 diabolo
9.040 itema3
8.419 philharmonie
19.881 bnf
54.675 bnf bib
291.421 in the whole graph
89.872 active*
* with 1 or more compositions, performances, dedications, ...
1.479 dedicatees
529 subjects
21.626 composers
7.830 conductors
3.583 performers
13.242 text authors
38. 38
Corporate Bodies
45.743 in the whole graph
18.075 active*
* with 1 or more compositions, performances, dedications, ...
1001 euterpe
0 diabolo
39 itema3
1.603 philharmonie
855 bnf
14.657 bnf bib
6 dedicatees
7 subjects
517
orchestras +
ensembles
192 choruses
6.099 publishers
2.194 producers