SlideShare a Scribd company logo
1 of 37
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
1
Peter M. Broadwell
@peterbroadwell
broadwell@library.ucla.edu
Martin Klein
@mart1nkle1n
martinklein@library.ucla.edu
Let the Music Live/
que viva la música
Techniques for Managed Integration of a
Unique Multimedia Collection into Public
Linked Open Data Repositories
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
2
The collection
http://frontera.library.ucla.edu
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
3
The collection
• 116,000 songs digitized and made available as audio
files to date, out of an estimated 160,000 in total
• Originally recorded from 1905 to the 1990s on ~2,000
commercial record labels
• Storage footprint of streaming MP3s: 460 GB
Format Number of songs
33 RPM (1955-1990) 14,741
45 RPM (1955-1990) 51,220
78 RPM (1905-1955) 33,191
Cassette tape (1955-1990) 7,879
Reel-to-reel tape *1955-1990) 368
• ~300,000 album images (covers and media)
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
4
The collection
• 13,752 unique artists or groups on album covers
• 7,035 unique names from album sleeves
• 24,221 unique composers
• 2,000-2,500 labeled song types/genres
Record label # of songs
Victor 8,591
Columbia 8,196
Ideal 4,819
Falcon 4,532
Peerless 3,336
Bego 2,411
Vocalion 2,164
Del Valle 2,145
Song type # of songs
ranchera 21,947
bolero 10,522
corrido 7,393
canción 5,410
polka 4,742
canción ranchera 2,736
cumbia 2,055
vals 1,399
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
5
The collection
• ~700 unique song tags/keywords (prior to translation)
• All songs tagged with 1-20 keywords (avg ~4.5)
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
6
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
7
Chris Strachwitz
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
8
Arhoolie Records
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
9
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
10
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
11
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
12
Supporters of the collection
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
13
Research using the Frontera
collection as a primary source
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
14
A “multimedia encyclopedia”
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
15
More metadata, more problems
• No authority values employed for person and group
names; “name hacking” used to approximate
uniqueness
• Relationship between song, album, and “release” is not
consistent
• Authority data for song entities is better: matrix numbers
and catalog numbers are available
• Collection is entirely “siloed” on its current site, largely due
to its homegrown metadata scheme
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
16
• Adopt metadata structures of open online music
encyclopedias (MusicBrainz)
• Use unique IDs from linked open data knowledge
bases to identify people, groups, companies,
songs, albums, etc.
• Adopting IDs from external LOD sites lets us link
out to these related records
• When records are missing from external LOD
knowledge bases, add them to those sites
automatically
Goal: incorporate Frontera into
the broader semantic web
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
17
LOD records and relations
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
18
LOD records and relations
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
19
LOD records and relations
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
20
Inspiration: Linked Jazz, NYPL
Labs’ ECCO, LD4L
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
21
Inspiration: Linked Jazz, NYPL
Labs’ ECCO, LD4L
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
22
Inspiration: Linked Jazz, NYPL
Labs’ ECCO, LD4L
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
23
LOD integration: phase 1
Initial metadata cleaning and preparation
• Identify likely unique entities (names, etc.) via “fuzzy
matching,” e.g., MD5 hash comparisons
• Challenge: finding methods that scale to >100,000 rows
(many approaches must be scripted)
• May necessitate creation of Yet Another Database
• Generate audio fingerprints of music files
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
24
LOD integration: phase 2
Discovery and linking of existing records
• Entity lookup in LOD knowledge bases
• Audio fingerprint lookups in AcoustID database, which
links to MusicBrainz
• Search for artist, group, and composer names in service
APIs (note: these work better with English than Spanish)
• DBpedia Spotlight
• MusicBrainz
• Discogs
• VIAF, LCNAF (worth a try)
• Combination of automated and crowd-sourced verification
of links, integration into site
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
25
LOD integration: phase 3
Contributing/creating new records
• Unsolicited bulk record generation may be seen as linked
data spam and rejected (“notability” problem)
• Direct communication and participation in knowledge
base’s community is the most promising approach
• Case study: discussion with MusicBrainz community
• Voting/editorial review system can be incompatible with
bulk updates, but the community may be willing to
accommodate
• Data records should be well formed and clean; upload
methods must be tested and the upload coordinated
with LOD admins
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
26
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
27
LOD integration: the “bot” option
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
28
LOD integration: crosswalks
between repositories and records
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
29
Progress to date
• Used metadata cleaning approaches to identify most likely
unique names in the DB
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
30
Progress to date
• Used metadata cleaning approaches to identify most likely
unique names in the DB
• Applied acoustic fingerprinting to all 116,000 audio files
• matched 1,313 songs
• following the AcoustID links to MusicBrainz positively
identifies ~287 artists with their records in MusicBrainz
(as well as Discogs and DBpedia)
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
31
Progress to date
• Used metadata cleaning approaches to identify most likely
unique names in the DB
• Applied acoustic fingerprinting to all 116,000 audio files
• matched 1,313 songs
• following the AcoustID links to MusicBrainz positively
identifies ~287 artists with their records in MusicBrainz
(as well as Discogs and DBpedia)
• Ran DBpedia Spotlight on all artists and composer names,
correlated matched entities with MusicBrainz, Wikidata IDs
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
32
Progress to date
• Used metadata cleaning approaches to identify most likely
unique names in the DB
• Applied acoustic fingerprinting to all 116,000 audio files
• matched 1,313 songs
• following the AcoustID links to MusicBrainz positively
identifies ~287 artists with their records in MusicBrainz
(as well as Discogs and DBpedia)
• Ran DBpedia Spotlight on all artists and composer names,
correlated matched entities with MusicBrainz, Wikidata IDs
• Searched for artist and composer names via MusicBrainz,
Discogs, and VIAF APIs
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
33
Entity matching to LOD sites
Artists on label
(out of 13,752)
Artists on sleeve
(out of 7,035)
Composers
(out of 24,211)
Acoustic
fingerprinting
287 (for all names)
DBpedia
Spotlight
272 27 72
MusicBrainz
lookup
620 434 1,151
Discogs
search API
4,929 3,502 9,423
VIAF search
API
3,707 3,057 8,889
*These are likely in order of decreasing accuracy!
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
34
Concerns/next steps
• Scalable approaches for Q/A of data (new and old)
• Discoverability and usability for humans and machines
(APIs)
• Repository integration: adopting a linked data model will
help
• Trusted channels for upload to existing knowledge bases:
design a formal model?
• Work with specialized sub-collections of knowledge bases
(topics, regions)?
• Test DBpedia Spotlight w/Spanish data pack
• Does using links to existing LOD entries just reinforce
inequality of artist exposure (“rich get richer”/LOD “echo
chamber”)?
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
35
Thanks!
UCLA Digital Library
• Lisa McAulay
• Kristian Allen
• T-Kay Sangwand
• …everyone else (past and present)
Arhoolie Foundation
• Tom Diamant
• Chris Strachwitz (obviously)
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
36
Thanks!
Integration of a Unique Multimedia Collection
into Public Linked Open Data Repositories
@peterbroadwell, @mart1nkle1n – #OR2016
37
Peter M. Broadwell
@peterbroadwell
broadwell@library.ucla.edu
Martin Klein
@mart1nkle1n
martinklein@library.ucla.edu
Let the Music Live/
que viva la música

More Related Content

Similar to Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories

Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...WGBH Media Library and Archives
 
IAML Future of music in public libraries
IAML Future of music in public librariesIAML Future of music in public libraries
IAML Future of music in public librariesJohan Mijs
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Martin Klein
 
Finding a goldmine of natural history illustrations within BHL texts: the Ar...
Finding a goldmine of natural history illustrations within BHL texts:  the Ar...Finding a goldmine of natural history illustrations within BHL texts:  the Ar...
Finding a goldmine of natural history illustrations within BHL texts: the Ar...Trish Rose-Sandler
 
Linked data radical change
Linked data   radical changeLinked data   radical change
Linked data radical changeRichard Wallis
 
R&D at Sound and Vision
R&D at Sound and VisionR&D at Sound and Vision
R&D at Sound and VisionBouke Huurnink
 
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...WGBH Media Library and Archives
 
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...Ryn Marchese
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022AndriaLesane
 
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10TechSoup
 
Digital Library Project Proposal
Digital Library Project ProposalDigital Library Project Proposal
Digital Library Project ProposalMicah Vandegrift
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataRichard Wallis
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarCrossref
 
FindStream investor deck
FindStream investor deckFindStream investor deck
FindStream investor deckFindStream
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Mike Mertens
 
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...Marcus Smith
 
The Listening Experience Database
The Listening Experience DatabaseThe Listening Experience Database
The Listening Experience DatabaseAlessandro Adamou
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Alessandro Adamou
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?Richard Wallis
 

Similar to Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories (20)

Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
 
IAML Future of music in public libraries
IAML Future of music in public librariesIAML Future of music in public libraries
IAML Future of music in public libraries
 
Music brainz and library music collections
Music brainz and library music collectionsMusic brainz and library music collections
Music brainz and library music collections
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
 
Finding a goldmine of natural history illustrations within BHL texts: the Ar...
Finding a goldmine of natural history illustrations within BHL texts:  the Ar...Finding a goldmine of natural history illustrations within BHL texts:  the Ar...
Finding a goldmine of natural history illustrations within BHL texts: the Ar...
 
Linked data radical change
Linked data   radical changeLinked data   radical change
Linked data radical change
 
R&D at Sound and Vision
R&D at Sound and VisionR&D at Sound and Vision
R&D at Sound and Vision
 
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
 
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
 
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
Webinar: SOS Save Our Site! Archiving Web Content-2017-08-10
 
Digital Library Project Proposal
Digital Library Project ProposalDigital Library Project Proposal
Digital Library Project Proposal
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
 
FindStream investor deck
FindStream investor deckFindStream investor deck
FindStream investor deck
 
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
Liber 2014 - Chain Reactions: TEL & RLUK on their Linked Open data.
 
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
The Semantic Web and the Digital Archaeological Workflow: A Case Study from S...
 
The Listening Experience Database
The Listening Experience DatabaseThe Listening Experience Database
The Listening Experience Database
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?
 

Recently uploaded

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Recently uploaded (20)

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories

  • 1. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 1 Peter M. Broadwell @peterbroadwell broadwell@library.ucla.edu Martin Klein @mart1nkle1n martinklein@library.ucla.edu Let the Music Live/ que viva la música Techniques for Managed Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories
  • 2. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 2 The collection http://frontera.library.ucla.edu
  • 3. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 3 The collection • 116,000 songs digitized and made available as audio files to date, out of an estimated 160,000 in total • Originally recorded from 1905 to the 1990s on ~2,000 commercial record labels • Storage footprint of streaming MP3s: 460 GB Format Number of songs 33 RPM (1955-1990) 14,741 45 RPM (1955-1990) 51,220 78 RPM (1905-1955) 33,191 Cassette tape (1955-1990) 7,879 Reel-to-reel tape *1955-1990) 368 • ~300,000 album images (covers and media)
  • 4. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 4 The collection • 13,752 unique artists or groups on album covers • 7,035 unique names from album sleeves • 24,221 unique composers • 2,000-2,500 labeled song types/genres Record label # of songs Victor 8,591 Columbia 8,196 Ideal 4,819 Falcon 4,532 Peerless 3,336 Bego 2,411 Vocalion 2,164 Del Valle 2,145 Song type # of songs ranchera 21,947 bolero 10,522 corrido 7,393 canción 5,410 polka 4,742 canción ranchera 2,736 cumbia 2,055 vals 1,399
  • 5. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 5 The collection • ~700 unique song tags/keywords (prior to translation) • All songs tagged with 1-20 keywords (avg ~4.5)
  • 6. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 6
  • 7. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 7 Chris Strachwitz
  • 8. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 8 Arhoolie Records
  • 9. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 9
  • 10. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 10
  • 11. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 11
  • 12. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 12 Supporters of the collection
  • 13. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 13 Research using the Frontera collection as a primary source
  • 14. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 14 A “multimedia encyclopedia”
  • 15. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 15 More metadata, more problems • No authority values employed for person and group names; “name hacking” used to approximate uniqueness • Relationship between song, album, and “release” is not consistent • Authority data for song entities is better: matrix numbers and catalog numbers are available • Collection is entirely “siloed” on its current site, largely due to its homegrown metadata scheme
  • 16. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 16 • Adopt metadata structures of open online music encyclopedias (MusicBrainz) • Use unique IDs from linked open data knowledge bases to identify people, groups, companies, songs, albums, etc. • Adopting IDs from external LOD sites lets us link out to these related records • When records are missing from external LOD knowledge bases, add them to those sites automatically Goal: incorporate Frontera into the broader semantic web
  • 17. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 17 LOD records and relations
  • 18. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 18 LOD records and relations
  • 19. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 19 LOD records and relations
  • 20. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 20 Inspiration: Linked Jazz, NYPL Labs’ ECCO, LD4L
  • 21. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 21 Inspiration: Linked Jazz, NYPL Labs’ ECCO, LD4L
  • 22. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 22 Inspiration: Linked Jazz, NYPL Labs’ ECCO, LD4L
  • 23. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 23 LOD integration: phase 1 Initial metadata cleaning and preparation • Identify likely unique entities (names, etc.) via “fuzzy matching,” e.g., MD5 hash comparisons • Challenge: finding methods that scale to >100,000 rows (many approaches must be scripted) • May necessitate creation of Yet Another Database • Generate audio fingerprints of music files
  • 24. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 24 LOD integration: phase 2 Discovery and linking of existing records • Entity lookup in LOD knowledge bases • Audio fingerprint lookups in AcoustID database, which links to MusicBrainz • Search for artist, group, and composer names in service APIs (note: these work better with English than Spanish) • DBpedia Spotlight • MusicBrainz • Discogs • VIAF, LCNAF (worth a try) • Combination of automated and crowd-sourced verification of links, integration into site
  • 25. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 25 LOD integration: phase 3 Contributing/creating new records • Unsolicited bulk record generation may be seen as linked data spam and rejected (“notability” problem) • Direct communication and participation in knowledge base’s community is the most promising approach • Case study: discussion with MusicBrainz community • Voting/editorial review system can be incompatible with bulk updates, but the community may be willing to accommodate • Data records should be well formed and clean; upload methods must be tested and the upload coordinated with LOD admins
  • 26. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 26
  • 27. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 27 LOD integration: the “bot” option
  • 28. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 28 LOD integration: crosswalks between repositories and records
  • 29. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 29 Progress to date • Used metadata cleaning approaches to identify most likely unique names in the DB
  • 30. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 30 Progress to date • Used metadata cleaning approaches to identify most likely unique names in the DB • Applied acoustic fingerprinting to all 116,000 audio files • matched 1,313 songs • following the AcoustID links to MusicBrainz positively identifies ~287 artists with their records in MusicBrainz (as well as Discogs and DBpedia)
  • 31. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 31 Progress to date • Used metadata cleaning approaches to identify most likely unique names in the DB • Applied acoustic fingerprinting to all 116,000 audio files • matched 1,313 songs • following the AcoustID links to MusicBrainz positively identifies ~287 artists with their records in MusicBrainz (as well as Discogs and DBpedia) • Ran DBpedia Spotlight on all artists and composer names, correlated matched entities with MusicBrainz, Wikidata IDs
  • 32. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 32 Progress to date • Used metadata cleaning approaches to identify most likely unique names in the DB • Applied acoustic fingerprinting to all 116,000 audio files • matched 1,313 songs • following the AcoustID links to MusicBrainz positively identifies ~287 artists with their records in MusicBrainz (as well as Discogs and DBpedia) • Ran DBpedia Spotlight on all artists and composer names, correlated matched entities with MusicBrainz, Wikidata IDs • Searched for artist and composer names via MusicBrainz, Discogs, and VIAF APIs
  • 33. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 33 Entity matching to LOD sites Artists on label (out of 13,752) Artists on sleeve (out of 7,035) Composers (out of 24,211) Acoustic fingerprinting 287 (for all names) DBpedia Spotlight 272 27 72 MusicBrainz lookup 620 434 1,151 Discogs search API 4,929 3,502 9,423 VIAF search API 3,707 3,057 8,889 *These are likely in order of decreasing accuracy!
  • 34. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 34 Concerns/next steps • Scalable approaches for Q/A of data (new and old) • Discoverability and usability for humans and machines (APIs) • Repository integration: adopting a linked data model will help • Trusted channels for upload to existing knowledge bases: design a formal model? • Work with specialized sub-collections of knowledge bases (topics, regions)? • Test DBpedia Spotlight w/Spanish data pack • Does using links to existing LOD entries just reinforce inequality of artist exposure (“rich get richer”/LOD “echo chamber”)?
  • 35. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 35 Thanks! UCLA Digital Library • Lisa McAulay • Kristian Allen • T-Kay Sangwand • …everyone else (past and present) Arhoolie Foundation • Tom Diamant • Chris Strachwitz (obviously)
  • 36. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 36 Thanks!
  • 37. Integration of a Unique Multimedia Collection into Public Linked Open Data Repositories @peterbroadwell, @mart1nkle1n – #OR2016 37 Peter M. Broadwell @peterbroadwell broadwell@library.ucla.edu Martin Klein @mart1nkle1n martinklein@library.ucla.edu Let the Music Live/ que viva la música

Editor's Notes

  1. Thanks! Title is kind of a mouthful – at least it’s bilingual, like the collection I’ll be talking about. I know this is a conference about repositories rather than collections, but as you can see from the title, I’ll be talking about some exploratory work we’ve done to investigate how to integrate this collection and other like it into the broader world of linked data, and in this case specifically linked open data. We have a collection that’s an especially good candidate for this treatment: a multimedia collection of unique cultural materials that is quite extensive and also about 15 years old now, but so far has remained largely isolated online. We’ve recently been researching ways to make it part of the web, not merely on the web.
  2. Here’s the current site for the collection. More statistics are in the next slide, but first, given that the headline of the talk is “let the music live” (and the title of this entire session is “let the content sing”, I figure we should listen to a bit of it. The song: a bitingly satirical two-part corrido, “El Lavaplatos” (the dishwasher) recorded by Los Hermanos Bañuelos in LA in the 1920s, tells the first-person story of a Mexican immigrant who seeks success in Hollywood but finds only menial labor and dashed dreams, and returns to Mexico more broke than before.
  3. Vital statistics about the materials in the collection. Digitization began in 2001 and has continued to… today. Most of the recordings are technically still in copyright, but many would qualify as “orphan works” (the original label is long since defunct, and ownership of its remaining assets is unclear). So the sound content of the collection is not truly “open” – users outside UCLA are limited to a 90-second snippet. We’d love to provide more, and maybe we will some day.
  4. More statistics about the metadata – the concept of “unique names” is a bit problematic for us, as I’ll discuss soon. The music is actually quite a range of genres, from all parts of Mexico and the southern US. Most recognizable genres would be mariachi ensembles and two-person ballads with voice and guitar. The subject matter of the songs is quite varied as well, but fortunately we have metadata about this, too. Not lyrics in most cases, but tags.
  5. Given this impressive number of song tags, I saw an opportunity to create a visual overview of the human-assigned descriptors of the songs. Maybe you’ve seen “genre maps” like this for LastFM or Spotify. Those usually take a few days on a high-performance computing cluster running deep learning self-organizing map algorithms. All I had was my laptop, so I just used Gephi. If two tags were used for the same song, I put a line between them, then generated a network layout. Note that there are 2 types of nodes – song genres(mostly in Spanish) and tags (in English). It’s pretty revealing to see how they are positioned relative to each other. Ex: corrido.
  6. Given this impressive number of song tags, I saw an opportunity to create a visual overview of the human-assigned descriptors of the songs. Maybe you’ve seen “genre maps” like this for LastFM or Spotify. Those usually take a few days on a high-performance computing cluster running deep learning self-organizing map algorithms. All I had was my laptop, so I just used Gephi. If two tags were used for the same song, I put a line between them, then generated a network layout. Note that there are 2 types of nodes – song genres(mostly in Spanish) and tags (in English). It’s pretty revealing to see how they are positioned relative to each other. Ex: corrido.
  7. A little more historical background on the collection: it was begun by Chris Strachwitz, who was a descendant of aristocracy in what is now Poland before WWII, eventually came to the US as a refugee after the war, got really into US popular music, moved to California, went to Pomona and then Berkeley, became an avid record collector – the more obscure the better – is friends with Les Blank, had a radio show on KPFK, founded Arhoolie records in 1960 and started collecting rare 78s of Mexican and Mexican American music around then.
  8. Here’s Arhoolie records and its retail outlet, the Down Home Record Store, on San Pablo Ave in El Cerrito, just north of Berkeley
  9. The Arhoolie Foundation’s website was blocked by the Web filter at the UCLA Conference Center for being an “advocacy organization.” Maybe Donald Trump is in charge of the firewall???
  10. There have now been several iterations of the Frontera collection site; this is the most recent, built in Drupal using the multi-lingual interface module.
  11. So far we’ve largely avoided the problems with a lack of multi-lingual support In institutional repositories by not using one – the metadata is ONLY in Drupal, and the music content is stored on an Isilon file system connected to a streaming server.
  12. The collection has received quite a bit of support from various sources over the past 15 years, for digitization and access. This has resulted in a large collection but with its share of growing pains, as I’ll discuss in a minute.
  13. At least one book has already been written using the collection as a primary source (based primarily on the digitization of the first 45,000 or early 78s)
  14. It has also been described as a multimedia encyclopedia, since it’s not just music – it’s images and official metadata, plus user-submitted metadata including lyrics. This is the part we’d really like to push now – augmenting participation in the collection by scholars and enthusiasts, and improving the visibility of the materials on the wider web. It turns out these goals are interrelated, and they might also help to address some of the metadata issues that have developed over the past decade as more and more content has been layered onto the site, often without the benefit of very much advance planning.
  15. These are some of the challenges now facing us that are necessary to overcome to integrate the collection into the wider web, but conversely, the integration process will actually help us to resolve some of these issues Song, album, release: for 78s they’re all the same thing, more or less, but for LPs and cassette tapes, the paradigm breaks down.
  16. These are the goals for our “linked data outreach” project. This is something that we’ve wanted to do with the Frontera collection for a long time, but haven’t had the time to consider until now.
  17. Some of the linked open data repositories we’d like the Frontera records to link out to. National Library of Wales is doing some interesting work that involves uploading records to Wikidata. This could work as well for music data, but if possible we’d prefer to work with MusicBrainz, which seeks to be the central source of song, artist, album, etc. authority IDs on the web.
  18. Dbpedia is another model
  19. Discogs is semi-commercial, but has a pretty active and committed user base, mostly of electronic music DJs, but anyway it’s potentially larger than MusicBrainz’s (note that MB is technically nonprofit, despite providing most of the Google Knowledge Graph data for music and musicians).
  20. Projects that have already prototyped LOD integration for music-related archives: Linked Jazz. We can use the NYPL Labs’ crowd-sourcing interfaces to add community-submitted data to Frontera, and to verify inter-knowledge base links.
  21. This is a demo of one of the NYPL Labs’ very addictive crowdsourcing interfaces; this one involves selecting the proper DBpedia entity (Wikipedia article) match for a name.
  22. It’s usually the one with a picture.
  23. Note: package solutions like MusicBrainz Picard (basically a non-commercial iTunes on steroids) are great, but probably don’t scale to 100k nodes. Audio fingerprints as the best authority source for music. Mention Shazam, maybe Hatto case?
  24. More experimental/challenging (this is harder)
  25. This is a discussion I kicked off a few weeks ago on the MusicBrainz forums about what softare would be necessary and what the policy implications would be of bulk upload of the majority non-overlapping records from a culture heritage data set like Frontera to MusicBrainz
  26. Posted to the forum discussion: what happens when upload bots are allowed Note that this is not necessarily a bad thing, though in this case it was a Japanese EDM music bot that was submitting a lot of bad data, so it wasn’t necessarily a net win.
  27. MusicBrainz has decided it has no choice but to trust Discogs. So if there’s an entity in Discogs, there exist scripting tools to fashion a new MusicBrainz entity to submit.
  28. Note: the fact that so few (usually at most 30%) of entities in the Frontera data set are matched in any of these repositories is potentially a good outcome; it means that most of the entities in Frontera are not attested anywhere else, and devising a workflow in which we auto-generate LOD records for these new entities that are then uploaded to an archive like MusicBrainz will play a huge role in increasing the visibility of these musicians, who otherwise would eventually have been forgotten.
  29. Check Spanish spelling/accents!