Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Panel on Authority Files and Controlled Vocabularies
Welcome & Introduction
Swiss Open Cultural Data Hackathon, 3 June 2020
Beat Estermann
▶ Bern University of Applied Sciences, Institute for Public Sector Transformation
Image (cropped) : http://lod-cloud.net/, 2017, CC BY, CC-BY
The slides are provided under the Creative Commons BY 4.0 License
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
▶ Nicola Carboni, University of Zurich, Swiss Arts Research Infrastructure
• Resources and methodologies gathered by the Swiss Art Research Infrastructure (SARI) for
managing, organising and semantifying reference information
▶ Sarah Amsler, University of Zurich, Swiss Arts Research Infrastructure
• Translation of the AAT Art & Architecture Thesaurus
▶ Beat Estermann, Bern University of Applied Sciences
• Swiss GLAM Inventory on Wikidata
• Authorities in the Context of the LOD Ecosystem for the Performing Arts
Today’s Panel
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Key to Linked Open Data: A Shared Language to Describe the World
Conceptual Models
(Ontologies)
Controlled Vocabularies
?
«Named Entities»
Base Registers
Authority Files
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
LOD Publication of Named Entities and Controlled Vocabularies
Study funded by E-Government Switzerland, 2019
▶ Great demand for base registers relating to territorial divisions, geographical location of
objects, identification of organizations.
▶ High demand for data from the Federal Statistical Office and Swisstopo;
the latter is doing a great job in publishing LOD, the former less so.
▶ Most data publishers require additional support in view of LOD publication of base registers
(tools, guidelines, consulting).
Potential for
Data Usage
Feasibility of
Data Publication
Readiness of Data
Providers
Goal:
All Important Base
Registers Published
as LOD
Prio 1
Source: Estermann, B., Gschwend, A., Haller, S., Parrales E. (2020): Basisregister und kontrollierte Vokabulare als Wegbereiter für Linked Open Data in der Schweiz.
Innovationsprojekt von E-Government Schweiz im Auftrag des Schweizerischen Bundesarchivs. Berner Fachhochschule, Institut Public Sector Transformation.
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Authorities Files and Controlled Vocabularies Specific to the Heritage Sector and
the Digital Humanities
Authority File Data Publisher CH / Int
Gemeinsame Normdatei (GND) German National Library Int
Virtual International Authority File (VIAF) Online Computer Library Center (OCLC) Int
Inventories of Swiss Heritage Institutions Swiss National Library, Swissbib/UB Basel, Swiss Museums
Association, Association of Swiss Archivists
CH
Inventories of Cultural Properties in Switzerland Swiss Federal Office for Civil Protection, cantonal and municipal
offices for monuments protection
CH
Swiss Photography Metadata Foto CH CH
Authority files on Swiss history histHub CH
Metadata of the Swiss Historical Dictionary Historisches Lexikon Schweiz (HLS) CH
Biographical catalogue of the SNL Swiss National Library CH
Metagrid (CH-related historical authority files) SAGW / Dodis CH
Union List of Artist Names (ULAN) Getty Research Institute Int
Inventory of CH-related artists SIK ISEA CH
Dewey Decimal Classification Online Computer Library Center (OCLC) Int
Art and Architecture Thesaurus (AAT) Getty Research Institute Int
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Photo: Joop Haitsma (?), 1943, Wikimedia Commons, CC BY-SA 3.0
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Swiss GLAM Inventory on Wikidata
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
▶ Combined data sources for the OpenGLAM Benchmark Survey in 2014:
• Museums Inventory of the Swiss Museums Association
• ISIL Inventory maintained by the Swiss National Library
• Archives Database of the Association of Swiss Archivists
• Inventory of Cultural Properties by the Federal Office for Civil Protection
• etc.
▶ Published the dataset as open data in 2015
▶ Ingested the data in Wikidata in 2016
Motivation
▶ Gain practical experience with linked data publication via Wikidata
▶ Provide the data to the community; discontinue maintenance of the dataset on my side
Wikidata Ingest as a Side-product of an Online Survey
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Pilot Project «Sum of All Swiss GLAMs»
OpenGLAM CH and partners, 2019/2020
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Pilot Project «Referencing Archival Fonds on Wikidata»
Swiss National Library, ETH Library, Zurich Central Library, OpenGLAM CH
Further Information: This Month in GLAM, January 2020 edition
Referencing Archival Fonds on Wikidata
Graph-based Query Expansion on Library Discovery Systems
feeds into
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Pilot Project «Sum of All Swiss GLAMs» – Example Infoboxes
English Wikipedia
“Zentralbibliothek Zürich”
French Wikipedia
“Lavoslav Ružička”
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Crowdsourcing Campaigns (e.g. International Museums Day)
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
▶ Integration / Synchronization with Institutional Databases
• Evolution of the complementarity of Wikidata and institutional databases
• Who will have «authority» over which aspects of the data?
• Who will maintain / supplement which data on which platform?
▶ Data Modelling Issues
• Representing organizational history (mergers, acquisitions, etc.)
• Harmonizing data modelling practices
▶ Crowdsourcing / Expert Sourcing Campaigns
▶ Develop Further Use Cases
▶ Internationalization
Current Challenges / Future Plans
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Authorities in the Context of the Linked
Open Data Ecosystem for the Performing
Arts (LODEPA)
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
LODEPA – Many Stakeholders, One Knowledge Base
Distributed
Architecture
Key Stakeholders
Usage Scenarios
Source: https://linkeddigitalfuture.ca/
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Relevant Base Registers / Authority Files
Databases
• ISNI
• VIAF
• MusicBrainz
• Discogs
• IMDb
• Songkick
• Wikidata
Entities
• Works (literary, musical, choreographic)
• Editions/Translations of Works
• Character Roles
• Performing Arts Buildings
• Humans (writers, composers, performing arts
professionals)
• Organizations (presenting organizations, musical
ensembles, theatre troupes, dance troupes)
Base registers and authority files
play a key role in
interlinking datasets
from various sources.
Some statistics (Wikidata, May 2020)
• 470’000 musical works
• 23’000 plays
• 900 choreographic works
• 24’000 character roles
• 22’000 performing arts buildings
• 280’000 musicians
• 280’000 actors/actresses
• 96’000 musical ensembles
• 5’600 theatre troupes
• 400 dance troupes
and steadily growing...
Databases
• ISNI
• VIAF
• MusicBrainz
• Discogs
• IMDb
• Songkick
• Wikidata
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Use Case: Ehrenreich Collection
Photos: Beat Estermann (CC BY 4.0)
Roy Ehrenreich in 1981 during a trip in Nepal.
Unidentified photographer. All rights reserved.
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
▶ “Artists” Catalogue:
• Originally: 11’581 pages Word Processing Document
• Transformation into 115’000 lines in Excel
• Currently: transfer into a noSQL database
▶ Challenges:
• Data not normalized
• Spelling mistakes
• Data required for copyright assessment is missing (e.g. composers’ date of death)
• For the use in apps, it would be nice to enrich the data with further data
• Data quality is insufficient to allow for a direct data ingest in Wikidata
Ehrenreich Collection Database
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Enhancement of the Ehrenreich Collection Database
Ehrenreich Collection
Database
Peformance history of
opera-related performances
Performances
Artists
Art Songs
Composers
Authors
Operas / Arias
Operatic Characters
Voice Types
(controlled vocabulary)
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
▶ Ideally, we would be able to link to well-documented productions/performances on
Wikidata, but we are still far from being able to do so!
▶ Ingesting data is a tedious task
▶ Data holders are not always cooperative, some of them are though
▶ Data matching is challenging if entities are «underspecified»
(e.g. only first name and family name of a person in the source and/or the target database)
Challenges
However...
▶ Little strokes fell big oaks
▶ Data ingests on Wikidata are cumulative and enable many different use cases
▶ Missing data on Wikidata can be provided through crowdsourcing / expert sourcing
▶ In 2020, a student at BFH has started to add data from Wikidata to the Ehrenreich
Collection Database on a large scale
Berner Fachhochschule | Haute école spécialisée bernoise | Bern University of Applied Sciences
Get in Touch!
Bern University of Applied Sciences
Institute for Public Sector Transformation
Prof. Beat Estermann
beat.estermann@bfh.ch
Thank You For Your Attention!

Estermann Panel on Authority Files, 3 June 2020

  • 1.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Panel on Authority Files and Controlled Vocabularies Welcome & Introduction Swiss Open Cultural Data Hackathon, 3 June 2020 Beat Estermann ▶ Bern University of Applied Sciences, Institute for Public Sector Transformation Image (cropped) : http://lod-cloud.net/, 2017, CC BY, CC-BY The slides are provided under the Creative Commons BY 4.0 License
  • 2.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences ▶ Nicola Carboni, University of Zurich, Swiss Arts Research Infrastructure • Resources and methodologies gathered by the Swiss Art Research Infrastructure (SARI) for managing, organising and semantifying reference information ▶ Sarah Amsler, University of Zurich, Swiss Arts Research Infrastructure • Translation of the AAT Art & Architecture Thesaurus ▶ Beat Estermann, Bern University of Applied Sciences • Swiss GLAM Inventory on Wikidata • Authorities in the Context of the LOD Ecosystem for the Performing Arts Today’s Panel
  • 3.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Key to Linked Open Data: A Shared Language to Describe the World Conceptual Models (Ontologies) Controlled Vocabularies ? «Named Entities» Base Registers Authority Files
  • 4.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences LOD Publication of Named Entities and Controlled Vocabularies Study funded by E-Government Switzerland, 2019 ▶ Great demand for base registers relating to territorial divisions, geographical location of objects, identification of organizations. ▶ High demand for data from the Federal Statistical Office and Swisstopo; the latter is doing a great job in publishing LOD, the former less so. ▶ Most data publishers require additional support in view of LOD publication of base registers (tools, guidelines, consulting). Potential for Data Usage Feasibility of Data Publication Readiness of Data Providers Goal: All Important Base Registers Published as LOD Prio 1 Source: Estermann, B., Gschwend, A., Haller, S., Parrales E. (2020): Basisregister und kontrollierte Vokabulare als Wegbereiter für Linked Open Data in der Schweiz. Innovationsprojekt von E-Government Schweiz im Auftrag des Schweizerischen Bundesarchivs. Berner Fachhochschule, Institut Public Sector Transformation.
  • 5.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Authorities Files and Controlled Vocabularies Specific to the Heritage Sector and the Digital Humanities Authority File Data Publisher CH / Int Gemeinsame Normdatei (GND) German National Library Int Virtual International Authority File (VIAF) Online Computer Library Center (OCLC) Int Inventories of Swiss Heritage Institutions Swiss National Library, Swissbib/UB Basel, Swiss Museums Association, Association of Swiss Archivists CH Inventories of Cultural Properties in Switzerland Swiss Federal Office for Civil Protection, cantonal and municipal offices for monuments protection CH Swiss Photography Metadata Foto CH CH Authority files on Swiss history histHub CH Metadata of the Swiss Historical Dictionary Historisches Lexikon Schweiz (HLS) CH Biographical catalogue of the SNL Swiss National Library CH Metagrid (CH-related historical authority files) SAGW / Dodis CH Union List of Artist Names (ULAN) Getty Research Institute Int Inventory of CH-related artists SIK ISEA CH Dewey Decimal Classification Online Computer Library Center (OCLC) Int Art and Architecture Thesaurus (AAT) Getty Research Institute Int
  • 6.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Photo: Joop Haitsma (?), 1943, Wikimedia Commons, CC BY-SA 3.0
  • 7.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Swiss GLAM Inventory on Wikidata
  • 8.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences ▶ Combined data sources for the OpenGLAM Benchmark Survey in 2014: • Museums Inventory of the Swiss Museums Association • ISIL Inventory maintained by the Swiss National Library • Archives Database of the Association of Swiss Archivists • Inventory of Cultural Properties by the Federal Office for Civil Protection • etc. ▶ Published the dataset as open data in 2015 ▶ Ingested the data in Wikidata in 2016 Motivation ▶ Gain practical experience with linked data publication via Wikidata ▶ Provide the data to the community; discontinue maintenance of the dataset on my side Wikidata Ingest as a Side-product of an Online Survey
  • 9.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Pilot Project «Sum of All Swiss GLAMs» OpenGLAM CH and partners, 2019/2020
  • 10.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Pilot Project «Referencing Archival Fonds on Wikidata» Swiss National Library, ETH Library, Zurich Central Library, OpenGLAM CH Further Information: This Month in GLAM, January 2020 edition Referencing Archival Fonds on Wikidata Graph-based Query Expansion on Library Discovery Systems feeds into
  • 11.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Pilot Project «Sum of All Swiss GLAMs» – Example Infoboxes English Wikipedia “Zentralbibliothek Zürich” French Wikipedia “Lavoslav Ružička”
  • 12.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Crowdsourcing Campaigns (e.g. International Museums Day)
  • 13.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences ▶ Integration / Synchronization with Institutional Databases • Evolution of the complementarity of Wikidata and institutional databases • Who will have «authority» over which aspects of the data? • Who will maintain / supplement which data on which platform? ▶ Data Modelling Issues • Representing organizational history (mergers, acquisitions, etc.) • Harmonizing data modelling practices ▶ Crowdsourcing / Expert Sourcing Campaigns ▶ Develop Further Use Cases ▶ Internationalization Current Challenges / Future Plans
  • 14.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Authorities in the Context of the Linked Open Data Ecosystem for the Performing Arts (LODEPA)
  • 15.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences LODEPA – Many Stakeholders, One Knowledge Base Distributed Architecture Key Stakeholders Usage Scenarios Source: https://linkeddigitalfuture.ca/
  • 16.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Relevant Base Registers / Authority Files Databases • ISNI • VIAF • MusicBrainz • Discogs • IMDb • Songkick • Wikidata Entities • Works (literary, musical, choreographic) • Editions/Translations of Works • Character Roles • Performing Arts Buildings • Humans (writers, composers, performing arts professionals) • Organizations (presenting organizations, musical ensembles, theatre troupes, dance troupes) Base registers and authority files play a key role in interlinking datasets from various sources. Some statistics (Wikidata, May 2020) • 470’000 musical works • 23’000 plays • 900 choreographic works • 24’000 character roles • 22’000 performing arts buildings • 280’000 musicians • 280’000 actors/actresses • 96’000 musical ensembles • 5’600 theatre troupes • 400 dance troupes and steadily growing... Databases • ISNI • VIAF • MusicBrainz • Discogs • IMDb • Songkick • Wikidata
  • 17.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Use Case: Ehrenreich Collection Photos: Beat Estermann (CC BY 4.0) Roy Ehrenreich in 1981 during a trip in Nepal. Unidentified photographer. All rights reserved.
  • 18.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences ▶ “Artists” Catalogue: • Originally: 11’581 pages Word Processing Document • Transformation into 115’000 lines in Excel • Currently: transfer into a noSQL database ▶ Challenges: • Data not normalized • Spelling mistakes • Data required for copyright assessment is missing (e.g. composers’ date of death) • For the use in apps, it would be nice to enrich the data with further data • Data quality is insufficient to allow for a direct data ingest in Wikidata Ehrenreich Collection Database
  • 19.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Enhancement of the Ehrenreich Collection Database Ehrenreich Collection Database Peformance history of opera-related performances Performances Artists Art Songs Composers Authors Operas / Arias Operatic Characters Voice Types (controlled vocabulary)
  • 20.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences ▶ Ideally, we would be able to link to well-documented productions/performances on Wikidata, but we are still far from being able to do so! ▶ Ingesting data is a tedious task ▶ Data holders are not always cooperative, some of them are though ▶ Data matching is challenging if entities are «underspecified» (e.g. only first name and family name of a person in the source and/or the target database) Challenges However... ▶ Little strokes fell big oaks ▶ Data ingests on Wikidata are cumulative and enable many different use cases ▶ Missing data on Wikidata can be provided through crowdsourcing / expert sourcing ▶ In 2020, a student at BFH has started to add data from Wikidata to the Ehrenreich Collection Database on a large scale
  • 21.
    Berner Fachhochschule |Haute école spécialisée bernoise | Bern University of Applied Sciences Get in Touch! Bern University of Applied Sciences Institute for Public Sector Transformation Prof. Beat Estermann beat.estermann@bfh.ch Thank You For Your Attention!