Wikidata Introductory Workshop
Friends of OpenGLAM
Beat Estermann, Bern, 13 May 2019
Unless otherwise noted, the content of this presentation is made available under the CC BY 4.0 license.
• Short introduction to Wikidata
• What is its purpose?
• How does it work?
• Wikidata + GLAM
• Aim & vision
• Where do we stand?
• Zooming in on data related to heritage institutions
• Where would you / your institution fit in?
• Let’s Practice!
• Querying Wikidata
• Editing Wikidata
On the Programme Today
If you want to build a ship, don’t drum up people together to collect wood and don’t
assign them tasks and work, but rather teach them to long for the endless immensity
of the sea. – Antoine de Saint Exupery
Course page on Wikidata:
https://tinyurl.com/WD-intro-2019
Short Introduction to Wikidata
• What is its purpose?
• How does it work?
Imagine a world in which every single human being can freely
share in the sum of all knowledge. That's our commitment.
Structured
Commons
Purpose of Wikidata
• Centralized Interwiki-Links [Example: Bern]
• Centralized Data Management for Infoboxes [Example: Ferdinand de Saussure]
• Centralized Data Management for Lists [Example: Lista de pinturas de A. Norfini]
• Possibility of Querying the Data in a Standardized Format
[Example Queries / External Applications]
« The Sum of All Human Knowledge» as Linked Open Data
 Multilingual
 With Sourced Statements
 Freely usable by anyone (CC Zero)
Structure of Wikidata – RDF Triples
BernBern SwitzerlandSwitzerlandis the capital ofis the capital of
Subject Predicate Object
SwitzerlandSwitzerland capitalcapital BernBern
Predicate ObjectSubject
SwitzerlandSwitzerland
Subject
is ais a
Predicate
CountryCountry
Object
instance
class
property
instance
instance
property
SwitzerlandSwitzerland
Subject
GDPGDP
Predicate
518 Mia. $ value
property
instance
point in time 2015 value
qualifier
Structure of Wikidata – Linked Data
Subject Predicate Object
Bern
(Q70)
is a
(P31 - instance of)
municipality of Switzerland (
Q70208)
Bern
(Q70)
is the capital of
(P1376 - is the capital of)
Switzerland
(Q39)
Berlin
(Q64)
is a
(P31 - instance of)
municipality of Germany
(Q262166)
Berlin
(Q64)
is the capital of
(P1376 - is the capital of)
Germany
(Q183)
Switzerland
(Q39)
is a
(P31 - instance of)
country
(Q6256)
Germany
(Q183)
is a
(P31 - instance of)
country
(Q6256)
municipality of Switzerland
(Q70208)
is a subclass of
(P279 - subclass of)
municipality
(Q15284)
municipality of Germany
(Q262166)
is a subclass of
(P279 - subclass of)
municipality
(Q15284)
BernBern SwitzerlandSwitzerlandis the capital ofis the capital of
Subject Predicate Object
URI
URI URI
Structure of a Wikidata Entry
Douglas
Adams
Douglas
Adams Jane BelsonJane Belsonspousespouse
Subject
Predicate
Object
start / end time 25 Nov. 1991 – 11 May
Ref.
It’s Your Turn!
• Does the WD item of your
place of residence /
provenance have a statement
for the mayor? Is it up to
date?
• Does the Wikipedia page
contain this information?
• How about the Catalan
Wikipedia? US Department of Commerce, Bureau of the Census, Public
Information Office, around 1940. NARA. Public Domain.
Find the information on the Internet
and add it to Wikidata!
Add it also to Wikipedia! (in your
language and in Catalan!)
Wikidata + GLAM
• Aims & vision
• Where do we stand?
• Zooming in on data related to heritage institutions
• The aim of this project is to coordinate, facilitate and promote
the ingestion of cultural heritage related data into
Wikidata, to facilitate the cleansing and enhancement of this
data and to promote its use across Wikipedia, its sister
projects and beyond.
• It is our vision to establish Wikidata as a central hub for data
integration, data enhancement, and data management in
the heritage domain.
Aim and Vision (WikiProject Cultural Heritage)
• Establish Wikidata as a database that covers the entire world’s
cultural heritage.
• Establish Wikidata as a central hub that interlinks GLAM collections
around the world; and provides links to bibliographic, genealogic,
scientifc and other collections of information; create the ultimate
authority file.
• Foster truly multilingual and global collaboration among people
from various backgrounds.
• Leverage synergies between institutions, reduce duplicate work.
• Encourage debate in the community by highlighting and
interrogating differences in perspective.
• Provide a single source of data for some of the most popular web
sites and apps, including Wikipedia infoboxes and lists.
Vision (Blog posts: Stinson et al. 2016; Thornton / Cochrane 2016; Poulter 2017)
Thematic Projects
https://www.wikidata.org/wiki/Wikidata:WikiProject_Cultural_heritage [Example]
Current Trends in the Heritage Sector (1/2)
Source: OpenGLAM Benchmark Survey
N = 1560
Wikidata
Current Trends in the Heritage Sector (2/2)
Knowledge
Graph
Entity Extraction
Inter-linking
Machine Learning Human in the Loop
Services for Metadata Extraction & Enhancement
Source: Bern University of Applied Sciences
Core Aspects of Linked Data Publication
Source: eCH-0205 – Linked Open Data
• http://make.opendata.ch/wiki/data:glam_ch
• Personnalités Vaudoises (BCUL)
• Swiss Photography Metadata (Büro für Fotografiegeschichte)
• Artist data from the SIKART Lexicon on art in Switzerland (SIK-ISEA)
• Metadata of the Historical Dictionary of Switzerland (HLS)
• PCP Inventory (Federal Office for Civil Protection)
• Inventory of Historical Monuments (Canton of Zurich)
• Inventory of Historical Monuments (City of Zurich)
• Inventory of classified Gardens and Parks (City of Zurich)
• Art in the Urban Space (City of Zurich)
• Swiss GLAM Inventory (OpenGLAM)
• Inventory of Research Libraries in Switzerland (Swissbib)
• ISplus Swiss (G)LAM Inventory (Swiss National Library)
• Schauspielhaus Zürich Repertoire of Theatre and other Productions, 1938–1968
• Swiss Theatre Metadata (Swiss Theatre Collection)
• Plazi TreatmentBank (repository of the world's species) (Plazi.org)
• Historical Statistics of Switzerland (University of Zurich)
Data Provision – Which Datasets are Useful?
Challenges Related to Ontology Development (1/2)
All rights reserved.
• Coping with the Bazaar:
• Sometimes changes to property definitions are too easily made by
volunteers.
• There is a rigorous process for creating new properties, but not for
changing definitions of properties or creating new classes.
• No master language; how to keep translations of definitions in synch?
• Sometimes, different approaches are used to model the same thing.
• What are good design principles?
• Aim for re-usability of properties across various domains.
• Select high-priority areas first, do not try to solve everything overnight for
the entire cultural heritage domain.
• Use existing databases in a given domain as a starting point to drive
ontology development.
• …
• Finding a balance between:
• The expressive power of an ontology
• Its practicability when it comes to large scale use by many people
• Its queryability (usability from the perspective of data users)
Challenges Related to Ontology Development (2/2)
• Mapping Between Data Models
• Getting an overview of appropriate properties and classes can be a
time-consuming exercise.
• Creating new properties requires community agreement and may involve
lengthy discussions and compromises.
• There is still a lot of work to be done in the area of typologies and
thesauri (= controlled vocabularies) [Example]
• Matching Items / Disambiguation
• There are tools like Mix’n’Match and OpenRefine to support this, but it
remains a major challenge, esp. with datasets which haven’t resolved this
issue internally.
• Incorrect / Incoherent Data on Wikidata
• Many data ingestion projects require cleansing up of existing data.
• Repeated Ingestion / Updates
• How to approach the historicization of data?
• How to set up processes to regularly update data?
Challenges Related to Data Ingestion
N.B.: We are not filling a void or starting from scratch, but contributing to an
existing ecosystem of data, data models, and community members!
Example: Data Cleansing
• Establishing and Documenting Data Quality
• Getting rid of duplicates
• Dealing with incorrect and inconsistent data
• How to monitor data quality and data completeness?
• Building a Network of Trust
• Linking all statements to a reliable source
• In the future: “Signed Statements” 
• Data Exchange Between Wikidata and Primary Databases
• Data synchronization: How to keep data mutually up to date?
• How to make it easier for GLAM employees to follow
changes/improvements to their data on Wikidata?
Challenges Related to Data Maintenance
• Chicken-or-Egg Problem:
• Data usage drives data quality & completeness
• Data quality & completeness are prerequisites of data use
See also: How Wikidata is Solving its Chicken-or-Egg Problem in the Field of Cultural Heritage
Challenges Related to Data Use
How about data related to heritage
institutions?
Some Ideas…
• Create an international database of all heritage
institutions on the basis of Wikidata.
• Use this database to populate infoboxes and lists on
Wikipedia.
• Call upon heritage institutions to complement their
Wikidata entries and to keep them up-to-date.
• Call upon Wikipedians to create/enhance Wikipedia
articles about the institutions and their holdings.
• Get heritage institutions to enhance the inter-linking
between their own databases and Wikidata.
• Get heritage institutions to make their content available
through Wikidata & Wikimedia Commons.
• Encourage the development of third-party applications
making use of the worldwide inventory of heritage
institutions.
Vision
• WikiProject “Heritage Institutions”:
• Aim and scope (may require updating)
• Data structure
• Typology (controlled vocabularies; may need expanding)
• Overview of data sources to import data from
• Use cases (requires updating)
• Sample queries / maintenance queries
• So far, information about 40’000 museums, 25’000 libraries, and
4’000 archives have been ingested in Wikidata. The quality and
completeness of the entries vary a lot.
• For some countries, there is virtually full coverage of all existing
institutions (e.g. Switzerland, Ukraine, Brazil). For some countries,
national inventories exist, but have not been ingested yet (e.g.
Portugal, Russia, Spain). And in many further countries, there are no
or only very fragmentary inventories. At the moment, there is no
systematic overview of the current coverage per country.
Status Quo – Wikidata
There are currently two projects aiming at the implementation of a worldwide database of
heritage institutions on Wikidata:
- FindingGLAMs (project run by Wikimedia Sweden, Unsesco, and the Wikimedia Foundation)
- Sum of All GLAMs (project run by the Wiki Movement Brazil)
• A few Wikipedias already use Wikidata-driven infobox templates for
museums, libraries, and/or archives
(a central overview is lacking)
• The Wikipedia in Portuguese uses Wikidata-driven lists in their main
namespace.
• In some Wikipedias, the use of Wikidata-driven infobox templates is
still a disputed practice; this is even more true for Wikidata-driven
lists.
• A few Wikipedias have a Mbabel template that can be used to create
draft articles on museums based on data from Wikidata.
• There are many missing articles about heritage institutions in
Wikipedia... ;-)
Status Quo – Wikipedia
• Wikimedia CH Campaign at the occasion of the International
Museums Day 2018 (creation and improvement of Wikipedia articles)
• Swiss Open Cultural Data Hackathon (data ingests and creation of
applications)
• Wiki Loves Monuments (international photo contest)
Status Quo – Campaigns & Community (Switzerland)
Discussion
• Where would you / your institution fit in?
Let’s Practice
• Querying Wikidata
• Editing Wikidata
Querying & Editing Wikidata
Querying & Editing Wikidata
•Schauspielhaus Productions without a “based on” statement
•Swiss heritage institutions without a “director” statement
•Swiss heritage institutions without a German/French label
•Items about Swiss museums with statements that are not properly sourced
Querying Wikidata & Editing Wikipedia
•The performers with the most appearances in plays at Schauspielhaus Zürich but
without a Wikipedia article in German
Exploring Ontologies & Editing Wikidata
•Typology of heritage institutions
•Typology of concerts, recordings, etc.
The task descriptions can be found on the course page:
https://tinyurl.com/WD-intro-2019
Thank You for Your Attention!
I Hope You Will Enjoy Wikidata… ;-)
Contact
Beat Estermann
Bern University of Applied Sciences
beat.estermann@bfh.ch
+41 31 848 34 38

Wikidata Introductory Workshop

  • 1.
    Wikidata Introductory Workshop Friendsof OpenGLAM Beat Estermann, Bern, 13 May 2019 Unless otherwise noted, the content of this presentation is made available under the CC BY 4.0 license.
  • 2.
    • Short introductionto Wikidata • What is its purpose? • How does it work? • Wikidata + GLAM • Aim & vision • Where do we stand? • Zooming in on data related to heritage institutions • Where would you / your institution fit in? • Let’s Practice! • Querying Wikidata • Editing Wikidata On the Programme Today If you want to build a ship, don’t drum up people together to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea. – Antoine de Saint Exupery Course page on Wikidata: https://tinyurl.com/WD-intro-2019
  • 3.
    Short Introduction toWikidata • What is its purpose? • How does it work?
  • 4.
    Imagine a worldin which every single human being can freely share in the sum of all knowledge. That's our commitment.
  • 5.
  • 6.
    Purpose of Wikidata •Centralized Interwiki-Links [Example: Bern] • Centralized Data Management for Infoboxes [Example: Ferdinand de Saussure] • Centralized Data Management for Lists [Example: Lista de pinturas de A. Norfini] • Possibility of Querying the Data in a Standardized Format [Example Queries / External Applications] « The Sum of All Human Knowledge» as Linked Open Data  Multilingual  With Sourced Statements  Freely usable by anyone (CC Zero)
  • 7.
    Structure of Wikidata– RDF Triples BernBern SwitzerlandSwitzerlandis the capital ofis the capital of Subject Predicate Object SwitzerlandSwitzerland capitalcapital BernBern Predicate ObjectSubject SwitzerlandSwitzerland Subject is ais a Predicate CountryCountry Object instance class property instance instance property SwitzerlandSwitzerland Subject GDPGDP Predicate 518 Mia. $ value property instance point in time 2015 value qualifier
  • 8.
    Structure of Wikidata– Linked Data Subject Predicate Object Bern (Q70) is a (P31 - instance of) municipality of Switzerland ( Q70208) Bern (Q70) is the capital of (P1376 - is the capital of) Switzerland (Q39) Berlin (Q64) is a (P31 - instance of) municipality of Germany (Q262166) Berlin (Q64) is the capital of (P1376 - is the capital of) Germany (Q183) Switzerland (Q39) is a (P31 - instance of) country (Q6256) Germany (Q183) is a (P31 - instance of) country (Q6256) municipality of Switzerland (Q70208) is a subclass of (P279 - subclass of) municipality (Q15284) municipality of Germany (Q262166) is a subclass of (P279 - subclass of) municipality (Q15284) BernBern SwitzerlandSwitzerlandis the capital ofis the capital of Subject Predicate Object URI URI URI
  • 9.
    Structure of aWikidata Entry Douglas Adams Douglas Adams Jane BelsonJane Belsonspousespouse Subject Predicate Object start / end time 25 Nov. 1991 – 11 May Ref.
  • 10.
    It’s Your Turn! •Does the WD item of your place of residence / provenance have a statement for the mayor? Is it up to date? • Does the Wikipedia page contain this information? • How about the Catalan Wikipedia? US Department of Commerce, Bureau of the Census, Public Information Office, around 1940. NARA. Public Domain. Find the information on the Internet and add it to Wikidata! Add it also to Wikipedia! (in your language and in Catalan!)
  • 11.
    Wikidata + GLAM •Aims & vision • Where do we stand? • Zooming in on data related to heritage institutions
  • 12.
    • The aimof this project is to coordinate, facilitate and promote the ingestion of cultural heritage related data into Wikidata, to facilitate the cleansing and enhancement of this data and to promote its use across Wikipedia, its sister projects and beyond. • It is our vision to establish Wikidata as a central hub for data integration, data enhancement, and data management in the heritage domain. Aim and Vision (WikiProject Cultural Heritage)
  • 13.
    • Establish Wikidataas a database that covers the entire world’s cultural heritage. • Establish Wikidata as a central hub that interlinks GLAM collections around the world; and provides links to bibliographic, genealogic, scientifc and other collections of information; create the ultimate authority file. • Foster truly multilingual and global collaboration among people from various backgrounds. • Leverage synergies between institutions, reduce duplicate work. • Encourage debate in the community by highlighting and interrogating differences in perspective. • Provide a single source of data for some of the most popular web sites and apps, including Wikipedia infoboxes and lists. Vision (Blog posts: Stinson et al. 2016; Thornton / Cochrane 2016; Poulter 2017)
  • 14.
  • 15.
    Current Trends inthe Heritage Sector (1/2) Source: OpenGLAM Benchmark Survey N = 1560 Wikidata
  • 16.
    Current Trends inthe Heritage Sector (2/2) Knowledge Graph Entity Extraction Inter-linking Machine Learning Human in the Loop Services for Metadata Extraction & Enhancement Source: Bern University of Applied Sciences
  • 17.
    Core Aspects ofLinked Data Publication Source: eCH-0205 – Linked Open Data
  • 18.
    • http://make.opendata.ch/wiki/data:glam_ch • PersonnalitésVaudoises (BCUL) • Swiss Photography Metadata (Büro für Fotografiegeschichte) • Artist data from the SIKART Lexicon on art in Switzerland (SIK-ISEA) • Metadata of the Historical Dictionary of Switzerland (HLS) • PCP Inventory (Federal Office for Civil Protection) • Inventory of Historical Monuments (Canton of Zurich) • Inventory of Historical Monuments (City of Zurich) • Inventory of classified Gardens and Parks (City of Zurich) • Art in the Urban Space (City of Zurich) • Swiss GLAM Inventory (OpenGLAM) • Inventory of Research Libraries in Switzerland (Swissbib) • ISplus Swiss (G)LAM Inventory (Swiss National Library) • Schauspielhaus Zürich Repertoire of Theatre and other Productions, 1938–1968 • Swiss Theatre Metadata (Swiss Theatre Collection) • Plazi TreatmentBank (repository of the world's species) (Plazi.org) • Historical Statistics of Switzerland (University of Zurich) Data Provision – Which Datasets are Useful?
  • 19.
    Challenges Related toOntology Development (1/2) All rights reserved.
  • 22.
    • Coping withthe Bazaar: • Sometimes changes to property definitions are too easily made by volunteers. • There is a rigorous process for creating new properties, but not for changing definitions of properties or creating new classes. • No master language; how to keep translations of definitions in synch? • Sometimes, different approaches are used to model the same thing. • What are good design principles? • Aim for re-usability of properties across various domains. • Select high-priority areas first, do not try to solve everything overnight for the entire cultural heritage domain. • Use existing databases in a given domain as a starting point to drive ontology development. • … • Finding a balance between: • The expressive power of an ontology • Its practicability when it comes to large scale use by many people • Its queryability (usability from the perspective of data users) Challenges Related to Ontology Development (2/2)
  • 23.
    • Mapping BetweenData Models • Getting an overview of appropriate properties and classes can be a time-consuming exercise. • Creating new properties requires community agreement and may involve lengthy discussions and compromises. • There is still a lot of work to be done in the area of typologies and thesauri (= controlled vocabularies) [Example] • Matching Items / Disambiguation • There are tools like Mix’n’Match and OpenRefine to support this, but it remains a major challenge, esp. with datasets which haven’t resolved this issue internally. • Incorrect / Incoherent Data on Wikidata • Many data ingestion projects require cleansing up of existing data. • Repeated Ingestion / Updates • How to approach the historicization of data? • How to set up processes to regularly update data? Challenges Related to Data Ingestion N.B.: We are not filling a void or starting from scratch, but contributing to an existing ecosystem of data, data models, and community members!
  • 24.
  • 26.
    • Establishing andDocumenting Data Quality • Getting rid of duplicates • Dealing with incorrect and inconsistent data • How to monitor data quality and data completeness? • Building a Network of Trust • Linking all statements to a reliable source • In the future: “Signed Statements”  • Data Exchange Between Wikidata and Primary Databases • Data synchronization: How to keep data mutually up to date? • How to make it easier for GLAM employees to follow changes/improvements to their data on Wikidata? Challenges Related to Data Maintenance
  • 27.
    • Chicken-or-Egg Problem: •Data usage drives data quality & completeness • Data quality & completeness are prerequisites of data use See also: How Wikidata is Solving its Chicken-or-Egg Problem in the Field of Cultural Heritage Challenges Related to Data Use
  • 28.
    How about datarelated to heritage institutions? Some Ideas…
  • 29.
    • Create aninternational database of all heritage institutions on the basis of Wikidata. • Use this database to populate infoboxes and lists on Wikipedia. • Call upon heritage institutions to complement their Wikidata entries and to keep them up-to-date. • Call upon Wikipedians to create/enhance Wikipedia articles about the institutions and their holdings. • Get heritage institutions to enhance the inter-linking between their own databases and Wikidata. • Get heritage institutions to make their content available through Wikidata & Wikimedia Commons. • Encourage the development of third-party applications making use of the worldwide inventory of heritage institutions. Vision
  • 30.
    • WikiProject “HeritageInstitutions”: • Aim and scope (may require updating) • Data structure • Typology (controlled vocabularies; may need expanding) • Overview of data sources to import data from • Use cases (requires updating) • Sample queries / maintenance queries • So far, information about 40’000 museums, 25’000 libraries, and 4’000 archives have been ingested in Wikidata. The quality and completeness of the entries vary a lot. • For some countries, there is virtually full coverage of all existing institutions (e.g. Switzerland, Ukraine, Brazil). For some countries, national inventories exist, but have not been ingested yet (e.g. Portugal, Russia, Spain). And in many further countries, there are no or only very fragmentary inventories. At the moment, there is no systematic overview of the current coverage per country. Status Quo – Wikidata There are currently two projects aiming at the implementation of a worldwide database of heritage institutions on Wikidata: - FindingGLAMs (project run by Wikimedia Sweden, Unsesco, and the Wikimedia Foundation) - Sum of All GLAMs (project run by the Wiki Movement Brazil)
  • 31.
    • A fewWikipedias already use Wikidata-driven infobox templates for museums, libraries, and/or archives (a central overview is lacking) • The Wikipedia in Portuguese uses Wikidata-driven lists in their main namespace. • In some Wikipedias, the use of Wikidata-driven infobox templates is still a disputed practice; this is even more true for Wikidata-driven lists. • A few Wikipedias have a Mbabel template that can be used to create draft articles on museums based on data from Wikidata. • There are many missing articles about heritage institutions in Wikipedia... ;-) Status Quo – Wikipedia
  • 32.
    • Wikimedia CHCampaign at the occasion of the International Museums Day 2018 (creation and improvement of Wikipedia articles) • Swiss Open Cultural Data Hackathon (data ingests and creation of applications) • Wiki Loves Monuments (international photo contest) Status Quo – Campaigns & Community (Switzerland)
  • 33.
    Discussion • Where wouldyou / your institution fit in?
  • 34.
    Let’s Practice • QueryingWikidata • Editing Wikidata
  • 35.
    Querying & EditingWikidata Querying & Editing Wikidata •Schauspielhaus Productions without a “based on” statement •Swiss heritage institutions without a “director” statement •Swiss heritage institutions without a German/French label •Items about Swiss museums with statements that are not properly sourced Querying Wikidata & Editing Wikipedia •The performers with the most appearances in plays at Schauspielhaus Zürich but without a Wikipedia article in German Exploring Ontologies & Editing Wikidata •Typology of heritage institutions •Typology of concerts, recordings, etc. The task descriptions can be found on the course page: https://tinyurl.com/WD-intro-2019
  • 36.
    Thank You forYour Attention! I Hope You Will Enjoy Wikidata… ;-) Contact Beat Estermann Bern University of Applied Sciences beat.estermann@bfh.ch +41 31 848 34 38