Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
A Perspective on Wikidata:
Ecosystems, Trust and
Usability
Robert Sanderson
Director for Cultural Heritage Metadata
Yale University
robert.sanderson@yale.edu
PCC Forum, 12th November 2021
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Overview
Ecosystems: Standards, Systems, Community
“single, centralized platform” (3 minutes)
Trust: Accuracy, Model, Norms
“crowd-sourced” (4 minutes)
Usability: LOUD, Structure, Use?
“integration into workflows” (3 minutes)
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Ecosystems - Standards
🙂 😃
😍
🙂 😊
🙂
😫
😖
😡
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Ecosystems - Systems
Wikidata is a single, centralized system:
• Technology dependent (wikibase, “unique” snak format)
• Ontology is only internal (Pxxx properties)
• Vocabulary is only internal (Q instances)
• Doesn’t try to be part of the web, only on the web
• Does provide identifiers for the entity in existing systems
Wikidata is the Facebook of Metadata
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Ecosystems - Community
Is there concern about wikidata establishing a “monopoly on
open data”?
• You can’t compete with free; first to market advantage
• Concerted branding/differentiation effort from
Wikimedia to establish “wikimedians” rather than
“contributors”
• Is there oxygen left for discussions about perhaps better
ways to do this?
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Trust - Accuracy
Using wikidata is not a issue of control, it’s an issue of trust.
Accuracy – Does the data appropriately describe reality?
Trust – Will it continue to do so in the future?
Trust is hard given the data is open and constantly changing.
How will the reputation of your organization be affected
(positively or negatively) by using constantly changing data?
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Trust – Lack of Model
The data model is incoherent due to constant change and
crowd sourcing. E.g. Yale Art Gallery (Q1568434) is a:
Place Organization
Building
Collection
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Trust – Norms
What should be described in wikidata, by whom?
• “Notability” requirement
• Wikipedia derived rules: e.g. can’t describe yourself
• Extremely inconsistent enforcement
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Usability – LOUD
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Usability – Ontology and Data
Some challenges, beyond the “unique” format:
• Properties are just identifiers (“P31”) not named (“type”)
• Records are flat … apart from qualifiers and references
• Which solve two issues, but not all
• Constantly changing records and ontology
• Without an effective synchronization mechanism
• Non-traditional conflation of ontology and vocabulary
together as instance data
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Usability – Use?
What can we use wikidata for?
• Source of external identifiers!
• Source of names in different languages
• More specific information, e.g. date not year
• Relationships to other entities
Wikidata is great for augmenting cultural knowledge
with details and relationships beyond traditional catalogs
Wikidata
and
Cultural
Heritage
robert.
sanderson
@yale.edu
@azaroth42
Thank You!

A Perspective on Wikidata: Ecosystems, Trust, and Usability

Editor's Notes

  • #4 B for trying.
  • #5 Or is that the Meta of Metadata. I’m just as confused as you.
  • #7 BBC example. Impossible to know what changes will come. Today inaccuracy is mostly well-intentioned robots making poorly coded judgements. Matt Miller adding LC identifiers for places
  • #8 Duplicates, Permanent Duplicates, no separation between classes and instances. How can we expect data of sufficient quality for use, when there’s no distinction between these different entities?
  • #9 And it did not end there  I don’t trust a system where the norms say I cannot describe myself. This is an innocuous case, but you can imagine many similar scenarios that would be very damaging to have information available, or not available.
  • #10 Usability of the data is determined by the audience, and for data that is the software developer.