Classification schemes, thesauri and other
Knowledge Organization Systems
- a Linked Data perspective
Antoine Isaac
Pelagios: Linked Pasts
London, July 20-21, 2015
Classification schemes?
Scope: knowledge organization systems (KOS) such as
classification systems, thesauri, gazetteers, subject
heading lists…
(last-minute addition: also time periods, cf. PeriodO  )
Simple Knowledge Organization System
SKOS is for exchanging KOSs as Linked Data (in RDF)
• Better than semi-structured data (CSV)
• Still relatively simple
A SKOS graph
animals
cats
UF domestic cats
RT wildcats
BT animals
SN used only for domestic cats
domestic cats
USE cats
wildcats
Representing semantics
The formal way: OWL Semantic Web ontology language
Used for ontologies that enable machine reasoning
Mother is a class
Parent is the class of entities of type Person that are related to
at least one other resource of type Person using the child
property
…
Do we want to represent every vocabulary
as a formal ontology?
It is possible, but not easy
 KOS are large
 KOS have softer “semantics”
Parent RelatedTerm Child
 KOS have a focus on terminological information
Child UsedFor Offspring
Softer semantics can be useful for many applications!
Europeana and knowledge organisation systems
 Create a “semantic layer” on top of cultural heritage objects
From: Stefan Gradmann
Using KOS in the Europeana Data Model
Enhanced descriptive metadata
Using KOS Linked Data
<skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2251">
<skos:prefLabel xml:lang="">Harpsichord</skos:prefLabel>
<skos:prefLabel xml:lang="de">Cembalo</skos:prefLabel>
<skos:prefLabel xml:lang="sv">Cembalo</skos:prefLabel>
<skos:prefLabel xml:lang="fr">Clavecin</skos:prefLabel>
<skos:prefLabel xml:lang="it">Clavicembalo</skos:prefLabel>
<skos:prefLabel xml:lang="en">Harpsichord</skos:prefLabel>
<skos:prefLabel xml:lang="nl">Klavecimbel</skos:prefLabel>
<skos:broader>
<skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2239">
<skos:prefLabel>Harpsichords</skos:prefLabel>
</skos:Concept>
</skos:broader>
</skos:Concept>
Other types of contextual resources
<gn:Feature rdf:about="http://sws.geonames.org/3176959/">
<gn:name>Florence</gn:name>
<gn:alternateName xml:lang="ko"> 피렌체 </gn:alternateName>
<gn:alternateName xml:lang="ja"> フィレンツェ </gn:alternateName>
<gn:alternateName xml:lang="th">ฟลอเรนซ์</gn:alternateName>
<gn:alternateName xml:lang="bo">ཧྥུ་ལོ་རོན་ཟིའུ་ཡ།</gn:alternateName>
<gn:alternateName xml:lang="cy">Fflorens</gn:alternateName>
<gn:alternateName xml:lang="bs">Firenca</gn:alternateName>
<gn:alternateName xml:lang="hbs">Firenca</gn:alternateName>
<gn:alternateName xml:lang="hr">Firenca</gn:alternateName>
<gn:alternateName xml:lang="sq">Firenca</gn:alternateName>
<gn:alternateName xml:lang="pl">Firence</gn:alternateName>
<gn:alternateName xml:lang="sl">Firence</gn:alternateName>
<gn:alternateName xml:lang="lij">Firense</gn:alternateName>
<gn:population>371517</gn:population>
<wgs84_pos:lat>43.76667</wgs84_pos:lat>
<wgs84_pos:long>11.25</wgs84_pos:long>
http://blogs.getty.edu/iris/art-
architecture-thesaurus-now-
available-as-linked-open-data/
Multilingual search
'uurglazen' in Italy
http://europeana.eu/portal/search.html?
query=uurglazen&rows=96&qf=COUNTRY%3Aitaly
Vocabularies currently provided to Europeana
Europeana metadata enrichment
Enrichment types and vocabularies
Enrichment
Type
Target
vocabulary
Source metadata
fields
Number of
enriched
objects
Places GeoNames dcterms:spatial,
dc:coverage
7M
Concepts GEMET,
DBpedia,
dc:subject,
dc:type
9.2M
Agents DBpedia dc:creator,
dc:contributor
144K
Time Semium Time dc:date,
dc:coverage,
dcterms:temporal,
edm:year
10,2M
Work in progress
Entity-based search and browsing
Annotation
Pundit @ DM2E project http://dm2e.eu
Europeana Channels
Semantic auto-completion
Not only end-user facing functions
Data must be accessible
(Unified) APIs, Linked Data
Data re-users should be able to provide enhanced services
to their audience easily, especially in digital humanities
Specific collection and application needs cannot rely on a handful of
generic vocabularies
Work needed
Vocabulary management and publication
 Europeana developed its own WWI vocabulary based on a
subset of LCSH
Terms translated in 10 languages and linked to id.loc.gov
Vocabulary services
http://data.europeana.eu/concept/loc/sh85148236
Representing finer-grained semantics
 More precise relationships and formal semantics
 For query expansion or data validation
 E.g. ISO 25964 and Getty SKOS extensions
Representing finer-grained semantics
Depth level, concept associations
XKOS
Pre-coordinated strings
MADS/RDF
Representing finer-grained semantics?
Finer-grained semantics can be useful, but core models
are key
They are what most people will start using
The need for alignment / co-reference /
reconciliation
KOS 1:
animals
cats
wildcats
KOS 2:
animal
human
object
A lot of work (being) done
 A long line of work in the KOS community: DESIRE,
CARMEN, Renardus, LIMBER, HILT, MSAC, MACS,
Crisscross, KoMoHe, FAO…
 Continued in Linked data context: Pleiades, Wikidata…
MACS: 120K links between Library of Congress Subject Headings (LCSH),
RAMEAU, Schlagwortnormdatei (SWD)
Semantic mismatches
Irish vocabulary
From: Runar Bergheim
Norwegian
vocabulary
skos:exactMatch
Requires flexible approaches
 AMALGAME/CultuurLink:
http://semanticweb.cs.vu.nl/amalgame/
http://cultuurlink.beeldengeluid.nl/
Finding and re-using vocabularies
Well-known or new vocabularies
Wikidata, VIAF, Geonames, Pleiades, DBpedia, LCSH…
Data repositories and inventories
The Data Hub
Vocabulary selection criteria
 Available in technically appropriate way
 Well-maintained
 Documented (including metadata)
 Well-connected, e.g. equivalent elements in other
vocabularies are indicated
 Multilingual
 Open
• license stacking hampers re-use
Quality assessment?
Cf. Data on the Web Best Practices
http://www.w3.org/TR/2015/WD-dwbp-20150625/#dataVocabularies
Take-home messages
Efforts across the whole ecosystem
Publishers of vocabularies, Providers of object data, Application
developers, Researchers…
Requires to get very different steps right
Implementing standards for data exchange
Design consuming applications
Not only technical: encouraging open data!
Thank you!
Antoine Isaac
antoine.isaac@europeana.eu

Classification schemes, thesauri and other Knowledge Organization Systems - a Linked Data perspective - Linked pasts 15

  • 1.
    Classification schemes, thesauriand other Knowledge Organization Systems - a Linked Data perspective Antoine Isaac Pelagios: Linked Pasts London, July 20-21, 2015
  • 2.
    Classification schemes? Scope: knowledgeorganization systems (KOS) such as classification systems, thesauri, gazetteers, subject heading lists… (last-minute addition: also time periods, cf. PeriodO  )
  • 6.
    Simple Knowledge OrganizationSystem SKOS is for exchanging KOSs as Linked Data (in RDF) • Better than semi-structured data (CSV) • Still relatively simple
  • 7.
    A SKOS graph animals cats UFdomestic cats RT wildcats BT animals SN used only for domestic cats domestic cats USE cats wildcats
  • 8.
    Representing semantics The formalway: OWL Semantic Web ontology language Used for ontologies that enable machine reasoning Mother is a class Parent is the class of entities of type Person that are related to at least one other resource of type Person using the child property …
  • 9.
    Do we wantto represent every vocabulary as a formal ontology? It is possible, but not easy  KOS are large  KOS have softer “semantics” Parent RelatedTerm Child  KOS have a focus on terminological information Child UsedFor Offspring Softer semantics can be useful for many applications!
  • 10.
    Europeana and knowledgeorganisation systems  Create a “semantic layer” on top of cultural heritage objects From: Stefan Gradmann
  • 11.
    Using KOS inthe Europeana Data Model
  • 12.
  • 13.
    Using KOS LinkedData <skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2251"> <skos:prefLabel xml:lang="">Harpsichord</skos:prefLabel> <skos:prefLabel xml:lang="de">Cembalo</skos:prefLabel> <skos:prefLabel xml:lang="sv">Cembalo</skos:prefLabel> <skos:prefLabel xml:lang="fr">Clavecin</skos:prefLabel> <skos:prefLabel xml:lang="it">Clavicembalo</skos:prefLabel> <skos:prefLabel xml:lang="en">Harpsichord</skos:prefLabel> <skos:prefLabel xml:lang="nl">Klavecimbel</skos:prefLabel> <skos:broader> <skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2239"> <skos:prefLabel>Harpsichords</skos:prefLabel> </skos:Concept> </skos:broader> </skos:Concept>
  • 14.
    Other types ofcontextual resources <gn:Feature rdf:about="http://sws.geonames.org/3176959/"> <gn:name>Florence</gn:name> <gn:alternateName xml:lang="ko"> 피렌체 </gn:alternateName> <gn:alternateName xml:lang="ja"> フィレンツェ </gn:alternateName> <gn:alternateName xml:lang="th">ฟลอเรนซ์</gn:alternateName> <gn:alternateName xml:lang="bo">ཧྥུ་ལོ་རོན་ཟིའུ་ཡ།</gn:alternateName> <gn:alternateName xml:lang="cy">Fflorens</gn:alternateName> <gn:alternateName xml:lang="bs">Firenca</gn:alternateName> <gn:alternateName xml:lang="hbs">Firenca</gn:alternateName> <gn:alternateName xml:lang="hr">Firenca</gn:alternateName> <gn:alternateName xml:lang="sq">Firenca</gn:alternateName> <gn:alternateName xml:lang="pl">Firence</gn:alternateName> <gn:alternateName xml:lang="sl">Firence</gn:alternateName> <gn:alternateName xml:lang="lij">Firense</gn:alternateName> <gn:population>371517</gn:population> <wgs84_pos:lat>43.76667</wgs84_pos:lat> <wgs84_pos:long>11.25</wgs84_pos:long>
  • 15.
  • 16.
    Multilingual search 'uurglazen' inItaly http://europeana.eu/portal/search.html? query=uurglazen&rows=96&qf=COUNTRY%3Aitaly
  • 19.
  • 20.
  • 21.
    Enrichment types andvocabularies Enrichment Type Target vocabulary Source metadata fields Number of enriched objects Places GeoNames dcterms:spatial, dc:coverage 7M Concepts GEMET, DBpedia, dc:subject, dc:type 9.2M Agents DBpedia dc:creator, dc:contributor 144K Time Semium Time dc:date, dc:coverage, dcterms:temporal, edm:year 10,2M
  • 22.
    Work in progress Entity-basedsearch and browsing Annotation Pundit @ DM2E project http://dm2e.eu Europeana Channels Semantic auto-completion
  • 23.
    Not only end-userfacing functions Data must be accessible (Unified) APIs, Linked Data Data re-users should be able to provide enhanced services to their audience easily, especially in digital humanities Specific collection and application needs cannot rely on a handful of generic vocabularies
  • 24.
  • 25.
    Vocabulary management andpublication  Europeana developed its own WWI vocabulary based on a subset of LCSH Terms translated in 10 languages and linked to id.loc.gov
  • 26.
  • 27.
    Representing finer-grained semantics More precise relationships and formal semantics  For query expansion or data validation  E.g. ISO 25964 and Getty SKOS extensions
  • 28.
    Representing finer-grained semantics Depthlevel, concept associations XKOS Pre-coordinated strings MADS/RDF
  • 29.
    Representing finer-grained semantics? Finer-grainedsemantics can be useful, but core models are key They are what most people will start using
  • 30.
    The need foralignment / co-reference / reconciliation KOS 1: animals cats wildcats KOS 2: animal human object
  • 31.
    A lot ofwork (being) done  A long line of work in the KOS community: DESIRE, CARMEN, Renardus, LIMBER, HILT, MSAC, MACS, Crisscross, KoMoHe, FAO…  Continued in Linked data context: Pleiades, Wikidata… MACS: 120K links between Library of Congress Subject Headings (LCSH), RAMEAU, Schlagwortnormdatei (SWD)
  • 32.
    Semantic mismatches Irish vocabulary From:Runar Bergheim Norwegian vocabulary skos:exactMatch
  • 33.
    Requires flexible approaches AMALGAME/CultuurLink: http://semanticweb.cs.vu.nl/amalgame/ http://cultuurlink.beeldengeluid.nl/
  • 34.
    Finding and re-usingvocabularies Well-known or new vocabularies Wikidata, VIAF, Geonames, Pleiades, DBpedia, LCSH… Data repositories and inventories The Data Hub
  • 35.
    Vocabulary selection criteria Available in technically appropriate way  Well-maintained  Documented (including metadata)  Well-connected, e.g. equivalent elements in other vocabularies are indicated  Multilingual  Open • license stacking hampers re-use Quality assessment? Cf. Data on the Web Best Practices http://www.w3.org/TR/2015/WD-dwbp-20150625/#dataVocabularies
  • 36.
    Take-home messages Efforts acrossthe whole ecosystem Publishers of vocabularies, Providers of object data, Application developers, Researchers… Requires to get very different steps right Implementing standards for data exchange Design consuming applications Not only technical: encouraging open data!
  • 37.

Editor's Notes

  • #4 http://www.mimo-db.eu/InstrumentsKeywords/2251
  • #5 http://www.getty.edu/vow/AATFullDisplay?find=hourglasses&amp;logic=AND&amp;note=&amp;english=N&amp;prev_page=1&amp;subjectid=300198626
  • #6 http://sws.geonames.org/3176959/
  • #7 http://www.w3.org/2004/02/skos/
  • #12 View the object at: http://www.europeana.eu/portal/record/09102/_CM_0161930.html
  • #17 The labels are also stored in our database for better search. -&amp;gt; Search &amp;apos;uurglazen&amp;apos; in Italy http://europeana.eu/portal/search.html?query=uurglazen&amp;rows=96&amp;qf=COUNTRY%3Aitaly -&amp;gt; back to http://europeana.eu/portal/record/02301/urn_imss_instrument_401058.html Show that the Dutch label for Hourglasses was not in the original data. But it&amp;apos;s in the auto-generated tags below.
  • #20 Two categories: Global Produced by projects See list on the wiki
  • #26 People may ask why we&amp;apos;ve not just re-used the LCSH URIs and added translation data to them. Response will be &amp;quot;so as to obey the principle of not re-defining others&amp;apos; data&amp;quot;
  • #32 Note: the last line is redundant with previous slides
  • #33 Alignment: 2 vocabularies describing the same concept can be aligned via the concept...
  • #36 In the linked environment, enrichment often refers to adding new information at the semantic level to the data about certain resources. It is the creation of new links between the enriched resources and another data resource, such as controlled vocabularies and authority files. The goal is contexualization of metadata and embedding the resoucrs in context outside the scope of the platform