1. eContentplus
Roxanne Wyns
Royal Museums of Art and History
Thesauri and the Semantic Web
Brussels
16th of December 2009
Brussels, 16/12/2009 1
2. Thesauri and the Semantic Web
The digitalisation of cultural heritage collections is a priority task these
days. It has become an important part of the core business of collection
management and helps to achieve the primary and secondary goals of a
cultural institution:
- To register its collections (inventory)
- To collect and provide scientific and documentary information on its collection
- To provide access to its collections for the scientific research and for the general
public
More and more institutions provide access to their growing digital
collections in an online environment:
- Through their own web portal
- Through national partnership portals (E.g.: Vlaamse Kunstcollectie, ErfgoedPlus)
- Through the EUROPEANA portal
Brussels, 16/12/2009
Brussels, 16/12/2009 2 2
3. Thesauri and the Semantic Web
Does this mean that someone interested in these digital collections can
now find all information easily on the World Wide Web??
Problems
- A search for information on the web often requires some knowledge on the
subject and an interpretation on the search results to get to more results
- A full text search does not take into account different spellings, synonyms, etc…
- Sometimes it is impossible to know which term an author used to describe the
object(s) you are searching for, or even whether he has used a term in the same
meaning as his colleague
- But the biggest problem when searching for meaningful result on the Web might
be the multilingual world we live in…
Brussels, 16/12/2009
Brussels, 16/12/2009 3 3
4. Thesauri and the Semantic Web
When you take all of these problems into account, it becomes almost
impossible to find meaningful, correct or complete results on your search
Brussels, 16/12/2009
Brussels, 16/12/2009 4 4
5. Thesauri and the Semantic Web
Perhaps a better example
When you search on Google for
• Painter Domenikos Theotocopoulos = “El Greco” (nickname)
• Some indexers use “El Greco”, others “D. Theotocopoulos”
• Searching for “El Greco” does not give all results
Brussels, 16/12/2009
Brussels, 16/12/2009 5 5
6. Thesauri and the Semantic Web
Brussels, 16/12/2009
Brussels, 16/12/2009 6 6
7. Thesauri and the Semantic Web
Brussels, 16/12/2009
Brussels, 16/12/2009 7 7
8. Thesauri and the Semantic Web
Brussels, 16/12/2009
Brussels, 16/12/2009 8 8
9. Thesauri and the Semantic Web
Brussels, 16/12/2009
Brussels, 16/12/2009 9 9
10. Thesauri and the Semantic Web
Solution
Providing semantic relations between concepts with different lexical labels
Brussels, 16/12/2009
Brussels, 16/12/2009 10 10
11. Thesauri and the Semantic Web
The Semantic Web: The solution for sharing and retrieving relevant data
on the Web
Searching information often requires to combine data on the Web (e.g.
searches in different digital libraries)
Humans see the context of the data and are able to combine
information easily, even if different terminologies are used
However: machines are ignorant
- partial information is unusable
- difficult to make sense from, e.g., an image
- difficult to combine information
Only if we formulate the conceptual meaning of the data in such way, a
machine is able to read and interpret it.
Brussels, 16/12/2009
Brussels, 16/12/2009 11 11
12. Thesauri and the Semantic Web
So to support exchange of data on the web, we need a simple language for
expressing information in machine-understandable way
To combine different datasets:
- of different origin somewhere on the web
- of different formats (mysql, excel sheet, XHTML, etc)
- with different names for relations (e.g., multilingual)
The principle of the semantic web is the use of ontologies.
An ontology is a formal representation of a set of concepts within a domain
and the relationships between those concepts. It is used to reason about
the properties of that domain, and may be used to define the domain.
An ontology aims to capture consensual knowledge, to reuse and share
across software applications and by groups of people.
Brussels, 16/12/2009
Brussels, 16/12/2009 12 12
13. Thesauri and the Semantic Web
The W3C World Wide Web Consortium provides technologies to make data
integration possible
In short:
The Semantic Web “layer cake"”
Semantic Web is ...
a metadata based infrastructure for
reasoning on the Web
an extension, not a replacement of the
current web
Metadata
“ machine understandable” information
shared vocabularies (ontologies)
a shared data model
Technological standards
RDF, OWL, SKOS,…
…just a technical aspect
Brussels, 16/12/2009
Brussels, 16/12/2009 13 13
14. Thesauri and the Semantic Web
A real Semantic Web like the so called Linking Open Data-cloud (LOD –
http://linkeddata.org/) where all data on the web would we linked with
each other is still far away.
Brussels, 16/12/2009
Brussels, 16/12/2009 14 14
15. Thesauri and the Semantic Web
But on a smaller scale, there are some interesting examples which show
the possibility of the semantic web technologies to enrich cultural
heritage data
Semantics in Europeana v1.0
Europeana Thought lab = Task of EuropeanaConnect Work Package 1 & 2
Goals:
- Making Europeana a network of interoperating and aggregated surrogates that
enables semantics based objects discovery and use
- Make Europeana talk European:
• Multilingual search and multilingual browsing
• Core language set: English, French, German, Italian, Spanish
• Secondary language set: Dutch, Hungarian, Polish, Portugese, Swedish
Europeana Thought lab online: http://europeana.eu/portal/thought-lab.html
Contains data of: Rijksmuseum Amsterdam, Musée du Louvre, Rijksbureau voor
Kunsthistorische Documentatie
Brussels, 16/12/2009
Brussels, 16/12/2009 15 15
16. Thesauri and the Semantic Web
Europeana Thought lab online:
http://europeana.eu/portal/thought-lab.html
Brussels, 16/12/2009
Brussels, 16/12/2009 16 16
17. Thesauri and the Semantic Web
Semantic auto-completion
Brussels, 16/12/2009
Brussels, 16/12/2009 17 17
18. Thesauri and the Semantic Web
Clustering of results
Brussels, 16/12/2009
Brussels, 16/12/2009 18 18
19. Thesauri and the Semantic Web
Matching concepts’ labels
Brussels, 16/12/2009
Brussels, 16/12/2009 19 19
20. Thesauri and the Semantic Web
A concept more specific than Egypte
Brussels, 16/12/2009
Brussels, 16/12/2009 20 20
21. Thesauri and the Semantic Web
A concept more specific than Egypte
Brussels, 16/12/2009
Brussels, 16/12/2009 21 21
22. Thesauri and the Semantic Web
Following other relations - creator
Brussels, 16/12/2009
Brussels, 16/12/2009 22 22
23. Thesauri and the Semantic Web
Following other relations – creator death place
Brussels, 16/12/2009
Brussels, 16/12/2009 23 23
24. Thesauri and the Semantic Web
Following other relations – creator death place
Brussels, 16/12/2009
Brussels, 16/12/2009 24 24
25. Thesauri and the Semantic Web
Enabling technologies (developed by the W3C) to achieve this semantic
operability are:
• RDF
RDF is a universal language to describe the characteristics of resource on the web
using a Subject-Predicate-Object structure (s-p-o triples). RDF triples provides a
labelled connection between resources using URI-s to make it possible to link (via
properties) data with one another.
An example of a “subject", "predicate", "object“ s-p-o triples:
Subject Predicate Object
Leonardo authorOf Gioconda
Cimabue masterOf Giotto
In this way a machine is able to find the semantic relations between data. As a
result, new relations can be found and retrieved when searching a semantic web
database.
Brussels, 16/12/2009
Brussels, 16/12/2009 25 25
26. Thesauri and the Semantic Web
• OWL (Web Ontology Language) provides a more expressive language to enhance
the exchange of information
An example in OWL:
The statement
The painting of the Sistine Chapel was carried out by Michelangelo Buonarroti
Abstracting from the statement
The painting of the Sistine Chapel (the subject) is an (instance of) activity
carried out by is a predicate
Michelangelo Buonarroti is an (instance of) Person
In OWL (conceptually)
the paintingOfSistineChapel (E7.Activity) was carried_out_by (P14F)
MichelangeloBuonarroti (E21.Person)
In OWL (graphically)
paintingOfSistineChapel
carried_out_by
MichelangeloBuonarroti
But for the semantical representation of taxonomies, thesauri and conceptual
schema’s, a simpler formel language will do…
Brussels, 16/12/2009
Brussels, 16/12/2009 26 26
27. Thesauri and the Semantic Web
All of them play their role, but SKOS might be the most understandable
and the most useful technology for semantic alignment and
correspondences between large vocabularies in a multilingual context.
• SKOS stands for Simple Knowledge Organisation System
– it provides properties for semantic mappings between concepts of different
controlled vocabularies
– it’s an application of RDF
Brussels, 16/12/2009
Brussels, 16/12/2009 27 27
28. Thesauri and the Semantic Web
A short introduction to SKOS
SKOS is a family of formal languages designed for representation of thesauri,
taxonomies, subject-heading systems, or any other type of structured controlled
vocabulary.
It’s main objective is to enable easy publication and connecting of controlled
structured vocabularies for the Semantic Web.
It’s important to know that SKOS only provides the structure and the technology to
connect data coming from different sources. Defining the semantic relations
between data is still a manual work that often requires a degree of expertise in
the domain of the terminology.
The process of semantically connecting data coming from different authority files
like thesauri is called ‘mapping’.
Brussels, 16/12/2009
Brussels, 16/12/2009 28 28
29. Thesauri and the Semantic Web
SKOS Core
It defines the classes and properties sufficient to represent the common features
found in a standard thesaurus. It is based on a concept-centric view of the
vocabulary, where primitive objects are not terms, but abstract concepts
represented by terms.
Components
• Concepts: Concepts can be organized in hierarchies using broader-narrower
relationships, or linked by non hierarchical (associative) relationships.
• Uses URIs for pointing (identifying) concepts
• Labelled with lexical strings in one or more natural languages (for creating
multi-lingual thesauri)
• Documented with various types of note
• Semantically related to each other in informal hierarchies and association
networks ( -> Semantic Web)
• Aggregated into concept schemes = A set of concepts, optionally including
statements about semantic relationships between those concepts.
Brussels, 16/12/2009
Brussels, 16/12/2009 29 29
30. Thesauri and the Semantic Web
Semantic relations within a monolingual thesaurus
Relationship Abbreviation English
BT Broader Term
Hierarchical
NT Narrower Term
Associative RT Related Term
USE Use (Preferred Term)
Equivalence
UF Used For (Non-Preferred Term)
Definition SN Scope Note
Brussels, 16/12/2009
Brussels, 16/12/2009 30 30
31. Thesauri and the Semantic Web
Hierarchical relations between terms
- BT: Broader Term
- NT: Narrower Term
- TT: Top Term
Example:
- Container (TT)
> Barrel (NT of Container)
> Coffin (NT of Container)
> Vessel (NT of Container – BT of Bucket, Pot,…)
>> Bucket (NT of Vessel)
>> Pot (NT of Vessel – BT of Chamber pot)
>>> Chamber pot (NT of Pot)
Some terms can logically belong to more than one broader category. If the
thesaurus allows a term to have more than one broader term it is said to be
polyhierarcical: e.g.Organ: BT keyboard instrument; wind instrument
Brussels, 16/12/2009
Brussels, 16/12/2009 31 31
32. Thesauri and the Semantic Web
British museum object names thesaurus:
http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm
>> Barrel and Vessel are the NT of Container
Brussels, 16/12/2009
Brussels, 16/12/2009 32 32
33. Thesauri and the Semantic Web
British museum object names thesaurus:
http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm
>> Vessel is the NT of Container
>> Container is the BT of Vessel
>> Pot is the NT of Vessel
Brussels, 16/12/2009
Brussels, 16/12/2009 33 33
34. Thesauri and the Semantic Web
British museum object names thesaurus:
http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm
>> Chamber-Pot is the NT of Pot
>> Pot is the NT of Vessel
Brussels, 16/12/2009
Brussels, 16/12/2009 34 34
35. Thesauri and the Semantic Web
Associative relations between terms
- RT: Related term
Example:
- Chamber pot (NT of Pot)
RT: • Bed pan
• Latrine
• Urinal
…
The associate relationship provides a way of linking terms which do not have a
genuine hierarchical connection and consequently fail to qualify as
broader/narrower terms
Brussels, 16/12/2009
Brussels, 16/12/2009 35 35
36. Thesauri and the Semantic Web
>> Bed-Pan is a RT of Chamber-Pot
Brussels, 16/12/2009
Brussels, 16/12/2009 36 36
37. Thesauri and the Semantic Web
Equivalence terms
- USE: Use or PT (Preferred term) = used as an index heading
- UF: Used For or NP (Non-Preferred Term) = a cross reference to the
equivalent preferred term
Preferred term = Standard / Indexing term
Non-Preferred term = synonyms, different spellings, to help find the
preffered term
There should be sufficient entry terms to ensure that the user will be
quickly directed to the correct preferred term whichever word they think
of initially
Example:
- Food-vessel (NT) USE Vessel (PT)
- Figurine USE Statuette
Brussels, 16/12/2009
Brussels, 16/12/2009 37 37
38. Thesauri and the Semantic Web
>> Food-Vessel USE Vessel
>> Vessel UF Food-Vessel
Brussels, 16/12/2009
Brussels, 16/12/2009 38 38
39. Thesauri and the Semantic Web
1. Equivalence
The diagram implies equivalent sets.
Circle A and B overlap.
Example: A=B
ancient monuments (A) USE monuments (B)
monuments (B) UF ancient monuments (A)
2. Hierarchical
The diagram implies class inclusion
Example: B A
mammals (B) NT dogs (A)
3. Associative
The diagram implies semantic overlap,
ie. there is and element of meaning
common to both terms A B
Example:
gold RT money
Brussels, 16/12/2009
Brussels, 16/12/2009 39 39
40. Thesauri and the Semantic Web
Scope notes: SN
Sometimes the meaning of a term is not obvious. That’s where the
importance of a Scope Note comes in:
A scope note:
- gives a definition or explanation about the meaning of a term
- gives an indication of what the term covers
- refers to related terms, synonyms,…
- must be relevant as an indexing/search term
Example:
Shoe:
SN: Outer foot covering not reaching above the ankle. Includes additional footwear
worn over normal outer foot covering such as overshoes. For devices to raise the
foot clear of the mud, etc. see 'patten'.
Brussels, 16/12/2009
Brussels, 16/12/2009 40 40
41. Thesauri and the Semantic Web
TGN:http://www.getty.edu/research/conducting_research/vocabularies/tgn/
Brussels, 16/12/2009
Brussels, 16/12/2009 41 41
42. Thesauri and the Semantic Web
Why Scope Notes are so important
- Homographs: are words that are spelled the same yet have different meaning.
> Example: The French term ‘Bois’ has two meanings, both ‘Wood’ and ‘Antlers’
- Appearance of the same term more than ones in the thesaurus.
Example:
Animal > Antler > Antelope
Animal > Skin > Antelope
(French) Animal > Bois
Fossile > Bois (could both be Fossil wood or Fossil antlers)
Végetal > Bois
Although the place of the term in the thesaurus indicates its meanig most of the
time, it is best to provide a Scope Note. Especially when the thesaurus is being
used in a multilingual environment. There is for example also a Brussels in
Wisconsin (USA).
Brussels, 16/12/2009
Brussels, 16/12/2009 42 42
43. Thesauri and the Semantic Web
A good thesaurus should use Scope Notes to define its tems to prevent wrong
interpretation and use of a term.
These principles can be useful in any thesaurus and, whether it is a SKOSified
thesaurus or not. It just makes it possible to structure your data and by doing so
getting better result when searching your database for relevant information.
Brussels, 16/12/2009
Brussels, 16/12/2009 43 43
44. Thesauri and the Semantic Web
Conceptually inter-connecting multiple authority files and the creation
multilinguistic thesauri
SKOS provides the possibility to connect different thesauri in an online
environment. It is the perfect tool for the creation of multilingual thesauri.
When semantically connecting different multilingual terminologies to each other,
it is of even greater importance to know the exact meaning and covering of the
term. So when creating a multilingual thesaurus, the Scope Notes should be
translated as well as the terms!
Another necessity is to define the degree of the match of the term to its
equivalent in another language. Two concepts are equivalent if we can fit them in
the same place of a semantic network, but an exact match isn’t always possible.
Brussels, 16/12/2009
Brussels, 16/12/2009 44 44
45. Thesauri and the Semantic Web
Multilingual equivalencies Source language Target language
1 - Exact Equivalence (=)
Where the target language contains a term which is:
a) identical in meaning and scope to the term in the source language
b) capable of functioning as a preferred term
Example: adminstration = administración
2- Inexact Equivalence ( ≅ )
A term in the target language expresses the same general concept as the source language
term, although the meaning of these terms are not precisely identical
Example: crown property ≅ patrimonio nacional
3 - Single to Multiple (A=B+C)
The term in the source language cannot be matched by an exactly equivalent term in the
target language, but the concept to which the source language term refers can be expressed
by a combination of two or more existing preferred terms in the target language.
Example: listed building (source) = édifice inscrit + édifice classé (target)
4 - Non-equivalence
The target language does not contain a term which corresponds in meaning, either partially
or inexactly, to the source language term. In this case the term from the source language
can be:
a) taken as a loan term: Example: affectataires_FR (source) affectataires_EN (target) OR
b) translated from the original language: Example: patrimoine pariétal (source) parietal
heritage (target)
Brussels, 16/12/2009
Brussels, 16/12/2009 45 45
46. Thesauri and the Semantic Web
Mapping to SKOS
• skos:broadMatch and skos:narrowMatch used to state a hierarchical
mapping link between two concepts.
• skos:relatedMatch is used to state an associative mapping link between
two concepts.
• skos:closeMatch and skos:exactMatch are used to assert that two
concepts have a similar meaning
• skos:closeMatch is used to link two concepts that are sufficiently
similar
Brussels, 16/12/2009
Brussels, 16/12/2009 46 46
47. Thesauri and the Semantic Web
Brussels, 16/12/2009
Brussels, 16/12/2009 47 47
49. Thesauri and the Semantic Web
Some well known SKOSified thesauri
• ICONCLASS (iconographic description)
http://www.iconclass.org/
• Getty Arts and Architecture Thesaurus (AAT)
http://www.getty.edu/research/conducting_research/vocabularies/aat/
• Getty Union List of Artist (ULAN)
http://www.getty.edu/research/conducting_research/vocabularies/ulan/
• Getty Thesaurus of Geographical Names (TGN)
http://www.getty.edu/research/conducting_research/vocabularies/tgn/
• The UNESCO thesaurus
http://www2.ulcc.ac.uk/unesco/
• Library of Congress Subject Headings (LCSH)
Brussels, 16/12/2009
Brussels, 16/12/2009 49 49
50. Thesauri and the Semantic Web
Conclusion
However powerful the software, it can only be as good as the underlying
metadata and thesaurus structure. Computers can take a lot of effort out
of compiling, maintaining and using the database, but they cannot make
the intellectual decisions which are needed to function effectively.
Without standardisation in your own collection management database,
this next step of making digital cultural heritage information on the web
more accessible will never be reached.
And remember that by improving your collection management database,
you also improve your own search results;-)
Thank you for your attention
For more information: r.wyns@kmkg.be
Brussels, 16/12/2009
Brussels, 16/12/2009 50 50
51. Thesauri and the Semantic Web
Documentation
W3C Semantic Web activity on SKOS: http://www.w3.org/2004/02/skos/
Athena website – WP4 SKOS workshop Rome 16-07-2009:
http://www.athenaeurope.org/
Collections Trust: Guidelines for Constructing a Museum Object Name Thesaurus
http://www.collectionstrust.org.uk/spectrum-terminology/holm#What
Introductory Tutorial on Thesaurus Construction: Univ. of Western Ontario
http://publish.uwo.ca/~craven/677/thesaur/main00.htm
Standard guide to establisment and development of monoloigical theasuri (BS
5723) (British Standards Institution, 1987) and the virtually identical ISO 2788.
Wikipedia:
http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System
http://en.wikipedia.org/wiki/Ontology_(information_science)
Brussels, 16/12/2009
Brussels, 16/12/2009 51 51