Thesauri and the Semantic Web

Thesauri and the Semantic Web



Roxanne Wyns

Roxanne Wyns
Europeana Meeting, Belgium
16 December 2009



    • eContentplus Roxanne Wyns Royal Museums of Art and History Thesauri and the Semantic Web Brussels 16th of December 2009Brussels, 16/12/2009 1
    • Thesauri and the Semantic Web The digitalisation of cultural heritage collections is a priority task these days. It has become an important part of the core business of collection management and helps to achieve the primary and secondary goals of a cultural institution: - To register its collections (inventory) - To collect and provide scientific and documentary information on its collection - To provide access to its collections for the scientific research and for the general public More and more institutions provide access to their growing digital collections in an online environment: - Through their own web portal - Through national partnership portals (E.g.: Vlaamse Kunstcollectie, ErfgoedPlus) - Through the EUROPEANA portalBrussels, 16/12/2009 Brussels, 16/12/2009 2 2
    • Thesauri and the Semantic Web Does this mean that someone interested in these digital collections can now find all information easily on the World Wide Web?? Problems - A search for information on the web often requires some knowledge on the subject and an interpretation on the search results to get to more results - A full text search does not take into account different spellings, synonyms, etc… - Sometimes it is impossible to know which term an author used to describe the object(s) you are searching for, or even whether he has used a term in the same meaning as his colleague - But the biggest problem when searching for meaningful result on the Web might be the multilingual world we live in…Brussels, 16/12/2009 Brussels, 16/12/2009 3 3
    • Thesauri and the Semantic Web When you take all of these problems into account, it becomes almost impossible to find meaningful, correct or complete results on your searchBrussels, 16/12/2009 Brussels, 16/12/2009 4 4
    • Thesauri and the Semantic Web Perhaps a better example When you search on Google for • Painter Domenikos Theotocopoulos = “El Greco” (nickname) • Some indexers use “El Greco”, others “D. Theotocopoulos” • Searching for “El Greco” does not give all resultsBrussels, 16/12/2009 Brussels, 16/12/2009 5 5
    • Thesauri and the Semantic Web Solution Providing semantic relations between concepts with different lexical labelsBrussels, 16/12/2009 Brussels, 16/12/2009 10 10
    • Thesauri and the Semantic Web The Semantic Web: The solution for sharing and retrieving relevant data on the Web Searching information often requires to combine data on the Web (e.g. searches in different digital libraries) Humans see the context of the data and are able to combine information easily, even if different terminologies are used However: machines are ignorant - partial information is unusable - difficult to make sense from, e.g., an image - difficult to combine information Only if we formulate the conceptual meaning of the data in such way, a machine is able to read and interpret it.Brussels, 16/12/2009 Brussels, 16/12/2009 11 11
    • Thesauri and the Semantic Web So to support exchange of data on the web, we need a simple language for expressing information in machine-understandable way To combine different datasets: - of different origin somewhere on the web - of different formats (mysql, excel sheet, XHTML, etc) - with different names for relations (e.g., multilingual) The principle of the semantic web is the use of ontologies. An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain. An ontology aims to capture consensual knowledge, to reuse and share across software applications and by groups of people.Brussels, 16/12/2009 Brussels, 16/12/2009 12 12
    • Thesauri and the Semantic Web The W3C World Wide Web Consortium provides technologies to make data integration possible In short: The Semantic Web “layer cake"” Semantic Web is ... a metadata based infrastructure for reasoning on the Web an extension, not a replacement of the current web Metadata “ machine understandable” information shared vocabularies (ontologies) a shared data model Technological standards RDF, OWL, SKOS,… …just a technical aspectBrussels, 16/12/2009 Brussels, 16/12/2009 13 13
    • Thesauri and the Semantic Web A real Semantic Web like the so called Linking Open Data-cloud (LOD – http://linkeddata.org/) where all data on the web would we linked with each other is still far away.Brussels, 16/12/2009 Brussels, 16/12/2009 14 14
    • Thesauri and the Semantic Web But on a smaller scale, there are some interesting examples which show the possibility of the semantic web technologies to enrich cultural heritage data Semantics in Europeana v1.0 Europeana Thought lab = Task of EuropeanaConnect Work Package 1 & 2 Goals: - Making Europeana a network of interoperating and aggregated surrogates that enables semantics based objects discovery and use - Make Europeana talk European: • Multilingual search and multilingual browsing • Core language set: English, French, German, Italian, Spanish • Secondary language set: Dutch, Hungarian, Polish, Portugese, Swedish Europeana Thought lab online: http://europeana.eu/portal/thought-lab.html Contains data of: Rijksmuseum Amsterdam, Musée du Louvre, Rijksbureau voor Kunsthistorische DocumentatieBrussels, 16/12/2009 Brussels, 16/12/2009 15 15
    • Thesauri and the Semantic Web Europeana Thought lab online: http://europeana.eu/portal/thought-lab.htmlBrussels, 16/12/2009 Brussels, 16/12/2009 16 16
    • Thesauri and the Semantic Web Semantic auto-completionBrussels, 16/12/2009 Brussels, 16/12/2009 17 17
    • Thesauri and the Semantic Web Clustering of resultsBrussels, 16/12/2009 Brussels, 16/12/2009 18 18
    • Thesauri and the Semantic Web Matching concepts’ labelsBrussels, 16/12/2009 Brussels, 16/12/2009 19 19
    • Thesauri and the Semantic Web A concept more specific than EgypteBrussels, 16/12/2009 Brussels, 16/12/2009 20 20
    • Thesauri and the Semantic Web A concept more specific than EgypteBrussels, 16/12/2009 Brussels, 16/12/2009 21 21
    • Thesauri and the Semantic Web Following other relations - creatorBrussels, 16/12/2009 Brussels, 16/12/2009 22 22
    • Thesauri and the Semantic Web Following other relations – creator death placeBrussels, 16/12/2009 Brussels, 16/12/2009 23 23
    • Thesauri and the Semantic Web Following other relations – creator death placeBrussels, 16/12/2009 Brussels, 16/12/2009 24 24
    • Thesauri and the Semantic Web Enabling technologies (developed by the W3C) to achieve this semantic operability are: • RDF RDF is a universal language to describe the characteristics of resource on the web using a Subject-Predicate-Object structure (s-p-o triples). RDF triples provides a labelled connection between resources using URI-s to make it possible to link (via properties) data with one another. An example of a “subject", "predicate", "object“ s-p-o triples: Subject Predicate Object Leonardo authorOf Gioconda Cimabue masterOf Giotto In this way a machine is able to find the semantic relations between data. As a result, new relations can be found and retrieved when searching a semantic web database.Brussels, 16/12/2009 Brussels, 16/12/2009 25 25
    • Thesauri and the Semantic Web • OWL (Web Ontology Language) provides a more expressive language to enhance the exchange of information An example in OWL: The statement The painting of the Sistine Chapel was carried out by Michelangelo Buonarroti Abstracting from the statement The painting of the Sistine Chapel (the subject) is an (instance of) activity carried out by is a predicate Michelangelo Buonarroti is an (instance of) Person In OWL (conceptually) the paintingOfSistineChapel (E7.Activity) was carried_out_by (P14F) MichelangeloBuonarroti (E21.Person) In OWL (graphically) paintingOfSistineChapel carried_out_by MichelangeloBuonarroti But for the semantical representation of taxonomies, thesauri and conceptual schema’s, a simpler formel language will do…Brussels, 16/12/2009 Brussels, 16/12/2009 26 26
    • Thesauri and the Semantic Web All of them play their role, but SKOS might be the most understandable and the most useful technology for semantic alignment and correspondences between large vocabularies in a multilingual context. • SKOS stands for Simple Knowledge Organisation System – it provides properties for semantic mappings between concepts of different controlled vocabularies – it’s an application of RDFBrussels, 16/12/2009 Brussels, 16/12/2009 27 27
    • Thesauri and the Semantic Web A short introduction to SKOS SKOS is a family of formal languages designed for representation of thesauri, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. It’s main objective is to enable easy publication and connecting of controlled structured vocabularies for the Semantic Web. It’s important to know that SKOS only provides the structure and the technology to connect data coming from different sources. Defining the semantic relations between data is still a manual work that often requires a degree of expertise in the domain of the terminology. The process of semantically connecting data coming from different authority files like thesauri is called ‘mapping’.Brussels, 16/12/2009 Brussels, 16/12/2009 28 28
    • Thesauri and the Semantic Web SKOS Core It defines the classes and properties sufficient to represent the common features found in a standard thesaurus. It is based on a concept-centric view of the vocabulary, where primitive objects are not terms, but abstract concepts represented by terms. Components • Concepts: Concepts can be organized in hierarchies using broader-narrower relationships, or linked by non hierarchical (associative) relationships. • Uses URIs for pointing (identifying) concepts • Labelled with lexical strings in one or more natural languages (for creating multi-lingual thesauri) • Documented with various types of note • Semantically related to each other in informal hierarchies and association networks ( -> Semantic Web) • Aggregated into concept schemes = A set of concepts, optionally including statements about semantic relationships between those concepts.Brussels, 16/12/2009 Brussels, 16/12/2009 29 29
    • Thesauri and the Semantic Web Semantic relations within a monolingual thesaurus Relationship Abbreviation English BT Broader Term Hierarchical NT Narrower Term Associative RT Related Term USE Use (Preferred Term) Equivalence UF Used For (Non-Preferred Term) Definition SN Scope NoteBrussels, 16/12/2009 Brussels, 16/12/2009 30 30
    • Thesauri and the Semantic Web Hierarchical relations between terms - BT: Broader Term - NT: Narrower Term - TT: Top Term Example: - Container (TT) > Barrel (NT of Container) > Coffin (NT of Container) > Vessel (NT of Container – BT of Bucket, Pot,…) >> Bucket (NT of Vessel) >> Pot (NT of Vessel – BT of Chamber pot) >>> Chamber pot (NT of Pot) Some terms can logically belong to more than one broader category. If the thesaurus allows a term to have more than one broader term it is said to be polyhierarcical: e.g.Organ: BT keyboard instrument; wind instrumentBrussels, 16/12/2009 Brussels, 16/12/2009 31 31
    • Thesauri and the Semantic Web British museum object names thesaurus: http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm >> Barrel and Vessel are the NT of ContainerBrussels, 16/12/2009 Brussels, 16/12/2009 32 32
    • Thesauri and the Semantic Web British museum object names thesaurus: http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm >> Vessel is the NT of Container >> Container is the BT of Vessel >> Pot is the NT of VesselBrussels, 16/12/2009 Brussels, 16/12/2009 33 33
    • Thesauri and the Semantic Web British museum object names thesaurus: http://www.collectionstrust.org.uk/bmobj/Obthesm3.htm >> Chamber-Pot is the NT of Pot >> Pot is the NT of VesselBrussels, 16/12/2009 Brussels, 16/12/2009 34 34
    • Thesauri and the Semantic Web Associative relations between terms - RT: Related term Example: - Chamber pot (NT of Pot) RT: • Bed pan • Latrine • Urinal … The associate relationship provides a way of linking terms which do not have a genuine hierarchical connection and consequently fail to qualify as broader/narrower termsBrussels, 16/12/2009 Brussels, 16/12/2009 35 35
    • Thesauri and the Semantic Web >> Bed-Pan is a RT of Chamber-PotBrussels, 16/12/2009 Brussels, 16/12/2009 36 36
    • Thesauri and the Semantic Web Equivalence terms - USE: Use or PT (Preferred term) = used as an index heading - UF: Used For or NP (Non-Preferred Term) = a cross reference to the equivalent preferred term Preferred term = Standard / Indexing term Non-Preferred term = synonyms, different spellings, to help find the preffered term There should be sufficient entry terms to ensure that the user will be quickly directed to the correct preferred term whichever word they think of initially Example: - Food-vessel (NT) USE Vessel (PT) - Figurine USE StatuetteBrussels, 16/12/2009 Brussels, 16/12/2009 37 37
    • Thesauri and the Semantic Web >> Food-Vessel USE Vessel >> Vessel UF Food-VesselBrussels, 16/12/2009 Brussels, 16/12/2009 38 38
    • Thesauri and the Semantic Web 1. Equivalence The diagram implies equivalent sets. Circle A and B overlap. Example: A=B ancient monuments (A) USE monuments (B) monuments (B) UF ancient monuments (A) 2. Hierarchical The diagram implies class inclusion Example: B A mammals (B) NT dogs (A) 3. Associative The diagram implies semantic overlap, ie. there is and element of meaning common to both terms A B Example: gold RT moneyBrussels, 16/12/2009 Brussels, 16/12/2009 39 39
    • Thesauri and the Semantic Web Scope notes: SN Sometimes the meaning of a term is not obvious. That’s where the importance of a Scope Note comes in: A scope note: - gives a definition or explanation about the meaning of a term - gives an indication of what the term covers - refers to related terms, synonyms,… - must be relevant as an indexing/search term Example: Shoe: SN: Outer foot covering not reaching above the ankle. Includes additional footwear worn over normal outer foot covering such as overshoes. For devices to raise the foot clear of the mud, etc. see patten.Brussels, 16/12/2009 Brussels, 16/12/2009 40 40
    • Thesauri and the Semantic Web TGN:http://www.getty.edu/research/conducting_research/vocabularies/tgn/Brussels, 16/12/2009 Brussels, 16/12/2009 41 41
    • Thesauri and the Semantic Web Why Scope Notes are so important - Homographs: are words that are spelled the same yet have different meaning. > Example: The French term ‘Bois’ has two meanings, both ‘Wood’ and ‘Antlers’ - Appearance of the same term more than ones in the thesaurus. Example: Animal > Antler > Antelope Animal > Skin > Antelope (French) Animal > Bois Fossile > Bois (could both be Fossil wood or Fossil antlers) Végetal > Bois Although the place of the term in the thesaurus indicates its meanig most of the time, it is best to provide a Scope Note. Especially when the thesaurus is being used in a multilingual environment. There is for example also a Brussels in Wisconsin (USA).Brussels, 16/12/2009 Brussels, 16/12/2009 42 42
    • Thesauri and the Semantic Web A good thesaurus should use Scope Notes to define its tems to prevent wrong interpretation and use of a term. These principles can be useful in any thesaurus and, whether it is a SKOSified thesaurus or not. It just makes it possible to structure your data and by doing so getting better result when searching your database for relevant information.Brussels, 16/12/2009 Brussels, 16/12/2009 43 43
    • Thesauri and the Semantic Web Conceptually inter-connecting multiple authority files and the creation multilinguistic thesauri SKOS provides the possibility to connect different thesauri in an online environment. It is the perfect tool for the creation of multilingual thesauri. When semantically connecting different multilingual terminologies to each other, it is of even greater importance to know the exact meaning and covering of the term. So when creating a multilingual thesaurus, the Scope Notes should be translated as well as the terms! Another necessity is to define the degree of the match of the term to its equivalent in another language. Two concepts are equivalent if we can fit them in the same place of a semantic network, but an exact match isn’t always possible.Brussels, 16/12/2009 Brussels, 16/12/2009 44 44
    • Thesauri and the Semantic Web Multilingual equivalencies Source language Target language1 - Exact Equivalence (=)Where the target language contains a term which is:a) identical in meaning and scope to the term in the source languageb) capable of functioning as a preferred termExample: adminstration = administración2- Inexact Equivalence ( ≅ )A term in the target language expresses the same general concept as the source languageterm, although the meaning of these terms are not precisely identicalExample: crown property ≅ patrimonio nacional3 - Single to Multiple (A=B+C)The term in the source language cannot be matched by an exactly equivalent term in thetarget language, but the concept to which the source language term refers can be expressedby a combination of two or more existing preferred terms in the target language.Example: listed building (source) = édifice inscrit + édifice classé (target)4 - Non-equivalenceThe target language does not contain a term which corresponds in meaning, either partiallyor inexactly, to the source language term. In this case the term from the source languagecan be:a) taken as a loan term: Example: affectataires_FR (source) affectataires_EN (target) ORb) translated from the original language: Example: patrimoine pariétal (source) parietalheritage (target)Brussels, 16/12/2009 Brussels, 16/12/2009 45 45
    • Thesauri and the Semantic Web Mapping to SKOS • skos:broadMatch and skos:narrowMatch used to state a hierarchical mapping link between two concepts. • skos:relatedMatch is used to state an associative mapping link between two concepts. • skos:closeMatch and skos:exactMatch are used to assert that two concepts have a similar meaning • skos:closeMatch is used to link two concepts that are sufficiently similarBrussels, 16/12/2009 Brussels, 16/12/2009 46 46
    • Thesauri and the Semantic Web <rdf:Description rdf:about="http://iaaa.cps.unizar.es/thesaurus/HYDROBIOLOGY"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/AQUACULTURE"/> <skos:prefLabel xml:lang="fr">HYDROBIOLOGIE</skos:prefLabel> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/ENVIRONMENTAL SCIENCES"/> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/AQUATIC PLANTS"/> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/VIRUSES"/> <skos:broader rdf:resource="http://iaaa.cps.unizar.es/thesaurus/BIOLOGY"/> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/MARINE BIOLOGY"/> <skos:inScheme rdf:resource="http://iaaa.unizar.es/thesaurus/UNESCO"/> <skos:prefLabel xml:lang="en">HYDROBIOLOGY</skos:prefLabel> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/AQUATIC ENVIRONMENT"/> <skos:narrower rdf:resource="http://iaaa.cps.unizar.es/thesaurus/LIMNOLOGY"/> <skos:prefLabel xml:lang="es">HIDROBIOLOGÍA</skos:prefLabel> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/AQUATIC ANIMALS"/> <skos:related rdf:resource="http://iaaa.cps.unizar.es/thesaurus/AQUATIC ECOSYSTEMS"/> </rdf:Description>Brussels, 16/12/2009 Brussels, 16/12/2009 48 48
    • Thesauri and the Semantic Web Some well known SKOSified thesauri • ICONCLASS (iconographic description) http://www.iconclass.org/ • Getty Arts and Architecture Thesaurus (AAT) http://www.getty.edu/research/conducting_research/vocabularies/aat/ • Getty Union List of Artist (ULAN) http://www.getty.edu/research/conducting_research/vocabularies/ulan/ • Getty Thesaurus of Geographical Names (TGN) http://www.getty.edu/research/conducting_research/vocabularies/tgn/ • The UNESCO thesaurus http://www2.ulcc.ac.uk/unesco/ • Library of Congress Subject Headings (LCSH)Brussels, 16/12/2009 Brussels, 16/12/2009 49 49
    • Thesauri and the Semantic Web Conclusion However powerful the software, it can only be as good as the underlying metadata and thesaurus structure. Computers can take a lot of effort out of compiling, maintaining and using the database, but they cannot make the intellectual decisions which are needed to function effectively. Without standardisation in your own collection management database, this next step of making digital cultural heritage information on the web more accessible will never be reached. And remember that by improving your collection management database, you also improve your own search results;-) Thank you for your attention For more information: r.wyns@kmkg.beBrussels, 16/12/2009 Brussels, 16/12/2009 50 50
    • Thesauri and the Semantic Web Documentation W3C Semantic Web activity on SKOS: http://www.w3.org/2004/02/skos/ Athena website – WP4 SKOS workshop Rome 16-07-2009: http://www.athenaeurope.org/ Collections Trust: Guidelines for Constructing a Museum Object Name Thesaurus http://www.collectionstrust.org.uk/spectrum-terminology/holm#What Introductory Tutorial on Thesaurus Construction: Univ. of Western Ontario http://publish.uwo.ca/~craven/677/thesaur/main00.htm Standard guide to establisment and development of monoloigical theasuri (BS 5723) (British Standards Institution, 1987) and the virtually identical ISO 2788. Wikipedia: http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System http://en.wikipedia.org/wiki/Ontology_(information_science)Brussels, 16/12/2009 Brussels, 16/12/2009 51 51