DE conferentie 2008 - Isaac en Wartena
Upcoming SlideShare
Loading in...5

DE conferentie 2008 - Isaac en Wartena






Total Views
Views on SlideShare
Embed Views



1 Embed 27 27



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

DE conferentie 2008 - Isaac en Wartena DE conferentie 2008 - Isaac en Wartena Presentation Transcript

  • Semantic Web Do it Yourself DE Conferentie Rotterdam, 11 december 2008Antoine Isaac en Christian Wartena
  • Preamble: who are we? Christian Wartena Telematica Institute Working on: MyMedia ( Cultuur in Context CATCH research programme (CHOICE) Antoine Isaac Working on: CATCH research programme (STITCH) European Library-related projects (TELplus) SKOS @ W3C
  • Preamble: who are you? Domain Archive? Library? Museum? Experience with SW New to the stuff? Basic knowledge? Advanced knowledge / already implemented something? Motivation Thinking about using it? Just to learn about it or what you can do with it?
  • Topics of this workshop Semantic Web in a nutshell (20’) Do it yourself (40’) Short break – Answer questions (15’) Examples from practice & Discussion (45’)
  • Topics Semantic Web in a nutshell A Web of data Smart data Do it yourself Short break – Answer questions Examples from practice & Discussion
  • The Web for humans A city The city’s location Hyperlinks anchored to words Meaning
  • SW problem: the Web for computers? Where is meaning ?
  • The Semantic Web vision: a web of (smart) data Article Document The_Netherlands subClassOf typehasCapital file1 partOf Amsterdamtype defines par3 City
  • A Web of resources theirVoc:Article myVoc:Amsterdam• Web-enabled Identifiers (URIs)• Coming from different spaces myVoc: =
  • Data in an RDF “graph”Resource Description Framework : structured data as triple statements theirVoc:Article rdf:type myVoc:Amsterdam Links coming from different spaces
  • More than traditional metadata (1) Web-based resources allow distribution/sharing/linking of documents description vocabularies (meta)data (file1, subject, Amsterdam) different owners & locations
  • Web of data example: Linking Open Data community project
  • CH case: Libris Swedish Union Catalogue as linked data
  • Linked descriptions of resources in Libris Martin Malmsten, Dublin Core 2008
  • External links in Libris: Library of CongressSubject Headings Ed Summers et. al., Dublin Core 2008
  • Searching using multiple vocabularies
  • Agenda Semantic Web in a nutshell A Web of data Smart data: the "semantics"
  • Creating vocabularies of “building blocks”for RDF graphs theirVoc:Article rdf:type myVoc:Amsterdam
  • Ontologies Ontologies specify description vocabularies which can be shared subject, Article Give formal definition to vocabulary elements Every Article is a Document
  • Machine-readable definitions Ontology axiom rdfs:subClassOf theirVoc:Document theirVoc:Article rdf:type rdf:type deduction of new facts & control of existing factsby reasoning engines
  • CH case: eCulture/Europeana
  • CH case study: eCulture/Europeana
  • Case study: Europeana ?y The query: vra:subject skos:broader ?x rma:Egypt The existing description: rma:Cairo rma:depicts skos:broaderrma:gezicht_in_cairo rma:Egypt Why is there a match? For the Europeana ontology, every rma:depicts statement implies a vra:subject statement
  • More than traditional metadata (2) Flexible reasoning: a same base can be easily added with new descriptions using different ontologies den08:shows_DEN_Participant Requirement: semantically connect these ontologies den08:shows_DEN_Participant "implies" vra:subject SW principle: meaning is accessible with the data, not encoded in external programs
  • Semantic Web in a nutshell A web of (meta)data Descriptions of resources Easy to share and interconnect Naar buiten! Smart data Machine-readable definitions for the data Relies on open standards W3Cs URI, XML, RDF, OWL, SPARQL, SKOS…
  • Topics Semantic Web in a nutshell Do it yourself Short break – Answer questions Examples from practice & Discussion
  • Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  • Porting CH data to the Semantic Web Typical CH data: Metadata on objects and documents, using specific description structures and possibly controlled vocabularies
  • Representing semantics of your data Semantics is about relations! Relations between words and things in the real world. Relations between concepts. Concepts are defined by their relations to other concepts E.g.: a theatre is a building every building is located in a location every building is build in a period of time a building is designed by an architect
  • Semantic Relations Objects are defined by their relations to other entities Just a thesaurus is not enough Only ‘broader/narrower’ relation is still very poor. E.g. the Rotterdamse Schouwburg is not only defined by saying that it is a theatre, but also by: the Rot. Schouwb. is located in Schouwburgplein 25 the Rot. Schouwb. is build in 1988 the Rot. Schouwb. is designed by Wim Gerhard Quist Relations enable reasoning about entities.
  • Relations and Interoperability Two collections generally don’t consists of the same type of objects (except for painting galleries…) Nevertheless, in many cases, there are many relations between the collections. E.g. a collection about architecture photographs and a collection about theatre productions. Relations between these collections can only be found if we specify the relation between a picture and the building on the picture; a theatre production and the buildings it was performed in.
  • How to find a suitable ontology? The set of concepts and relations is called an ontology. Which ontology should we use? Find a suitable ontology. Makes your data directly interoperable to other institutions using that ontology. For many domains no ontologies are available. Build an ontology yourself. Yields the perfectly suited ontology. Lot of work.
  • How to design an Ontology Ontology should define relations between used concepts Nothing more. No judgments whether a concept is important, peripheral, etc. Not a general world view But: possibilities for extension No solutions for standard problems Representation of NAL data
  • But what are our concepts? Defining an ontology forces you to make clear what concepts you use That is a great value by itself. This is a real challenge if you are working with people from different disciplines or institutes. You should answer questions like what is the relation between A theatre building and the theatre genre A performance and a theatre production A location and an address (that might change when a street is renamed or renumbered) Etc. Solutions should be consistent
  • How to write it down Use the standard semantic web languages, like RDF and OWL. These languages have some restrictions. You have to learn how to deal with them You can’t write larger ontologies without using specialized tooling.
  • OWL for domain experts? What is a class, what is an individual? When do we use data(type) properties, when object properties? And what are annotation properties? How can we use subproperties? Language restrictions: only binary relations You cannot say something like: Jan is director of (Amphion, 1984-1992) This is not equivalent to Jan is director of Amphion Jan is director during 1984-1992
  • Tooling (1) Most tools seem to require a PhD in Semantic Web Science Tools are Protégé OntoStudio Etc.
  • Tooling (2) Cooperate with semantic web experts Use simple representations to talk about the ontology. Graphical representations are impressive but don’t scale to more than a handful of concepts. Proven ways to exchange information about the ontology Face to face discussions E-mail Wikis OwlDoc
  • Example
  • But… Ontologies are fine-grained models for the data Creating (and using) them is labor-intensive Do we need them for all kind of CH data? Consider thesauri and other controlled vocabularies: (dozens of) thousands of concepts AAT Loose semantics Car wheel BT Car Still useful for applications! Search, annotation
  • Porting controlled vocabularies to theSemantic Web xxxx xxxx xxx xxx xxx xxxx xxxx xxxx xxx xxx xxx xxx xxxx xxxx xxx xxxx xxx xxxx
  • SKOSObservation: there are many models/formats forcontrolled vocabularies: thesauri, classification schemes, etc…But also common features, used by typicalapplications Lexical information, semantic linksSKOS (Simple Knowledge Organization System)model to represent KOSs on the Semantic Web ina simple way Comparable to Dublin Core, for conceptual vocabularies
  • Concepts and labels
  • (Multilingual) labels
  • Semantic relations
  • Putting it together: a SKOS graphanimalscats UF domestic cats RT wildcats BT animalsdomestic cats USE catswildcats
  • Networking controlled vocabularies
  • Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  • MultimediaN eCulture Annotation tool
  • Benefiting from the availability ofdifferent vocabularies
  • Direct access to the context of annotations
  • CHOICE annotation tool: benefitingfrom information extraction technology
  • Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  • Interoperability Levels of Interoperability Creating Interoperability
  • Levels of interoperability Information Interoperability Syntactic Interoperability format heterogeneity many commercial solutions Structural Interoperability structural/Schematic heterogeneity some commercial solutions mainly based on manually provided mapping rules Semantic Interoperability challenge!
  • Definition Semantic Interoperability is the ability of two or more computer systems to exchange information and have the meaning of that information automatically interpreted by the receiving system accurately enough to produce useful results.
  • Structural Interoperability
  • Structural Interoperability (NPO)
  • Sem. Interoperability ‘missie congo’
  • Sem. Interoperability ‘missie congo’
  • Sem. Interoperability ‘missie congo’
  • Creating Interoperability Using one standard model By hand The wisdom of the crowds? Automatic alignment
  • Standardization Not realistic for a lot data Legacy data Who determines the standard? Talking about standards hinders annotating the collections Useful for common and clearly restricted domains; Name, address, location (NAL / NAW) common annotation types; Dublin Core metalevel SKOS
  • Integrate by hand Lots of work Best results
  • Example
  • Interoperability by Using theWisdom of the Crowds Use web2.0 to realize the semantic web “Defining new mappings interactively. As a user browses an ontology in a repository, he may come across a concept for which he knows there is a similar concept in another ontology. The user can create the mapping on-the-fly, linking the two concepts.” (Noy e.a. 2008) Use tags as a common annotation level?
  • Example from Stanford University
  • Automatic alignment techniques Long brain tumor Long tumor Lexical Labels of entities and textual definitions Structural Structure of the vocabularies Background knowledge Using a shared conceptual reference to find links Extensional Object information (e.g. book indexing)Frank van Harmelen, AIME05
  • Semantic interoperability by usingsynonyms and related terms For many tasks we don’t have to know what the exact relation between two entities is. It is sufficient to know that terms are representing the same concept (‘synonyms’) or related concepts. Usage Intelligent query expansion Automatic finding relations between collections Relatedness of terms can in some cases be derived from actual metadata!
  • Finding related terms by using co-occurrence data Statistics from usage in annotated data Cooccurence of metadata is the key Missie Veilig- Missie Congo Gaza Missie Missie Veilig- Missie Congo Gaza Missie Veilig- heidsraad Blauw- Veilig- Veilig- VN Veilig- heidsraad Blauw- Veilig- Veilig- VN heidsraad VN heidsraad heidsraad Bush heidsraad VN helm helm heidsraad heidsraad Bush VN New-York VN VN New-York VN VN VN High cooccurrence of VN and veiligheidsraad
  • New measure for keyword similarity Keywords have similar usage if they co-occur with similar frequency with all other keywords. In other words: Terms are similar if they have similar co-occurrence patterns Work very well for social tags
  • 2 Experiments Mapping between Teleblik keywords and User Tags 100 videos 12.414 tags 4.348 different tags 269 different keywords Mapping between tags and Wikipedia categories 58.345 articles 500.618 tags and category annotations 42.425 different categories 49.603 different tags Mapping computed for 4.182 most frequent tags/cat. (Involving a transformation on a 92082 x 92082 matrix of floats!)
  • and Wikipedia Social book marking site Bookmarks in most cases can be interpreted as labels or tags for the bookmarked URL. Many Wikipedia articles are tagged by users Wikipedia Articles are labeled with one or more categories by the article authors. Categories are organized hierarchically. Categories are organized consciously like in a thesaurus
  • economist American_economist Economists economics people philosophy history ecommerce Electronic_commerce credit business web2.0 American_poetsAmerican_novelists poetry literature Living_people art beauty actress American_television_actors cinema interesting people Classification_systems taxonomy classification library folksonomy tagging biology psychology
  • Results of instance based mapping categories to tags 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 identical synonym broader narrower related unrelated
  • Results of instance based mapping tags to categories 0,3 0,25 0,2 0,15 0,1 0,05 0 identical synonym broader narrower related unrelated
  • hilarious funny humor humour comedy fun cookingcookbook recipes cookbooks cookery food diary diaries journal teenage family girls unemploymentdrunk employment middle_class jobs class homerodysseus greek_poetry trojan_war troy iliad shakespeare Elizabethan_drama william_shakespeare british_drama plays tragedies
  • Case study: re-indexing at KB? KB has a depot collection Some books are indexed beforehand at public libraries The two collections use different thesauri Biblion Depot (openbare bibliotheken) (KB) 650K 1M books books LTR Brinkman
  • The re-indexing application Propose Brinkman indexing when KB receives a Biblion book Biblion Depot LTR Brinkman LTR ? ? ? Brinkman
  • Techniques usedLexical: comparison of concept labelsExtensional: based on overlap of books indexed with LTRconcepts and books indexed with Brinkman concepts Dutch Literature LTR Brinkman DutchCollectionof books
  • Result
  • Feasibility? Quality of annotations Level comparable to Christians experiment First experiments: optimum around 60% precision, 50% recall Automatic alignment has flaws, but it could help already! Assisting users, not replacing them Note: the application is difficult (variability)
  • Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting legacy data to the Semantic Web Representing semantics of your data Controlled vocabularies – SKOS Annotation Creating new semantic CH descriptions Creating interoperability at the semantic level Connecting (to standard) description models Using the wisdom of the crowds? Automatically aligning existing description vocabularies
  • Take-home message: benefits Performing over the web: Knowledge re-use & sharing Libris Knowledge integration CiC, KB re-indexing Data enrichment Enhanced collection access eCulture/Europeana semantic search It can really help open up CH data!
  • Take-home message: costs Steps to be taken Porting legacy data Semantic alignment Annotation
  • Topics Semantic Web in a nutshell Do it yourself Short break – Formulate Questions Examples from practice
  • Examples from practice Do you have experiences or plans to Semantically annotate? Using standard vocabularies? Model your domain; write or extend an ontology Connect to or integrate with other collections? Inside or outside your institute Make data available on the internet? Tell us (after the break)! Some issues to tell us about: (next slide)
  • Why do you think semantic web couldbe helpful to your organization Where? Front-office Search/Browse/recommendation/personalization Make data available to end-users for reuse (mash-ups, virtual (user) collections, … ) Back-office Provide data to third parties Inference of new knowledge/consistency checking What type of data? Porting existing data Creating new data Automatic data extraction / Experts / End users What new (semantic) links arise? New relations to which collection/vocabularies? Within a collection / With other collections / With other institutes How? Automatic alignment / Experts / End users?