DE conferentie 2008 - Isaac en Wartena

474 views
360 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
474
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

DE conferentie 2008 - Isaac en Wartena

  1. 1. Semantic Web Do it Yourself DE Conferentie Rotterdam, 11 december 2008Antoine Isaac en Christian Wartena
  2. 2. Preamble: who are we? Christian Wartena Telematica Institute http://www.telin.nl/ Working on: MyMedia (http://www.mymedia-project.org/) Cultuur in Context CATCH research programme (CHOICE) Christian.Wartena@telin.nl Antoine Isaac Working on: CATCH research programme (STITCH) European Library-related projects (TELplus) SKOS @ W3C http://www.few.vu.nl/~aisaac/ aisaac@few.vu.nl
  3. 3. Preamble: who are you? Domain Archive? Library? Museum? Experience with SW New to the stuff? Basic knowledge? Advanced knowledge / already implemented something? Motivation Thinking about using it? Just to learn about it or what you can do with it?
  4. 4. Topics of this workshop Semantic Web in a nutshell (20’) Do it yourself (40’) Short break – Answer questions (15’) Examples from practice & Discussion (45’)
  5. 5. Topics Semantic Web in a nutshell A Web of data Smart data Do it yourself Short break – Answer questions Examples from practice & Discussion
  6. 6. The Web for humans A city The city’s location Hyperlinks anchored to words Meaning
  7. 7. SW problem: the Web for computers? Where is meaning ?
  8. 8. The Semantic Web vision: a web of (smart) data Article Document The_Netherlands subClassOf typehasCapital file1 partOf Amsterdamtype defines par3 City
  9. 9. A Web of resources theirVoc:Article http://ex.org/files/file1 myVoc:Amsterdam• Web-enabled Identifiers (URIs)• Coming from different spaces myVoc: = http://example.org/myVocabulary/
  10. 10. Data in an RDF “graph”Resource Description Framework : structured data as triple statements theirVoc:Article rdf:type http://ex.org/files/file1theirVoc:subject myVoc:Amsterdam Links coming from different spaces
  11. 11. More than traditional metadata (1) Web-based resources allow distribution/sharing/linking of documents description vocabularies http://geo.org/voc/ (meta)data (file1, subject, Amsterdam) http://www.kb.nl/eDepothttp://ex.org/files/file1 different owners & locations
  12. 12. Web of data example: Linking Open Data community projecthttp://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  13. 13. CH case: Libris http://libris.kb.se/ Swedish Union Catalogue as linked data
  14. 14. Linked descriptions of resources in Libris Martin Malmsten, Dublin Core 2008 http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf
  15. 15. External links in Libris: Library of CongressSubject Headings Ed Summers et. al., Dublin Core 2008 http://dc2008.de/wp-content/uploads/2008/09/summers-isaac-redding-krech.pdf
  16. 16. Searching using multiple vocabularies
  17. 17. Agenda Semantic Web in a nutshell A Web of data Smart data: the "semantics"
  18. 18. Creating vocabularies of “building blocks”for RDF graphs theirVoc:Article rdf:type http://ex.org/files/file1theirVoc:subject myVoc:Amsterdam
  19. 19. Ontologies Ontologies specify description vocabularies which can be shared subject, Article Give formal definition to vocabulary elements Every Article is a Document
  20. 20. Machine-readable definitions Ontology axiom rdfs:subClassOf theirVoc:Document theirVoc:Article rdf:type rdf:type http://ex.org/files/file1Allows deduction of new facts & control of existing factsby reasoning engines
  21. 21. CH case: eCulture/Europeana
  22. 22. CH case study: eCulture/Europeana
  23. 23. Case study: Europeana ?y The query: vra:subject skos:broader ?x rma:Egypt The existing description: rma:Cairo rma:depicts skos:broaderrma:gezicht_in_cairo rma:Egypt Why is there a match? For the Europeana ontology, every rma:depicts statement implies a vra:subject statement
  24. 24. More than traditional metadata (2) Flexible reasoning: a same base can be easily added with new descriptions using different ontologies den08:shows_DEN_Participant Requirement: semantically connect these ontologies den08:shows_DEN_Participant "implies" vra:subject SW principle: meaning is accessible with the data, not encoded in external programs
  25. 25. Semantic Web in a nutshell A web of (meta)data Descriptions of resources Easy to share and interconnect Naar buiten! Smart data Machine-readable definitions for the data Relies on open standards W3Cs URI, XML, RDF, OWL, SPARQL, SKOS… http://www.w3.org/2001/sw/
  26. 26. Topics Semantic Web in a nutshell Do it yourself Short break – Answer questions Examples from practice & Discussion
  27. 27. Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  28. 28. Porting CH data to the Semantic Web Typical CH data: Metadata on objects and documents, using specific description structures and possibly controlled vocabularies
  29. 29. Representing semantics of your data Semantics is about relations! Relations between words and things in the real world. Relations between concepts. Concepts are defined by their relations to other concepts E.g.: a theatre is a building every building is located in a location every building is build in a period of time a building is designed by an architect
  30. 30. Semantic Relations Objects are defined by their relations to other entities Just a thesaurus is not enough Only ‘broader/narrower’ relation is still very poor. E.g. the Rotterdamse Schouwburg is not only defined by saying that it is a theatre, but also by: the Rot. Schouwb. is located in Schouwburgplein 25 the Rot. Schouwb. is build in 1988 the Rot. Schouwb. is designed by Wim Gerhard Quist Relations enable reasoning about entities.
  31. 31. Relations and Interoperability Two collections generally don’t consists of the same type of objects (except for painting galleries…) Nevertheless, in many cases, there are many relations between the collections. E.g. a collection about architecture photographs and a collection about theatre productions. Relations between these collections can only be found if we specify the relation between a picture and the building on the picture; a theatre production and the buildings it was performed in.
  32. 32. How to find a suitable ontology? The set of concepts and relations is called an ontology. Which ontology should we use? Find a suitable ontology. Makes your data directly interoperable to other institutions using that ontology. For many domains no ontologies are available. Build an ontology yourself. Yields the perfectly suited ontology. Lot of work.
  33. 33. How to design an Ontology Ontology should define relations between used concepts Nothing more. No judgments whether a concept is important, peripheral, etc. Not a general world view But: possibilities for extension No solutions for standard problems Representation of NAL data
  34. 34. But what are our concepts? Defining an ontology forces you to make clear what concepts you use That is a great value by itself. This is a real challenge if you are working with people from different disciplines or institutes. You should answer questions like what is the relation between A theatre building and the theatre genre A performance and a theatre production A location and an address (that might change when a street is renamed or renumbered) Etc. Solutions should be consistent
  35. 35. How to write it down Use the standard semantic web languages, like RDF and OWL. These languages have some restrictions. You have to learn how to deal with them You can’t write larger ontologies without using specialized tooling.
  36. 36. OWL for domain experts? What is a class, what is an individual? When do we use data(type) properties, when object properties? And what are annotation properties? How can we use subproperties? Language restrictions: only binary relations You cannot say something like: Jan is director of (Amphion, 1984-1992) This is not equivalent to Jan is director of Amphion Jan is director during 1984-1992
  37. 37. Tooling (1) Most tools seem to require a PhD in Semantic Web Science Tools are Protégé OntoStudio Etc.
  38. 38. Tooling (2) Cooperate with semantic web experts Use simple representations to talk about the ontology. Graphical representations are impressive but don’t scale to more than a handful of concepts. Proven ways to exchange information about the ontology Face to face discussions E-mail Wikis OwlDoc
  39. 39. Example
  40. 40. But… Ontologies are fine-grained models for the data Creating (and using) them is labor-intensive Do we need them for all kind of CH data? Consider thesauri and other controlled vocabularies: (dozens of) thousands of concepts AAT Loose semantics Car wheel BT Car Still useful for applications! Search, annotation
  41. 41. Porting controlled vocabularies to theSemantic Web xxxx xxxx xxx xxx xxx xxxx xxxx xxxx xxx xxx xxx xxx xxxx xxxx xxx xxxx xxx xxxx
  42. 42. SKOSObservation: there are many models/formats forcontrolled vocabularies: thesauri, classification schemes, etc…But also common features, used by typicalapplications Lexical information, semantic linksSKOS (Simple Knowledge Organization System)model to represent KOSs on the Semantic Web ina simple way Comparable to Dublin Core, for conceptual vocabularies
  43. 43. Concepts and labels
  44. 44. (Multilingual) labels
  45. 45. Semantic relations
  46. 46. Putting it together: a SKOS graphanimalscats UF domestic cats RT wildcats BT animalsdomestic cats USE catswildcats
  47. 47. Networking controlled vocabularies
  48. 48. Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  49. 49. MultimediaN eCulture Annotation tool
  50. 50. Benefiting from the availability ofdifferent vocabularies
  51. 51. Direct access to the context of annotations
  52. 52. CHOICE annotation tool: benefitingfrom information extraction technology
  53. 53. Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting existing data to the Semantic Web Annotation Creating interoperability at the semantic level
  54. 54. Interoperability Levels of Interoperability Creating Interoperability
  55. 55. Levels of interoperability Information Interoperability Syntactic Interoperability format heterogeneity many commercial solutions Structural Interoperability structural/Schematic heterogeneity some commercial solutions mainly based on manually provided mapping rules Semantic Interoperability challenge!
  56. 56. Definition Semantic Interoperability is the ability of two or more computer systems to exchange information and have the meaning of that information automatically interpreted by the receiving system accurately enough to produce useful results.
  57. 57. Structural Interoperability
  58. 58. Structural Interoperability (NPO)
  59. 59. Sem. Interoperability ‘missie congo’
  60. 60. Sem. Interoperability ‘missie congo’
  61. 61. Sem. Interoperability ‘missie congo’
  62. 62. Creating Interoperability Using one standard model By hand The wisdom of the crowds? Automatic alignment
  63. 63. Standardization Not realistic for a lot data Legacy data Who determines the standard? Talking about standards hinders annotating the collections Useful for common and clearly restricted domains; Name, address, location (NAL / NAW) common annotation types; Dublin Core metalevel SKOS
  64. 64. Integrate by hand Lots of work Best results
  65. 65. Example
  66. 66. Interoperability by Using theWisdom of the Crowds Use web2.0 to realize the semantic web “Defining new mappings interactively. As a user browses an ontology in a repository, he may come across a concept for which he knows there is a similar concept in another ontology. The user can create the mapping on-the-fly, linking the two concepts.” (Noy e.a. 2008) Use tags as a common annotation level?
  67. 67. Example from Stanford University
  68. 68. Automatic alignment techniques Long brain tumor Long tumor Lexical Labels of entities and textual definitions Structural Structure of the vocabularies Background knowledge Using a shared conceptual reference to find links Extensional Object information (e.g. book indexing)Frank van Harmelen, AIME05http://www.cs.vu.nl/~frankh/presentations/AIME05.ppt
  69. 69. Semantic interoperability by usingsynonyms and related terms For many tasks we don’t have to know what the exact relation between two entities is. It is sufficient to know that terms are representing the same concept (‘synonyms’) or related concepts. Usage Intelligent query expansion Automatic finding relations between collections Relatedness of terms can in some cases be derived from actual metadata!
  70. 70. Finding related terms by using co-occurrence data Statistics from usage in annotated data Cooccurence of metadata is the key Missie Veilig- Missie Congo Gaza Missie Missie Veilig- Missie Congo Gaza Missie Veilig- heidsraad Blauw- Veilig- Veilig- VN Veilig- heidsraad Blauw- Veilig- Veilig- VN heidsraad VN heidsraad heidsraad Bush heidsraad VN helm helm heidsraad heidsraad Bush VN New-York VN VN New-York VN VN VN High cooccurrence of VN and veiligheidsraad
  71. 71. New measure for keyword similarity Keywords have similar usage if they co-occur with similar frequency with all other keywords. In other words: Terms are similar if they have similar co-occurrence patterns Work very well for social tags
  72. 72. 2 Experiments Mapping between Teleblik keywords and User Tags 100 videos 12.414 tags 4.348 different tags 269 different keywords Mapping between del.icio.us tags and Wikipedia categories 58.345 articles 500.618 tags and category annotations 42.425 different categories 49.603 different tags Mapping computed for 4.182 most frequent tags/cat. (Involving a transformation on a 92082 x 92082 matrix of floats!)
  73. 73. del.icio.us and Wikipedia Del.icio.us Social book marking site Bookmarks in most cases can be interpreted as labels or tags for the bookmarked URL. Many Wikipedia articles are tagged by del.icio.us users Wikipedia Articles are labeled with one or more categories by the article authors. Categories are organized hierarchically. Categories are organized consciously like in a thesaurus
  74. 74. economist American_economist Economists economics people philosophy history ecommerce Electronic_commerce credit business web2.0 American_poetsAmerican_novelists poetry literature Living_people art beauty actress American_television_actors cinema interesting people Classification_systems taxonomy classification library folksonomy tagging biology psychology
  75. 75. Results of instance based mapping categories to tags 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 identical synonym broader narrower related unrelated
  76. 76. Results of instance based mapping tags to categories 0,3 0,25 0,2 0,15 0,1 0,05 0 identical synonym broader narrower related unrelated
  77. 77. hilarious funny humor humour comedy fun cookingcookbook recipes cookbooks cookery food diary diaries journal teenage family girls unemploymentdrunk employment middle_class jobs class homerodysseus greek_poetry trojan_war troy iliad shakespeare Elizabethan_drama william_shakespeare british_drama plays tragedies
  78. 78. Case study: re-indexing at KB? KB has a depot collection Some books are indexed beforehand at public libraries The two collections use different thesauri Biblion Depot (openbare bibliotheken) (KB) 650K 1M books books LTR Brinkman
  79. 79. The re-indexing application Propose Brinkman indexing when KB receives a Biblion book Biblion Depot LTR Brinkman LTR ? ? ? Brinkman
  80. 80. Techniques usedLexical: comparison of concept labelsExtensional: based on overlap of books indexed with LTRconcepts and books indexed with Brinkman concepts Dutch Literature LTR Brinkman DutchCollectionof books
  81. 81. Result
  82. 82. Feasibility? Quality of annotations Level comparable to Christians experiment First experiments: optimum around 60% precision, 50% recall Automatic alignment has flaws, but it could help already! Assisting users, not replacing them Note: the application is difficult (variability)
  83. 83. Doing it yourself? What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques? Porting legacy data to the Semantic Web Representing semantics of your data Controlled vocabularies – SKOS Annotation Creating new semantic CH descriptions Creating interoperability at the semantic level Connecting (to standard) description models Using the wisdom of the crowds? Automatically aligning existing description vocabularies
  84. 84. Take-home message: benefits Performing over the web: Knowledge re-use & sharing Libris Knowledge integration CiC, KB re-indexing Data enrichment Enhanced collection access eCulture/Europeana semantic search It can really help open up CH data!
  85. 85. Take-home message: costs Steps to be taken Porting legacy data Semantic alignment Annotation
  86. 86. Topics Semantic Web in a nutshell Do it yourself Short break – Formulate Questions Examples from practice
  87. 87. Examples from practice Do you have experiences or plans to Semantically annotate? Using standard vocabularies? Model your domain; write or extend an ontology Connect to or integrate with other collections? Inside or outside your institute Make data available on the internet? Tell us (after the break)! Some issues to tell us about: (next slide)
  88. 88. Why do you think semantic web couldbe helpful to your organization Where? Front-office Search/Browse/recommendation/personalization Make data available to end-users for reuse (mash-ups, virtual (user) collections, … ) Back-office Provide data to third parties Inference of new knowledge/consistency checking What type of data? Porting existing data Creating new data Automatic data extraction / Experts / End users What new (semantic) links arise? New relations to which collection/vocabularies? Within a collection / With other collections / With other institutes How? Automatic alignment / Experts / End users?

×