Linking data for Europeana


Published on

Presentation at Dublin Core 2009

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello everyone, My name is Antoine Isaac I work for the Free University in Amsterdam, for the EuropeanaConnect project. And today, I'm going to talk about plans to use formalized data to enhance search processes in Europeana
  • Europeana is rooted in the European Commission's initiative on information society It aims at better connecting citizens to their cultural heritage curated by libraries, museums, archives, and so on. It is about giving access to digitized content and the rich knowledge that is associated to them, as produced by these institutes. It is a large, pan-European efforts: dozens of institutions are now involved. And From 2 million objects in the first beta release Europeana is expected to give access to 10 million objects next year.
  • What does it look like? So if you go to, you will currently find yourself on that page.
  • There is a search box where you can type the query you are interested in. For instance, Da Vinci.
  • For that query, you get a bunch of documents that are matching your query. This is pretty simple text-like search, based on the metadata associated to the objects collected by Europeana. What you get is a raw set of documents.
  • But you can refine this result, first by using some facets like country, date, etc.
  • For example, we can select country.
  • We have a list of countries that appear in the metadata for the current results.
  • We can for example click on UK And we get a smaller set of results.
  • These results are actually stuff that is coming from UK institutions, not stuff for example, *about* UK.
  • It is possible to use advanced search to somehow solve this issue. We can ask for instance for works by Da Vinci about UK. And naturally we'll find no results.
  • All this is great for a start. But there is a shared feeling by many people in the project that there can be more to it. Especially with respect to advanced search, and assisting users to orient themselves into such a big information space. One of the main options that is foreseen is the use of more semantics in the search process. The idea is to enhance access to Europeana content using query expansion mechanisms, or clustering of results: things that have already been tested in a number of projects. Ideally a better search would be able to exploit different type of relations between the entities appearing in the information spaces: distinguishing different links such as located in, is born in, created can be very useful for search. Also, some inference process can be beneficial here: for instance, if one queries for UK, it could be handy to find items that were only related to London in their description The point is that we know that there are already quite some rich descriptions available in the metadata. It requires to make that information properly machine-accessible, and design the tools to exploit it.
  • And that's were formal, linked data can come into play:
  • By allowing to build and exploit a kind of semantic layer on top of the items collected by Europeana. That semantic layer (a concept introduced in the context of the Europeana v1.0 project) would serve as an interface between the users's query needs and the item descriptions.
  • This is actually already been investigated in the context of the Europeana Thought Lab prototype, which I'm going to demonstrate now. The Thought Labe has been developed at the Free university and the CWI in Amsterdam It is a kind of mini-Europeana, as it works on just three collections. But it relies on formalized data that has been semantically aligned, allowing to experiment with some new features. The Thought Lab just starts as the normal portal: we have a search box, in which we can enter textual queries
  • The first difference is the autocompletion that is activated while typing in the search box. The tool returns the elements that are known in the information space and match the query. For example if I type Egypt, I get a number of artefacts' names, locations names or other concepts.
  • If I select the location Egypte, I get a number of results, but they are clustered: the first two ones are items that show Egypte
  • The reason for which they are here can be seen in their metadata:
  • They have a matching subject. But it is important to notice that this is a true matching of URIs, not simple string matching alone. In fact Egypte (iwht an e) is the label of the concept of Egypt which comes from a controlled vocabulary in the Rijksmusem in amsterdam. We can see above that when we selected Egypt in the first query step, we selected a resource with a URI Actually this explains the second cluster, about Egypt without an e: here are the work that are described using a concept from another vocabulary for Egypt, but which we know to be equivalent to the one in the Rijskmuseum that is in our query.
  • The third cluster seems a bit stranger: a more specific Egypt?
  • In fact these places, as can be seen from their metadata,
  • Have a subject which is a specific place in Egypt
  • And the system knows, from the description of that place in the Rijksmuseum vocabularies
  • That it is a more specific concept than Egypt. Hence its appearing in that cluster.
  • Other clusters how how different paths in the information space can be followed to lead to even less anticipated results For example, works created by persons who died in Egypt
  • Here are items that, as seen from their metadata,
  • were created by someone (here, a French photographer) who died in Egypt. It is interesting to note that here, if I click on the creator (Gustave Le gray)
  • The information on the death place is not found in the original resource's description
  • Actually, this knowledge comes from the fact that the resource standing for Le Gray in the first vocabulary has been linked (we say match) to the same person as represented in another vocabulary
  • Where it is decribed
  • As being dead in a place
  • For which we also have information
  • And which is fact Cairo, which is more specific than Egypt. Quite a long path, but it brings some serendipity which can be beneficial to have for addressing more complex users' needs
  • Technically, this prototype relies on [Read]
  • Linking data for Europeana

    1. 1. Linking data for Europeana Dublin Core Conference October 13, 2009 Antoine Isaac [email_address]
    2. 2. What is Europeana? <ul><li>EU's i2010 information society initiative </li></ul><ul><li>Connecting to European cultural heritage </li></ul><ul><ul><li>Libraries, museums, archives… </li></ul></ul><ul><ul><li>Access to digital content and descriptions </li></ul></ul><ul><li>Large effort </li></ul><ul><ul><li>Early 2009 Beta version: 2 million objects </li></ul></ul><ul><ul><li>Now: 4.6 million </li></ul></ul><ul><ul><li>Target for Version 1.0, 2010: 10 million </li></ul></ul>
    3. 3. The current portal
    4. 4. The current portal
    5. 7. Towards semantic search: facets
    6. 8. Towards semantic search: facets
    7. 9. Leonardo's stuff on UK?? <ul><li>No, just stuff held in UK, as given by simple metadata look-up </li></ul>
    8. 10. Leonardo's stuff on UK?? <ul><li>No, just stuff held in UK, as given by simple metadata look-up </li></ul>
    9. 11. Advanced search is possible
    10. 12. Towards semantics-enabled search <ul><li>Enhance access to Europeana content by semantics </li></ul><ul><ul><li>Query expansion </li></ul></ul><ul><ul><li>Clustering of results </li></ul></ul><ul><li>Exploiting different types of relations </li></ul><ul><ul><li>locatedIn, isBornIn, created… </li></ul></ul><ul><li>Making use of inference </li></ul><ul><ul><li>Finding work showing London for a query on UK </li></ul></ul><ul><li>Rich descriptions are already there, in metadata! </li></ul><ul><li>Requires to make it properly machine-accessible </li></ul>
    11. 13. Goal: semantics in Europeana v1.0 <ul><li>Building a semantic layer to help accessing content </li></ul>
    12. 14. Goal: semantics in Europeana v1.0 <ul><li>Building a semantic layer to help accessing content </li></ul>
    13. 15. Prototype: Europeana Thought Lab <ul><ul><li> </li></ul></ul>
    14. 16. Semantic autocompletion
    15. 17. Clustering of results
    16. 18. Baseline: matching concepts' label
    17. 19. Baseline: matching concepts' label
    18. 20. A &quot;more specific Egypt&quot;??
    19. 21. A &quot;more specific Egypt&quot;?
    20. 22. A &quot;more specific Egypt&quot;?
    21. 23. A concept more specific than the Egypt one
    22. 24. A concept more specific than the Egypt one
    23. 25. Following other relations
    24. 26. Following other relations - creator
    25. 27. Following other relations - creator
    26. 28. Following other relations - match
    27. 29. Following other relations - match
    28. 30. Following other relations – death place
    29. 31. Following other relations – death place
    30. 32. Following other relations – death place
    31. 33. Following other relations – death place
    32. 34. Enabling Technologies <ul><li>RDF </li></ul><ul><ul><li>Uniform format for data </li></ul></ul><ul><ul><li>Amenable to sharing and linking </li></ul></ul><ul><li>OWL </li></ul><ul><ul><li>Representation of metadata structures </li></ul></ul><ul><ul><li>Amenable to inference </li></ul></ul><ul><li>SKOS </li></ul><ul><ul><li>Representation of controlled vocabulary </li></ul></ul><ul><ul><li>Allows exploitation of legacy knowledge organization </li></ul></ul><ul><ul><ul><li>Simple, but precious! </li></ul></ul></ul>
    33. 35. Where are the challenges? <ul><li>Semantic conversion of data </li></ul><ul><ul><li>Using appropriate data models </li></ul></ul><ul><ul><li>Enriching legacy metadata </li></ul></ul>
    34. 36. Where are the challenges? <ul><li>Semantic conversion of data </li></ul><ul><ul><li>Using appropriate data models </li></ul></ul><ul><ul><li>Enriching legacy metadata </li></ul></ul><ul><li>Semantic alignments </li></ul><ul><ul><li>Between description ontologies </li></ul></ul><ul><ul><ul><li>vra:depicts rdfs:subPropertyOf dc:subject </li></ul></ul></ul><ul><ul><li>Between concepts in controlled vocabularies </li></ul></ul><ul><ul><ul><li>iconclass:bird skos:closeMatch ddc:bird </li></ul></ul></ul>
    35. 37. Alignment of semantic references
    36. 38. Where are the challenges? <ul><li>Semantic alignment (c'ed) </li></ul><ul><ul><li>Find correspondences between large vocabularies </li></ul></ul><ul><ul><li>In a multilingual context </li></ul></ul>
    37. 39. Where are the challenges? <ul><li>Semantic alignment (c'ed) </li></ul><ul><ul><li>Find correspondences between large vocabularies </li></ul></ul><ul><ul><li>In a multilingual context </li></ul></ul><ul><li>Scalability </li></ul><ul><ul><li>Plugging the semantic features into the Europeana production environment </li></ul></ul>
    38. 40. Thanks! <ul><li>Main Europeana portal: </li></ul><ul><ul><li> </li></ul></ul><ul><li>Thought Lab: </li></ul><ul><ul><li> </li></ul></ul><ul><li>Working on Europeana </li></ul><ul><ul><li>Europeana v1.0: </li></ul></ul><ul><ul><li>EuropeanaConnect: </li></ul></ul>
    39. 41. Hands-on task <ul><li>Search for something the main Europeana portal </li></ul><ul><li>Search for the same thing in the Thought Lab </li></ul><ul><li>Why are those results returned? </li></ul><ul><ul><li>Start exploring the graph used by the engine </li></ul></ul><ul><ul><li>Try to spot elements from controlled vocabularies </li></ul></ul><ul><ul><li>And links between different vocabularies </li></ul></ul>