WP3 Further specification of Functionality and Interoperability - Gradmann

855 views

Published on

WP3 Further specification of Functionality and Interoperability

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
855
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

WP3 Further specification of Functionality and Interoperability - Gradmann

  1. 1. WP3 Further specification of Functionality and Interoperability Work Group 3.2 Semantic and Multilingual Aspects
  2. 2. Issues for Work Group WG3.2: Some Principles <ul><li>Europeana surrogates need rich semantic context in (at least) </li></ul><ul><ul><li>Place, Time, Persons, Abstract Concepts </li></ul></ul><ul><li>The graphs linking surrogates and semantic nodes need to be typed </li></ul><ul><li>We will use linked data wherever possible instead of creating our own semantic nodes </li></ul><ul><li>Source data and their context will be in all European languages (and potentially more!) </li></ul><ul><li>Europeana users will wish to use all European languages (and potentially more!) </li></ul>
  3. 3. WG3.2: Semantic Contextualisation and Multilingual Issues
  4. 4. Issues for Work Group WG3.2: Semantic Contextualisation (1) <ul><li>What kind of functionality based on semantic technology do we actually want to enable (have a look at the thoughtlab and develop ideas from there)? Do we want to enable logical inferencing, for instance? </li></ul><ul><li>What source data do we actually have (subject headings, classifications, thesauri) and how well are objects contextualised in source data? </li></ul><ul><li>What kinds of semantic elements will we be able to produce from these via SKOSification or other automated procedures ? </li></ul><ul><li>Which linked data resources will we be using? </li></ul>
  5. 5. Issues for Work Group WG3.2: Semantic Contextualisation (2) <ul><li>To what extent will we be able to automatically contextualise surrogates in linking them to semantic nodes? </li></ul><ul><li>What types of links between surrogates and nodes do we distinguish? </li></ul><ul><li>What may providers expect to get back from us? </li></ul><ul><li>What technology do we need for all this </li></ul><ul><ul><li>RDF? SKOS?? OWL??? </li></ul></ul><ul><li>What input does Europeana.Connect (EuCo) WP1 expect from us and when ? </li></ul><ul><li>What do we expect back from EuCo WP1 and when ? </li></ul><ul><li>Any related projects ? Results we can reuse?? </li></ul>
  6. 6. Issues for Work Group WG3.2 Multilingual Issues <ul><li>What is a realistic scope for multilingual functionality: </li></ul><ul><ul><li>query translation? </li></ul></ul><ul><ul><li>Result set translation?? </li></ul></ul><ul><ul><li>More??? </li></ul></ul><ul><li>Which languages will Europeana 1.0 support? </li></ul><ul><li>What input does EuCo WP2 expect from us and when ? </li></ul><ul><li>What do we expect back from EuCo WP2 and when do we expect this? </li></ul><ul><li>Any related projects ? Results we can reuse?? </li></ul>
  7. 7. WG3.2: Semantic and multilingual aspects <ul><ul><li>Marco Berni </li></ul></ul><ul><ul><li>Tobias Blanke </li></ul></ul><ul><ul><li>Giuliana de Francesco </li></ul></ul><ul><ul><li>Milena Dobreva </li></ul></ul><ul><ul><li>Martin Doerr </li></ul></ul><ul><ul><li>Zeki Mustafa Dogan </li></ul></ul><ul><ul><li>Nicola Ferro </li></ul></ul><ul><ul><li>Stefan Gradmann </li></ul></ul><ul><ul><li>Antoine Isaac </li></ul></ul><ul><ul><li>Walter Koch </li></ul></ul><ul><ul><li>Stefanos Kollias </li></ul></ul><ul><ul><li>Allison Kupietzky </li></ul></ul><ul><ul><li>Dan Matei </li></ul></ul><ul><ul><li>Hans Nederbragt </li></ul></ul><ul><ul><li>Vivien Petras </li></ul></ul><ul><ul><li>Anne Schiller </li></ul></ul><ul><ul><li>Douglas Tudhope </li></ul></ul><ul><ul><li>Vassilis Tzouvaras </li></ul></ul><ul><ul><li>Dov Wiener </li></ul></ul><ul><li>Issues: </li></ul><ul><ul><li>intended functionality </li></ul></ul><ul><ul><li>quality and semantic contextualization of object data </li></ul></ul><ul><ul><li>subject headings, thesauri, classification data available </li></ul></ul><ul><ul><li>which technologies to use </li></ul></ul><ul><ul><li>realistic scope for multilingual operations </li></ul></ul><ul><ul><li>related projects in area of multilinguality </li></ul></ul><ul><li>Office: </li></ul><ul><ul><li>Sjoerd Siebinga </li></ul></ul><ul><ul><li>Go Sugimoto </li></ul></ul>
  8. 8. Today (02 April) <ul><li>Contextualisation of existing source data </li></ul><ul><li>Contextual data available </li></ul><ul><li>Functional Scope </li></ul><ul><li>Linked data at our disposal </li></ul>
  9. 9. WG3.2 02 April - 1 <ul><li>Contextual data available </li></ul><ul><ul><li>List of 84 different vocabularies </li></ul></ul><ul><ul><li>Some prominent ones such as LCSH, some of them in VIAF </li></ul></ul><ul><ul><li>Semantic areas: subjects, names, persons, material </li></ul></ul><ul><ul><li>Various delivery formats </li></ul></ul>
  10. 10. WG3.2 02 April - 2 <ul><li>Contextualisation of existing source data </li></ul><ul><ul><li>G eographic names used 50% -> 90% </li></ul></ul><ul><ul><li>Coordinates 6% -> 8% </li></ul></ul><ul><ul><li>T ime </li></ul></ul><ul><ul><li>S ubjects </li></ul></ul><ul><ul><li>P ersons </li></ul></ul><ul><ul><li>organisations </li></ul></ul>
  11. 11. WG3.2 02 April - 3 <ul><li>Questions / suggestions: </li></ul><ul><ul><li>Which resources are cross-domain? </li></ul></ul><ul><ul><li>Which ESE element to be used? </li></ul></ul><ul><ul><li>Who will do cleaning of metadata? </li></ul></ul><ul><ul><li>Why not store metadata received as objects of its own rights </li></ul></ul><ul><ul><li>Minerva list of thesauri to be considered </li></ul></ul><ul><ul><li>Distinguish subject terms and classification of objects </li></ul></ul><ul><ul><li>Restrict structured operations to high level thesauri and do the rest based on lexical associations and the like </li></ul></ul><ul><ul><li>Ask providers to make their internal authorities available rather than trying to do map </li></ul></ul>
  12. 12. WG3.2 02 April - 4 <ul><li>Functional Scope (1) </li></ul><ul><ul><li>Surrogate model as presented in D2.5 doesn’t distinguish different types of relationships such as ‘about’ and ‘was present at’. </li></ul></ul><ul><ul><ul><li>The Point is valid for data organisation and for searching </li></ul></ul></ul><ul><ul><ul><li>Is a better model realistic for 1.0? </li></ul></ul></ul><ul><ul><ul><li>Can relation types be derived from the original attributes’ semantics </li></ul></ul></ul><ul><ul><li>Contextualisation pertaining to surrogate vs. context data pertaining to originating context </li></ul></ul><ul><ul><li>Granularity: complex objects </li></ul></ul><ul><ul><li>We need examples! -> Don Undeen: The Semantic Web in Practice </li></ul></ul><ul><ul><li>Separation of digital object, conceptual object (FRBRize the model) </li></ul></ul><ul><ul><li>Annotation: part of surrogate? When are these object of their own rights </li></ul></ul>
  13. 13. WG3.2 02 April - 5 <ul><li>Functional Scope (2) </li></ul><ul><ul><li>Provenance! Diachronic dimensions should be better represented </li></ul></ul><ul><ul><li>Geographic data DigMap (input from Milena) </li></ul></ul><ul><ul><li>Target audience is critical! User modelling!! </li></ul></ul><ul><ul><li>Reasoning: indirectly connected things </li></ul></ul><ul><ul><li>Related terms + related (functional) context </li></ul></ul><ul><ul><li>Flexibility of modeling is a requirement </li></ul></ul><ul><ul><li>-> inferencing, some kind of reasoning is needed, and be it for machine processing only </li></ul></ul><ul><ul><li>Cost of processing time may be a critical issue in designing! </li></ul></ul><ul><ul><li>How to generalise properties to a small set of super-properties </li></ul></ul>
  14. 14. WG3.2 03 April - 5 <ul><li>Functional Scope (3) </li></ul><ul><ul><li>Access by super-properties based on appropriate generalisations, follow data paths </li></ul></ul><ul><ul><li>Rosetta stone metaphor: Rosetta navigation </li></ul></ul><ul><ul><li>Domain specific ontologies mapped (or pruned) to more generic Europeana ontologies as part of OurEuropeana </li></ul></ul><ul><ul><li>Higher level terms (Europeana) + more granular terminology (user) </li></ul></ul><ul><ul><li>Generalisation, query expansion </li></ul></ul><ul><ul><li>Characterisation of collections (do we want these?) – or rather fonds (in archival speak), contextual groupings </li></ul></ul><ul><ul><ul><li>Distinguish curatorial environments (with metatada pertaining to these) and virtual ‘collections’ </li></ul></ul></ul><ul><ul><li>Tree structure in archives: can we represent these in the surrogate structure, or do we model this in semantic contextualisation </li></ul></ul>
  15. 15. WG3.2 03 April - 5 <ul><li>Functional Scope (4) </li></ul><ul><ul><li>(Collections contd): provider vs. user generated groupings </li></ul></ul><ul><ul><li>All ‘collections’ can be reduced to conceptual context (including ‘events’) </li></ul></ul><ul><ul><li>Questions – answers? Or just surrogate retrieval?? And if we provide answers: multilingually?? </li></ul></ul><ul><li>Multilingual issues </li></ul><ul><ul><li>Linguistic info pertaining to each attribute is a basic requirement – possible? </li></ul></ul><ul><ul><li>Query expansion + translation as scope + query formulation aids </li></ul></ul><ul><ul><li>Surrogate model doesn’t account for language, also regarding diachronic aspects </li></ul></ul>
  16. 16. WG3.2 03 April - 5 <ul><li>[Multilingual issues] </li></ul><ul><ul><li>Architecture: language manager indicates query translation focus, but multilingual approach should be much more transversal </li></ul></ul><ul><ul><li>Check against lexica at ingest stage and normalise / enrich </li></ul></ul><ul><ul><li>Use of an interlingua of controlled terms – but consider out of vocabulary terms! </li></ul></ul><ul><ul><li>Use CACAO results: make recommendations rather than try to impose … </li></ul></ul><ul><ul><li>Resources in different languages (FRBRzing) </li></ul></ul><ul><ul><li>Use payloading in search contex </li></ul></ul><ul><ul><li>Who will provide named entity resources, and which standards will we use in this respect </li></ul></ul>
  17. 17. WG3.2 03 April - 5 <ul><li>[Multilingual issues] </li></ul><ul><ul><li>D istinguish properties that are important for multilingual operations from those that are not </li></ul></ul><ul><ul><li>Wordnet use in ThoughtLab with English as pivotal language providing quick wins </li></ul></ul><ul><ul><li>Freely available resources are rare! UNESCO thesaurus availiable in some languages: CACAO list, TrebleCLEF, Placenames (European resource) </li></ul></ul><ul><ul><li>IMPACT uses lexica, some of which may be freely available -> Max Kaiser! </li></ul></ul><ul><ul><li>Political issues: who are the semantic/linguistic resource providers? Political authorities?? </li></ul></ul><ul><ul><li>Last FP7 call (DL) … </li></ul></ul>
  18. 18. WG3.2 03 April - 5 <ul><li>[Multilingual issues] </li></ul><ul><ul><li>CEN INNN </li></ul></ul><ul><ul><li>Talk to CLARIN for multilingual services </li></ul></ul><ul><ul><li>Contact FlareNET project </li></ul></ul><ul><ul><li>Eurovoc mapping involving Gemnet and others (Doug) </li></ul></ul><ul><ul><li>Aligning all these resources may be a non-trivial issues </li></ul></ul><ul><ul><ul><li>Organise a seminar joining all projects </li></ul></ul></ul><ul><ul><li>Whitepaper on multilingual issues as a starting point (Milena, Martin, CACAO, </li></ul></ul><ul><ul><li>CERL has produced a thesaurus </li></ul></ul><ul><ul><li>Subject terms and concepts are harder than place names and the like </li></ul></ul><ul><ul><li>Problem of differing standards </li></ul></ul>
  19. 19. WG3.2 03 April - 5 <ul><li>[Multilingual issues] </li></ul><ul><ul><li>Whitepaper on multilingual services provided to Europeana as a starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki </li></ul></ul><ul><ul><li>-> Seminar adjacent to the September meeting </li></ul></ul><ul><ul><li>Technology watch </li></ul></ul>
  20. 20. WG3.2 03 April - 6 <ul><li>Linked data at our disposal (quite restricted) </li></ul><ul><ul><li>Link at ingestion and updating time rather than dynamically in query context (-> use a Europeana cache for pointers -> surrogate model?) </li></ul></ul><ul><ul><li>DBPedia (pivotal resource for multilingual operations!) </li></ul></ul><ul><ul><li>Language repository </li></ul></ul><ul><ul><li>Geonames </li></ul></ul><ul><ul><li>LCSH </li></ul></ul><ul><ul><li>Rameau (use MACS and CrissCross to provide mappings) </li></ul></ul><ul><ul><li>VIAF </li></ul></ul><ul><ul><li>ETB </li></ul></ul><ul><ul><li>B ut: Metadata provided will contain links to other resources, and typically not URIs </li></ul></ul>
  21. 21. WG3.2 03 April - 7 <ul><li>Typing relations ...! Including language tags again </li></ul>
  22. 22. WG3.2 03 April – Conclusion (1) <ul><li>Semantics: Rosetta Stone metaphor with two types of functionality </li></ul><ul><ul><li>Context of surrogates </li></ul></ul><ul><ul><li>Contextual groupings </li></ul></ul><ul><ul><li>Open: typing relations </li></ul></ul>
  23. 23. WG3.2 03 April – Conclusion (2) <ul><li>Multilingual Issues </li></ul><ul><ul><li>Linguistic info pertaining to each attribute is a basic requirement – possible? </li></ul></ul><ul><ul><li>Surrogate model doesn’t account for language, also regarding diachronic aspects </li></ul></ul><ul><ul><li>Scope: Query expansion + translation + query formulation aids </li></ul></ul><ul><ul><li>Whitepaper on multilingual services provided to Europeana as a starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki </li></ul></ul><ul><ul><li>-> Seminar bringing together all initiatives and projects adjacent to the September meeting </li></ul></ul>

×