090925 Data Transformation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

090925 Data Transformation

  1. 1. Guidelines for Interoperability in Tourism Data Transformation Wolfram Höpken [email_address] NoFrills Travel & Technology Expo, Bergamo 25.09.2009
  2. 2. Data transformation – structured data mapping <ul><li>Structured data mapping </li></ul><ul><ul><li>Schema mapping : Establishing mappings between local data sources (or database schemas) </li></ul></ul><ul><ul><li>Datasource-to-ontology mapping : Establishing mappings between a datasource and an ontology </li></ul></ul><ul><li>Mapping languages </li></ul><ul><ul><li>Should be fully declarative in order to </li></ul></ul><ul><ul><ul><li>efficiently define and describe mappings </li></ul></ul></ul><ul><ul><ul><li>discover inconsistencies and ambiguities in mappings </li></ul></ul></ul><ul><ul><li>Examples: XSLT, D2R map, R2O </li></ul></ul>
  3. 3. Data transformation – structured data mapping <ul><li>Types of clashes between data sources </li></ul><ul><ul><li>Different naming : Equivalent concepts have different names in different datasources (fully mappable) </li></ul></ul><ul><ul><li>Different position : Equivalent concepts have different positions within the structure of the datasource (fully mappable) </li></ul></ul><ul><ul><li>Different scope of concepts: Concepts, containing the same piece of information in different datasources, have different scopes, i.e., the same piece of information might be represented as single concept or as part of several concepts (fully mappable) </li></ul></ul><ul><ul><li>Different abstraction levels : The same information is represented on different levels of abstraction (partially mappable) </li></ul></ul><ul><ul><li>Different granularity : The same information is represented on different levels of granularity (partially mappable) </li></ul></ul><ul><ul><li>Missing concept : A concept in one datasource has no counterpart in the other datasource (not mappable) </li></ul></ul>
  4. 4. Data transformation – structured data mapping <ul><li>Short-term recommendations (1–3 years) </li></ul><ul><ul><li>Use (graphical) mediation tools that automatically map two different data structures </li></ul></ul><ul><ul><li>Introduce reasoning capabilities within resource mediation tools to automatically suggest inconsistencies </li></ul></ul><ul><li>Long-term recommendations (3–10 years) </li></ul><ul><ul><li>Use semantic web technologies (e.g. based on RDF) to name and represent (data) resources on the Web so that mapping can be automatically undertaken </li></ul></ul><ul><ul><li>Foster high level general ontologies to describe particular domains of interest so that low-level more concrete ontologies can later be linked or merged within the (general) structure </li></ul></ul>
  5. 5. Data transformation – semantic annotation <ul><li>Semantic annotation </li></ul><ul><ul><li>Adding meaning to unstructured, semi-structured or structured content (html documents, word documents, video or audio content, etc.) </li></ul></ul><ul><ul><li>Based on ontologies as referenced semantic </li></ul></ul><ul><li>Tagging </li></ul><ul><ul><li>User-generated semantic annotation </li></ul></ul><ul><ul><li>Often based on taxonomies </li></ul></ul><ul><li>Folksonomies </li></ul><ul><ul><li>Community-generated taxonomies </li></ul></ul><ul><ul><li>Especially used for annotation of user-generated content </li></ul></ul>
  6. 6. Data transformation – semantic annotation <ul><li>Short-term recommendations (1–3 years) </li></ul><ul><ul><li>Build graphic manual annotation tools that enable transparent semantic annotation and automatic generation of correspondent source code </li></ul></ul><ul><li>Long-term recommendations (3–10 years) </li></ul><ul><ul><li>Support natural language processing annotation techniques </li></ul></ul>
  7. 7. Data transformation – automatic information extraction <ul><li>Information extraction </li></ul><ul><ul><li>Structuring unstructured data in a way that it can be automatically analysed, queried and integrated with structured data sources </li></ul></ul><ul><ul><li>Automatic identification of selected types of entities, relations, or events in free text </li></ul></ul><ul><li>Named entity recognition </li></ul><ul><ul><li>Explication of references to organisations, institutions, facilities, places, etc. </li></ul></ul><ul><ul><li>Machine learning techniques like maximum entropy or hidden markov </li></ul></ul><ul><ul><li>Current approaches reach up to 90% precision </li></ul></ul><ul><li>Event extraction </li></ul><ul><ul><li>Normally template-based extraction of information, built on top of named entity recognition approaches </li></ul></ul>
  8. 8. Data transformation – automatic information extraction <ul><li>Short-term recommendations (1–3 years) </li></ul><ul><ul><li>Foster the use of semantic web technologies to describe non-structured data on the web by the means of resources to make data machine processable </li></ul></ul><ul><li>Long-term recommendations (3–10 years) </li></ul><ul><ul><li>Agree on the labels (preferably with intervention of a recognized body such as the W3C) particular tourism content ought to have, so that it is made visible for search engines </li></ul></ul><ul><ul><li>Develop SW that enables (semi)automatic information annotation according to the previous recommendation </li></ul></ul>