Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization


Published on

The emergence in the last years of initiatives like the Linked Open Data (LOD) has led to a significant increase of the amount of structured semantic data on the Web. Nevertheless, the wider reuse of such public semantic data is inhibited by the difficulty for users to decide whether a given dataset is actually suitable for their needs. This is because semantic datasets typically cover diverse domains, do not follow a unified way of organizing the knowledge and may differ in a number of dimensions. With that in mind, in this paper, we report our work in progress on a goal-driven dataset summarization approach that may facilitate better understanding and reuse-oriented evaluation of available semantic data.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

  1. 1. Towards Purposeful Reuse of Semantic Datasets via Goal-Driven Data Summarization Panos Alexopoulos, Jose Manuel Gomez Perez 6th International Conference on Advances in Semantic Processing Porto, Portugal, October 3rd, 2013
  2. 2. Introduction The Linked Data Use Challenge Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. 2
  3. 3. Introduction Motivating Scenario ●Assume that some entity (individual or organization) wants to reuse public semantic datasets from the Web to: ● Enrich with them its own data. ● Use the data to provide added-value services to its users/clients. ●These organizations can be: ● Technology providers (e.g. iSOCO) ● Information providers (e.g. publishers, media, etc.) ● Knowledge-driven and knowledge-intensive organisations 3
  4. 4. Data Enrichment Why Data Reuse? ●The problem with semantic data is the high amount of time and effort required to construct and maintain it. ●The reuse of existing public semantic data can (partially) alleviate this problem: ● Their volume and diversity are increasing at high rates. ● Their maintenance and evolution is the responsibility of their publishers, reducing the required efforts and costs for this task in the organization's side 4
  5. 5. Data Enrichment Example ●A news organization wants to create and maintain a knowledge base about European Football. ●The pace at which this knowledge changes is quite fast meaning that the organization needs to constantly monitor these changes and update the data. ●Much of this information is already available as public semantic data (e.g. DBPedia). ●Thus it could be better for the organization to reuse this public data instead of creating them from scratch. 5
  6. 6. Data Enrichment Barriers to Data Reuse ● Difficulty for knowledge engineers to decide whether a given dataset is actually suitable for their needs. ● Semantic datasets typically cover diverse domains ● They do not follow a unified way of organizing the knowledge ● Differ in a number of features including size, coverage, granularity and descriptiveness ● This makes difficult the following tasks: ● Assessing whether a dataset satisfies particular requirements ● Comparing different datasets to select which one is more suitable for a given purpose. 6
  7. 7. Data Reuse Our Approach ●We suggest the provision of the ability to data consumers to derive semantic data summaries. ●Existing summarization approaches treat the summarization task in an application and user independent way. ●By contrast, we are interested in facilitating the generation of requirements-oriented and goal-driven summaries that may be significantly more helpful to users. 7
  8. 8. Goal-Driven Semantic Data Summarization Problem Description ●Key question: “Given an application scenario where semantic data is required, how suitable is a given existing dataset for the purposes of this scenario?” ●To answer this, users normally need to be able to: 1. Explicitly express the requirements that a dataset needs to satisfy for a given task or goal. 2. Automatically measure/assess the extent to which a dataset satisfies each of these requirements and compile a summary report. 8
  9. 9. Goal-Driven Semantic Data Summarization Approach ●To implement these two capabilities we follow a checklist-based approach. ●Checklists are practically lists of action items arranged in a systematic manner that allow users to record the completion of each of them. ●They are widely applied across multiple industries, like healthcare or aviation, to ensure reliable and consistent execution of complex operations. ●In our case we apply checklists to define and execute custom dataset summarization tasks in the form of lists of goal-specific requirements and associated summarization processes. 9
  10. 10. Goal-Driven Semantic Data Summarization Summarization Task Representation ●To represent custom summarization tasks according to the checklist paradigm we have adopted the Minim model. ●This defines the following information: ● The Goals the dataset summarization task is designed to serve ● The Requirements against which the summarization task evaluates the dataset. ● The Data Analysis Operations that the summarization task employs in order to assess the satisfaction of its requirements 10
  11. 11. Goal-Driven Semantic Data Summarization Example Goals ●Decide if a dataset is appropriate for a Semantic Annotation scenario. ●Decide if a dataset is appropriate for a Question Answering scenario ●Determine which of two or more similar datasets best represent a given corpus. ●Detect arising inconsistencies or other quality problems. ●… 11
  12. 12. Goal-Driven Semantic Data Summarization Example Requirements ●Evaluate the dataset’s coverage of a particular domain/topic: Aims to measure the extent to which a dataset describes a given domain or topic. ●Evaluate the dataset’s labeling adequacy and richness: Aims to measure the extent to which the dataset’s elements (concepts, instances, relations etc.) are accompanied by representative and comprehensible labels, in one or more languages. ●Evaluate Connectivity: This requirement checks the existence of paths between concepts or entities, i.e. whether it is possible to go from a given concept to another on the graph and in what ways. 12
  13. 13. Goal-Driven Semantic Data Summarization Example Data Operations ●Check the existence of a particular element (concept, relation, attribute, instance, axiom) in the dataset. ●Check the dataset’s consistency (e.g. by running a reasoner). ●Measure the number of ambiguous entities in the dataset. ●Measure the number of labeled entities. 13
  14. 14. Goal-Driven Semantic Data Summarization Application Example ●We applied our framework to assess the suitability of public datasets for the purposes of reusing to semantically annotate texts describing football matches from the Spanish League. ●For that, we wanted the dataset to be reused to ● Contain information about all the current teams of the Spanish football league. ● All its entities to have at least one associated label and ● To relate teams with the players that current play in them. 14
  15. 15. Goal-Driven Semantic Data Summarization Defined Summarization Task 15
  16. 16. Goal-Driven Semantic Data Summarization Resulting Summary ● We executed this task against DBPedia and Freebase, automatically producing the following summary report ● The system provides a yes/no answer as to whether each dataset satisfies each requirement but also additional information on why this may or may not be the case. ● This is important because: ● A requirement might not be satisfied because of a high threshold ● A requirement might seem to be satisfied, yet that might not be actually true. 16
  17. 17. Ongoing Work Summary Generation Tool ● We are currently developing a summarization tool that enables the definition manipulation and execution of summarization tasks as well as the dashboard-like visualization of their output 17
  18. 18. Thank you! Questions? Dr. Panos Alexopoulos Semantic Applications Research Manager Quieres innovar? (t) +34 913 349 797 iSOCO Barcelona iSOCO Madrid iSOCO Pamplona iSOCO Valencia iSOCO Colombia Av. Torre Blanca, 57 Edificio ESADE CREAPOLIS Oficina 3C 15 08172 Sant Cugat del Vallès Barcelona, España (t) +34 935 677 200 Av. del Partenón, 16-18, 1º7ª Campo de las Naciones 28042 Madrid España (t) +34 913 349 797 Parque Tomás Caballero, 2, 6º4ª 31006 Pamplona España (t) +34 948 102 408 C/ Prof. Beltrán Báguena, 4 Oficina 107 46009 Valencia España (t) +34 963 467 143 Complejo Ruta N Calle 67, 52-20 Piso 3, Torre A Medellín Colombia (t) +57 516 7770 ext. 1132 Key Vendor Virtual Assistant 2013 18