A Description Method for Scientific Data Based on KOS
Upcoming SlideShare
Loading in...5
×
 

A Description Method for Scientific Data Based on KOS

on

  • 653 views

Presentation held by Wei Sun, Xuefu Zhang at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012

Presentation held by Wei Sun, Xuefu Zhang at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012

Statistics

Views

Total Views
653
Views on SlideShare
652
Embed Views
1

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 1

http://aims.fao.org 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A Description Method for Scientific Data Based on KOS A Description Method for Scientific Data Based on KOS Presentation Transcript

  • A DESCRIPTION METHOD FORSCIENTIFIC DATA BASED ONKOSWei Sun, Xuefu ZhangAII, CAAS
  • OUTLINE Background Research Objectives Description Schema for Scientific Data Process for establishing the scientific data description schema Key Technology for Constructing the Scientific Data Description Scheme Empirical analysis Conclusion
  • BACKGROUND Currently, the scientific data haven’t been described enough at depth in current related researches which in turn influence the efficiency of integrated knowledge discovery based on scientific data or based on scientific data and other resources. Resource description is one of important content in Knowledge Organization. Generally KOS can improve the organization and description granularity of scientific data semantically.
  • RESEARCH OBJECTIVES Based on the above considerations, the paper proposes an integrated description method for scientific data based on the Knowledge organization system by referring to the faceted classification and ontology in KOS. By taking the agricultural scientific data as an example, the paper verifies its feasibility which has laid a resource base for further improving the efficiency of integrated knowledge discovery based on scientific data.
  • DESCRIPTION SCHEMA FOR SCIENTIFICDATA Faceted Classification Description for Scientific Data Concept Mapping for Scientific Data Entity
  • DESCRIPTION SCHEMA FOR SCIENTIFICDATA F Dummy Root Node mF1Facet sF1 sF2 T1 T2 T3Term T4 T5 Concept Mapping Term Space of sF2 ( a( Faceted Classification Description Structure ( b( Domain Ontology Structure Fig. 1. Description schema for scientific data
  • PROCESS FOR ESTABLISHING THESCIENTIFIC DATA DESCRIPTION SCHEMASeven steps “confirming the scope of description” “making clear purpose of the description schema” “selection and construction of the facets” “term extraction and construction of the term space” “index for facets, terms and scientific data entities” “concept mapping description structure for scientific data to domain ontology” “maintenance for the scientific data description based on the faceted classification ”
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅰ) Selection and Construction of the Facets  Facet Analysis.  Facet Reduction.
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅱ) Term Extraction  Term Extraction Based on Numeric Attributive Variable.  Term Extraction Based on Text-type Attribute Variable.  Term Extraction Based on Mixed-type Attribute Variable.
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅱ) Construction of the Term Space (Linguistic Value Mapping of the Term Space) Attribute Variable Numeric Attribute Mixed-type Attribute Text-type Attribute Numeric Text-type Attribute Attribute Conceptual Continous Discrete Numeric Attribute Numeric Attribute Attribute Wide Threshold Narrow Threshold Attribute Attribute Mapping After Discretization Mapping DirectlyFig. 2. Attribute division mode and linguistic value mapping for different attributes
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅲ) Index for Scientific Data Entity Rule 1. If the attribute variable extracted from any field is the concept attribute or narrow threshold attribute, and then accurate retrieval can be conducted on concept attribute (or narrow threshold attribute) or the combination of the concept attribute (or narrow threshold attribute) and its facet, then the retrieval result will be regarded as index words for the scientific data entity d.
  • KEY TECHNOLOGY FOR CONSTRUCTING THE SCIENTIFIC DATA DESCRIPTION SCHEME (Ⅲ) Index for Scientific Data Entity Rule 2. If the attribute variable extracted from any field is wide threshold attribute variable or non-interval continuous attribute, and then mapping should be conducted on the attributes as well as the interval value of the faceted classification structure. As a result, the interval value mapped in the faceted classification structure will be regarded as an index for the scientific data entity d.
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅲ) Index for Scientific Data Entity Rule 3. If the attribute variable extracted from any field is continuous interval attribute, and then middle-point value of the interval attribute should be mapped with the interval value in the facet classification structure. As a result, the interval value mapped in the faceted classification structure will be regarded as an index for the scientific data entity d.
  • KEY TECHNOLOGY FOR CONSTRUCTINGTHE SCIENTIFIC DATA DESCRIPTIONSCHEME (Ⅳ) Concept Mapping for Scientific Data Entity  Upward Matching Principle for Scientific Data Description Term.  Downward Matching Principle for Term in Domain Ontology.
  • EMPIRICAL ANALYSIS Data Resource Table 1. Name list of the scientific data resources Name of the sub-database Obtained No. Data class Name of the database affiliated to the database time Agricultural resource and Database for leaf vegetable Sub-database of the database 1 2009.10 environmental pests in China for national agricultural pests science Agricultural Sub-database of the database resource and Database for the water 2 for national irrigation 2009.10 environmental demands of the reference crop experiment science Database for the dynamic Database for the dynamic Agricultural 3 development of China’s development of agricultural 2009.10 science base agricultural science science
  • EMPIRICAL ANALYSIS Result Demonstration for Method Separator between Each Level of Facets A Facet Identifier of Facet’s Specific Location Fig. 3. Example of faceted logic structure description document for scientific data
  • EMPIRICAL ANALYSIS Result Demonstration for Method Separator between Each Level of Terms Identifier of Term’s Specific Location Term Fig. 4. Term logic structure description document after linguistic value mapping
  • EMPIRICAL ANALYSIS Result Demonstration for Method The Name of a Scientific Data Entity Descriptor A Term Fig. 5. Screenshot for the index result of a scientific data entity
  • EMPIRICAL ANALYSIS Table 2. Statistical table for the concept mapping among different types of scientific data entity Mapping match Mapping matchclassification based Mapping Matchin classification based onon the matching intensity g number the conceptual leveldegree Neighbor matching Complete matching strong 2473 Relatively Neighbor matching Incomplete matching 3050 strong Relatively Distant matching Complete matching 231 weak Distant matching Incomplete matching weak 788
  • CONCLUSION The integrated description method for scientific data put forward in the paper has improved the description efficiency for the scientific data no matter from the description granularity or from the affinity of the entity semantic relation. Limited by time, length and experimental conditions, there are still certain shortages in this research.  (i) The method is only testified with the agricultural scientific data, further improvement and verification on the method should be conducted based on other fields or wider data sources.  (ii) Semantic web hasn’t been adopted to conduct standardized process on the description result. Ontology will be introduced into the next step, and standardized process will be conducted on the description result by employing RDF.
  • Thanks!