• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mapping FAO’s AGROVOC Thesaurus
 

Mapping FAO’s AGROVOC Thesaurus

on

  • 1,118 views

 

Statistics

Views

Total Views
1,118
Views on SlideShare
1,118
Embed Views
0

Actions

Likes
0
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • To draw equivalences between corresponding concepts within the two agricultural terminologies to enhance their interoperability and the applications that use them To enrich and structurally improve both sources by adding missing terms and translations, correcting erroneous structures,
  • Multilinguality: AGROVOC spans not only the five official languages but others in which considerable work has already been done or is under way. CAT provides an extensive Chinese vocabulary, over 2.5 times larger than the Chinese terms in AGROVOC. Domain coverage: AGROVOC offers breadth in its coverage of the agricultural subdomains. Although CAT deals with a narrower range of agricultural sub-domains than does AGROVOC, it does so in greater depth, extending over to seven levels of granularity, whereas AGROVOC covers up to four levels. Interoperability: any IS that uses one terminology csn be made accessible to the other terminoloigy,
  • What is being mapped: CAT chinese to AG English?
  • What is being mapped: CAT chinese to AG English?

Mapping FAO’s AGROVOC Thesaurus Mapping FAO’s AGROVOC Thesaurus Presentation Transcript

  • Mapping FAO’s AGROVOC Thesaurus and the Chinese Agricultural Thesaurus (CAT) Anita Liang July 26, 2005 Sixth AOS Workshop Vila Real, Portugal
  • Outline
    • The goals
    • Benefits
    • Project outputs
    • Characteristics of the terminologies
    • Definitions
    • Guidelines
    • Mapping
    • Outputs
    • Issues
    • To draw equivalences between corresponding concepts within the two agricultural terminologies
    • To enrich and structurally improve both sources
    The goals CAT Chinese world view MAPPING AGROVOC English world view
  • Benefits
    • Multilinguality : improved language coverage
    • Domain coverage : improved domain coverage
    • Interoperability : IS, applications
  • Project outputs
      • a mapping file that links corresponding concepts from CAT to AGROVOC
      • a list of modifications to be applied to AGROVOC that serve both to improve its content and to provide a valid mapping
  • Comparison
    • AGROVOC:
      • 27736 English terms: 16769 descriptors , 10967 non descriptors
      • 25060 Chinese terms: 16628 descriptors , 8432 non descriptors
      • It is hierarchically structured with BT/NT relations. It has associative relations RT and UF/USE, as well as UF+.
    • CAT:
      • Chinese terms: 64638 (51614 descriptors, 13024 non-descriptors)
      • English terms: 90% descriptors (10% have both English and Latin translations or no translations), no non-descriptors
      • It is hierarchically structured with BT/NT relations and contains associative relations RT, UF/USE.
  • Definitions: Mapping v. integration
    • integration of different sources into a single unified thesaurus, may involve complete restructuring of both sources, recoverability and integrity of original sources less a priority than the overall logical consistency of the integrated product
    • mapping of one source to the other, i.e., sources are revised, but each retains their original structure, mutual consistency is desirable but less a priority than establishing approximate equivalences
  • Definitions (cont’d)
    • The source vocabulary is CAT; the target vocabulary is AGROVOC.
    • Mapping means linking an entry in the source vocabulary to an entry in the target vocabulary.
    • A term is a lexical representation of a concept.
    • An entry in CAT consists of the Chinese term and any English translation(s) along with its relations to other entries. An entry in AGROVOC consists of at least one English or Chinese term along with their translations as well as its relations to other entries.
    • ==> entry = concept
  • Mapping between entries/concepts AGROVOC CAT zh term en term zh term en term fr term es term mapping CAT_ID = 123 (CAT termcode) AGROVOC_ID = 345 (AGROVOC termcode)
  • Working formats
    • What we have: RDBMS
      • AGROVOC scheme (MySQL)
      • CAT scheme (MySQL)
    • What we need: RDF(S)-based
      • SKOS?
      • OWL Lite
  • Guidelines: General (1/4)
    • Entries should be mapped irrespective of their status as descriptors or non-descriptors
    • Mappings should be between entry IDs, not term IDs.
    • Many to one: multiple CAT entries can be mapped to the same entry in the target vocabulary
    • One to many: an entry in CAT can be mapped to one or more entries in the target vocabulary
    • Mapping relations are based on SKOS Mapping relations and should include only the following:
      • Exact
      • Broader/Narrower (subsumption)
      • AND, OR, NOT
  • Guidelines: Source/Target Modifications (2/4)
    • When a gap occurs in either vocabulary because the corresponding term is missing, the term should be added to the appropriate vocabulary.
    • When a gap occurs in the target vocabulary because the concept does not exist :
      • If there is no parent in the target vocabulary to which it could be matched, then add the concept to the target vocabulary. Add the Chinese even if the English does not exist. Try to put relations where possible. Then do an exact mapping.
  • Guidelines: Source/Target Modifications (3/4)
    • Wrong translations should be fixed in both sources.
    • Inconsistencies should be fixed within the terminologies
        • cat_ zh1 BT cat_ zh2
        • agr_ zh1 UF agr_ zh2
    • Conflicting semantics should be fixed within the terminologies
        • cat_ zh1 BT cat_ zh2  cat_ en1 BT cat_ en2
        • agr_ zh1 NT agr_ zh2  agr_ en1 NT agr_ en2
  • Guidelines: Source/Target Modifications (4/4)
    • If two source entries need to be added to target vocabulary (but they have the same English translation), put a scope note or a definition to explain the difference.
  • Mapping: exactMatch
    • If CAT entry A and AGROVOC entry B mean the same thing, i.e., are synonymous, they should be exact matches.
    • e.g., zh1 and zh2 are synonyms
    • cat_zh1 / cat_en1 agr_zh2 / agr_en1
    exactMatch
  • Mapping: broaderMatch B is a concept that exists in CAT but not in AGROVOC
    • solution 1) broaderMatch
    • solution 2) add the concept in the target (only in the original language) and do an exactMatch
    a_A c_A c_B broaderMatch exactMatch
  • Mapping: narrowMatch A is a concept that exists in CAT but not in AGROVOC
    • solution 1) narrowMatch
    • solution 2) add the concept a_B in the target (only in the original language) and do an exactMatch
    a_A c_A narrowMatch a_B exactMatch
  • Problem?
    • CAT has concept { Mathematics } containing nearly 200 narrower terms
    • AGROVOC has concept { Mathematics } with no narrow terms
    • ==> Map all 200 CAT terms as broaderMatch to ag_Mathematics?
  • Mapping: inheritance (1/3)
    • Map every source entries at the most general level in the target vocabulary.
    • Map c_A to a_A
    • Descendants of c_A are by inheritance mapped as descendants of a_A
    • If there are corresponding descendants of c_A and a_A, they should be mapped.
    c_D c_D a_D a_D a_A c_A c_B a_B 1 3 a_A c_A c_B c_C a_B 1 2 3 c_C a_C a_C 2
  • Mapping process: inheritance (2/3)
    • Another type of inheritance:
    • Map c_A to a_A1 with exactMatch
    • If there are corresponding descendants of c_A and a_A, they should be mapped (c_B with a_B2).
    • Descendant of c_B are by inheritance mapped as descendants of a_B2
    c_D c_A c_B c_D 1 exactMatch a_A1 c_C a_B1 3 a_A2 a_B2 2 c_C
  • Mapping: inheritance (3/3) In case of partial inheritance, do not map single children (fig. 1) , but map the parent and exclude using NOT the entries that should not be mapped (fig. 2). c_fo a_fo a_B a_D a_C NOT c_fo a_fo a_B a_D a_C
  • The output (1/2) <c:Concept ID=“uri”> <prefLabel lang=“zh”> 中国 </prefLabel> <map:exactMatch> <a:Concept ID =“uri”> <prefLabel lang=“en”>China</prefLabel> </a:Concept> </map:exactMatch> </c:Concept>
  • Application JSP Page cow search search terms FAOBIB AGRIS (chinese) RDF mapping results AGROVOC RDF search records search records CAT RDF CAAS Bibliogr. DB
  • Issues
    • SKOS mapping
      • “ interlingua”: language independence - mapping is oriented towards source terminology
      • Set theory metaphor: Difficult to put into practice
    • Both terminologies are multilingual in overlapping languages - what is being mapped?
  • Thank you.