Mapping FAO’s AGROVOC Thesaurus and the Chinese Agricultural Thesaurus (CAT) Anita Liang July 26, 2005 Sixth AOS Workshop Vila Real, Portugal
Characteristics of the terminologies
To draw equivalences between corresponding concepts within the two agricultural terminologies
To enrich and structurally improve both sources
The goals CAT Chinese world view MAPPING AGROVOC English world view
Multilinguality : improved language coverage
Domain coverage : improved domain coverage
Interoperability : IS, applications
a mapping file that links corresponding concepts from CAT to AGROVOC
a list of modifications to be applied to AGROVOC that serve both to improve its content and to provide a valid mapping
27736 English terms: 16769 descriptors , 10967 non descriptors
25060 Chinese terms: 16628 descriptors , 8432 non descriptors
It is hierarchically structured with BT/NT relations. It has associative relations RT and UF/USE, as well as UF+.
Chinese terms: 64638 (51614 descriptors, 13024 non-descriptors)
English terms: 90% descriptors (10% have both English and Latin translations or no translations), no non-descriptors
It is hierarchically structured with BT/NT relations and contains associative relations RT, UF/USE.
Definitions: Mapping v. integration
integration of different sources into a single unified thesaurus, may involve complete restructuring of both sources, recoverability and integrity of original sources less a priority than the overall logical consistency of the integrated product
mapping of one source to the other, i.e., sources are revised, but each retains their original structure, mutual consistency is desirable but less a priority than establishing approximate equivalences
The source vocabulary is CAT; the target vocabulary is AGROVOC.
Mapping means linking an entry in the source vocabulary to an entry in the target vocabulary.
A term is a lexical representation of a concept.
An entry in CAT consists of the Chinese term and any English translation(s) along with its relations to other entries. An entry in AGROVOC consists of at least one English or Chinese term along with their translations as well as its relations to other entries.
==> entry = concept
Mapping between entries/concepts AGROVOC CAT zh term en term zh term en term fr term es term mapping CAT_ID = 123 (CAT termcode) AGROVOC_ID = 345 (AGROVOC termcode)
What we have: RDBMS
AGROVOC scheme (MySQL)
CAT scheme (MySQL)
What we need: RDF(S)-based
Guidelines: General (1/4)
Entries should be mapped irrespective of their status as descriptors or non-descriptors
Mappings should be between entry IDs, not term IDs.
Many to one: multiple CAT entries can be mapped to the same entry in the target vocabulary
One to many: an entry in CAT can be mapped to one or more entries in the target vocabulary
Mapping relations are based on SKOS Mapping relations and should include only the following:
AND, OR, NOT
Guidelines: Source/Target Modifications (2/4)
When a gap occurs in either vocabulary because the corresponding term is missing, the term should be added to the appropriate vocabulary.
When a gap occurs in the target vocabulary because the concept does not exist :
If there is no parent in the target vocabulary to which it could be matched, then add the concept to the target vocabulary. Add the Chinese even if the English does not exist. Try to put relations where possible. Then do an exact mapping.
Guidelines: Source/Target Modifications (3/4)
Wrong translations should be fixed in both sources.
Inconsistencies should be fixed within the terminologies
cat_ zh1 BT cat_ zh2
agr_ zh1 UF agr_ zh2
Conflicting semantics should be fixed within the terminologies
cat_ zh1 BT cat_ zh2 cat_ en1 BT cat_ en2
agr_ zh1 NT agr_ zh2 agr_ en1 NT agr_ en2
Guidelines: Source/Target Modifications (4/4)
If two source entries need to be added to target vocabulary (but they have the same English translation), put a scope note or a definition to explain the difference.
If CAT entry A and AGROVOC entry B mean the same thing, i.e., are synonymous, they should be exact matches.
e.g., zh1 and zh2 are synonyms
cat_zh1 / cat_en1 agr_zh2 / agr_en1
Mapping: broaderMatch B is a concept that exists in CAT but not in AGROVOC
solution 1) broaderMatch
solution 2) add the concept in the target (only in the original language) and do an exactMatch
a_A c_A c_B broaderMatch exactMatch
Mapping: narrowMatch A is a concept that exists in CAT but not in AGROVOC
solution 1) narrowMatch
solution 2) add the concept a_B in the target (only in the original language) and do an exactMatch
Mapping: inheritance (3/3) In case of partial inheritance, do not map single children (fig. 1) , but map the parent and exclude using NOT the entries that should not be mapped (fig. 2). c_fo a_fo a_B a_D a_C NOT c_fo a_fo a_B a_D a_C
The output (1/2) <c:Concept ID=“uri”> <prefLabel lang=“zh”> 中国 </prefLabel> <map:exactMatch> <a:Concept ID =“uri”> <prefLabel lang=“en”>China</prefLabel> </a:Concept> </map:exactMatch> </c:Concept>
Application JSP Page cow search search terms FAOBIB AGRIS (chinese) RDF mapping results AGROVOC RDF search records search records CAT RDF CAAS Bibliogr. DB
“ interlingua”: language independence - mapping is oriented towards source terminology
Set theory metaphor: Difficult to put into practice
Both terminologies are multilingual in overlapping languages - what is being mapped?