Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case
1. AIMS
Is ISO 639 enough for a multilingual
thesaurus?
The AGROVOC case
Caterina Caracciolo, Gudrun Johannsen, Lavanya Kiran,
Johannes Keizer
Food and Agriculture Organization of the UN
AOS 2012
Sept 4. 2012 - Kuching (MY)
2. Background
⢠AGROVOC is published in 21 languages + other
under development
⢠Multilinguality has always been an issue
⢠Since the beginning, multilinguality was
interpreted as âtranslationâ:
â One hierarchy of terms (one structure),
translations in various languages
⢠This organization remained with the move
from a term-centered to a concept-centered
resource
9/19/2012 2
3. AGROVOC as object-centered
resourceâŚ
⢠Being mainly a resource for document
indexing in the area of agriculture, it contains
large amount of words referring to plants,
animals, food in general
9/19/2012 3
4. # of concepts below top concepts
organism
substances
entities
phenomena
activities
products
methods
properties
features
objects
resources
subjects
systems
locations Series1
groups
measures
state
stages
technology
processes
factors
time
events
site
strategies
9/19/2012 4
0 5000 10000 15000 20000 25000
10. Requirements for rendering
multilinguality in AGROVOC
1. Unambiguously express the geographic area
where a given word is used
â specification of the area of use of a given word
should be optional.
2. No limitations on the type of area allowed
â Countries, groups of countries, geographical or
administrative regions should be equally available
for specification.
9/19/2012 KISAF, Rome 10
11. AGROVOC as a SKOS resource
⢠skos:Concept is to indicate a group of words in
various languages, to be considered translations of
one another
⢠URI are kept âabstractâ to emphasize independence
of the concept from language
â E.g. http://aims.fao.org/aos/agrovoc/c_12332
⢠The words grouped are then labels of the given
concept
9/19/2012 11
12. SKOS properties to express terms
⢠skos:prefLabel, skos:altLabel
â take plain literals as values
â and an optional language tag expressed by XML
attribute xml:lang
⢠skosxl:prefLabel, skosxl:altLabel
â Take entities with URIs, so extra infomation be
attached to labels
9/19/2012 12
13. AGROVOC uses ISO 639 2 digits
to tag languages in xml:lang
⢠ISO 639 provides codes for languages
independently of
â the country where they are spoken:
⢠Spanish, Basque (same country, both official languages)
⢠Dutch, Flamish (different country, similar enough
languagesâŚ)
â And their status: French and Breton (same
country, Breton has no status)
⢠Only one code for English, SpanishâŚ
⢠Limitations shown from previous examples
9/19/2012 KISAF, Rome 13
15. Is ISO 639 3 digits an option?
⢠More languages are included
â More contemporary languages
⢠Bemba language
â âOldâ languages (no longer spoken)
⢠Old French (842ca-1400)
â Groups of languages
⢠Cuacasian languages
â Artificial languages
⢠Same approach as the 2 digit version
9/19/2012 KISAF, Rome 15
16. Is IETF an option?
⢠Internet Engineering Task Force (IETF)
⢠IETF 5646 Tags for identifying languages
â Basis is ISO for languages (639)
â Subtags from ISO for countries (3166), ISO for
scripts (15924)
⢠Examples:
â tr-CY = Turkish from Cyprus
â zh-Hant-HK = Chinese in traditional Chinese script
9/19/2012 KISAF, Rome 16
17. Is a relational approach an option?
⢠Keep tagging approach to mark the language
â Use ISO 639 or IETF
⢠And introduce a relational notion of âwhere a
given word is usedâ
⢠Link together a concept representing a
geographic area, and the object to name
â E.g., Kiwicha isNameUsedInRegion Cusco
⢠Aim at âstandardâ relationsâŚ
9/19/2012 KISAF, Rome 17
18. Conclusions?
⢠This is work in progress
⢠We continue working out use cases, especially
from Spanish and Portuguese
⢠Assess alternatives
9/19/2012 KISAF, Rome 18