Your SlideShare is downloading. ×
Caracciolo et al_2012_aos_agrovoc_multilinguality
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Caracciolo et al_2012_aos_agrovoc_multilinguality

122
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
122
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • DCMI initiative!
  • Transcript

    • 1. AIMSIs ISO 639 enough for a multilingual thesaurus? The AGROVOC case Caterina Caracciolo, Gudrun Johannsen, Lavanya Kiran, Johannes Keizer Food and Agriculture Organization of the UN AOS 2012 Sept 4. 2012 - Kuching (MY)
    • 2. Background• AGROVOC is published in 21 languages + other under development• Multilinguality has always been an issue• Since the beginning, multilinguality was interpreted as “translation”: – One hierarchy of terms (one structure), translations in various languages• This organization remained with the move from a term-centered to a concept-centered resource9/5/2012 2
    • 3. AGROVOC as object-centered resource…• Being mainly a resource for document indexing in the area of agriculture, it contains large amount of words referring to plants, animals, food in general9/5/2012 3
    • 4. # of concepts below top concepts organism substances entitiesphenomena activities products methods properties features objects resources subjects systems locations Series1 groups measures state stages technology processes factors time events site strategies9/5/2012 4 0 5000 10000 15000 20000 25000
    • 5. Differentiating languages• Salmon (en)• Salmón (es)• лососи (ru)9/5/2012 5
    • 6. But distribution of languages may be wide…9/5/2012 6
    • 7. … and names of food tend to vary…Aguacate Palta 9/5/2012 7
    • 8. … and names of food tend to vary… Ataco morado, sangorache, sergorache, hawarchaAchis,Coyos (Cajamarca),Achita (Ayacucho), Coime, coimi,Kiwicha (Cusco) cuimi, millmi 9/5/2012 8
    • 9. Not only food names vary9/5/2012 9
    • 10. Requirements for rendering multilinguality in AGROVOC1. Unambiguously express the geographic area where a given word is used – specification of the area of use of a given word should be optional.2. No limitations on the type of area allowed – Countries, groups of countries, geographical or administrative regions should be equally available for specification.9/5/2012 KISAF, Rome 10
    • 11. AGROVOC as a SKOS resource• skos:Concept is to indicate a group of words in various languages, to be considered translations of one another• URI are kept “abstract” to emphasize independence of the concept from language – E.g. http://aims.fao.org/aos/agrovoc/c_12332• The words grouped are then labels of the given concept9/5/2012 11
    • 12. SKOS properties to express terms• skos:prefLabel, skos:altLabel – take plain literals as values – and an optional language tag expressed by XML attribute xml:lang• skosxl:prefLabel, skosxl:altLabel – Take entities with URIs, so extra infomation be attached to labels9/5/2012 12
    • 13. AGROVOC uses ISO 639 2 digits to tag languages in xml:lang• ISO 639 provides codes for languages independently of – the country where they are spoken: • Spanish, Basque (same country, both official languages) • Dutch, Flamish (different country, similar enough languages…) – And their status: French and Breton (same country, Breton has no status)• Only one code for English, Spanish…• Limitations shown from previous examples9/5/2012 KISAF, Rome 13
    • 14. MultilingualityISO 639Languagecodes 9/5/2012 14
    • 15. Is ISO 639 3 digits an option?• More languages are included – More contemporary languages • Bemba language – “Old” languages (no longer spoken) • Old French (842ca-1400) – Groups of languages • Cuacasian languages – Artificial languages• Same approach as the 2 digit version9/5/2012 KISAF, Rome 15
    • 16. Is IETF an option?• Internet Engineering Task Force (IETF)• IETF 5646 Tags for identifying languages – Basis is ISO for languages (639) – Subtags from ISO for countries (3166), ISO for scripts (15924)• Examples: – tr-CY = Turkish from Cyprus – zh-Hant-HK = Chinese in traditional Chinese script9/5/2012 KISAF, Rome 16
    • 17. Is a relational approach an option?• Keep tagging approach to mark the language – Use ISO 639 or IETF• And introduce a relational notion of “where a given word is used”• Link together a concept representing a geographic area, and the object to name – E.g., Kiwicha isNameUsedInRegion Cusco• Aim at “standard” relations…9/5/2012 KISAF, Rome 17
    • 18. Conclusions?• This is work in progress• We continue working out use cases, especially from Spanish and Portuguese• Assess alternatives9/5/2012 KISAF, Rome 18