• Save
Human Language Technologies for Ethiopian Languages: Challenges and Future Directions
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Human Language Technologies for Ethiopian Languages: Challenges and Future Directions

  • 927 views
Uploaded on

© Solomon Teferra Abate, Binyam Ephrem, Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed-hussen Abubeker and Taye Girma

© Solomon Teferra Abate, Binyam Ephrem, Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed-hussen Abubeker and Taye Girma

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
927
On Slideshare
854
From Embeds
73
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 73

http://aflat.org 69
http://www.aflat.org 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Human Language Technologies for EthiopianLanguages: Challenges and Future Directions Solomon Teferra Abate, Binyam Ephrem, Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed- hussen Abubeker and Taye Girma LIG, Université Joseph Fourier (UJF) ITPhD Program, Addis Ababa University solomon_teferra_7@yahoo.com AGIS11 Conference, Addis Ababa
  • 2. Outline● Ethiopian Languages● Human Language Technology (HLT) – Role in Development – HLT in the World● HLT for Ethiopian Languages – Language and Technology Coverage – Challenges and limitations – Future Directions and Strategies AGIS11 Conference, Addis Ababa
  • 3. Ethiopian Languages● There are about 90 languages● Most belong to the Afro-Asiatic language family● Amharic, Afan-Oromo and Tigringa are the 3 most spoken● Amharic is federal working language – Regions have their own working language – The language policy states that everyone has the right to in his/her mother tongue – More than 20 languages are MOI in primary (I&II) school AGIS11 Conference, Addis Ababa
  • 4. Human Language Technology● Is an interdisciplinary field that encompasses most sub- disciplines of linguistics, Computational Linguistics, Natural Language Processing, computer science, Artificial Intelligence, psychology, philosophy, mathematics and statistics ✔ Morphological analysis/synthesis, ✔ StemmingCovers ASR,✔ ✔ Information Extraction,areas ✔ MT, TTS,✔ ✔ Text/document categorizationlike: OCR, ✔ POS tagging, Spelling and Grammar checking, ✔ ✔ ✔ Parsing, ✔ etc. AGIS11 Conference, Addis Ababa
  • 5. Human Language Technology - Role● Enables ICT products to have knowledge of human language ● Increases the acceptance of the technology and the productivity of its users in the information age● Helps people collaborate, conduct business, share knowledge and participate in social and political debates regardless of language barriers or computer skills● Relevant for the disadvantaged to have access to information: ✔ the illiterate, ✔ the physically impaired population ✔ the rural poor, AGIS11 Conference, Addis Ababa
  • 6. HLT in the World● Well developed for a few languages of the world like English● IBM Watson Computer ● Passed its first test winning a QA competition with $1 M value ● The goal of its design is to have intelligent computer that can interact in a natural language ✔ Understanding any question asked in a natural speech ✔ Answer questions as humans do ● Uses a number of HLT modules such as: ASR, QA, TTS ✗ Requires a lot of expensive servers (about a total of $1 billion) AGIS11 Conference, Addis Ababa
  • 7. HLT in the World● Siri is a simple iphone based system that: ● Receives commands in a natural speech ● Send message ● Schedule meetings ● Place phone calls● Siri has been claimed to: ● understand what you say ● know what you mean ● speak back in a natural speech AGIS11 Conference, Addis Ababa
  • 8. HLT in the World: Europe● Europe is a continent that is united to one multilingual economic country with 23 official languages● To enable the European languages, the European Union: ✔ Invested over €130 M to promote language technologies and language resource infrastructures in 2009-2011 ✔ Allocated €35 M for SME action on Digital Content and Languages and €50 M for Language Technologies in its Work Program 2011-2012 ✔ Proposed a simple platform that enables availability of any online content and services in all European languages AGIS11 Conference, Addis Ababa
  • 9. HLT in the World: South Africa● South African government has identified HLT as a priority area to enable (technologically) its 11 official languages➢ Various R&D projects and initiatives have been funded by government through: ● Department of Arts and Culture (DAC), ● Department of Science and Technology (DST), and ● National Research Foundation (NRF)● The key challenge is fragmentation of R&D activities in HLT ● Addressed by the South African HLT Audit (SAHLTA) AGIS11 Conference, Addis Ababa
  • 10. HLT for Ethiopian Languages● Research on HLT for Ethiopian languages started in the 1990s✔ There are now a lot of (>200) encouraging and valuable works on: ➢ Thesaurus contraction, ➢ ASR, ➢ Stemming, ➢ Text classification ➢ MT ➢ Parsing, ➢ Text categorization, ➢ Text-to-speech, ➢ POS tagging, ➢ Morphological analysis, ➢ OCR, ➢ Spell checking, ➢ Information Extraction✗ Most of them are based on LRs developed for the experiment AGIS11 Conference, Addis Ababa
  • 11. HLT for Ethiopian Languages✗ HLT research covers a limited number of Ethiopian languages HLT for Ethiopian Languages (Masters theses) 25 NLP Speech Processing OCR 20 CSE Research Areas 15 10 5 0 Amharic Afan Oromo Tigringa Welayta Geez Sidama Languages AGIS11 Conference, Addis Ababa
  • 12. Challenges and Limitations● Challenges that hinder Ethiopian HLT include: – lack of language resources: speech and text corpora – Lack of standardized evaluation corpora and platform – lack of expertise on both language and technology – time shortage ● done only for academic achievement in the given time – absence of national HLT research plan - HLT road-map ● based only on individuals interest – lack of sustainable and coordinated research fund AGIS11 Conference, Addis Ababa
  • 13. Challenges and Limitations➔ They have limitations: – use of insufficient and low quality language resource ➢ research results are not conclusive – research results are not well evaluated, analyzed and documented ➢ Their achievements and gaps are vague – research attempts in HLT are fragmented ➢ lack of integration, consolidation and continuity ● Tokenizer POS Parser LA ASR/MT AGIS11 Conference, Addis Ababa
  • 14. Future Directions and Strategies● Is there any other way to escape the cost of the language barrier or to cover it with out HLT in the information age? NO!!!● Are we rich enough to continue spending for only academic exercises? NO!!! – 6 months of at least 10 research students doing their thesis on any one of HLT areas every year and their supervisors – 3 years of at least 6 PhD research students (admitted every year) and their research supervisors – The time of academic researchers doing research for publication purpose (for academic promotion) AGIS11 Conference, Addis Ababa
  • 15. Future Directions and Strategies● Give emphasis and recognition to R&D activities in HLT● Develop national HLT road-map (HLT Audit) – Shows research priorities – Avoids duplication (even across languages) – Reduces R&D cost – Provides a means of evaluation/assessment – Enforces consolidation, integration and continuity – Inspires researchers and developers – Shows the benefit areas for the HLT industry AGIS11 Conference, Addis Ababa
  • 16. Future Directions and Strategies● Establish Institutional/National R&D units – Fund, coordinate and evaluate R&D projects – Store, maintain, distribute language resources and R&D outputs – Promote the utility of R&D outputs – Coordinate and support private industries – Coordinate the cooperation of the academia and the industry – Promote/attract international investments on HLT industries AGIS11 Conference, Addis Ababa
  • 17. Conclusion● We have 85 living languages● All have speakers who need information and the right to get it in a language and the way they understand – HLT is the way to realize it● We need to have a strategy to put it in place – Cooperation across: ● Time: past->present->future ● Language, ● Research area, ● Sector: academic<->industry AGIS11 Conference, Addis Ababa
  • 18. We can make it BYAGIS11 Conference, Addis Ababa