IFE-MT: An English-to-Yorùbá Machine Translation System

3,429 views
3,263 views

Published on

© Eludiora, S. I., Salawu, S. A., Odejobi, O. A. & Agbeyangi, A.O.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,429
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

IFE-MT: An English-to-Yorùbá Machine Translation System

  1. 1. IFE-MT: An English-to-Yorùbá Machine Translation System*Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and *Agbeyangi, A.O. *Department of Computer Science & Engineering +Dept. of Linguistics & African Languages Obafemi Awolowo University, Ile-Ife, Nigeria AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 1
  2. 2. In this Presentation..1) Introduction2) Theoretical Issues a) Features of English & ba languages b) Machine translation process3) Practical issues a) Data acquisition b) system design c) software development d) system implementation AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 2
  3. 3. IntroductionMachine translation (MT): is the application ofcomputers to the task of translating texts or speechesfrom one natural language to another (Blank, 1998).An English to ba (E-Y) MT system translatesEnglish text to ba text. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 3
  4. 4. MT Conceptualisation AGIS11 UNECA CONFERENCE 1-2 DEC. 4 2011
  5. 5. MT Paradigm 1)Text → Text 2)Speech → Speech 3)Text → Speech 4)Speech → TextAGIS11 UNECA CONFERENCE 1-2 DEC. 2011 5
  6. 6. Research TheoryTheories/Assumptionsa) ba expression moves from concrete to abstract, but English expression moves from abstract to concrete.b) Natural language has at most 400 active words.c) Turing test theory for Evaluation (is a test of a machine’s ability to exhibit intelligent behavior): Using Mean opinion score AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 6
  7. 7. Features of English & Yoruba languagesENGLISHStressed Tone language Record(N) Record(V) Agba Commit(N) commit(V) Read(pr ) read (past) gba mọIntonation time Syllable timed He found it on the street? Baba How did you ever escape?Orthography OrthographyNon –phonetic Almost phonetico enough gba Ẹdẹ FishLarge resources language Low resources languageInflectional Non-Inflectional Wait | Waits | waited | waiting o ro | ti ro ro Go | Goes | Went | Gone | going o lọ | ti lọ lọGrammatical Structure Grammatical StructureSubject Verb Object (SVO) Subject Verb Object (SVO) The boy nrin ao old man lagba AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 7
  8. 8. English to ba Machine Translation System Challenges1) The translation process the two languages are SVO, but not straight forward (cultural bounded words and concepts)2) Domain selection problem3) Lack of language resources4) Orthography typesetting problem AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 8
  9. 9. Language resources challengeSources Correct Parallel Digital Domain annotated size Textual orthography Corporal/quali Specific tyResources Not fully Available/poor Available General (Not Not annotated Large enough Text formon the dialectically quality e.g. The domainInternet marked and Jehovah specific) punctuated WitnessReligious Divergent Contextually Mostly Specific Not annotated large Mostly textbooks or deficient e.g. hardcopy (religious)documents The Jehovah WitnessNigerian Poor Not available Not all are Not domain Not annotated small All are innewspapers digitalized specific text formThe radio & Not in text form Speech/poor Available General Not applicable Large enough Non-TV (Media) translation in textual magnetic discGovernment Mostly English Not available Available Multiple Not annotated Sizeable Text formdocuments in English domains volumeTextbooks/ Mostly Not available Not all are Specific Not POS Sizeable Text formmanuals/rep English digitized annotatedorts AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 9
  10. 10. Database Design Cont.Data 1: Sentences are systematically collected using home environment terminologies (Domain)Data 2: Lexical items extracted from Data 1Data 3: Data 1 and Data 2 annotations : POS tagsData 4: Data 3 represented using the format designed for MT translation Database AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 10
  11. 11. Lexicon database AGIS11 UNECA CONFERENCE 1-2 DEC. 11 2011
  12. 12. Database Design Cont. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 12
  13. 13. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 13
  14. 14. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 14
  15. 15. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 15
  16. 16. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 16
  17. 17. Software Development and Implementation ProcessSoftware tools: a) Python b) PyQt c) NLTK AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 17
  18. 18. Parser Natural Language Toolkit (NLTK)Ade sat on the chair Ade jokoo sori aga naa(S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the) (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori))(N chair)))) (N aga) (Det naa)))) AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 18
  19. 19. Program CodingSoftware Modules: a) Library b) Parser c) GUI AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 19
  20. 20. Software Demonstrationa) basic SVO sentencesb)qualified subject/object SVO sentencesc) modified verb SVO sentences AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 20
  21. 21. Software Demonstrationhttp://www.ifecisrg.org/IfeMT AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 21
  22. 22. ConclusionIn this presentation, I have discussed:Theoretical and practical issues relating to our IFE-MT developmentDatabase design, Library designSoftware development process, and Program codingThe IFE-MT software was demonstratedWe are now updating the database and evaluating the MT system using mean opinion score. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 22
  23. 23. Some Related Work Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to tackle agreement and word-ordering in english-arabic machine translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings /Accepted%20Refereed%20Papers/C43.pdf Anand, K. M., Dhanalakshmi, V., Soman, K.P. and Rajendran, S., (2010), A Sequence Labeling Approach to Morphological Analyzer for Tamil Language, International Journal on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP 1944-1951 Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine Translation semantic mapper”, International Journal of Engineering Science and Technology Vol. 2(10), PP 5313-5318 AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 23
  24. 24. Related Work Cont. Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of Noun Phrases from Punjabi to English”, International Journal of Computer Science Issues, Vol. 7, Issue 5, September, ISSN (Online):1694-0814 Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based machine translation for Swedish to Danish”, In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 27–33, Alicante. Tyers, F. M. (2010), “Rule-based Breton to French machine translation”, European Association for Machine Translation, EAMT May 2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010- Tyers.pdf) AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 24
  25. 25. ReferencesBlank, D. (1998), Definition of Machine Translation, http://www.macalester.edu/courses/russ65 /definiti.htm [Accessed 02/10/2010] AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 25
  26. 26. Thank you for listening AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 26

×