IFE-MT: An English-to-Yorùbá Machine Translation System

  • 2,319 views
Uploaded on

© Eludiora, S. I., Salawu, S. A., Odejobi, O. A. & Agbeyangi, A.O.

© Eludiora, S. I., Salawu, S. A., Odejobi, O. A. & Agbeyangi, A.O.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,319
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IFE-MT: An English-to-Yorùbá Machine Translation System*Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and *Agbeyangi, A.O. *Department of Computer Science & Engineering +Dept. of Linguistics & African Languages Obafemi Awolowo University, Ile-Ife, Nigeria AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 1
  • 2. In this Presentation..1) Introduction2) Theoretical Issues a) Features of English & ba languages b) Machine translation process3) Practical issues a) Data acquisition b) system design c) software development d) system implementation AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 2
  • 3. IntroductionMachine translation (MT): is the application ofcomputers to the task of translating texts or speechesfrom one natural language to another (Blank, 1998).An English to ba (E-Y) MT system translatesEnglish text to ba text. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 3
  • 4. MT Conceptualisation AGIS11 UNECA CONFERENCE 1-2 DEC. 4 2011
  • 5. MT Paradigm 1)Text → Text 2)Speech → Speech 3)Text → Speech 4)Speech → TextAGIS11 UNECA CONFERENCE 1-2 DEC. 2011 5
  • 6. Research TheoryTheories/Assumptionsa) ba expression moves from concrete to abstract, but English expression moves from abstract to concrete.b) Natural language has at most 400 active words.c) Turing test theory for Evaluation (is a test of a machine’s ability to exhibit intelligent behavior): Using Mean opinion score AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 6
  • 7. Features of English & Yoruba languagesENGLISHStressed Tone language Record(N) Record(V) Agba Commit(N) commit(V) Read(pr ) read (past) gba mọIntonation time Syllable timed He found it on the street? Baba How did you ever escape?Orthography OrthographyNon –phonetic Almost phonetico enough gba Ẹdẹ FishLarge resources language Low resources languageInflectional Non-Inflectional Wait | Waits | waited | waiting o ro | ti ro ro Go | Goes | Went | Gone | going o lọ | ti lọ lọGrammatical Structure Grammatical StructureSubject Verb Object (SVO) Subject Verb Object (SVO) The boy nrin ao old man lagba AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 7
  • 8. English to ba Machine Translation System Challenges1) The translation process the two languages are SVO, but not straight forward (cultural bounded words and concepts)2) Domain selection problem3) Lack of language resources4) Orthography typesetting problem AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 8
  • 9. Language resources challengeSources Correct Parallel Digital Domain annotated size Textual orthography Corporal/quali Specific tyResources Not fully Available/poor Available General (Not Not annotated Large enough Text formon the dialectically quality e.g. The domainInternet marked and Jehovah specific) punctuated WitnessReligious Divergent Contextually Mostly Specific Not annotated large Mostly textbooks or deficient e.g. hardcopy (religious)documents The Jehovah WitnessNigerian Poor Not available Not all are Not domain Not annotated small All are innewspapers digitalized specific text formThe radio & Not in text form Speech/poor Available General Not applicable Large enough Non-TV (Media) translation in textual magnetic discGovernment Mostly English Not available Available Multiple Not annotated Sizeable Text formdocuments in English domains volumeTextbooks/ Mostly Not available Not all are Specific Not POS Sizeable Text formmanuals/rep English digitized annotatedorts AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 9
  • 10. Database Design Cont.Data 1: Sentences are systematically collected using home environment terminologies (Domain)Data 2: Lexical items extracted from Data 1Data 3: Data 1 and Data 2 annotations : POS tagsData 4: Data 3 represented using the format designed for MT translation Database AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 10
  • 11. Lexicon database AGIS11 UNECA CONFERENCE 1-2 DEC. 11 2011
  • 12. Database Design Cont. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 12
  • 13. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 13
  • 14. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 14
  • 15. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 15
  • 16. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 16
  • 17. Software Development and Implementation ProcessSoftware tools: a) Python b) PyQt c) NLTK AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 17
  • 18. Parser Natural Language Toolkit (NLTK)Ade sat on the chair Ade jokoo sori aga naa(S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the) (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori))(N chair)))) (N aga) (Det naa)))) AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 18
  • 19. Program CodingSoftware Modules: a) Library b) Parser c) GUI AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 19
  • 20. Software Demonstrationa) basic SVO sentencesb)qualified subject/object SVO sentencesc) modified verb SVO sentences AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 20
  • 21. Software Demonstrationhttp://www.ifecisrg.org/IfeMT AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 21
  • 22. ConclusionIn this presentation, I have discussed:Theoretical and practical issues relating to our IFE-MT developmentDatabase design, Library designSoftware development process, and Program codingThe IFE-MT software was demonstratedWe are now updating the database and evaluating the MT system using mean opinion score. AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 22
  • 23. Some Related Work Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to tackle agreement and word-ordering in english-arabic machine translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings /Accepted%20Refereed%20Papers/C43.pdf Anand, K. M., Dhanalakshmi, V., Soman, K.P. and Rajendran, S., (2010), A Sequence Labeling Approach to Morphological Analyzer for Tamil Language, International Journal on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP 1944-1951 Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine Translation semantic mapper”, International Journal of Engineering Science and Technology Vol. 2(10), PP 5313-5318 AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 23
  • 24. Related Work Cont. Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of Noun Phrases from Punjabi to English”, International Journal of Computer Science Issues, Vol. 7, Issue 5, September, ISSN (Online):1694-0814 Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based machine translation for Swedish to Danish”, In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 27–33, Alicante. Tyers, F. M. (2010), “Rule-based Breton to French machine translation”, European Association for Machine Translation, EAMT May 2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010- Tyers.pdf) AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 24
  • 25. ReferencesBlank, D. (1998), Definition of Machine Translation, http://www.macalester.edu/courses/russ65 /definiti.htm [Accessed 02/10/2010] AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 25
  • 26. Thank you for listening AGIS11 UNECA CONFERENCE 1-2 DEC. 2011 26