SlideShare a Scribd company logo
1 of 26
IFE-MT: An English-to-Yorùbá
       Machine Translation System

*Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and
                 *Agbeyangi, A.O.




      *Department of Computer Science & Engineering

         +Dept. of Linguistics & African Languages

               Obafemi Awolowo University,
                     Ile-Ife, Nigeria



                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   1
In this Presentation..
1) Introduction
2) Theoretical Issues
   a) Features of English &    ba languages
   b) Machine translation process
3) Practical issues
   a) Data acquisition
   b) system design
   c) software development
   d) system implementation

                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   2
Introduction


Machine translation (MT): is the application of
computers to the task of translating texts or speeches
from one natural language to another (Blank, 1998).



An English to     ba (E-Y) MT system translates
English text to    ba text.


                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   3
MT Conceptualisation




     AGIS'11 UNECA CONFERENCE 1-2 DEC.
                                         4
                    2011
MT Paradigm

                         1)Text → Text
                         2)Speech → Speech
                         3)Text → Speech
                         4)Speech → Text




AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011       5
Research Theory
Theories/Assumptions

a)        ba expression moves from concrete to
     abstract, but English expression moves from
     abstract to concrete.

b) Natural language has at most 400 active words.

c) Turing test theory for Evaluation (is a test of a
   machine’s ability to exhibit intelligent behavior):
   Using Mean opinion score


                AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   6
Features of English & Yoruba languages
ENGLISH
Stressed                                             Tone language
    Record(N) Record(V)
                                                          Agba
    Commit(N) commit(V)
    Read(pr ) read (past)                                  gba
                                                           mọ
Intonation time                                      Syllable timed

    He found it on the street?                            Baba
    How did you ever escape?

Orthography                                          Orthography
Non –phonetic                                        Almost phonetic
o   enough                                                gba
                                                         Ẹdẹ
    Fish

Large resources language                             Low resources language
Inflectional                                         Non-Inflectional
     Wait | Waits | waited | waiting                 o   ro | ti  ro   ro
     Go | Goes | Went | Gone | going                 o lọ | ti lọ  lọ

Grammatical Structure                                Grammatical Structure
Subject Verb Object (SVO)
                                                     Subject Verb Object (SVO)
    The boy
                                                                 nrin     a
o   old man
                                                                 lagba

                                       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011    7
English to      ba Machine
    Translation System Challenges
1) The translation process
   the two languages are SVO, but not straight forward
   (cultural bounded words and concepts)

2) Domain selection problem

3) Lack of language resources

4) Orthography typesetting problem

                   AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   8
Language resources challenge
Sources        Correct                  Parallel            Digital       Domain               annotated        size           Textual
               orthography              Corporal/quali                    Specific
                                        ty
Resources      Not              fully   Available/poor      Available     General       (Not   Not annotated    Large enough   Text form
on       the   dialectically            quality e.g. The                  domain
Internet       marked           and     Jehovah                           specific)
               punctuated               Witness


Religious      Divergent                Contextually        Mostly        Specific             Not annotated    large          Mostly text
books     or                            deficient    e.g.   hardcopy      (religious)
documents                               The     Jehovah
                                        Witness

Nigerian       Poor                     Not available       Not all are   Not     domain       Not annotated    small          All are in
newspapers                                                  digitalized   specific                                             text form


The radio &    Not in text form         Speech/poor         Available     General              Not applicable   Large enough   Non-
TV (Media)                              translation         in                                                                 textual
                                                            magnetic
                                                            disc
Government     Mostly English           Not available       Available     Multiple             Not annotated    Sizeable       Text form
documents                                                   in English    domains                               volume

Textbooks/     Mostly                   Not available       Not all are   Specific             Not       POS    Sizeable       Text form
manuals/rep    English                                      digitized                          annotated
orts




                                                   AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011                                                  9
Database Design Cont.
Data 1: Sentences are systematically collected using
  home environment terminologies (Domain)

Data 2: Lexical items extracted from Data 1

Data 3: Data 1 and Data 2 annotations : POS tags

Data 4: Data 3 represented using the format
  designed for MT translation Database
                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   10
Lexicon database




   AGIS'11 UNECA CONFERENCE 1-2 DEC.
                                       11
                  2011
Database Design Cont.




     AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   12
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   13
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   14
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   15
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   16
Software Development and
 Implementation Process

Software tools:
  a) Python

  b) PyQt

  c) NLTK



            AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   17
Parser
                       Natural Language Toolkit (NLTK)
Ade sat on the chair                                           Ade jokoo sori aga naa
(S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the)        (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori))
(N chair))))                                            (N aga) (Det naa))))




                                       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011                      18
Program Coding

Software Modules:

 a) Library
 b) Parser
 c) GUI




                    AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   19
Software Demonstration

a) basic SVO sentences

b)qualified subject/object SVO sentences

c) modified verb SVO sentences




                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   20
Software Demonstration
http://www.ifecisrg.org/IfeMT




        AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   21
Conclusion
In this presentation, I have discussed:

Theoretical and practical issues relating to our IFE-MT
  development

Database design, Library design

Software development process, and Program coding

The IFE-MT software was demonstrated

We are now updating the database and evaluating the MT system
  using mean opinion score.


                            AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   22
Some Related Work
 Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to
  tackle agreement and word-ordering in english-arabic machine
  translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings
  /Accepted%20Refereed%20Papers/C43.pdf

 Anand, K. M., Dhanalakshmi, V., Soman, K.P. and
  Rajendran, S., (2010), A Sequence Labeling Approach to
  Morphological Analyzer for Tamil Language, International Journal
  on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP
  1944-1951

 Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine
  Translation semantic mapper”, International Journal of Engineering
  Science and Technology Vol. 2(10), PP 5313-5318



                        AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011      23
Related Work Cont.
 Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of
  Noun Phrases from Punjabi to English”, International Journal of
  Computer Science Issues, Vol. 7, Issue 5, September, ISSN
  (Online):1694-0814


 Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based
  machine translation for Swedish to Danish”, In Proceedings of the First
  International Workshop on Free/Open-Source Rule-Based Machine
  Translation, pages 27–33, Alicante.


 Tyers, F. M. (2010), “Rule-based Breton to French machine
  translation”, European Association for Machine Translation, EAMT May
  2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010-
  Tyers.pdf)



                         AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011         24
References
Blank, D. (1998), Definition of Machine
  Translation, http://www.macalester.edu/courses/russ65
  /definiti.htm [Accessed 02/10/2010]




                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   25
Thank you for listening




       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   26

More Related Content

What's hot

Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.enClarkTony
 
Language Identification: A neural network approach
Language Identification: A neural network approachLanguage Identification: A neural network approach
Language Identification: A neural network approachAlberto Simões
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
Internationalization & localization testing
Internationalization & localization testingInternationalization & localization testing
Internationalization & localization testingRobin0590
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionArif A.
 

What's hot (7)

Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
 
Language Identification: A neural network approach
Language Identification: A neural network approachLanguage Identification: A neural network approach
Language Identification: A neural network approach
 
Antlr rafaelpsouza
Antlr rafaelpsouzaAntlr rafaelpsouza
Antlr rafaelpsouza
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
Internationalization & localization testing
Internationalization & localization testingInternationalization & localization testing
Internationalization & localization testing
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 

Similar to IFE-MT: An English-to-Yorùbá Machine Translation System

A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingGuy De Pauw
 
Bilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesBilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesKeith Tam
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
 
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Association for Computational Linguistics
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba languageAlexander Decker
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionKepa J. Rodriguez
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...Erin Lyons
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for OooJaganadh Gopinadhan
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf SVTaylor123
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
B047006011
B047006011B047006011
B047006011inventy
 
B047006011
B047006011B047006011
B047006011inventy
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningKv Sagar
 

Similar to IFE-MT: An English-to-Yorùbá Machine Translation System (20)

A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
 
Bilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesBilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studies
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languages
 
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba language
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora Resolution
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Su2012 ss lg week one full pp
Su2012 ss lg week one full ppSu2012 ss lg week one full pp
Su2012 ss lg week one full pp
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
 
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
B047006011
B047006011B047006011
B047006011
 
B047006011
B047006011B047006011
B047006011
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learning
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Guy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

IFE-MT: An English-to-Yorùbá Machine Translation System

  • 1. IFE-MT: An English-to-Yorùbá Machine Translation System *Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and *Agbeyangi, A.O. *Department of Computer Science & Engineering +Dept. of Linguistics & African Languages Obafemi Awolowo University, Ile-Ife, Nigeria AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 1
  • 2. In this Presentation.. 1) Introduction 2) Theoretical Issues a) Features of English & ba languages b) Machine translation process 3) Practical issues a) Data acquisition b) system design c) software development d) system implementation AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 2
  • 3. Introduction Machine translation (MT): is the application of computers to the task of translating texts or speeches from one natural language to another (Blank, 1998). An English to ba (E-Y) MT system translates English text to ba text. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 3
  • 4. MT Conceptualisation AGIS'11 UNECA CONFERENCE 1-2 DEC. 4 2011
  • 5. MT Paradigm 1)Text → Text 2)Speech → Speech 3)Text → Speech 4)Speech → Text AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 5
  • 6. Research Theory Theories/Assumptions a) ba expression moves from concrete to abstract, but English expression moves from abstract to concrete. b) Natural language has at most 400 active words. c) Turing test theory for Evaluation (is a test of a machine’s ability to exhibit intelligent behavior): Using Mean opinion score AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 6
  • 7. Features of English & Yoruba languages ENGLISH Stressed Tone language Record(N) Record(V) Agba Commit(N) commit(V) Read(pr ) read (past) gba mọ Intonation time Syllable timed He found it on the street? Baba How did you ever escape? Orthography Orthography Non –phonetic Almost phonetic o enough gba Ẹdẹ Fish Large resources language Low resources language Inflectional Non-Inflectional Wait | Waits | waited | waiting o ro | ti ro ro Go | Goes | Went | Gone | going o lọ | ti lọ lọ Grammatical Structure Grammatical Structure Subject Verb Object (SVO) Subject Verb Object (SVO) The boy nrin a o old man lagba AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 7
  • 8. English to ba Machine Translation System Challenges 1) The translation process the two languages are SVO, but not straight forward (cultural bounded words and concepts) 2) Domain selection problem 3) Lack of language resources 4) Orthography typesetting problem AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 8
  • 9. Language resources challenge Sources Correct Parallel Digital Domain annotated size Textual orthography Corporal/quali Specific ty Resources Not fully Available/poor Available General (Not Not annotated Large enough Text form on the dialectically quality e.g. The domain Internet marked and Jehovah specific) punctuated Witness Religious Divergent Contextually Mostly Specific Not annotated large Mostly text books or deficient e.g. hardcopy (religious) documents The Jehovah Witness Nigerian Poor Not available Not all are Not domain Not annotated small All are in newspapers digitalized specific text form The radio & Not in text form Speech/poor Available General Not applicable Large enough Non- TV (Media) translation in textual magnetic disc Government Mostly English Not available Available Multiple Not annotated Sizeable Text form documents in English domains volume Textbooks/ Mostly Not available Not all are Specific Not POS Sizeable Text form manuals/rep English digitized annotated orts AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 9
  • 10. Database Design Cont. Data 1: Sentences are systematically collected using home environment terminologies (Domain) Data 2: Lexical items extracted from Data 1 Data 3: Data 1 and Data 2 annotations : POS tags Data 4: Data 3 represented using the format designed for MT translation Database AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 10
  • 11. Lexicon database AGIS'11 UNECA CONFERENCE 1-2 DEC. 11 2011
  • 12. Database Design Cont. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 12
  • 13. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 13
  • 14. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 14
  • 15. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 15
  • 16. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 16
  • 17. Software Development and Implementation Process Software tools: a) Python b) PyQt c) NLTK AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 17
  • 18. Parser Natural Language Toolkit (NLTK) Ade sat on the chair Ade jokoo sori aga naa (S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the) (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori)) (N chair)))) (N aga) (Det naa)))) AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 18
  • 19. Program Coding Software Modules: a) Library b) Parser c) GUI AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 19
  • 20. Software Demonstration a) basic SVO sentences b)qualified subject/object SVO sentences c) modified verb SVO sentences AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 20
  • 21. Software Demonstration http://www.ifecisrg.org/IfeMT AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 21
  • 22. Conclusion In this presentation, I have discussed: Theoretical and practical issues relating to our IFE-MT development Database design, Library design Software development process, and Program coding The IFE-MT software was demonstrated We are now updating the database and evaluating the MT system using mean opinion score. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 22
  • 23. Some Related Work  Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to tackle agreement and word-ordering in english-arabic machine translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings /Accepted%20Refereed%20Papers/C43.pdf  Anand, K. M., Dhanalakshmi, V., Soman, K.P. and Rajendran, S., (2010), A Sequence Labeling Approach to Morphological Analyzer for Tamil Language, International Journal on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP 1944-1951  Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine Translation semantic mapper”, International Journal of Engineering Science and Technology Vol. 2(10), PP 5313-5318 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 23
  • 24. Related Work Cont.  Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of Noun Phrases from Punjabi to English”, International Journal of Computer Science Issues, Vol. 7, Issue 5, September, ISSN (Online):1694-0814  Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based machine translation for Swedish to Danish”, In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 27–33, Alicante.  Tyers, F. M. (2010), “Rule-based Breton to French machine translation”, European Association for Machine Translation, EAMT May 2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010- Tyers.pdf) AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 24
  • 25. References Blank, D. (1998), Definition of Machine Translation, http://www.macalester.edu/courses/russ65 /definiti.htm [Accessed 02/10/2010] AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 25
  • 26. Thank you for listening AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 26