SlideShare a Scribd company logo
A Number to Yor`b´ Text Transcription System
               u a




                         Akinad´ Olugb´nga Ol´w´l´
                                e      e    . a ae
                                    and
                              e ı . eu ı `a ı
                           Od´job´ Od´t´nj´ Aj`d´
                            . .       .

                         Computer Sci. & Engr. Department
                         Ob´f´mi Aw´l´w` University, Il´-If`
                           ae.      oo o
                                      . .              e e .


                         AGIS 2011 Conference, Addis
                               Ababa, Ethiopia
                             01 December, 2011
Presentation Outline
  Introduction
          Numbers and Numerals
          Normalisation of Numbers for TTS
  Objectives
  The Yor`b´ numeral system
          u a
          The Yor`b´ numeral generation
                 u a
  methodology
  Software Implementation
  Evaluation
  Overview



2 of 24
Numbers and Numerals


  Number is an abstract concept that is represented by symbols
  within a numeral system.
  A Numeral system is concerned with the written representation of
  spoken positive whole numbers.
  The Hindu-Arabic numeral has being adopted as a universally
  acceptable representation for numbers and mathematical
  expressions.
  The Hindu-Arabic numeral has ten symbols i.e. 0, 1, 2, 3, 4, 5, 6,
  7, 8 and 9



3 of 24
Numbers and Numerals


  Numbers can take different formats, Which include
                                    `   ` a
          Cardinal Numbers: 3, 10, Eta, Ew´
                                    .    .
                                       ` . ta, ` . w`´
          Ordinal Numbers: 3rd , 10th, Ike Ike aa
          Monetary Value: $300, Naira M´w`´
                                          e aa
                                           .
          Phone Numbers: 234802234****
          Percentage: 10%, ` a m´`w´ n´ u og´run
                           Id´ ee a ın´ . o
                                 ..             .
  The Yor`b´ has many dialects with varying linguistic variation
         u a
  (Fabunmi, 2010). But they can all communicate using the
  Standard Yor`b´ (SY).
              u a
  So, our focus is to develop a system that can convert cardinal
  numbers to their SY lexical forms.


4 of 24
Normalisation of Numbers for TTS




  Text normalisation is one of the important steps in high level
  speech synthesis.
  Numbers, abbreviations, symbols etc are converted into their
  lexical (textual) equivalence.
  Normalisation of numbers is thus an important stage in this step.




5 of 24
Objectives



  To specify and design a computational model to capture the
  conversion of numbers to their Standard Yor`b´ (SY) lexical
                                             u a
  equivalence




6 of 24
Objectives



  To specify and design a computational model to capture the
  conversion of numbers to their Standard Yor`b´ (SY) lexical
                                             u a
  equivalence
  To implement a software for the number to SY text transcription
  model designed above




6 of 24
Objectives



  To specify and design a computational model to capture the
  conversion of numbers to their Standard Yor`b´ (SY) lexical
                                             u a
  equivalence
  To implement a software for the number to SY text transcription
  model designed above
  To evaluate the system implemented




6 of 24
The Yor`b´ numeral system
       u a


  Yor`b´ numerals is Vigesimal ie based on 20. *(Ekundayo, 1975),
     u a
  (Zaslavsky, 2000), (Longe, 2009)
  The Yoruba has 16 lexicons which serves as the basic building
  blocks. (Ekunday0, 1975)
  1=`kan, 2=`j` 3=`ta, 4=`rin, 5=`r´n, 6=`f`, 7=`je, 8=`jo,
     o
     .        e ı,  e
                    .       e
                            .      au       ea
                                             .      e       e.
                                                            .
  9=esan, 10=ewa, 20=og´n, 30=ogbon, 200=igba, 300=odunrun,
                          u
  400=ir´ o, 20,000=oke
         ınw´
  Special positional words exists for addition (l´), subtraction (d´
                                                 e                 ın)
  and multiplication (`n`).
                      o a
                      .



7 of 24
What a Yor`b´ Speaker/Hearer knows about Yor`b´
          u a                               u a
numeral

  All numbers can be represented within the Yor`b´ numerals.
                                               u a
  There are subgroups within the Yor`b´ numerals based on their
                                    u a
  syntactic derivational rules.
  Linguistic skills (contraction, vowel harmony, elision and euphonic
  assimilation) are required for the representations of some numerals.
  There exist multiple representations for numerals with low
  functional load.
  The largest single number that can be represented is 20,000 `k´
                                                              o e
                                                              . .
  Higher numbers are derived from 20,000 `k´
                                          o e
                                          . .
  Subtraction has a heavy functional load than addition

8 of 24
Subgroups of Yoruba Numeral


   1 - 14: Behaves as decimal
   15 - 199: Derived with 20 as the multiplicative base.
   200 - 1999: Derived with 200 as the multiplicative base.
   2000 - 19999: Derived with 2000 as the multiplicative base.
   2000 and above: Derived with 20000 as the multiplicative base.
So from above, the multiplicative base of Yoruba numeral are:
20, 200, 2000, 20000
which can be represented as: 2(10)1 ,2(10)2 ,2(10)3 , 2(10)4



 9 of 24
SY Numerals Derivation

   Three of the four basic arithmetical operations (addition,
   subtraction & multiplication) are employed for the derivation of an
   infinite set of SY numerals from the sixteen vocabulary items.
   (Ekundayo, 1977)(Zaslavsky, 1999)
   A single number can be generatedfrom multiple subtractions.
    Number Yor`b´u a                 Derivation
    15        ``d´g´n
              ee o u
              ..                     20-5
    65        `´d´rin
              aa o
                 .                   (4*20)-10-5
    565       `´d´rin l´ l`´d´gb`ta
              aa o
                 .     e ee e e
                          .. . .     (3*200)-100+(4*20)-
                                     10-5
     17,565   `´d´rin l´ l`´d´gb`ta (9*2000)-
              aa o .   e ee e e
                          .. . .
              l´ l`´d´gb`as´n
               e ee e a a
                  .. .               1,000+(3*200)-
                                     100+(4*20)-10-5
10 of 24
Special Type of Subtraction


   A special type of subtraction exist when you subtract 5, 10, 100
   and 1000
   This brings about the eedin phenomenum

                                      eedin(A)
   we will assume an implied subtraction from A of the following
   kinds.
   1.      5 iff A = 20 or 30
   2.      10 iff A = 60,80,100,....,200
   3.      100 iff A = 600,800,1000,....,2000
   4.      1000 iff A = 4000,6000,8000,....,20000


11 of 24
Methodology
   A review of the theory, process and computation underlying the SY
   numerals was conducted by consulting appropriate literature.
   A computational model was formulated using automata theory
   based approach. The proposed model was captured with a set of
   Push-Down Automata (PDA) using Java Formal Language and
   Automata Package (JFLAP), a simulation tool for experimenting
   with Formal Languages and Automata Theory.
   The model was implemented using the Python programming
   language.
   Evaluation of the system was carried out using the Mean Opinion
   Score (MOS) which is a Turing test for computational intelligence.



12 of 24
Software Design


                           Arabic
              Start
                          Number




                      Is number in basic
                          numerals?             Yes


                             No

                      Decomposition of      Translation
                      Number to Basic       Number to
                           Units              Yoruba




                      Yoruba               Morphological
              Stop
                      Numeral                Analysis




13 of 24
Push Down Automata
A non-deterministic PDA is defined as a sextuple

                           Q, Σ, Γ, q0 , F , δ, z0 ,


   Q is a finite set of states
   Σ is a finite set of input alphabet
   Γ is a finite set of stack alphabet
   z0 Γ is a the initial symbol on top of the stack
   q0 Q is the initial state
   F ⊆ Q is the set of final states
   δ is the set of transitions is a finite subset of
   Q × (Σ ∪ ε ∪ #) × (Γ) → Q × (Γ ∪ ε)
14 of 24
Push Down Automata




15 of 24
Processing of 67

Generate Magnitude
60 + 7#

Decompose to Vigesimal
4*20-10-3#

PDA processing of string

eta d´ aad´ og´n erin
.    ın   ın u .
Apply linguistic skills
eta d´ aad´ ogorin
.    ın    ın . .
eta d´ `´dota
.    ın aa .

16 of 24
Example: Processing of 19669
Ekundayo(1977) presented 7 canonical representations for 19669, but
we were able to produce 3 more forms for 19669. Which are
   eedegb``w´ ´ l´ ota-l´-legbeta ´ l´ mes´n
   . . . aa a o e .     e . . o e . a
   eedegb``w´ ´ l´ orin-´-legbeta d´ mokanl`´
   . . . aa a o e .     e . .      ın .      aa
   eedegb``w´ ´ l´ ´j` e-legbeta ´ l´ mokan-d´ . gbon
   . . . aa a o e o ı-l´ . . o e .            ın-lo .
   eedegb``w´ ´ l´ egbeta ´ l´ mokan-d´
   . . . aa a o e . . o e .              ın-laadorin
                                                .
   eedegb``w´ ´ l´ eedegberin ´ d´ mokan-l´-logbon
   .. .   aa a o e . . . .    o ın .          e . .
   oke ´ d´ ir´ o ´ l´ mokan-d´
   . . o ın ınw´ o e .          ın-laadorin
                                       .
   oke ´ d´ od´nr´n ´ d´ mokan-l´-logbon
   . . o ın . u u o ın .          e . .
   oke ´ d´ oj` e-lod´nr´n ´ l´ mes´n
   . . o ın ı-l´ . u u o e . a
   ´gb`j` ın-logor´n ´ l´ mokan-d´
   e e ı-d´ . . u o e .
   .                             ın-laadorin
                                        .
   eed´gbokan-d´ . gor´n ´ d´ mokan-l´-logbon
   .. e .
      .         ın-lo . u o ın .        e . .
17 of 24
Deep structures form for 19669

   19000 + 660 + 9
   19000 + 680 - 11
   19000 + 640 + 29
   19000 + 600 + 69
   19000 + 700 - 31
   20000 - 400 + 69
   20000 - 300 - 31
   20000 - 340 + 9
   19600 + 69
   19700 - 31

18 of 24
Surface structures form for 19669

   1000- 2000* 10 + 200* 3 + 20* 3 + 9
   1000- 2000* 10 + 200* 3 + 20* 4 - 1 + 10
   1000- 2000* 10 + 200* 3 + 20* 2 + 30 - 1
   1000- 2000* 10 + 200* 3 + 20* 3 + 9
   1000- 2000* 10 + 200* 4-100 - 30 + 1
   20000 - 200* 2 + 20* 4 - 10 - 1
   20000 - 300 - 30 + 1
   20000 - 40 + 300 + 9
   200* 98 + 20* 4 - 10 - 1
   200*99 - 100 - 30 + 1

19 of 24
Grammar for Yor`b´ Numeral
               u a
This is a slight modification of grammar discussed by Hurford (2006)
to accomodate subtraction

       NUMBER →          PHRASE      (NUMBER)          #addition
           PHRASE   →    DIGIT
                         REDUCE
           PHRASE   →                   PHRASE      #Subtraction
                           SUB
           PHRASE   →    M    PHRASE         #Multiplication

The rule follows that curly braces implies that either of the options
can be used and parentheses indicate that the content of the
parentheses can be left out.


20 of 24
Parse tree for 19669
19669 = 1000- 2000* 10 200* 3 10- 20* 4 - 1




21 of 24
Parse tree for 19669
19669 = 1000- 2000* 10 40 200* 3 1 - 30




22 of 24
Ongoing Work
   Evaluation of the software is on-going
   Effort is being made to develop the software for handheld devices




23 of 24
References

     Ekundayo, S. A. (1977).
     Vigesimal numeral derivational morphology: Yor`b´ grammatical
                                                   u a
     competence epitomized.
     Anthropological Linguistics, 19(9):436–453.

     Fab`nmi, F. A. (2010).
        u
     Vigesimal numerals on if` (togo) and if` (nigeria) dialects of yor`b´.
                              e
                              .             e
                                            .                          u a
     Linguistik online, 43:pages.

     Longe, O. (2009).
     A Yor`b´ Decimal Number System.
          u a
     Bookbuilders, Ibadan.


24 of 24

More Related Content

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
Guy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
Guy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
Guy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
Guy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
Guy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
Guy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
Guy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Guy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
Guy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
Guy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
Guy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
Guy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
Guy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Guy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 

Recently uploaded

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

A Number to Yorùbá Text Transcription System

  • 1. A Number to Yor`b´ Text Transcription System u a Akinad´ Olugb´nga Ol´w´l´ e e . a ae and e ı . eu ı `a ı Od´job´ Od´t´nj´ Aj`d´ . . . Computer Sci. & Engr. Department Ob´f´mi Aw´l´w` University, Il´-If` ae. oo o . . e e . AGIS 2011 Conference, Addis Ababa, Ethiopia 01 December, 2011
  • 2. Presentation Outline Introduction Numbers and Numerals Normalisation of Numbers for TTS Objectives The Yor`b´ numeral system u a The Yor`b´ numeral generation u a methodology Software Implementation Evaluation Overview 2 of 24
  • 3. Numbers and Numerals Number is an abstract concept that is represented by symbols within a numeral system. A Numeral system is concerned with the written representation of spoken positive whole numbers. The Hindu-Arabic numeral has being adopted as a universally acceptable representation for numbers and mathematical expressions. The Hindu-Arabic numeral has ten symbols i.e. 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 3 of 24
  • 4. Numbers and Numerals Numbers can take different formats, Which include ` ` a Cardinal Numbers: 3, 10, Eta, Ew´ . . ` . ta, ` . w`´ Ordinal Numbers: 3rd , 10th, Ike Ike aa Monetary Value: $300, Naira M´w`´ e aa . Phone Numbers: 234802234**** Percentage: 10%, ` a m´`w´ n´ u og´run Id´ ee a ın´ . o .. . The Yor`b´ has many dialects with varying linguistic variation u a (Fabunmi, 2010). But they can all communicate using the Standard Yor`b´ (SY). u a So, our focus is to develop a system that can convert cardinal numbers to their SY lexical forms. 4 of 24
  • 5. Normalisation of Numbers for TTS Text normalisation is one of the important steps in high level speech synthesis. Numbers, abbreviations, symbols etc are converted into their lexical (textual) equivalence. Normalisation of numbers is thus an important stage in this step. 5 of 24
  • 6. Objectives To specify and design a computational model to capture the conversion of numbers to their Standard Yor`b´ (SY) lexical u a equivalence 6 of 24
  • 7. Objectives To specify and design a computational model to capture the conversion of numbers to their Standard Yor`b´ (SY) lexical u a equivalence To implement a software for the number to SY text transcription model designed above 6 of 24
  • 8. Objectives To specify and design a computational model to capture the conversion of numbers to their Standard Yor`b´ (SY) lexical u a equivalence To implement a software for the number to SY text transcription model designed above To evaluate the system implemented 6 of 24
  • 9. The Yor`b´ numeral system u a Yor`b´ numerals is Vigesimal ie based on 20. *(Ekundayo, 1975), u a (Zaslavsky, 2000), (Longe, 2009) The Yoruba has 16 lexicons which serves as the basic building blocks. (Ekunday0, 1975) 1=`kan, 2=`j` 3=`ta, 4=`rin, 5=`r´n, 6=`f`, 7=`je, 8=`jo, o . e ı, e . e . au ea . e e. . 9=esan, 10=ewa, 20=og´n, 30=ogbon, 200=igba, 300=odunrun, u 400=ir´ o, 20,000=oke ınw´ Special positional words exists for addition (l´), subtraction (d´ e ın) and multiplication (`n`). o a . 7 of 24
  • 10. What a Yor`b´ Speaker/Hearer knows about Yor`b´ u a u a numeral All numbers can be represented within the Yor`b´ numerals. u a There are subgroups within the Yor`b´ numerals based on their u a syntactic derivational rules. Linguistic skills (contraction, vowel harmony, elision and euphonic assimilation) are required for the representations of some numerals. There exist multiple representations for numerals with low functional load. The largest single number that can be represented is 20,000 `k´ o e . . Higher numbers are derived from 20,000 `k´ o e . . Subtraction has a heavy functional load than addition 8 of 24
  • 11. Subgroups of Yoruba Numeral 1 - 14: Behaves as decimal 15 - 199: Derived with 20 as the multiplicative base. 200 - 1999: Derived with 200 as the multiplicative base. 2000 - 19999: Derived with 2000 as the multiplicative base. 2000 and above: Derived with 20000 as the multiplicative base. So from above, the multiplicative base of Yoruba numeral are: 20, 200, 2000, 20000 which can be represented as: 2(10)1 ,2(10)2 ,2(10)3 , 2(10)4 9 of 24
  • 12. SY Numerals Derivation Three of the four basic arithmetical operations (addition, subtraction & multiplication) are employed for the derivation of an infinite set of SY numerals from the sixteen vocabulary items. (Ekundayo, 1977)(Zaslavsky, 1999) A single number can be generatedfrom multiple subtractions. Number Yor`b´u a Derivation 15 ``d´g´n ee o u .. 20-5 65 `´d´rin aa o . (4*20)-10-5 565 `´d´rin l´ l`´d´gb`ta aa o . e ee e e .. . . (3*200)-100+(4*20)- 10-5 17,565 `´d´rin l´ l`´d´gb`ta (9*2000)- aa o . e ee e e .. . . l´ l`´d´gb`as´n e ee e a a .. . 1,000+(3*200)- 100+(4*20)-10-5 10 of 24
  • 13. Special Type of Subtraction A special type of subtraction exist when you subtract 5, 10, 100 and 1000 This brings about the eedin phenomenum eedin(A) we will assume an implied subtraction from A of the following kinds. 1. 5 iff A = 20 or 30 2. 10 iff A = 60,80,100,....,200 3. 100 iff A = 600,800,1000,....,2000 4. 1000 iff A = 4000,6000,8000,....,20000 11 of 24
  • 14. Methodology A review of the theory, process and computation underlying the SY numerals was conducted by consulting appropriate literature. A computational model was formulated using automata theory based approach. The proposed model was captured with a set of Push-Down Automata (PDA) using Java Formal Language and Automata Package (JFLAP), a simulation tool for experimenting with Formal Languages and Automata Theory. The model was implemented using the Python programming language. Evaluation of the system was carried out using the Mean Opinion Score (MOS) which is a Turing test for computational intelligence. 12 of 24
  • 15. Software Design Arabic Start Number Is number in basic numerals? Yes No Decomposition of Translation Number to Basic Number to Units Yoruba Yoruba Morphological Stop Numeral Analysis 13 of 24
  • 16. Push Down Automata A non-deterministic PDA is defined as a sextuple Q, Σ, Γ, q0 , F , δ, z0 , Q is a finite set of states Σ is a finite set of input alphabet Γ is a finite set of stack alphabet z0 Γ is a the initial symbol on top of the stack q0 Q is the initial state F ⊆ Q is the set of final states δ is the set of transitions is a finite subset of Q × (Σ ∪ ε ∪ #) × (Γ) → Q × (Γ ∪ ε) 14 of 24
  • 18. Processing of 67 Generate Magnitude 60 + 7# Decompose to Vigesimal 4*20-10-3# PDA processing of string eta d´ aad´ og´n erin . ın ın u . Apply linguistic skills eta d´ aad´ ogorin . ın ın . . eta d´ `´dota . ın aa . 16 of 24
  • 19. Example: Processing of 19669 Ekundayo(1977) presented 7 canonical representations for 19669, but we were able to produce 3 more forms for 19669. Which are eedegb``w´ ´ l´ ota-l´-legbeta ´ l´ mes´n . . . aa a o e . e . . o e . a eedegb``w´ ´ l´ orin-´-legbeta d´ mokanl`´ . . . aa a o e . e . . ın . aa eedegb``w´ ´ l´ ´j` e-legbeta ´ l´ mokan-d´ . gbon . . . aa a o e o ı-l´ . . o e . ın-lo . eedegb``w´ ´ l´ egbeta ´ l´ mokan-d´ . . . aa a o e . . o e . ın-laadorin . eedegb``w´ ´ l´ eedegberin ´ d´ mokan-l´-logbon .. . aa a o e . . . . o ın . e . . oke ´ d´ ir´ o ´ l´ mokan-d´ . . o ın ınw´ o e . ın-laadorin . oke ´ d´ od´nr´n ´ d´ mokan-l´-logbon . . o ın . u u o ın . e . . oke ´ d´ oj` e-lod´nr´n ´ l´ mes´n . . o ın ı-l´ . u u o e . a ´gb`j` ın-logor´n ´ l´ mokan-d´ e e ı-d´ . . u o e . . ın-laadorin . eed´gbokan-d´ . gor´n ´ d´ mokan-l´-logbon .. e . . ın-lo . u o ın . e . . 17 of 24
  • 20. Deep structures form for 19669 19000 + 660 + 9 19000 + 680 - 11 19000 + 640 + 29 19000 + 600 + 69 19000 + 700 - 31 20000 - 400 + 69 20000 - 300 - 31 20000 - 340 + 9 19600 + 69 19700 - 31 18 of 24
  • 21. Surface structures form for 19669 1000- 2000* 10 + 200* 3 + 20* 3 + 9 1000- 2000* 10 + 200* 3 + 20* 4 - 1 + 10 1000- 2000* 10 + 200* 3 + 20* 2 + 30 - 1 1000- 2000* 10 + 200* 3 + 20* 3 + 9 1000- 2000* 10 + 200* 4-100 - 30 + 1 20000 - 200* 2 + 20* 4 - 10 - 1 20000 - 300 - 30 + 1 20000 - 40 + 300 + 9 200* 98 + 20* 4 - 10 - 1 200*99 - 100 - 30 + 1 19 of 24
  • 22. Grammar for Yor`b´ Numeral u a This is a slight modification of grammar discussed by Hurford (2006) to accomodate subtraction NUMBER → PHRASE (NUMBER) #addition PHRASE → DIGIT REDUCE PHRASE → PHRASE #Subtraction SUB PHRASE → M PHRASE #Multiplication The rule follows that curly braces implies that either of the options can be used and parentheses indicate that the content of the parentheses can be left out. 20 of 24
  • 23. Parse tree for 19669 19669 = 1000- 2000* 10 200* 3 10- 20* 4 - 1 21 of 24
  • 24. Parse tree for 19669 19669 = 1000- 2000* 10 40 200* 3 1 - 30 22 of 24
  • 25. Ongoing Work Evaluation of the software is on-going Effort is being made to develop the software for handheld devices 23 of 24
  • 26. References Ekundayo, S. A. (1977). Vigesimal numeral derivational morphology: Yor`b´ grammatical u a competence epitomized. Anthropological Linguistics, 19(9):436–453. Fab`nmi, F. A. (2010). u Vigesimal numerals on if` (togo) and if` (nigeria) dialects of yor`b´. e . e . u a Linguistik online, 43:pages. Longe, O. (2009). A Yor`b´ Decimal Number System. u a Bookbuilders, Ibadan. 24 of 24