A Number to Yorùbá Text Transcription System

A Number to Yor`b´ Text Transcription System
u a

Akinad´ Olugbńga Ol´w´l´
e e . a ae
and
e ı . eu ı à ı
Od´job´ Od´tńj´ Aj`d´
. . .

Computer Sci. & Engr. Department
Ob´f´mi Aw´l´w` University, Il´-If`
ae. oo o
. . e e .

AGIS 2011 Conference, Addis
Ababa, Ethiopia
01 December, 2011

Presentation Outline
Introduction
Numbers and Numerals
Normalisation of Numbers for TTS
Objectives
The Yor`b´ numeral system
u a
The Yor`b´ numeral generation
u a
methodology
Software Implementation
Evaluation
Overview

2 of 24


Number is an abstract concept that is represented by symbols
within a numeral system.
A Numeral system is concerned with the written representation of
spoken positive whole numbers.
The Hindu-Arabic numeral has being adopted as a universally
acceptable representation for numbers and mathematical
expressions.
The Hindu-Arabic numeral has ten symbols i.e. 0, 1, 2, 3, 4, 5, 6,
7, 8 and 9

3 of 24


Numbers can take diﬀerent formats, Which include
` ` a
Cardinal Numbers: 3, 10, Eta, Ew´
. .
` . ta, ` . w`´
Ordinal Numbers: 3rd , 10th, Ike Ike aa
Monetary Value: $300, Naira M´w`´
e aa
.
Phone Numbers: 234802234****
Percentage: 10%, ` a m´`w´ n´ u og´run
Id´ ee a ın´ . o
.. .
The Yor`b´ has many dialects with varying linguistic variation
u a
(Fabunmi, 2010). But they can all communicate using the
Standard Yor`b´ (SY).
u a
So, our focus is to develop a system that can convert cardinal
numbers to their SY lexical forms.

4 of 24

Normalisation of Numbers for TTS

Text normalisation is one of the important steps in high level
speech synthesis.
Numbers, abbreviations, symbols etc are converted into their
lexical (textual) equivalence.
Normalisation of numbers is thus an important stage in this step.

5 of 24

Objectives

To specify and design a computational model to capture the
conversion of numbers to their Standard Yor`b´ (SY) lexical
u a
equivalence

6 of 24

Objectives

u a
equivalence
To implement a software for the number to SY text transcription
model designed above

6 of 24

Objectives

u a
equivalence
To implement a software for the number to SY text transcription
model designed above
To evaluate the system implemented

6 of 24

The Yor`b´ numeral system
u a

Yor`b´ numerals is Vigesimal ie based on 20. *(Ekundayo, 1975),
u a
(Zaslavsky, 2000), (Longe, 2009)
The Yoruba has 16 lexicons which serves as the basic building
blocks. (Ekunday0, 1975)
1=`kan, 2=`j` 3=`ta, 4=`rin, 5=`r´n, 6=`f`, 7=`je, 8=`jo,
o
. e ı, e
. e
. au ea
. e e.
.
9=esan, 10=ewa, 20=og´n, 30=ogbon, 200=igba, 300=odunrun,
u
400=ir´ o, 20,000=oke
ınw´
Special positional words exists for addition (l´), subtraction (d´
e ın)
and multiplication (`n`).
o a
.

7 of 24

What a Yor`b´ Speaker/Hearer knows about Yor`b´
u a u a
numeral

All numbers can be represented within the Yor`b´ numerals.
u a
There are subgroups within the Yor`b´ numerals based on their
u a
syntactic derivational rules.
Linguistic skills (contraction, vowel harmony, elision and euphonic
assimilation) are required for the representations of some numerals.
There exist multiple representations for numerals with low
functional load.
The largest single number that can be represented is 20,000 `k´
o e
. .
Higher numbers are derived from 20,000 `k´
o e
. .
Subtraction has a heavy functional load than addition

8 of 24

Subgroups of Yoruba Numeral

1 - 14: Behaves as decimal
15 - 199: Derived with 20 as the multiplicative base.
2000 and above: Derived with 20000 as the multiplicative base.
So from above, the multiplicative base of Yoruba numeral are:
20, 200, 2000, 20000
which can be represented as: 2(10)1 ,2(10)2 ,2(10)3 , 2(10)4

9 of 24

SY Numerals Derivation

Three of the four basic arithmetical operations (addition,
subtraction & multiplication) are employed for the derivation of an
infinite set of SY numerals from the sixteen vocabulary items.
(Ekundayo, 1977)(Zaslavsky, 1999)
A single number can be generatedfrom multiple subtractions.
Number Yor`bú a Derivation
15 ``d´gń
ee o u
.. 20-5
65 `´d´rin
aa o
. (4*20)-10-5
565 `´d´rin l´ l`´d´gb`ta
aa o
. e ee e e
.. . . (3*200)-100+(4*20)-
10-5
17,565 `´d´rin l´ l`´d´gb`ta (9*2000)-
aa o . e ee e e
.. . .
l´ l`´d´gbàsń
e ee e a a
.. . 1,000+(3*200)-
100+(4*20)-10-5
10 of 24

Special Type of Subtraction

A special type of subtraction exist when you subtract 5, 10, 100
and 1000
This brings about the eedin phenomenum

eedin(A)
we will assume an implied subtraction from A of the following
kinds.
1. 5 iff A = 20 or 30
2. 10 iff A = 60,80,100,....,200
3. 100 iff A = 600,800,1000,....,2000
4. 1000 iff A = 4000,6000,8000,....,20000

11 of 24

Methodology
A review of the theory, process and computation underlying the SY
numerals was conducted by consulting appropriate literature.
A computational model was formulated using automata theory
based approach. The proposed model was captured with a set of
Push-Down Automata (PDA) using Java Formal Language and
Automata Package (JFLAP), a simulation tool for experimenting
with Formal Languages and Automata Theory.
The model was implemented using the Python programming
language.
Evaluation of the system was carried out using the Mean Opinion
Score (MOS) which is a Turing test for computational intelligence.

12 of 24

Software Design

Arabic
Start
Number

Is number in basic
numerals? Yes

No

Decomposition of Translation
Number to Basic Number to
Units Yoruba

Yoruba Morphological
Stop
Numeral Analysis

13 of 24

Push Down Automata
A non-deterministic PDA is defined as a sextuple

Q, Σ, Γ, q0 , F , δ, z0 ,

Q is a finite set of states
Σ is a finite set of input alphabet
Γ is a finite set of stack alphabet
z0 Γ is a the initial symbol on top of the stack
q0 Q is the initial state
F ⊆ Q is the set of final states
δ is the set of transitions is a finite subset of
Q × (Σ ∪ ε ∪ #) × (Γ) → Q × (Γ ∪ ε)
14 of 24

Push Down Automata

15 of 24

Processing of 67

Generate Magnitude
60 + 7#

Decompose to Vigesimal
4*20-10-3#

PDA processing of string

eta d´ aad´ og´n erin
. ın ın u .
Apply linguistic skills
eta d´ aad´ ogorin
. ın ın . .
eta d´ `´dota
. ın aa .

16 of 24

Example: Processing of 19669
Ekundayo(1977) presented 7 canonical representations for 19669, but
we were able to produce 3 more forms for 19669. Which are
eedegb``w´ ´ l´ ota-l´-legbeta ´ l´ mesń
. . . aa a o e . e . . o e . a
eedegb``w´ ´ l´ orin-´-legbeta d´ mokanl`´
. . . aa a o e . e . . ın . aa
eedegb``w´ ´ l´ ´j` e-legbeta ´ l´ mokan-d´ . gbon
. . . aa a o e o ı-l´ . . o e . ın-lo .
eedegb``w´ ´ l´ egbeta ´ l´ mokan-d´
. . . aa a o e . . o e . ın-laadorin
.
eedegb``w´ ´ l´ eedegberin ´ d´ mokan-l´-logbon
.. . aa a o e . . . . o ın . e . .
oke ´ d´ ir´ o ´ l´ mokan-d´
. . o ın ınw´ o e . ın-laadorin
.
oke ´ d´ odńrń ´ d´ mokan-l´-logbon
. . o ın . u u o ın . e . .
oke ´ d´ oj` e-lodńrń ´ l´ mesń
. . o ın ı-l´ . u u o e . a
´gb`j` ın-logorń ´ l´ mokan-d´
e e ı-d´ . . u o e .
. ın-laadorin
.
eed´gbokan-d´ . gorń ´ d´ mokan-l´-logbon
.. e .
. ın-lo . u o ın . e . .
17 of 24

Deep structures form for 19669

19000 + 660 + 9
19000 + 680 - 11
19000 + 640 + 29
19000 + 600 + 69
19000 + 700 - 31
20000 - 400 + 69
20000 - 300 - 31
20000 - 340 + 9
19600 + 69
19700 - 31

18 of 24

Surface structures form for 19669

1000- 2000* 10 + 200* 3 + 20* 3 + 9
1000- 2000* 10 + 200* 3 + 20* 4 - 1 + 10
1000- 2000* 10 + 200* 3 + 20* 2 + 30 - 1
1000- 2000* 10 + 200* 3 + 20* 3 + 9
1000- 2000* 10 + 200* 4-100 - 30 + 1
20000 - 200* 2 + 20* 4 - 10 - 1
20000 - 300 - 30 + 1
20000 - 40 + 300 + 9
200* 98 + 20* 4 - 10 - 1
200*99 - 100 - 30 + 1

19 of 24

Grammar for Yor`b´ Numeral
u a
This is a slight modiﬁcation of grammar discussed by Hurford (2006)
to accomodate subtraction

NUMBER → PHRASE (NUMBER) #addition
PHRASE → DIGIT
REDUCE
PHRASE → PHRASE #Subtraction
SUB
PHRASE → M PHRASE #Multiplication

The rule follows that curly braces implies that either of the options
can be used and parentheses indicate that the content of the
parentheses can be left out.

20 of 24

Parse tree for 19669
19669 = 1000- 2000* 10 200* 3 10- 20* 4 - 1

21 of 24

Parse tree for 19669
19669 = 1000- 2000* 10 40 200* 3 1 - 30

22 of 24

Ongoing Work
Evaluation of the software is on-going
Eﬀort is being made to develop the software for handheld devices

23 of 24

References

Ekundayo, S. A. (1977).
Vigesimal numeral derivational morphology: Yor`b´ grammatical
u a
competence epitomized.
Anthropological Linguistics, 19(9):436–453.

Fab`nmi, F. A. (2010).
u
Vigesimal numerals on if` (togo) and if` (nigeria) dialects of yor`b´.
e
. e
. u a
Linguistik online, 43:pages.

Longe, O. (2009).
A Yor`b´ Decimal Number System.
u a
Bookbuilders, Ibadan.

24 of 24

A Number to Yorùbá Text Transcription System

Recommended

Recommended

More Related Content

More from Guy De Pauw

More from Guy De Pauw (20)

Recently uploaded

Recently uploaded (20)

A Number to Yorùbá Text Transcription System