Capturing Word-level Dependencies in Morpheme-based Language Modeling
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs in Ikota
1. LABORATOIRE
D'INFORMATIQUE Describing Morphologically-rich Languages using
Metagrammars: a Look at Verbs in Ikota
1 2 1
Denys Duchier , Brunelle Magnana Ekoukou , Yannick Parmentier , Simon
FONDAMENTALE
D'ORLEANS
1 2
Petitjean , Emmanuel Schang
1
LIFO, Université d’Orléans - 6, rue Léonard de Vinci 45067 Orléans Cedex 2 – France
{denys.duchier,yannick.parmentier,simon.petitjean}@univ-orleans.fr
2
LLL, Université d’Orléans - 10, rue de Tours 45067 Orléans Cedex 2 – France
{brunelle.magnana-ekoukou,emmanuel.schang}@univ-orleans.fr
Purpose
• Provide a formal description of the morphology of verbs in Ikota
• Automatically derive from this description a lexicon of inflected forms
• Lexicalized wide-coverage tree-grammars for natural languages are very large and extremely resource intensive to develop and maintain
• They can be automatically produced by software from a highly modular formal description called a metagrammar, introduced by [1]
• Metagrammars are much easier to develop and to maintain
• We propose to adopt a similar strategy to capture morphological generalizations over verbs in Ikota
Ikota language eXtensible MetaGrammar
Ikota is a Bantu language of Gabon and the Democratic XMG is normally used to describe lexicalized tree grammars. In other words, an XMG specification is a declarative description
Republic of Congo. It is threatened with extinction in of the tree-structures composing a grammar. This description relies on four main concepts:
Gabon, mainly because of its abandon for French. It • abstraction: the ability to associate a content with a name
shares many grammatical features with the Bantu lan- • contribution: the ability to accumulate information in any level of linguistic description
guages. Ikota is a tonal language with two registers (High • conjunction: the ability to combine pieces of information
and Low). It has ten noun classes and has a widespread • disjunction: the ability to non-deterministically select pieces of information
agreement in the NP.
Table: Ikota’s noun classes
Formally, one can define an XMG specification as follows:
Noun class prefix allomorphs
CL 1 mò-, Ø- `
mw-, n- Rule := Name → Content
CL 2 bà- b- Content := Contribution | Name | Content ∨ Content | Content ∧ Content
CL 3 mò-, Ø- `
mw-, n-
CL 4 mè-
CL 5 ì-, Ã- dy-
Metagrammar of Ikota verbal morphology
CL 6 mà- m-
CL 7 è-
CL 8 bè- Subj → 1 ← m ∨ 1 ← o ∨ ...
`
CL 9 Ø- p=1 p=2
CL 14 ò-, bò- bw- n = sg n = sg
Tense → 2 ← ´e ∨ 2 ← ´e ∨ 2 ← a
` ∨ 2 ← a` ∨ 2 ← ab´
´ ı
Verbs in Ikota tense = past tense = future tense = present tense = past tense = future
proxi = near proxi = ¬near proxi = imminent
Verbs consist of a lexical root (VR) and several affixes Active → 5 ← À ∨ 5 ← Á ∨ 4 ← ´bw` e E
distributed on each side of the VR. active = + active = + active = -
prog = - prog = +
Table: Verb formation 4 ← ÀK
Aspect → ∨
Subj- Tense- VR -(Aspect) -Active -(Proximal) tense = future tense = ¬future
prog = - prog = +
Table: Verbal forms of bòÃákà "to eat" Proximal → 6 ← nÁ ∨ 6 ← sÁ ∨ ∨
Subj. Tense VR Aspect Active Prox. Value proxi = day proxi = far proxi = none ∨ near proxi = imminent
tense = future
m- à- Ã -á present To-Eat → 3 ← Ã
m- à- Ã -á -ná past, yesterday vclass = g1
m- à- Ã -á -sá distant past To-Give → 3 ← w
m- é- Ã -á recent past vclass = g2
m- é- Ã -àk -à medium future VR → To-Eat ∨ To-Give
m- é- Ã -àk -à -ná future, tomorrow Verb → Subj ∧ Tense ∧ VR ∧ Aspect ∧ Active ∧ Proximal
m- é- Ã -àk -à -sá distant future
m- ábí- Ã -àk -à imminent future
Figure: A successful derivation for : o+´+Ã+ÀK+À+nÁ → o+´+Ã+ak+a+n´ → oÃ`k`n´
` e ` e ` ` a ´ a a a
The infinitival suffix aka at the lexical level can be re-
alized at the surface level by ákà, ´Ù`, ´k`, depending on
E E O O
Verb → Subj ∧ Tense ∧ VR ∧ Aspect ∧ Active ∧ Proximal
the verb class.
→ 1 ← o ∧` 2 ← ´
e ∧ 3 ← Ã ∧ 4 ← ÀK ∧ 5 ← À ∧ 6 ← nÁ
p=2 tense = future vclass = g1 tense = future active = + proxi = day
References n = sg prog = - prog = -
→ 1 ← o 2 ← ´ 3 ← Ã 4 ← ÀK 5 ← À 6 ← nÁ
` e
[1] Marie Candito, A Principle-Based Hierarchical Representation of p=2 prog = - tense = future vclass = g1
LTAGs, Proceedings of the 16th International Conference on Com- n = sg active = + proxi = day
putational Linguistics (COLING’96)
[2] Magnana Ekoukou, Brunelle, Morphologie nominale de l’ikota Figure: A failed derivation: clashes on tense and on prog
(B25): inventaire des classes nominales Mémoire de Master 2,
Université d’Orléans,2010
Verb → Subj ∧ Tense ∧ VR ∧ Aspect ∧ Active ∧ Proximal
[3] Piron, Pascale, Éléments de description du kota, langue bantoue
du Gabon, mémoire de licence spéciale africaine, Université Libre → 1 ← o ∧
` 2 ← ´e ∧ 3 ← Ã ∧ ∧ 5 ← À ∧ 6 ← nÁ
de Bruxelles, 1990 p=2 tense = future vclass = g1 tense = ¬future active = + proxi = day
n = sg prog = + prog = -
→ failure!
Conclusion
• We proposed a formal, albeit preliminary, declarative description of verbal morphology in Ikota, an arguably minority African language
• From this formal description, using XMG, we are able to automatically produce a lexicon of fully inflected verb forms with morphosyntactic features.
• Methodology : XMG helps to express ideas, test them quickly, and then validate the results against the available data
• Using the same tool, we will be able to also describe the syntax of the language using e.g. tree-adjoining grammars (the topic of an ongoing PhD thesis)
D. Duchier, B. Magnana Ekoukou, Y. Parmentier, S. Petitjean, E. Schang Describing Morphologically-rich Languages using Metagrammars: a Look at Verbs in Ikota. In: LREC-2012: SALTMIL-AfLaT Workshop on “Language technology for normalisation of less-resourced languages”