文献紹介 (2016/07/22)
長岡技術科学大学 自然言語処理研究室
発表者: LY NAM PHONG
Finite-State Description of
Vietnamese Reduplication
Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP 2009, pages 63–69,
Suntec, Singapore, 6-7 August 2009. c 2009 ACL and AFNLP
Abstract
● first computational model for the reduplication
of the Vietnamese language.
● Reduplication is a popular phenomenon of
Vietnamese.
● Reduplicative word examples: hao hao (a little
similar), s ng s ng (high and majestic)...ừ ữ
Introduction
● Finite-state technology has been applied
successfully for describing the morphological
processes of many natural languages.
● But they are less suitable for non-concatenative
phenomena in some languages.
● Reduplication is a common phenomenon in
many Asian languages.
Vietnamese lexicon
● Language Type:
– Vietnamese is an isolating language which is characterized by the
following properties:
● Is a monosyllabic language
● Word forms never change
● Vocabulary:
– Simple words, which is monosyllabic.
– Compound words, composed by semantic coordination and by
semantic subodination.
– Reduplicative words composed by phonetic reduplication.
– Complex words phonetically transcribed from foreign languages.
Length # %
1
2
3
4
>=5
6303
28416
2259
2784
419
15.69
70.72
5.62
6.93
1.04
Total 40181 100
Table 1: Length of words measured in syllables
(The Vietnamese lexicon edited by Vietlex)
Vietnamese lexicon
● Syllables
– In addition of monosyllabic, Vietnamese is a tonal
language.
– Table 2: Vietnamese tones
No. Tones Notation
1.
2.
3.
Low falling
Creaky rising
Creaky falling
à
ã
ạ
4.
5.
6.
Mid level
Dipping
High rising
a
ả
á
Reduplication in Vietnamese
● This study can recognize three type of
reduplication, including full reduplication,
reduplication with tone according and
reduplication with final consonant according
đ đ , lăm lăm, ...ỏ ỏ
s ng s ng, đo đ , h h ...ừ ữ ỏ ơ ớ
c m c p, thiêm thi p, ...ầ ậ ế
Implemented
● Implemented finite-state sequential transducers
(FSTs) which are able to recognize and
produce corresponding types of reduplicative
words.
● To obtain a sequential FST which is able to
recognize three type of reduplication mentioned
above, they constructed three transducers.
First type transducer
● construct a deterministic FST f1 that produces
reduplicants from their roots in which the output
string labeled on each arc is the same as its
input character.
● the following minimal FST recognizes and
generates three words: luôn luôn (always), l lừ ừ
(silently), khàn khàn (raucous).
● f1 recognizing all 274 reduplicative words
consists of 90 states
Second type transducer
● Similarly, FST f2 which generates three
reduplicative words giông gi ng (a little similar),ố
đ ng đ ng (interminable) and đăm đ mằ ẵ ắ
(fixedly) is as follows:
● f2 recognizing all 307 reduplicative words
consists of 93 states.
Third type transducer
● FST f3 recognizing four reduplicative words
biêng bi c (bluish green), bi n bi t (leaveế ề ệ
behind no traces whatsoever), bình b ch (aị
series of thudding blows), bôm b p (pop pop):ố
● f3 recognizing all 232 reduplicative words of the
third type consists of 59 states.
A software Pakage
● They have developed a Java pakage named
vnReduplicator can recognize a substantial
amount of reduplicative bi-syllabic words found
in the Vietnamese language.
● Example input text: Anh đi bi n bi tề ệ . Cô v nẫ
ch anh h n 20 nămờ ơ đ ng đ ngằ ẵ .
Conclusion
● presented for the first time a computational
model for the reduplication.
● Current work does not handle partial
reduplication in which either the onset is
repeated or the rhyme and the tone of
syllablesare repeated, for example b ng b nhồ ề
(bob), chúm chím (open slightly one’s lips)…

Bunken2207

  • 1.
    文献紹介 (2016/07/22) 長岡技術科学大学 自然言語処理研究室 発表者: LYNAM PHONG Finite-State Description of Vietnamese Reduplication Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP 2009, pages 63–69, Suntec, Singapore, 6-7 August 2009. c 2009 ACL and AFNLP
  • 2.
    Abstract ● first computationalmodel for the reduplication of the Vietnamese language. ● Reduplication is a popular phenomenon of Vietnamese. ● Reduplicative word examples: hao hao (a little similar), s ng s ng (high and majestic)...ừ ữ
  • 3.
    Introduction ● Finite-state technologyhas been applied successfully for describing the morphological processes of many natural languages. ● But they are less suitable for non-concatenative phenomena in some languages. ● Reduplication is a common phenomenon in many Asian languages.
  • 4.
    Vietnamese lexicon ● LanguageType: – Vietnamese is an isolating language which is characterized by the following properties: ● Is a monosyllabic language ● Word forms never change ● Vocabulary: – Simple words, which is monosyllabic. – Compound words, composed by semantic coordination and by semantic subodination. – Reduplicative words composed by phonetic reduplication. – Complex words phonetically transcribed from foreign languages. Length # % 1 2 3 4 >=5 6303 28416 2259 2784 419 15.69 70.72 5.62 6.93 1.04 Total 40181 100 Table 1: Length of words measured in syllables (The Vietnamese lexicon edited by Vietlex)
  • 5.
    Vietnamese lexicon ● Syllables –In addition of monosyllabic, Vietnamese is a tonal language. – Table 2: Vietnamese tones No. Tones Notation 1. 2. 3. Low falling Creaky rising Creaky falling à ã ạ 4. 5. 6. Mid level Dipping High rising a ả á
  • 6.
    Reduplication in Vietnamese ●This study can recognize three type of reduplication, including full reduplication, reduplication with tone according and reduplication with final consonant according đ đ , lăm lăm, ...ỏ ỏ s ng s ng, đo đ , h h ...ừ ữ ỏ ơ ớ c m c p, thiêm thi p, ...ầ ậ ế
  • 7.
    Implemented ● Implemented finite-statesequential transducers (FSTs) which are able to recognize and produce corresponding types of reduplicative words. ● To obtain a sequential FST which is able to recognize three type of reduplication mentioned above, they constructed three transducers.
  • 8.
    First type transducer ●construct a deterministic FST f1 that produces reduplicants from their roots in which the output string labeled on each arc is the same as its input character. ● the following minimal FST recognizes and generates three words: luôn luôn (always), l lừ ừ (silently), khàn khàn (raucous). ● f1 recognizing all 274 reduplicative words consists of 90 states
  • 9.
    Second type transducer ●Similarly, FST f2 which generates three reduplicative words giông gi ng (a little similar),ố đ ng đ ng (interminable) and đăm đ mằ ẵ ắ (fixedly) is as follows: ● f2 recognizing all 307 reduplicative words consists of 93 states.
  • 10.
    Third type transducer ●FST f3 recognizing four reduplicative words biêng bi c (bluish green), bi n bi t (leaveế ề ệ behind no traces whatsoever), bình b ch (aị series of thudding blows), bôm b p (pop pop):ố ● f3 recognizing all 232 reduplicative words of the third type consists of 59 states.
  • 11.
    A software Pakage ●They have developed a Java pakage named vnReduplicator can recognize a substantial amount of reduplicative bi-syllabic words found in the Vietnamese language. ● Example input text: Anh đi bi n bi tề ệ . Cô v nẫ ch anh h n 20 nămờ ơ đ ng đ ngằ ẵ .
  • 12.
    Conclusion ● presented forthe first time a computational model for the reduplication. ● Current work does not handle partial reduplication in which either the onset is repeated or the rhyme and the tone of syllablesare repeated, for example b ng b nhồ ề (bob), chúm chím (open slightly one’s lips)…