Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Korea - SSML_workshop.ppt
1. 1
W3C Workshop on Internationalizing SSML
SSML Extension for Korean
Workshop : 2005/11/02 (Wed)
Sang-Jin Kim
sangjin@icu.ac.kr
2. 2
Contents
Characteristic of Korean
SSML Extension for Chinese Characters in Korean
SSML Extension for Homograph Words in Korean
Conclusion
3. 3
Characteristic of Korean
Hangul, The Korean Character
Consists of forty letters
21 vowels (including 13 diphthongs), and 19 consonants
Syllable
V, CV, VC, and CVC (C : consonant, V : vowel)
Eojeol, the word phrase is different from a phrase in English
Completely different from Japanese except for the grammatical
structure
Completely different from Chinese although Korean has
borrowed many Chinese words and some Chinese characters
4. 4
Characteristic of Korean
Vowels in Hangul, The Korean Character
Monothong vowels classified according to tongue position and
height
5. 5
Characteristic of Korean
Consonants in Hangul, The Korean Character
Consonants classified according to place and manner of
articulation
6. 6
SSML Extension for
Chinese Characters in Korean
Chinese Characters in Korean
Present Korean and Japanese use many Chinese Characters
But, pronunciation of the characters is different
Same characters is represented differently according to the
country
These simplified characters are not used in Korea
7. 7
SSML Extension for
Chinese Characters in Korean
Chinese Characters in Korean
We can write text only with Korean characters
Not unusual to use Chinese characters as well
The pronunciation of the are exactly same
8. 8
SSML Extension for
Chinese Characters in Korean
Chinese Characters in Korean TTS
The input text for text-to-speech(TTS) system has to be
converted into a phonetic list
If Chinese characters are mixed with Korean characters, they
have to be substituted to Korean
We don’t use all Chinese characters, rather there is a
frequently-used-Chinese-character-list recommended by our
Korean government and its size is 2000
We need to utilize this list and their pronunciations in the
Korean TTS system, since the pronunciations of them are
different from Chinese and Japanese
9. 9
SSML Extension for
Chinese Characters in Korean
SSML Extension for Chinese Characters in Korean
Same characters but different pronunciation in Chinese
Characters according to the country
<lexicon xml:lang=”ko” uri=”http://www.multilingual.org/lexicon.file”>
<lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_freq_KR.file”>
<lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_technical.file”>
<lexicon xml:lang=”ja-KR” uri=”http://www.multilingual.org/Chinese_lexicon_JP.file”>
<lexicon xml:lang=”cn-KR” uri=”http://www.multilingual.org/Chinese_lexicon_CN.file”>
10. 10
SSML Extension for
Homograph Words in Korean
Homograph Words in Korean
Same word, different pronunciation, different meaning
The difference is “duration”
11. 11
SSML Extension for
Homograph Words in Korean
SSML Extension for Homograph Words in Korean
Only the difference for these words is the duration in
pronunciation
necessary to give the duration information to a TTS system for
these kinds of words
SSML recommendation supports “say-as” element and “sub”
element, these elements cannot handle the above problem
successfully
12. 12
SSML Extension for
Homograph Words in Korean
SSML Extension for Homograph Words in Korean
We suggest “tone” tag for this problem
Attribute values for tone element are ‘long’, ‘short’ and ‘default’
would be enough for Korean.
13. 13
Conclusion
SSML Extension for Chinese Characters in Korean
lexicon element doesn’t support “xml:lang” tag
We suggest xml:lang=“ko”, xml:lang=“ko-CN”, xml:lang=“ja-
KR”, xml:lang=“cn-KR” tags
SSML Extension for Homograph Words in Korean
“say-as” and “sub” elements cannot handle homograph
problem successfully
We suggest “tone” element
Attribute values, type=“long”, type=“short”, and type=“default”
would be enough for Korean