Poster presentation for ICPhS 2019
"Prosody-semantic interface in Seoul Korean: Corpus for a Disambiguation of Wh- Intervention"
Presented by Jeonghwa Cho @Melbourne, Australia
1. Prosody-Semantic Interface in Seoul Korean:
Corpus for a Disambiguation of Wh- intervention
Won Ik Cho¹*, Jeonghwa Cho²*, Jeemin Kang², Nam Soo Kim¹
Department of Electrical and Computer Engineering and INMC¹,
Department of English Language and Literature²,
Seoul National University, Seoul, Korea
E-mail: wicho@hi.snu.ac.kr, {jeong9793, bling1104, nkim}@snu.ac.kr
Introduction
- Korean as a head-final and wh-in-situ language
• Sentence type is decided by sentence-final intonation for
some sentence enders (declarative vs. interrogative)
• Accent (pitch and duration) around wh-particles sometimes
makes them in-situ (as a existential quantifier)
• Overall prosody determines the presence of rhetoricalness
for some question/commands
- ...And so far in corpus-based approach?
• Little study done on comprehensive analysis on ambiguity
regarding various intention types (other than questions)
• Small amount of corpora dedicated to handling ambiguous
utterances
• ...Thus, uprising demand for a corpus on sentence
ambiguity in the areas of prosody-semantics, L2 acquisition
& computational linguistics
- How about constructing a corpus that contains ONLY the
utterances whose syntactic ambiguity can be resolved
by introducing prosody?
Corpus Generation
- Wh- particles
• 누구 (nwukwu, who), 뭐 (mwe, what),
어디 (eti, where), 언제 (encey, when),
어떻게 (ettehkey, how), 몇 (myech, how much)
• 왜 (way, why) was not utilized because it is not used as an
existential quantifier
- Predicates
• Depend on the wh- particle being adopted
• Chosen among 5,800 frequently used lexicons
• Pronouns and polarity items were added in some cases
- Reportive particles
• Added to form an evidential mood
• Induces rhetoricalness for some questions
- Sentence enders (SEs)
• SEs with an unfixed role (underspecified SE)
• e.g. -래 (ray), -어 (e), -지 (ci)
• SEs with a fixed role
• e.g. -까 (kka: interrogative)
- Politeness suffix
• Attached at the end of a sentence to assign politeness
• Restricts rhetoricalness under some circumstances
Tagging Intention
- Intention as an extended discourse component:
generation/tagging done by 3 Korean native speakers (by
unanimous consent)
• Statement (e.g., Somebody is going to bring it)
• Yes/no question (e.g., Did somebody tell you to bring it)
• Wh- question (e.g., Who's going to bring it?)
• Rhetorical question (e.g., Who says I'll bring it)
• Command (e.g., Somebody told you to bring it)
• Request (e.g., Can somebody bring it)
• Rhetorical command (e.g., See who brings it)
Corpus Description
- Specification
• 1,292 sentences
• 3,552 scripts tagged with intention
• Along with English translation
• Arranged in 1. the order of wh-particles, and then 2. the
lexicographical order of the predicates
• Evidentiality and politeness were tagged
• Affirmative questions were noted
• RQ/Cs were annotated with implicit meaning
• Recordings of male/female voice (7,104 in total)
- Statistics (in volume)
• Who > What > How much > Where > When > How
• The most frequent type: [S, YN, WH] (424/1,292)
• The least frequent type: [WH, R] (22/1,292)
• Wh- intervention occurs largely among how much sentences
• Rhetoricalness is the most frequent with when, and the least
frequent with how much
• Commands with how are scarce, since they result in unclear
instructions
Conclusion
- A new type of corpus, which consists of syntactically
ambiguous sentences that are disambiguated by introducing
prosody, is provided along with generation scheme
- Will be useful in qualitative studies on phonetics-syntax-
semantics interface and L2 acquisition, and quantitative
studies regarding computational linguistics