Portrait poster on
"Text matters but speech influences: A computational analysis of syntactic ambiguity resolution"
in CogSci 2020
Paper available at:
https://cognitivesciencesociety.org/cogsci20/papers/0448/index.html
1. Text Matters but Speech Influences:
A Computational Analysis of Syntactic Ambiguity Resolution
Won Ik Cho¹, Jeonghwa Cho², Woo Hyun Kang¹, Nam Soo Kim¹
Department of Electrical and Computer Engineering and INMC, Seoul National University¹
Department of Linguistics, University of Michigan, Ann Arbor²
E-mail: wicho@hi.snu.ac.kr, jeonghwa@umich.edu, whkang@hi.snu.ac.kr, nkim@snu.ac.kr
Introduction
- Syntactic ambiguity as a core task in SLU
• Prosody-sensitive matters in understanding the intention
• Disambiguation is challenging only with the text in some
wh-in-situ languages, such as ko/ja
- Prosody-syntax-semantics interface?
• Textual information is important, of course
• However, not fully determinable only with text
• Speech can help disambiguation!
- Research question:
• How does audio and text interact with each other while the
disambiguation of the syntactically ambiguous utterances?
• Can simulation-based results match neurological findings?
Related Work
- Works on the intonation depending on the sentence forms
• Declarative questions (Gunlogson, 2002)
• Intonation-dependent utterances (Yun, 2019; Cho et al., 2018)
- Works on syntactic ambiguity in Korean
• Datives (Hwang & Schafer, 2009)
• Comparatives (Kim & Sells, 2010)
• Attachment regarding genetives (Baek and Yun, 2018)
• Intention (Cho et al., 2019)
• Statement (e.g., Somebody is going to bring it)
• Yes/no question (e.g., Did somebody tell you to bring it)
• Wh- question (e.g., Who's going to bring it?)
• Rhetorical question (e.g., Who says I'll bring it)
• Command (e.g., Somebody told you to bring it)
• Request (e.g., Can somebody bring it)
• Rhetorical command (e.g., See who brings it)
• 1,292
(sentences)
• 3,552
(utterances)
Models
Results
- Train 9 / Test 1 (best score)
- Attention matters (1,2)
- Co-utilization of
audio-text matters (2,3)
- Over-stack brings
collapse of attention (4,5)
- Text matters but speech influences! (3,4a/b,5)
- Parallels better at checking nuance or rhetoricalness;
Cross-attentions at telling directives and wh-intervention
- ASR output also works, to probe real-world applicability
Conclusion
- We applied parallel bidirectional recurrent encoder (PBRE),
multi-hop attention (MHA), and cross-attention (CA) to
identify the speech intention of syntactically ambiguous
utterances and scrutinized them in cross-modal perspective.
- We analyzed the result in view of experimental linguistics,
linking the co-attention frameworks to neurological findings.
Explanation
- Genuine intention cannot be inferred unless the audio and
text are both given if the sentence has syntactic ambiguity
- Not only are phonological and lexical processing subsequent, but
also the related regions interchange the information with each
other via various language pathways
- Possibly do so more intensively when faced with ambiguous
utterance
- The information flow is not only restricted to phonological
information but can be extended to emotional information, as
required for understanding rhetorical utterances