Warnikchow - Psybus keynote - 3i4k

Human Interface Laboratory
Lifeless Poor Grad students Annotate Furiously
부제: 음성 연구자의 자연어 삽질기
2018. 12. 01
Won Ik Cho

Contents
• 연사 소개
• 네? 자연어 처리요...?
• 의도란 무엇인가
• 언어의 장벽
• 한국어는 끝까지 들어봐야 안다
• Lifeless Poor Grad students Annotate Furiously
• 항상 더 좋은 솔루션은 있다
• 아카이빙 및 데이터 배포
• 느낀 점
1

연사 소개
• 조원익
 B.S. in EE/Mathematics (SNU, ’10~’14)
 Ph.D. student (SNU INMC, ‘14~)
• Academic background
 Interested in mathematics >> EE!
 Double major?
• Math is very difficult
• Circuit does not fit me
 Early years in Speech processing lab
• Source separation
• Voice activity & endpoint detection
• Automatic music composition
– Move onto language modeling?
2
https://github.com/warnikchow

네? 자연어 처리요...?
• New task?
 Development of free-running speech recognition technologies for
embedded robot system (funded by MOTIE)
 로봇용 free-running 임베디드 자연어 대화음성인식을 위한 원천 기술 개발
• In other words:
 Non wake-up-word based speech understanding system
 ...?
3
오늘 또
떨어졌네
이게 대체
며칠째
파란불이냐
지금 손실이
얼마지

의도란 무엇인가
• Intention과 Intent의 미묘한 차이
 Intent understanding and slot-filling
• More used in a domain-specific tasks
– e.g.) Liu and Lane, 2016
4

• Intention과 Intent의 미묘한 차이
 Intention understanding – more related to sentence semantics
• e.g.) Speech intention understanding (in Gu et al., 2017)
5

• Intention understanding – how?
 At a glance: by sentence types
• The way many systems for Korean are still built in (and many people use!)
• -하다 declarative
• -하니 interrogative
• -해(줘)라 imperative
6

 What is KEY in understanding sentence forms?
• Discourse component (Sadock and Zwicky, 1985; Portner, 2004)
7

 The studies on dialog act (Stolcke, 2000)
• About 40 acts are tagged for 200,000 utterances
• Actually we only need to detect the directives!
8

언어의 장벽
• 이걸 어떻게 잘 범-언어적으로 확장해 볼까?
 Manual tagging on Cornell movie corpus, by commands and non-
commands (Nov ’17)
• Commands와 non-commands로는 좀 부족하지 않느냐?
 Elaborate tagging with questions and rhetorical directives (Mar. ‘18)
• Inter-annotator agreement는 필수적이다
 Manual tagging by three other English L1 (or bilingual) speakers (May ‘18)
• 각 class의 이름이 비직관적이다
 Renaming the classes (Aug ‘18)
• 어디에 쓰는 지 잘 모르겠다
 Emphasize the utility for the free-running conversation-style dialog (.......)
• ... (양이 너무 많아져서 journal로 resubmit)
9

언어의 장벽
 Our approach (for Korean) (English version is under journal review)
10
단일 문장인가?
Intonation 정보로
결정 가능한가?
Question set이 있고
청자의 답을 필요로 하는가?
Effective한 To-do list가
청자에게 부여되는가?
No
Yes
No
Yes
요구 (Commands)
수사명령문 (RC)
Full clause를
포함하는가?
No
No
Compound sentence: 힘이 강한 화행에 중점
(서로 다른 문장도 같은 토픽일 때 한 문장으로 간주)
Fragments (FR)
질문 (Questions)
No
Context-dependent (CD)
Yes
Yes
Yes
Intonation 정보가
필요한가?
Yes
Intonation-dependent (ID)
No Questions /
Embedded form
Requirements /
Prohibitions
수사의문문 (RQ)
Target: single sentence
without context
nor punctuation
Otherwise
서술 (Statements)

언어의 장벽
11
This study is highly methodological rather than theoretical, and
may depend on the annotator/reader’s linguistic intuition!

한국어는 끝까지 들어봐야 안다
• Quite largely used, but difficult language
 What is word in Korean?
• Alphabet (Jaso) (ㄱ ㄴ ㄷ ...)
• Character (Morpho-syllabic block) (Korean: {Syllable:CV(C)})
• Morpheme
– Some morphological analyzers do not decompose characters (e.g. Twitter analyzer)
• Words (Eojeol) (the unit of segmentation)
– In Korean, ‘spacing’ is more frequently used
• Phrases
– Unlike English, the head of each phrase comes in the final place (Josa)
12
(Choi and Palmer, 2011)

• What kind of utterances should each class include?
 Five clear-cut cases (CCs)
• Statements
• Questions
• Commands
• Rhetorical questions
• Rhetorical commands
 How about the underspecified or ambiguous cases?
• Fragments (FRs)
• Intonation-dependent utterances (IUs)
13

• Fragments
 Single or compound noun
• ex) 페이스북, 국어사전, 발효 음식
• Utilized if the topic is relevant to the user
 Single noun phrase (possibly with drops of josa)
• ex) 상쾌한 아침, 청담동 가게
• Ones that be meaningful as greeting, but not for question/command
 Phrases without specific intention
• ex) 우리나라도, 무료로 열리는
 Unfinished sentences
• Mostly under 2 eojeols were counted
• Ones with underspecified sentence enders that might have a clear intention
were considered NOT as fragments
– 우리회사 저번 회식일이 언제인데
– 너희 은행 강도 들었다며
14

• Intonation-dependent utterances
 How to figure out if the utterances is intonation-dependent?
15
천천히 가고 있어! (utterance)
천천 히 가 고 있 어 (transcript)
question
statement
command
?

• Intonation-dependent utterances
 Underspecified sentence enders
• -어, -지, -대, -해, -라고, -다며, etc.
• Sentence type is determined based upon the sentence-final intonations that are
assigned considering the speech act
 Conversation maxim (Levinson, 2000)
• 정보성-원리 Informativeness-principle (단순화 버전)
– 화자: 필요한 것 이상으로 말하지 말라.
» Do not say more than is required (bearing the Q-principle in mind)
– 청자: 화자가 일반적으로 말한 것은 전형적으로 그리고 특칭적으로 해석하라.
» What is generally said is stereotypically and specifically exemplified.
 Wh-intervention
• 뭐 먹고 싶어
– What or something?
16

한국어는 끝까지 들어봐야 한다
• Introducing phonetic features: Intonation-dependency
 Annotating proper intention for possible cases of intonation
• 기본적으로 문말 억양을 고려함 (5가지 정도)
• 한 가지 의도에 여러 intonation이 가능하다면, 모두 tagging에 허용함 (그러나 한
가지 intonation에서 여러 intention이 가능한 경우는 ambiguous한 것으로 봄)
• 양태의 격률의 관점에서, 어색하게 해석될 수 있는 것들은 제외함 (부사, 수일치 등
과 관련하여). 비슷한 이유로, 질과 양의 격률을 고려하여 너무 많은 정보를 담고
있는 것을 질문으로 판단하는 것을 피함
• Wh-particle들이 의문사의 기능을 하지 않는 경우들을 조심함 (Q와 S의 구별이 될
수 있음. 다만 yes/no와 wh-를 구별 가능한 경우도 있는데, 이는 일단은 Q이지만
별도 분류하여 추후에 표기함)
• 많은 한국어 문장이 그렇듯 주어가 생략되어 1,2,3-인칭 등으로 해석할 수 있을 경
우에는, 각각을 대입해 보고, 어색하지 않은 것들로 판단함
• 호격의 유무에 주의함
17

• Corpus labeling
 Checking the inter-annotator agreement
• Fleiss’ Kappa (Fleiss, 1971)
– 𝑁 = 10, 𝑛 = 14, 𝑘 = 5
– 𝑝𝑗 = 𝑖 𝑛𝑖𝑗 /𝑁𝑛
– 𝑃𝑖 = 𝑗 𝑛𝑖𝑗
2
− 𝑛𝑖𝑗 /𝑛(𝑛 − 1)
– 𝑃𝑒 = 𝑗 𝑝𝑗
2
– 𝑃 = 𝑖 𝑃𝑖 /𝑁
– 𝐾 =
𝑃− 𝑃𝑒
1− 𝑃𝑒
=
0.378 −0.213
1 −0.213
= 0.210
18

• Corpus labeling
 IAA: 0.85 (Fleiss’ Kappa) with three Seoul Korean native annotators
• Manual tagging on Corpus 1 for checking IAA
19

• Approach in the paper: two-stage analysis
 Classify the sentence-final into five types
• Only the intonation for IP-final syllables
• Using LMH% and grouping the conventional 9-class approach (Jun, 2000)
 Train an additional network with two inputs: intonation & text
20

 Intonation classifier
• Manual tagging on 7,000 utterances
21

항상 더 좋은 솔루션은 있다
 Problems in: Wh-intervention?
• Needs disambiguation (under progress)
22
몇 개 가져오래
Should I bring some?
How many should I bring?
They told you to bring some?

항상 더 좋은 솔루션은 있다
• IU module
 Multimodal analysis approach (Gu et al., 2017)!
• 왜 이것을 쓰지 않았나?
– 일찍 생각하지 못했기 때문에...
• 물론 꼭 더 좋은 solution이라고 할 수는 없다
– 사람들의 prosody가 항상 비슷하지는 않다 (anomalous usage)
23

아카이빙 및 데이터 배포
• System overview
24

• FCI module as a text classifier
25

• Data and model distribution (with tutorial)
 https://github.com/warnikchow/3i4k & https://github.com/warnikchow/dlk2nlp
26

느낀 점
• 정확한 문제 정의와 좋은 데이터셋 만드는 게 70% 이상
• 언어직관과 관련된 task는 모국어가 아니면 하기 힘들다
 Annotation guideline 을 만드는 과정과 IAA의 체크 모두 중요
 그러니까 bilingual 아니면 영어 semantics 함부로 건들지 마세요!
• 데이터 묵혀 뒀다가 뭐에 쓰나요... 공개하여 contribute하자 (?)
 사실 기업과제가 아니라서 가능했던 것이다 ㅠㅠ
• 한국어 NLP 나름의 매력이 있다! 답이 안보여도 계속 합시ㄷ
 BTS 떡상과 통일한국 8000만 화자를 기원하며...
27

Reference (order of appearance)
• Liu, Bing, and Ian Lane. "Attention-based recurrent neural network models for joint intent
detection and slot filling." arXiv preprint arXiv:1609.01454 (2016).
• Gu, Yue, et al. "Speech intention classification with multimodal deep learning." Canadian
Conference on Artificial Intelligence. Springer, Cham, 2017.
• Sadock, Jerrold M., and Arnold M. Zwicky. "Speech act distinctions in syntax." Language
typology and syntactic description 1 (1985): 155-196.
• Portner, Paul. "The semantics of imperatives within a theory of clause types." Semantics and
linguistic theory. Vol. 14. 2004.
• Stolcke, Andreas, et al. "Dialogue act modeling for automatic tagging and recognition of
conversational speech." Computational linguistics 26.3 (2000): 339-373.
• Choi, Jinho D., and Martha Palmer. "Statistical dependency parsing in Korean: From corpus
generation to automatic parsing." Proceedings of the Second Workshop on Statistical Parsing of
Morphologically Rich Languages. Association for Computational Linguistics, 2011.
• Levinson, Stephen C. Presumptive meanings: The theory of generalized conversational
implicature. MIT press, 2000.
• Fleiss, Joseph L. "Measuring nominal scale agreement among many raters." Psychological
bulletin 76.5 (1971): 378.
• Jun, Sun-Ah. "K-ToBI (Korean ToBI) labelling conventions (version 3.1, October 2000)." UCLA
working papers in phonetics (2000): 149-173.
28

Thank you!
End_of_presentation

Warnikchow - Psybus keynote - 3i4k

Recommended

Recommended

More Related Content

Similar to Warnikchow - Psybus keynote - 3i4k

Similar to Warnikchow - Psybus keynote - 3i4k (13)

More from WarNik Chow

More from WarNik Chow (20)

Warnikchow - Psybus keynote - 3i4k

Editor's Notes