SlideShare a Scribd company logo
Human Interface Laboratory
Towards Speech Intention Understanding
in Korean
2018. 12. 06
Won Ik Cho
Contents
• 연사 소개
• Introduction 및 task 소개
• 의도 파악의 문제 정의
• Annotation guideline 및 코퍼스 구축
• Intonation-aided intention identification (for Korean)
• Demonstration
• 활용 방안 및 summary
• 질의응답
1
연사 소개
• 조원익
 B.S. in EE/Mathematics (SNU, ’10~’14)
 Ph.D. student (SNU INMC, ‘14~)
• Academic background
 Interested in mathematics >> EE!
 Double major?
• Math is very difficult
• Circuit does not fit me
 Early years in Speech processing lab
• Source separation
• Voice activity & endpoint detection
• Automatic music composition
– Move onto language modeling?
2
https://github.com/warnikchow
Introduction
• New task?
 Development of free-running speech recognition technologies for
embedded robot system (funded by MOTIE)
 로봇용 free-running 임베디드 자연어 대화음성인식을 위한 원천 기술 개발
• In other words:
 Non wake-up-word based speech understanding system
 ...?
3
오늘 또
떨어졌네
이게 대체
며칠째
파란불이냐
지금 손실이
얼마지
Introduction
• How?
 Related to many aspects of (speaker-dependent) speech recognition
• Speaker-dependency (in terms of a personal assistant)
• Noisy far-talk recognition and beamforming
• Speech intention understanding
– To which utterances should AI react?
4
오늘 또
떨어졌네
이게 대체
며칠째
파란불이냐
지금 손실이
얼마지
Introduction
• Speech intention understanding
 Defining what ‘intention’ is
• Discourse components
• Speech act
• Rhetoricalness
 Making up annotation guideline
 Introducing phonetic features
• Intonation-dependency
• Sentence-final intonations
• Multimodal approaches
5
의도 파악의 문제 정의
• Intention과 Intent의 미묘한 차이
 Intent understanding and slot-filling
• More used in a domain-specific tasks
– e.g.) Liu and Lane, 2016
6
의도 파악의 문제 정의
• Intention과 Intent의 미묘한 차이
 Intention understanding – more related to sentence semantics
• e.g.) Speech intention understanding (in Gu et al., 2017)
7
의도 파악의 문제 정의
• Intention understanding – how?
 At a glance: by sentence types
• The way many systems for Korean are still built in (and many people use!)
• -하다 declarative
• -하니 interrogative
• -해(줘)라 imperative
8
의도 파악의 문제 정의
• Intention understanding – how?
 What is KEY in understanding sentence forms?
• Discourse component (Sadock and Zwicky, 1985; Portner, 2004)
9
의도 파악의 문제 정의
• Intention understanding – how?
 Discourse component does not exactly match with speech act!
• What is speech act?
– Locutionary act
– Illocutionary act
– Perlocutionary act
 Typology by Searle (1975)
• 대언 행위 Representatives
• 지시 행위 Directives: 화자가 청자로 하여금 어떤 일을 하도록 의도하는 행위
• 위임 행위 Commissives
• 표출 행위 Expressives
• 선언 행위 Declarations (or declaratives)
 일반적으로 문장 형식이 위의 행위를 결정하는 경우 직접화행 (direct speech
act) 이라 하지만, 그렇지 않은 경우 (indirect speech act) 도 있다
10
의도 파악의 문제 정의
• Intention understanding – how?
 The studies on dialog act (Stolcke, 2000)
• About 40 acts are tagged for 200,000 utterances
11
의도 파악의 문제 정의
• Intention understanding – how?
 The studies on communication function (Bunt, 2010)
12
의도 파악의 문제 정의
• Intention understanding – how?
 Very elaborate ... for Korean?
• depends on paper
(Lee, 1997; Kim, 2008)
13
의도 파악의 문제 정의
• Intention understanding – how?
 DAs for Korean still requires additional information
• Context is indispensable ...
– but how can we extract the core role of a single utterances?
14
(Kim, 1999)
(Lee, 1998)
의도 파악의 문제 정의
• Intention understanding – how?
 Searle & Discourse component revisited
• Recent tagging methodologies
– Situation entity types, Tweet acts
15
(Friedrich et al., 2016; Vosoughi and Roy, 2015)
의도 파악의 문제 정의
• Intention understanding – how?
 Our approach (for Korean) (English version is under journal review)
16
단일 문장인가?
Intonation 정보로
결정 가능한가?
Question set이 있고
청자의 답을 필요로 하는가?
Effective한 To-do list가
청자에게 부여되는가?
No
Yes
No
Yes
요구 (Commands)
수사명령문 (RC)
Full clause를
포함하는가?
No
No
Compound sentence: 힘이 강한 화행에 중점
(서로 다른 문장도 같은 토픽일 때 한 문장으로 간주)
Fragments (FR)
질문 (Questions)
No
Context-dependent (CD)
Yes
Yes
Yes
Intonation 정보가
필요한가?
Yes
Intonation-dependent (ID)
No Questions /
Embedded form
Requirements /
Prohibitions
수사의문문 (RQ)
Target: single sentence
without context
nor punctuation
Otherwise
서술 (Statements)
Annotation guideline 및 코퍼스 구축
17
This study is highly methodological rather than theoretical, and
may depend on the annotator/reader’s linguistic intuition!
Annotation guideline 및 코퍼스 구축
• What kind of utterances should each class include?
 Five clear-cut cases (CCs)
• Statements
• Questions
• Commands
• Rhetorical questions
• Rhetorical commands
 How about the underspecified or ambiguous cases?
• Fragments (FRs)
• Intonation-dependent utterances (IUs)
18
Annotation guideline 및 코퍼스 구축
• Fragments
 Single or compound noun
• ex) 페이스북, 국어사전, 발효 음식
• Utilized if the topic is relevant to the user
 Single noun phrase (possibly with drops of josa)
• ex) 상쾌한 아침, 청담동 가게
• Ones that be meaningful as greeting, but not for question/command
 Phrases without specific intention
• ex) 우리나라도, 무료로 열리는
 Unfinished sentences
• Mostly under 2 eojeols were counted
• Ones with underspecified sentence enders that might have a clear intention
were considered NOT as fragments
– 우리회사 저번 회식일이 언제인데
– 너희 은행 강도 들었다며
19
Annotation guideline 및 코퍼스 구축
• Questions
 Assume sentences with non-rhetorical QS
 Basic concepts include yes/no question, the ones with wh- particles, and
the embedded form within structure of declaratives
• 이후 audio 처리를 하게 된다면 yes/no 및 wh-question으로 나뉠 수 있음
• Alternative questions도 있으나 그 portion이 크지 않음
– ex) 왼쪽으로 갈까 오른쪽으로 갈까
• ex) 수술 후유증으로 입원하셨던 거 잘 치료 되셨나요, 어떤 종류의 카메라를 샀는
데요, 이 편지는 저한테 온 건가요, 경부 고속도로 지체구간은 어디지
20
Annotation guideline 및 코퍼스 구축
• Commands
 Assume sentences with non-rhetorical TDL
 Include orders (-해라, -해줘), requests (-해줄래), exhortatives (-하자, -해보
자)
• ‘너가 공부해야 된다고 생각해’, 혹은 ‘너보고 공부하라고 했다’처럼 화자가 청자에
게 직접적으로 의무를 부여할 경우?
• ‘엄마가 너보고 공부하라고 했다’처럼 agent와 청자의 권력관계가 파악될 수 있는
경우?
 Requests that are relevant to questions were classified as commands
• ex) 세탁기 잘 돌아가는지 알려줘
 Negative commands (prohibitions) are also taken into account
 Exhortatives which are relevant to statements were not considered as
commands
• ex) 나도 일등 좀 해보자
21
Annotation guideline 및 코퍼스 구축
• Commands (cont’d)
 Imperatives in conditional conjunction are included regarding the content
• ex) 당장 그 손을 떼지 않으면 죽음을 면치 못할 것이다
• 조건절을 수반한 명령문 (Conditionalized imperatives) 의 경우, 조건절이 To-do-
list를 무효화시키는 경우 (‘쏠 테면 쏴봐’ = ‘넌 날 쏘지 못할 것이다’) 가 아닌 경우
(‘두시 되면 나 좀 깨워줘’ 등) 를 요구에 포함.
– 이 경우, permission으로 해석될 여지가 있음. 예컨대, ‘피곤하면 집에 가’
– permission으로 해석되는 경우는 rhetorical command에 분류
 Fragment의 형태이나 통상적으로 명령으로 쓰일 수 있는 것들도 포함
• ex) 확인하시고 공구 할인에 즉시 참가해 보세요, 좀 조용히 좀 하지 전화하는데,
오늘 온 메일 모두 지워줄 수 있니, 엘루이 호텔 특실 예약 부탁해, 입출금이 자유
로운 통장도 하나 있으면 좋아, 다음 신호등 받고 유턴
 질문의 의도를 가지지만 명령문인 것 (~알려줘, 찾아줘, 검색해줘, 말해줘 등)
은 질문에 포함
22
Annotation guideline 및 코퍼스 구축
• Rhetorical questions
 Sentences representing the user’s point (astonishment, refusal,
disappointment, etc.) by suggesting a QS that does not require an answer
• 놀람, 화남, 질책 등을 주로 표현함
• ex) 하루가 멀다 하고 왜 이러니, 뭐 하다가 이제야 연락하니
 Possibly including tag questions
• ex) 정말 아름답지 그렇지 않니
 Considered to be semantically similar to statements
 애매한 부분들
• -겠지: 스스로 확신이 있는 종류의 자문 (RQ)
• -다며/-겠죠: 상대에게 확인을 구함 (Q)
• -지요/-건데: 자문보다 서술/계획에 가까움 (S)
23
Annotation guideline 및 코퍼스 구축
• Rhetorical commands
 Sentences with idiomatic expressions such as wish or regret, including the
ones that draw attention or show exclamation, by suggesting a non-
mandatory TDL
• ex) 더운데 건강 조심하세요, 그러던지 말던지 네 마음대로 해, 내 정신 좀 봐
 Considered to be semantically similar to statements
 예의상 하는 말과 요구의 경계가 애매한 경우가 있음
• 명백히 무엇을 조심할지, 언제 만날지 나와 있는 경우 요구로 판단
• ex) 빙판길 조심하세요, 있다가 여섯 시에 뵙겠습니다
24
Annotation guideline 및 코퍼스 구축
• Statements
 질문, 요구의 의도가 없고,
 수사의문/수사명령이 아니며,
 Intonation에 좌우되지 않고,
 Fragment이거나 ambiguous하지 않은 문장들
• 상당히 많은 부분을 차지
• Compound sentence에서 질문/요구 등과 함께 올 경우 force를 갖지 않는 것으로
처리 (둘 중 어떤 것과 같이 오는지, 어떤 순서로 오는지에 따라 의미 차이가 생김)
– ex) 너무 추운데 문좀 열어줘
25
Annotation guideline 및 코퍼스 구축
• Intonation-dependent utterances
 How to figure out if the utterances is intonation-dependent?
26
천천히 가고 있어! (utterance)
천천 히 가 고 있 어 (transcript)
question
statement
command
?
Annotation guideline 및 코퍼스 구축
• Intonation-dependent utterances
 Underspecified sentence enders
• -어, -지, -대, -해, -라고, -다며, etc.
• Sentence type is determined based upon the sentence-final intonations that are
assigned considering the speech act
 Conversation maxim (Levinson, 2000)
• 정보성-원리 Informativeness-principle (단순화 버전)
– 화자: 필요한 것 이상으로 말하지 말라.
» Do not say more than is required (bearing the Q-principle in mind)
– 청자: 화자가 일반적으로 말한 것은 전형적으로 그리고 특칭적으로 해석하라.
» What is generally said is stereotypically and specifically exemplified.
 Wh-intervention
• 뭐 먹고 싶어
– What or something?
27
Annotation guideline 및 코퍼스 구축
• Corpus labeling
 Checking the inter-annotator agreement
• Fleiss’ Kappa (Fleiss, 1971)
– 𝑁 = 10, 𝑛 = 14, 𝑘 = 5
– 𝑝𝑗 = 𝑖 𝑛𝑖𝑗 /𝑁𝑛
– 𝑃𝑖 = 𝑗 𝑛𝑖𝑗
2
− 𝑛𝑖𝑗 /𝑛(𝑛 − 1)
– 𝑃𝑒 = 𝑗 𝑝𝑗
2
– 𝑃 = 𝑖 𝑃𝑖 /𝑁
– 𝐾 =
𝑃− 𝑃𝑒
1− 𝑃𝑒
=
0.378 −0.213
1 −0.213
= 0.210
28
Annotation guideline 및 코퍼스 구축
• Corpus labeling
 IAA: 0.85 (Fleiss’ Kappa) with three Seoul Korean native annotators
• Manual tagging on Corpus 1 for checking IAA
29
Annotation guideline 및 코퍼스 구축
• Introducing phonetic features: Intonation-dependency
 Annotating proper intention for possible cases of intonation
• 기본적으로 문말 억양을 고려함 (5가지 정도)
• 한 가지 의도에 여러 intonation이 가능하다면, 모두 tagging에 허용함 (그러나 한
가지 intonation에서 여러 intention이 가능한 경우는 ambiguous한 것으로 봄)
• 양태의 격률의 관점에서, 어색하게 해석될 수 있는 것들은 제외함 (부사, 수일치 등
과 관련하여). 비슷한 이유로, 질과 양의 격률을 고려하여 너무 많은 정보를 담고
있는 것을 질문으로 판단하는 것을 피함
• Wh-particle들이 의문사의 기능을 하지 않는 경우들을 조심함 (Q와 S의 구별이 될
수 있음. 다만 yes/no와 wh-를 구별 가능한 경우도 있는데, 이는 일단은 Q이지만
별도 분류하여 추후에 표기함)
• 많은 한국어 문장이 그렇듯 주어가 생략되어 1,2,3-인칭 등으로 해석할 수 있을 경
우에는, 각각을 대입해 보고, 어색하지 않은 것들로 판단함
• 호격의 유무에 주의함
30
Annotation guideline 및 코퍼스 구축
• Approach in the paper: two-stage analysis
 Classify the sentence-final into five types
• Only the intonation for IP-final syllables
• Using LMH% and grouping the conventional 9-class approach (Jun, 2000)
 Train an additional network with two inputs: intonation & text
31
Annotation guideline 및 코퍼스 구축
• Approach in the paper: two-stage analysis
 Classify the sentence-final into five types
• Target input-output with selected sentences
32
Annotation guideline 및 코퍼스 구축
• Recent approaches in multimodal analysis (Gu et al., 2017)
 Utilizes a concatenated structure of text and audio featurization
33
Intonation-aided intention identification
• System overview
34
Intonation-aided intention identification
• FCI module
 Basically a text classification (7-class)
• Text input, label output
 Assumes a perfect ASR transcript
• Does not require the audio feature yet
 Utilizes traditional NLP approaches
• What will be effective feature for the intention identification?
35
Intonation-aided intention identification
• Basic NLP techniques:
Embedding words and sentences to numeric values
 Sparse and dense word vectors
• One-hot encoding
• Term frequency-Inverse document frequency (TF-IDF)
• Distributional semantics
– Word2Vec (possibly GloVe, fastText…)
» Bag-of-Means
 Linear and non-linear classifiers
• Support vector machine (SVM)
• Logistic regression (LR)
• Neural network classifiers (NN)
36
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Sparse sentence vectors
• One-hot encoding
– Sparse vector with dictionary-size dimension
– Binarized term occurrence (0,1) depending on the presence
– All words equidistance, so normalization is extra-important
– Different from frequency vector
37
(https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Sparse sentence vectors
• TF-IDF: Multiplication of term frequency and inverse document frequency
– TF: the frequency of the word among the whole corpus
– IDF: how much information the word provides; or if the term is common or rare
across all documents (inverse fraction of the documents containing the term)
– Logarithmic function applied to prevent explosion
38
(https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Dense word and sentence vectors: Word2vec (Mikolov, 2013)
• “You shall know a word by the company it keeps” (J. R. Firth 1957:11, from CS224n)
• Basic idea
– Define a model that assigns prediction between a center word 𝑤𝑡 and 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 :
𝑃(𝑐𝑜𝑛𝑡𝑒𝑥𝑡|𝑤𝑡)
– Loss function 𝐽 = 1 − 𝑃(𝑤−𝑡|𝑤𝑡)
– Keep adjusting the vector representation of words to minimize the loss
– Skip-grams
» Training objective: to learn word vector
representations that are good at
predicting the nearby words
39
Maximize
with
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Dense word and sentence vectors: Word2vec to Bag-of-Means
• Sentence vector = Averaging the word vectors?
• Disadvantage: distributional/sequential information can be omitted
40
(Le and Mikolov, 2014)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Linear and non-linear classifiers
• Support vector machine (SVM)
– Separating the input data
by using kernel transformation
– Deterministic
– Appropriate for small # of features, small dataset
– Computation issue for large dataset
• Logistic regression (LR)
– Maximize the posterior class probability
– Probabilistic
– Appropriate for large # of features, large dataset
• Neural network (NN)
– Takes advantage of both approaches
» Utilizing sigmoid/softmax activation
respectively to binary/multi-classification
– Slower to train
41
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Dense word and sentence vectors: Word2vec to CNN summarizer
• Convolutional neural network (CNN)
– Summarizer of distributional information
– Primarily suggested for image classification
– Skims local information by striding windows, extracts important feature, and
aggregates them into a final abstract summarizing layer
– Similarly applied to a sentence (padded vector sequence)
– Can be applied to either word level or character level
42
(Kim, 2014)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Dense word and sentence vectors: Word2vec to RNN summarizer
• Recurrent neural networks (RNN)
– Summarizer of sequential information
– Appropriate for handling time-series data
– Captures non-consequent components
– High performance, but
also highly complex
– LSTM used to prevent
vanishing gradient
– Bidirectional structure are
popularly used (BiLSTM)
– Final hidden state output and
weighted sum of hidden states
both utilized for summarization
43
(Cornegruta et. al, 2016)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Structured self-attentive BiLSTM (Lin et al., 2017)
44
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 Korean NLP >> What is word?
• Alphabet (Jaso) (ㄱ ㄴ ㄷ ...)
• Character (Morpho-syllabic block) (Korean: {Syllable:CV(C)})
• Morpheme
– Some morphological analyzers do not decompose characters (e.g. Twitter analyzer)
• Words (Eojeol) (the unit of segmentation)
– In Korean, ‘spacing’ is more frequently used
• Phrases
– Unlike English, the head of each phrase comes in the final place (Josa)
45
(Choi and Palmer, 2011)
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 How can we compensate errors?
• ASR errors
– Current devices perform good, but for noisy environment?
• Out-of-vocabulary (OOV)
– Which many spoken language contain
• Incorrectness of morphological analyzer
– Although recent modules score high accuracy, the result may not be reliable if
spoken language (which incorporates scrambling) and many OOV are engaged in
 Character embedding?
• Robust with the errors regarding words (morphemes)
– Might be a better solution for noisy text
46
Intonation-aided intention identification
• Embedding words and sentences to numeric values
 How can we conduct character embedding?
• Hangul blocks >> Word piece model?
– Not only by decomposing into morphemes;
– But also the blocks themselves can be morphemes or even words!
• fastText (Bojanowski et al., 2016) – subword n-grams are utilized
– Dictionary: fastText skip-gram 100dim on Drama script corpus (of size 2M)
– Model released
» https://github.com/warnikchow/raws
» Contains about 160K words (or subwords)
» Contains about 2,500 characters (syllables)
• BiLSTM-self attention on right-arranged character sequence (acc: 88.82%, F1:
0.8021)
– Model released
» https://github.com/warnikchow/3i4k
47
Intonation-aided intention identification
• IU module: Two-stage approach:
 Intonation classifier - Acoustic features for speech analysis?
• Traditional approaches: GMM-HMM, power, on/offset pitch
• Recent DL-based approaches: f0 contour + NNs
• Suitability for Korean
– stress vs. syllable-timedness >> augment RMSE?
48
Intonation-aided intention identification
• IU module: Two-stage approach:
 Intonation classifier
• Manual tagging on 7,000 utterances
49
Intonation-aided intention identification
• IU module: Two-stage approach:
 Intonation classifier: https://github.com/warnikchow/korinto
• Concatenation of CNN and BiLSTM Self-attention
50
Intonation-aided intention identification
• IU module: Two-stage approach:
 Intonation-aided intention identifier
• One-hot encoded Intonation label as an attention source
51
Intonation-aided intention identification
• Approach in the paper: two-stage analysis
 Problems in: Wh-intervention?
• Needs disambiguation (under progress)
52
몇 개 가져오래
Should I bring some?
How many should I bring?
They told you to bring some?
Intonation-aided intention identification
• IU module
 Multimodal analysis revisited!
• 왜 이것을 쓰지 않았나?
– 일찍 생각하지 못했기 때문에...
• 물론 꼭 더 좋은 solution이라고 할 수는 없다
– 사람들의 prosody가 항상 비슷하지는 않다 (anomalous usage)
53
Demonstration
• FCI module as a text classifier
54
Demonstration
• Data and model distribution (with tutorial)
 https://github.com/warnikchow/3i4k & https://github.com/warnikchow/dlk2nlp
55
활용 방안
• As a real-time intention identifier of the utterances
• As a corpus auto-labeler (semi-supervised learning?)
• As a new annotating scheme
56
활용 방안
• 개선점?
 First of all, multimodal approach can be adopted
 Elaborate classification on question and command (under progress)
• Yes/no and wh- questions
• Orders and requests (command in interrogative form)
 Current system targets Seoul Korean; how about dialects?
57
Summary
• ‘의도 파악’ task라고 할 때, 문제의 정의가 중요
• Sentence form만으로는 의도를 결정할 수 없음
• Annotation guideline 을 만드는 과정과 IAA의 체크 모두 중요
• 음성인식 오류 보정을 위해 character, 나아가 alphabet-level
processing이 효과적일 수 있음
• 한국어를 다룰 때는 intonation-dependency와 wh-intervention을
모두 고려해야 함
58
Reference (order of appearance)
• Liu, Bing, and Ian Lane. "Attention-based recurrent neural network models for joint intent detection and slot
filling." arXiv preprint arXiv:1609.01454 (2016).
• Gu, Yue, et al. "Speech intention classification with multimodal deep learning." Canadian Conference on Artificial
Intelligence. Springer, Cham, 2017.
• Sadock, Jerrold M., and Arnold M. Zwicky. "Speech act distinctions in syntax." Language typology and syntactic
description 1 (1985): 155-196.
• Portner, Paul. "The semantics of imperatives within a theory of clause types." Semantics and linguistic theory.
Vol. 14. 2004.
• Searle, John R. "A classification of illocutionary acts." Language in society 5.1 (1976): 1-23.
• Stolcke, Andreas, et al. "Dialogue act modeling for automatic tagging and recognition of conversational
speech." Computational linguistics 26.3 (2000): 339-373.
• Bunt, Harry, et al. "Towards an ISO standard for dialogue act annotation." Seventh conference on International
Language Resources and Evaluation (LREC'10). 2010.
• 이현정, 서정연. "한국어 대화체 문장의 화행 분석." 한국정보과학회 학술발표논문집 24.2Ⅱ (1997): 259-262.
• 김세종, 이용훈, 이종혁. "이전 문장 자질과 다음 발화의 후보 화행을 이용한 한국어 화행 분석." 정보과학회논문지: 소프
트웨어 및 응용 35.6 (2008): 374-385.
• 이현정, 이재원, 서정연. "자동통역을 위한 한국어 대화 문장의 화행 분석 모델." 정보과학회논문지 (B) 25.10 (1998):
1443-1452.
• 이성욱, 서정연. "결정트리를 이용한 한국어 화행 분석." 한국정보과학회 언어공학연구회 학술발표 논문집 (1999): 377-
381.
• Friedrich, Annemarie, Alexis Palmer, and Manfred Pinkal. "Situation entity types: automatic classification of
clause-level aspect." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers). Vol. 1. 2016.
59
Reference (order of appearance)
• Vosoughi, Soroush, and Deb Roy. "Tweet Acts: A Speech Act Classifier for Twitter." ICWSM. 2016.
• Levinson, Stephen C. Presumptive meanings: The theory of generalized conversational implicature. MIT press,
2000.
• Fleiss, Joseph L. "Measuring nominal scale agreement among many raters." Psychological bulletin 76.5 (1971):
378.
• Jun, Sun-Ah. "K-ToBI (Korean ToBI) labelling conventions (version 3.1, October 2000)." UCLA working papers in
phonetics (2000): 149-173.
• Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances
in neural information processing systems. 2013.
• Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International
Conference on Machine Learning. 2014.
• Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
• Cornegruta, Savelie, et al. "Modelling radiological language with bidirectional long short-term memory
networks." arXiv preprint arXiv:1609.08409 (2016).
• Lin, Zhouhan, et al. "A structured self-attentive sentence embedding." arXiv preprint arXiv:1703.03130 (2017).
• Choi, Jinho D., and Martha Palmer. "Statistical dependency parsing in Korean: From corpus generation to
automatic parsing." Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich
Languages. Association for Computational Linguistics, 2011.
• Sennrich, Rico, Barry Haddow, and Alexandra Birch. "Neural machine translation of rare words with subword
units." arXiv preprint arXiv:1508.07909 (2015).
• Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv preprint arXiv:1607.04606
(2016).
60
Thank you!
End_of_presentation

More Related Content

Similar to Warnikchow - Naver Tech Talk - 3i4k

How to Create a Golden Ontology
How to Create a Golden OntologyHow to Create a Golden Ontology
How to Create a Golden Ontology
Mike Bennett
 
語言議題
語言議題語言議題
語言議題
Kevin Chun-Hsien Hsu
 
- Logic - Module 1B - Logic and Propositions course lactur .pdf
- Logic - Module 1B - Logic and Propositions course lactur .pdf- Logic - Module 1B - Logic and Propositions course lactur .pdf
- Logic - Module 1B - Logic and Propositions course lactur .pdf
MehdiHassan67
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Saurabh Kaushik
 
Oral IV course introduction (week 1)
Oral IV course introduction (week 1)Oral IV course introduction (week 1)
Oral IV course introduction (week 1)
Ron Martinez
 
Vocabulary i
Vocabulary iVocabulary i
Vocabulary i
Bruno Sampaio Garrido
 
The Three Grammars
The Three GrammarsThe Three Grammars
The Three Grammars
Letra Essencia
 
Testing speaking
Testing speakingTesting speaking
Testing speaking
M B
 
輪読:単語認知1・前半 (関西学院大学・金澤)
輪読:単語認知1・前半 (関西学院大学・金澤)輪読:単語認知1・前半 (関西学院大学・金澤)
輪読:単語認知1・前半 (関西学院大学・金澤)
Yu Kanazawa / Osaka University
 
Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4
pasaportealmundo
 
chapter2 Know.representation.pptx
chapter2 Know.representation.pptxchapter2 Know.representation.pptx
chapter2 Know.representation.pptx
wendifrawtadesse1
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
Kerem Morgül
 
Writing-a-Speech (1).pptx
Writing-a-Speech (1).pptxWriting-a-Speech (1).pptx
Writing-a-Speech (1).pptx
RominaFarina4
 
Corpus presentation app ling
Corpus presentation  app lingCorpus presentation  app ling
Corpus presentation app ling
Michael Carroll
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
Marcis Pinnis
 
TOK ESSAY EDITED dp 1guide for ibdp.pptx
TOK ESSAY EDITED dp 1guide for ibdp.pptxTOK ESSAY EDITED dp 1guide for ibdp.pptx
TOK ESSAY EDITED dp 1guide for ibdp.pptx
ManuhBiamah
 
1. Describe physical security threats to the United States as a re.docx
1. Describe physical security threats to the United States as a re.docx1. Describe physical security threats to the United States as a re.docx
1. Describe physical security threats to the United States as a re.docx
jeremylockett77
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
at Poltekkes Kemenkes Surakarta
 
CH 3_Linguistics of SLA.pdf
CH 3_Linguistics of SLA.pdfCH 3_Linguistics of SLA.pdf
CH 3_Linguistics of SLA.pdf
VATHVARY
 
Orientation for website
Orientation for websiteOrientation for website
Orientation for website
Shusaku Wada
 

Similar to Warnikchow - Naver Tech Talk - 3i4k (20)

How to Create a Golden Ontology
How to Create a Golden OntologyHow to Create a Golden Ontology
How to Create a Golden Ontology
 
語言議題
語言議題語言議題
語言議題
 
- Logic - Module 1B - Logic and Propositions course lactur .pdf
- Logic - Module 1B - Logic and Propositions course lactur .pdf- Logic - Module 1B - Logic and Propositions course lactur .pdf
- Logic - Module 1B - Logic and Propositions course lactur .pdf
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Oral IV course introduction (week 1)
Oral IV course introduction (week 1)Oral IV course introduction (week 1)
Oral IV course introduction (week 1)
 
Vocabulary i
Vocabulary iVocabulary i
Vocabulary i
 
The Three Grammars
The Three GrammarsThe Three Grammars
The Three Grammars
 
Testing speaking
Testing speakingTesting speaking
Testing speaking
 
輪読:単語認知1・前半 (関西学院大学・金澤)
輪読:単語認知1・前半 (関西学院大学・金澤)輪読:単語認知1・前半 (関西学院大学・金澤)
輪読:単語認知1・前半 (関西学院大学・金澤)
 
Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4
 
chapter2 Know.representation.pptx
chapter2 Know.representation.pptxchapter2 Know.representation.pptx
chapter2 Know.representation.pptx
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Writing-a-Speech (1).pptx
Writing-a-Speech (1).pptxWriting-a-Speech (1).pptx
Writing-a-Speech (1).pptx
 
Corpus presentation app ling
Corpus presentation  app lingCorpus presentation  app ling
Corpus presentation app ling
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
TOK ESSAY EDITED dp 1guide for ibdp.pptx
TOK ESSAY EDITED dp 1guide for ibdp.pptxTOK ESSAY EDITED dp 1guide for ibdp.pptx
TOK ESSAY EDITED dp 1guide for ibdp.pptx
 
1. Describe physical security threats to the United States as a re.docx
1. Describe physical security threats to the United States as a re.docx1. Describe physical security threats to the United States as a re.docx
1. Describe physical security threats to the United States as a re.docx
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
 
CH 3_Linguistics of SLA.pdf
CH 3_Linguistics of SLA.pdfCH 3_Linguistics of SLA.pdf
CH 3_Linguistics of SLA.pdf
 
Orientation for website
Orientation for websiteOrientation for website
Orientation for website
 

More from WarNik Chow

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
WarNik Chow
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
WarNik Chow
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
WarNik Chow
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
WarNik Chow
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
WarNik Chow
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
WarNik Chow
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
WarNik Chow
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
WarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
WarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
WarNik Chow
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
WarNik Chow
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
WarNik Chow
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
WarNik Chow
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
WarNik Chow
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
WarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
WarNik Chow
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
WarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
WarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
WarNik Chow
 

More from WarNik Chow (20)

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 

Recently uploaded

DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptxDBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
Tulasi72
 
Rockets and missiles notes engineering ppt
Rockets and missiles notes engineering pptRockets and missiles notes engineering ppt
Rockets and missiles notes engineering ppt
archithaero
 
Online fraud prediction and prevention.pptx
Online fraud prediction and prevention.pptxOnline fraud prediction and prevention.pptx
Online fraud prediction and prevention.pptx
madihasultana209
 
Business Development_ Identifying and Seizing Market Opportunities with Skyle...
Business Development_ Identifying and Seizing Market Opportunities with Skyle...Business Development_ Identifying and Seizing Market Opportunities with Skyle...
Business Development_ Identifying and Seizing Market Opportunities with Skyle...
Skyler Bloom
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
Tool and Die Tech
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
Blesson Easo Varghese
 
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagneEAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
idelewebmestre
 
Jet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdfJet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdf
KIET Group of Institutions
 
Introduction to IP address concept - Computer Networking
Introduction to IP address concept - Computer NetworkingIntroduction to IP address concept - Computer Networking
Introduction to IP address concept - Computer Networking
Md.Shohel Rana ( M.Sc in CSE Khulna University of Engineering & Technology (KUET))
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
T.D. Shashikala
 
The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
engrasjadshahzad
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
rawankhanlove256
 
readers writers Problem in operating system
readers writers Problem in operating systemreaders writers Problem in operating system
readers writers Problem in operating system
VADAPALLYPRAVEENKUMA1
 
printing of ic circuits.pdf
printing       of        ic     circuits.pdfprinting       of        ic     circuits.pdf
printing of ic circuits.pdf
chidambaramnatarajar
 
RECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptxRECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptx
peacesoul123
 
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmtlecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
RAtna29
 
Top EPC companies in India - Best EPC Contractor
Top EPC companies in India - Best EPC  ContractorTop EPC companies in India - Best EPC  Contractor
Top EPC companies in India - Best EPC Contractor
MangeshK6
 
Metrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical EngineeringMetrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical Engineering
leakingvideo
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
Kamal Acharya
 
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdfRed Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
mdfkobir
 

Recently uploaded (20)

DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptxDBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
 
Rockets and missiles notes engineering ppt
Rockets and missiles notes engineering pptRockets and missiles notes engineering ppt
Rockets and missiles notes engineering ppt
 
Online fraud prediction and prevention.pptx
Online fraud prediction and prevention.pptxOnline fraud prediction and prevention.pptx
Online fraud prediction and prevention.pptx
 
Business Development_ Identifying and Seizing Market Opportunities with Skyle...
Business Development_ Identifying and Seizing Market Opportunities with Skyle...Business Development_ Identifying and Seizing Market Opportunities with Skyle...
Business Development_ Identifying and Seizing Market Opportunities with Skyle...
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
 
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagneEAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
EAAP2023 : Durabilité et services écosystémiques de l'élevage ovin de montagne
 
Jet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdfJet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdf
 
Introduction to IP address concept - Computer Networking
Introduction to IP address concept - Computer NetworkingIntroduction to IP address concept - Computer Networking
Introduction to IP address concept - Computer Networking
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
 
The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
readers writers Problem in operating system
readers writers Problem in operating systemreaders writers Problem in operating system
readers writers Problem in operating system
 
printing of ic circuits.pdf
printing       of        ic     circuits.pdfprinting       of        ic     circuits.pdf
printing of ic circuits.pdf
 
RECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptxRECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptx
 
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmtlecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
 
Top EPC companies in India - Best EPC Contractor
Top EPC companies in India - Best EPC  ContractorTop EPC companies in India - Best EPC  Contractor
Top EPC companies in India - Best EPC Contractor
 
Metrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical EngineeringMetrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical Engineering
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
 
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdfRed Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
 

Warnikchow - Naver Tech Talk - 3i4k

  • 1. Human Interface Laboratory Towards Speech Intention Understanding in Korean 2018. 12. 06 Won Ik Cho
  • 2. Contents • 연사 소개 • Introduction 및 task 소개 • 의도 파악의 문제 정의 • Annotation guideline 및 코퍼스 구축 • Intonation-aided intention identification (for Korean) • Demonstration • 활용 방안 및 summary • 질의응답 1
  • 3. 연사 소개 • 조원익  B.S. in EE/Mathematics (SNU, ’10~’14)  Ph.D. student (SNU INMC, ‘14~) • Academic background  Interested in mathematics >> EE!  Double major? • Math is very difficult • Circuit does not fit me  Early years in Speech processing lab • Source separation • Voice activity & endpoint detection • Automatic music composition – Move onto language modeling? 2 https://github.com/warnikchow
  • 4. Introduction • New task?  Development of free-running speech recognition technologies for embedded robot system (funded by MOTIE)  로봇용 free-running 임베디드 자연어 대화음성인식을 위한 원천 기술 개발 • In other words:  Non wake-up-word based speech understanding system  ...? 3 오늘 또 떨어졌네 이게 대체 며칠째 파란불이냐 지금 손실이 얼마지
  • 5. Introduction • How?  Related to many aspects of (speaker-dependent) speech recognition • Speaker-dependency (in terms of a personal assistant) • Noisy far-talk recognition and beamforming • Speech intention understanding – To which utterances should AI react? 4 오늘 또 떨어졌네 이게 대체 며칠째 파란불이냐 지금 손실이 얼마지
  • 6. Introduction • Speech intention understanding  Defining what ‘intention’ is • Discourse components • Speech act • Rhetoricalness  Making up annotation guideline  Introducing phonetic features • Intonation-dependency • Sentence-final intonations • Multimodal approaches 5
  • 7. 의도 파악의 문제 정의 • Intention과 Intent의 미묘한 차이  Intent understanding and slot-filling • More used in a domain-specific tasks – e.g.) Liu and Lane, 2016 6
  • 8. 의도 파악의 문제 정의 • Intention과 Intent의 미묘한 차이  Intention understanding – more related to sentence semantics • e.g.) Speech intention understanding (in Gu et al., 2017) 7
  • 9. 의도 파악의 문제 정의 • Intention understanding – how?  At a glance: by sentence types • The way many systems for Korean are still built in (and many people use!) • -하다 declarative • -하니 interrogative • -해(줘)라 imperative 8
  • 10. 의도 파악의 문제 정의 • Intention understanding – how?  What is KEY in understanding sentence forms? • Discourse component (Sadock and Zwicky, 1985; Portner, 2004) 9
  • 11. 의도 파악의 문제 정의 • Intention understanding – how?  Discourse component does not exactly match with speech act! • What is speech act? – Locutionary act – Illocutionary act – Perlocutionary act  Typology by Searle (1975) • 대언 행위 Representatives • 지시 행위 Directives: 화자가 청자로 하여금 어떤 일을 하도록 의도하는 행위 • 위임 행위 Commissives • 표출 행위 Expressives • 선언 행위 Declarations (or declaratives)  일반적으로 문장 형식이 위의 행위를 결정하는 경우 직접화행 (direct speech act) 이라 하지만, 그렇지 않은 경우 (indirect speech act) 도 있다 10
  • 12. 의도 파악의 문제 정의 • Intention understanding – how?  The studies on dialog act (Stolcke, 2000) • About 40 acts are tagged for 200,000 utterances 11
  • 13. 의도 파악의 문제 정의 • Intention understanding – how?  The studies on communication function (Bunt, 2010) 12
  • 14. 의도 파악의 문제 정의 • Intention understanding – how?  Very elaborate ... for Korean? • depends on paper (Lee, 1997; Kim, 2008) 13
  • 15. 의도 파악의 문제 정의 • Intention understanding – how?  DAs for Korean still requires additional information • Context is indispensable ... – but how can we extract the core role of a single utterances? 14 (Kim, 1999) (Lee, 1998)
  • 16. 의도 파악의 문제 정의 • Intention understanding – how?  Searle & Discourse component revisited • Recent tagging methodologies – Situation entity types, Tweet acts 15 (Friedrich et al., 2016; Vosoughi and Roy, 2015)
  • 17. 의도 파악의 문제 정의 • Intention understanding – how?  Our approach (for Korean) (English version is under journal review) 16 단일 문장인가? Intonation 정보로 결정 가능한가? Question set이 있고 청자의 답을 필요로 하는가? Effective한 To-do list가 청자에게 부여되는가? No Yes No Yes 요구 (Commands) 수사명령문 (RC) Full clause를 포함하는가? No No Compound sentence: 힘이 강한 화행에 중점 (서로 다른 문장도 같은 토픽일 때 한 문장으로 간주) Fragments (FR) 질문 (Questions) No Context-dependent (CD) Yes Yes Yes Intonation 정보가 필요한가? Yes Intonation-dependent (ID) No Questions / Embedded form Requirements / Prohibitions 수사의문문 (RQ) Target: single sentence without context nor punctuation Otherwise 서술 (Statements)
  • 18. Annotation guideline 및 코퍼스 구축 17 This study is highly methodological rather than theoretical, and may depend on the annotator/reader’s linguistic intuition!
  • 19. Annotation guideline 및 코퍼스 구축 • What kind of utterances should each class include?  Five clear-cut cases (CCs) • Statements • Questions • Commands • Rhetorical questions • Rhetorical commands  How about the underspecified or ambiguous cases? • Fragments (FRs) • Intonation-dependent utterances (IUs) 18
  • 20. Annotation guideline 및 코퍼스 구축 • Fragments  Single or compound noun • ex) 페이스북, 국어사전, 발효 음식 • Utilized if the topic is relevant to the user  Single noun phrase (possibly with drops of josa) • ex) 상쾌한 아침, 청담동 가게 • Ones that be meaningful as greeting, but not for question/command  Phrases without specific intention • ex) 우리나라도, 무료로 열리는  Unfinished sentences • Mostly under 2 eojeols were counted • Ones with underspecified sentence enders that might have a clear intention were considered NOT as fragments – 우리회사 저번 회식일이 언제인데 – 너희 은행 강도 들었다며 19
  • 21. Annotation guideline 및 코퍼스 구축 • Questions  Assume sentences with non-rhetorical QS  Basic concepts include yes/no question, the ones with wh- particles, and the embedded form within structure of declaratives • 이후 audio 처리를 하게 된다면 yes/no 및 wh-question으로 나뉠 수 있음 • Alternative questions도 있으나 그 portion이 크지 않음 – ex) 왼쪽으로 갈까 오른쪽으로 갈까 • ex) 수술 후유증으로 입원하셨던 거 잘 치료 되셨나요, 어떤 종류의 카메라를 샀는 데요, 이 편지는 저한테 온 건가요, 경부 고속도로 지체구간은 어디지 20
  • 22. Annotation guideline 및 코퍼스 구축 • Commands  Assume sentences with non-rhetorical TDL  Include orders (-해라, -해줘), requests (-해줄래), exhortatives (-하자, -해보 자) • ‘너가 공부해야 된다고 생각해’, 혹은 ‘너보고 공부하라고 했다’처럼 화자가 청자에 게 직접적으로 의무를 부여할 경우? • ‘엄마가 너보고 공부하라고 했다’처럼 agent와 청자의 권력관계가 파악될 수 있는 경우?  Requests that are relevant to questions were classified as commands • ex) 세탁기 잘 돌아가는지 알려줘  Negative commands (prohibitions) are also taken into account  Exhortatives which are relevant to statements were not considered as commands • ex) 나도 일등 좀 해보자 21
  • 23. Annotation guideline 및 코퍼스 구축 • Commands (cont’d)  Imperatives in conditional conjunction are included regarding the content • ex) 당장 그 손을 떼지 않으면 죽음을 면치 못할 것이다 • 조건절을 수반한 명령문 (Conditionalized imperatives) 의 경우, 조건절이 To-do- list를 무효화시키는 경우 (‘쏠 테면 쏴봐’ = ‘넌 날 쏘지 못할 것이다’) 가 아닌 경우 (‘두시 되면 나 좀 깨워줘’ 등) 를 요구에 포함. – 이 경우, permission으로 해석될 여지가 있음. 예컨대, ‘피곤하면 집에 가’ – permission으로 해석되는 경우는 rhetorical command에 분류  Fragment의 형태이나 통상적으로 명령으로 쓰일 수 있는 것들도 포함 • ex) 확인하시고 공구 할인에 즉시 참가해 보세요, 좀 조용히 좀 하지 전화하는데, 오늘 온 메일 모두 지워줄 수 있니, 엘루이 호텔 특실 예약 부탁해, 입출금이 자유 로운 통장도 하나 있으면 좋아, 다음 신호등 받고 유턴  질문의 의도를 가지지만 명령문인 것 (~알려줘, 찾아줘, 검색해줘, 말해줘 등) 은 질문에 포함 22
  • 24. Annotation guideline 및 코퍼스 구축 • Rhetorical questions  Sentences representing the user’s point (astonishment, refusal, disappointment, etc.) by suggesting a QS that does not require an answer • 놀람, 화남, 질책 등을 주로 표현함 • ex) 하루가 멀다 하고 왜 이러니, 뭐 하다가 이제야 연락하니  Possibly including tag questions • ex) 정말 아름답지 그렇지 않니  Considered to be semantically similar to statements  애매한 부분들 • -겠지: 스스로 확신이 있는 종류의 자문 (RQ) • -다며/-겠죠: 상대에게 확인을 구함 (Q) • -지요/-건데: 자문보다 서술/계획에 가까움 (S) 23
  • 25. Annotation guideline 및 코퍼스 구축 • Rhetorical commands  Sentences with idiomatic expressions such as wish or regret, including the ones that draw attention or show exclamation, by suggesting a non- mandatory TDL • ex) 더운데 건강 조심하세요, 그러던지 말던지 네 마음대로 해, 내 정신 좀 봐  Considered to be semantically similar to statements  예의상 하는 말과 요구의 경계가 애매한 경우가 있음 • 명백히 무엇을 조심할지, 언제 만날지 나와 있는 경우 요구로 판단 • ex) 빙판길 조심하세요, 있다가 여섯 시에 뵙겠습니다 24
  • 26. Annotation guideline 및 코퍼스 구축 • Statements  질문, 요구의 의도가 없고,  수사의문/수사명령이 아니며,  Intonation에 좌우되지 않고,  Fragment이거나 ambiguous하지 않은 문장들 • 상당히 많은 부분을 차지 • Compound sentence에서 질문/요구 등과 함께 올 경우 force를 갖지 않는 것으로 처리 (둘 중 어떤 것과 같이 오는지, 어떤 순서로 오는지에 따라 의미 차이가 생김) – ex) 너무 추운데 문좀 열어줘 25
  • 27. Annotation guideline 및 코퍼스 구축 • Intonation-dependent utterances  How to figure out if the utterances is intonation-dependent? 26 천천히 가고 있어! (utterance) 천천 히 가 고 있 어 (transcript) question statement command ?
  • 28. Annotation guideline 및 코퍼스 구축 • Intonation-dependent utterances  Underspecified sentence enders • -어, -지, -대, -해, -라고, -다며, etc. • Sentence type is determined based upon the sentence-final intonations that are assigned considering the speech act  Conversation maxim (Levinson, 2000) • 정보성-원리 Informativeness-principle (단순화 버전) – 화자: 필요한 것 이상으로 말하지 말라. » Do not say more than is required (bearing the Q-principle in mind) – 청자: 화자가 일반적으로 말한 것은 전형적으로 그리고 특칭적으로 해석하라. » What is generally said is stereotypically and specifically exemplified.  Wh-intervention • 뭐 먹고 싶어 – What or something? 27
  • 29. Annotation guideline 및 코퍼스 구축 • Corpus labeling  Checking the inter-annotator agreement • Fleiss’ Kappa (Fleiss, 1971) – 𝑁 = 10, 𝑛 = 14, 𝑘 = 5 – 𝑝𝑗 = 𝑖 𝑛𝑖𝑗 /𝑁𝑛 – 𝑃𝑖 = 𝑗 𝑛𝑖𝑗 2 − 𝑛𝑖𝑗 /𝑛(𝑛 − 1) – 𝑃𝑒 = 𝑗 𝑝𝑗 2 – 𝑃 = 𝑖 𝑃𝑖 /𝑁 – 𝐾 = 𝑃− 𝑃𝑒 1− 𝑃𝑒 = 0.378 −0.213 1 −0.213 = 0.210 28
  • 30. Annotation guideline 및 코퍼스 구축 • Corpus labeling  IAA: 0.85 (Fleiss’ Kappa) with three Seoul Korean native annotators • Manual tagging on Corpus 1 for checking IAA 29
  • 31. Annotation guideline 및 코퍼스 구축 • Introducing phonetic features: Intonation-dependency  Annotating proper intention for possible cases of intonation • 기본적으로 문말 억양을 고려함 (5가지 정도) • 한 가지 의도에 여러 intonation이 가능하다면, 모두 tagging에 허용함 (그러나 한 가지 intonation에서 여러 intention이 가능한 경우는 ambiguous한 것으로 봄) • 양태의 격률의 관점에서, 어색하게 해석될 수 있는 것들은 제외함 (부사, 수일치 등 과 관련하여). 비슷한 이유로, 질과 양의 격률을 고려하여 너무 많은 정보를 담고 있는 것을 질문으로 판단하는 것을 피함 • Wh-particle들이 의문사의 기능을 하지 않는 경우들을 조심함 (Q와 S의 구별이 될 수 있음. 다만 yes/no와 wh-를 구별 가능한 경우도 있는데, 이는 일단은 Q이지만 별도 분류하여 추후에 표기함) • 많은 한국어 문장이 그렇듯 주어가 생략되어 1,2,3-인칭 등으로 해석할 수 있을 경 우에는, 각각을 대입해 보고, 어색하지 않은 것들로 판단함 • 호격의 유무에 주의함 30
  • 32. Annotation guideline 및 코퍼스 구축 • Approach in the paper: two-stage analysis  Classify the sentence-final into five types • Only the intonation for IP-final syllables • Using LMH% and grouping the conventional 9-class approach (Jun, 2000)  Train an additional network with two inputs: intonation & text 31
  • 33. Annotation guideline 및 코퍼스 구축 • Approach in the paper: two-stage analysis  Classify the sentence-final into five types • Target input-output with selected sentences 32
  • 34. Annotation guideline 및 코퍼스 구축 • Recent approaches in multimodal analysis (Gu et al., 2017)  Utilizes a concatenated structure of text and audio featurization 33
  • 36. Intonation-aided intention identification • FCI module  Basically a text classification (7-class) • Text input, label output  Assumes a perfect ASR transcript • Does not require the audio feature yet  Utilizes traditional NLP approaches • What will be effective feature for the intention identification? 35
  • 37. Intonation-aided intention identification • Basic NLP techniques: Embedding words and sentences to numeric values  Sparse and dense word vectors • One-hot encoding • Term frequency-Inverse document frequency (TF-IDF) • Distributional semantics – Word2Vec (possibly GloVe, fastText…) » Bag-of-Means  Linear and non-linear classifiers • Support vector machine (SVM) • Logistic regression (LR) • Neural network classifiers (NN) 36
  • 38. Intonation-aided intention identification • Embedding words and sentences to numeric values  Sparse sentence vectors • One-hot encoding – Sparse vector with dictionary-size dimension – Binarized term occurrence (0,1) depending on the presence – All words equidistance, so normalization is extra-important – Different from frequency vector 37 (https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html)
  • 39. Intonation-aided intention identification • Embedding words and sentences to numeric values  Sparse sentence vectors • TF-IDF: Multiplication of term frequency and inverse document frequency – TF: the frequency of the word among the whole corpus – IDF: how much information the word provides; or if the term is common or rare across all documents (inverse fraction of the documents containing the term) – Logarithmic function applied to prevent explosion 38 (https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html)
  • 40. Intonation-aided intention identification • Embedding words and sentences to numeric values  Dense word and sentence vectors: Word2vec (Mikolov, 2013) • “You shall know a word by the company it keeps” (J. R. Firth 1957:11, from CS224n) • Basic idea – Define a model that assigns prediction between a center word 𝑤𝑡 and 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 : 𝑃(𝑐𝑜𝑛𝑡𝑒𝑥𝑡|𝑤𝑡) – Loss function 𝐽 = 1 − 𝑃(𝑤−𝑡|𝑤𝑡) – Keep adjusting the vector representation of words to minimize the loss – Skip-grams » Training objective: to learn word vector representations that are good at predicting the nearby words 39 Maximize with
  • 41. Intonation-aided intention identification • Embedding words and sentences to numeric values  Dense word and sentence vectors: Word2vec to Bag-of-Means • Sentence vector = Averaging the word vectors? • Disadvantage: distributional/sequential information can be omitted 40 (Le and Mikolov, 2014)
  • 42. Intonation-aided intention identification • Embedding words and sentences to numeric values  Linear and non-linear classifiers • Support vector machine (SVM) – Separating the input data by using kernel transformation – Deterministic – Appropriate for small # of features, small dataset – Computation issue for large dataset • Logistic regression (LR) – Maximize the posterior class probability – Probabilistic – Appropriate for large # of features, large dataset • Neural network (NN) – Takes advantage of both approaches » Utilizing sigmoid/softmax activation respectively to binary/multi-classification – Slower to train 41
  • 43. Intonation-aided intention identification • Embedding words and sentences to numeric values  Dense word and sentence vectors: Word2vec to CNN summarizer • Convolutional neural network (CNN) – Summarizer of distributional information – Primarily suggested for image classification – Skims local information by striding windows, extracts important feature, and aggregates them into a final abstract summarizing layer – Similarly applied to a sentence (padded vector sequence) – Can be applied to either word level or character level 42 (Kim, 2014)
  • 44. Intonation-aided intention identification • Embedding words and sentences to numeric values  Dense word and sentence vectors: Word2vec to RNN summarizer • Recurrent neural networks (RNN) – Summarizer of sequential information – Appropriate for handling time-series data – Captures non-consequent components – High performance, but also highly complex – LSTM used to prevent vanishing gradient – Bidirectional structure are popularly used (BiLSTM) – Final hidden state output and weighted sum of hidden states both utilized for summarization 43 (Cornegruta et. al, 2016)
  • 45. Intonation-aided intention identification • Embedding words and sentences to numeric values  Structured self-attentive BiLSTM (Lin et al., 2017) 44
  • 46. Intonation-aided intention identification • Embedding words and sentences to numeric values  Korean NLP >> What is word? • Alphabet (Jaso) (ㄱ ㄴ ㄷ ...) • Character (Morpho-syllabic block) (Korean: {Syllable:CV(C)}) • Morpheme – Some morphological analyzers do not decompose characters (e.g. Twitter analyzer) • Words (Eojeol) (the unit of segmentation) – In Korean, ‘spacing’ is more frequently used • Phrases – Unlike English, the head of each phrase comes in the final place (Josa) 45 (Choi and Palmer, 2011)
  • 47. Intonation-aided intention identification • Embedding words and sentences to numeric values  How can we compensate errors? • ASR errors – Current devices perform good, but for noisy environment? • Out-of-vocabulary (OOV) – Which many spoken language contain • Incorrectness of morphological analyzer – Although recent modules score high accuracy, the result may not be reliable if spoken language (which incorporates scrambling) and many OOV are engaged in  Character embedding? • Robust with the errors regarding words (morphemes) – Might be a better solution for noisy text 46
  • 48. Intonation-aided intention identification • Embedding words and sentences to numeric values  How can we conduct character embedding? • Hangul blocks >> Word piece model? – Not only by decomposing into morphemes; – But also the blocks themselves can be morphemes or even words! • fastText (Bojanowski et al., 2016) – subword n-grams are utilized – Dictionary: fastText skip-gram 100dim on Drama script corpus (of size 2M) – Model released » https://github.com/warnikchow/raws » Contains about 160K words (or subwords) » Contains about 2,500 characters (syllables) • BiLSTM-self attention on right-arranged character sequence (acc: 88.82%, F1: 0.8021) – Model released » https://github.com/warnikchow/3i4k 47
  • 49. Intonation-aided intention identification • IU module: Two-stage approach:  Intonation classifier - Acoustic features for speech analysis? • Traditional approaches: GMM-HMM, power, on/offset pitch • Recent DL-based approaches: f0 contour + NNs • Suitability for Korean – stress vs. syllable-timedness >> augment RMSE? 48
  • 50. Intonation-aided intention identification • IU module: Two-stage approach:  Intonation classifier • Manual tagging on 7,000 utterances 49
  • 51. Intonation-aided intention identification • IU module: Two-stage approach:  Intonation classifier: https://github.com/warnikchow/korinto • Concatenation of CNN and BiLSTM Self-attention 50
  • 52. Intonation-aided intention identification • IU module: Two-stage approach:  Intonation-aided intention identifier • One-hot encoded Intonation label as an attention source 51
  • 53. Intonation-aided intention identification • Approach in the paper: two-stage analysis  Problems in: Wh-intervention? • Needs disambiguation (under progress) 52 몇 개 가져오래 Should I bring some? How many should I bring? They told you to bring some?
  • 54. Intonation-aided intention identification • IU module  Multimodal analysis revisited! • 왜 이것을 쓰지 않았나? – 일찍 생각하지 못했기 때문에... • 물론 꼭 더 좋은 solution이라고 할 수는 없다 – 사람들의 prosody가 항상 비슷하지는 않다 (anomalous usage) 53
  • 55. Demonstration • FCI module as a text classifier 54
  • 56. Demonstration • Data and model distribution (with tutorial)  https://github.com/warnikchow/3i4k & https://github.com/warnikchow/dlk2nlp 55
  • 57. 활용 방안 • As a real-time intention identifier of the utterances • As a corpus auto-labeler (semi-supervised learning?) • As a new annotating scheme 56
  • 58. 활용 방안 • 개선점?  First of all, multimodal approach can be adopted  Elaborate classification on question and command (under progress) • Yes/no and wh- questions • Orders and requests (command in interrogative form)  Current system targets Seoul Korean; how about dialects? 57
  • 59. Summary • ‘의도 파악’ task라고 할 때, 문제의 정의가 중요 • Sentence form만으로는 의도를 결정할 수 없음 • Annotation guideline 을 만드는 과정과 IAA의 체크 모두 중요 • 음성인식 오류 보정을 위해 character, 나아가 alphabet-level processing이 효과적일 수 있음 • 한국어를 다룰 때는 intonation-dependency와 wh-intervention을 모두 고려해야 함 58
  • 60. Reference (order of appearance) • Liu, Bing, and Ian Lane. "Attention-based recurrent neural network models for joint intent detection and slot filling." arXiv preprint arXiv:1609.01454 (2016). • Gu, Yue, et al. "Speech intention classification with multimodal deep learning." Canadian Conference on Artificial Intelligence. Springer, Cham, 2017. • Sadock, Jerrold M., and Arnold M. Zwicky. "Speech act distinctions in syntax." Language typology and syntactic description 1 (1985): 155-196. • Portner, Paul. "The semantics of imperatives within a theory of clause types." Semantics and linguistic theory. Vol. 14. 2004. • Searle, John R. "A classification of illocutionary acts." Language in society 5.1 (1976): 1-23. • Stolcke, Andreas, et al. "Dialogue act modeling for automatic tagging and recognition of conversational speech." Computational linguistics 26.3 (2000): 339-373. • Bunt, Harry, et al. "Towards an ISO standard for dialogue act annotation." Seventh conference on International Language Resources and Evaluation (LREC'10). 2010. • 이현정, 서정연. "한국어 대화체 문장의 화행 분석." 한국정보과학회 학술발표논문집 24.2Ⅱ (1997): 259-262. • 김세종, 이용훈, 이종혁. "이전 문장 자질과 다음 발화의 후보 화행을 이용한 한국어 화행 분석." 정보과학회논문지: 소프 트웨어 및 응용 35.6 (2008): 374-385. • 이현정, 이재원, 서정연. "자동통역을 위한 한국어 대화 문장의 화행 분석 모델." 정보과학회논문지 (B) 25.10 (1998): 1443-1452. • 이성욱, 서정연. "결정트리를 이용한 한국어 화행 분석." 한국정보과학회 언어공학연구회 학술발표 논문집 (1999): 377- 381. • Friedrich, Annemarie, Alexis Palmer, and Manfred Pinkal. "Situation entity types: automatic classification of clause-level aspect." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2016. 59
  • 61. Reference (order of appearance) • Vosoughi, Soroush, and Deb Roy. "Tweet Acts: A Speech Act Classifier for Twitter." ICWSM. 2016. • Levinson, Stephen C. Presumptive meanings: The theory of generalized conversational implicature. MIT press, 2000. • Fleiss, Joseph L. "Measuring nominal scale agreement among many raters." Psychological bulletin 76.5 (1971): 378. • Jun, Sun-Ah. "K-ToBI (Korean ToBI) labelling conventions (version 3.1, October 2000)." UCLA working papers in phonetics (2000): 149-173. • Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. • Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International Conference on Machine Learning. 2014. • Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014). • Cornegruta, Savelie, et al. "Modelling radiological language with bidirectional long short-term memory networks." arXiv preprint arXiv:1609.08409 (2016). • Lin, Zhouhan, et al. "A structured self-attentive sentence embedding." arXiv preprint arXiv:1703.03130 (2017). • Choi, Jinho D., and Martha Palmer. "Statistical dependency parsing in Korean: From corpus generation to automatic parsing." Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages. Association for Computational Linguistics, 2011. • Sennrich, Rico, Barry Haddow, and Alexandra Birch. "Neural machine translation of rare words with subword units." arXiv preprint arXiv:1508.07909 (2015). • Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv preprint arXiv:1607.04606 (2016). 60

Editor's Notes

  1. .