SlideShare a Scribd company logo
Human Interface Laboratory
담화 성분을 활용한 지시 발화의 키 프레이즈 추출:
한국어 병렬 코퍼스 구축 및 데이터 증강 방법론
2019. 10. 12 @HCLT 2019
조원익, 문영기, 김종인, 김남수
Contents
• Introduction
 What is keyphrase? Keyphrase vs. Summary
 What is keyphrase for directives?
• Related work
 Keyphrase extraction, sentence generation, and paraphrasing
 SQL, bilingual pivoting (BP), and discourse component (DC)
• Corpus construction
• Dataset augmentation
• Summary
 Application
 Future work
1
Introduction
• What is keyphrase?
 Keyphrase as a set of words that stands for a document
• e.g., Keywords (topic words) for an abstract
– Can be combined into some phrases
» 담화성분 기반의 키프레이즈 추출, 패러프레이징을 위한 한국어 병렬 코퍼스
• But remember: keyphrases are also ‘phrase’!
– And those hold for a document, or even for short ones (sentences)?
2
Introduction
• What is keyphrase?
 Keyphrase as a phrase that summarizes a sentence
• e.g., Extractive summarization that sometimes accompanies paraphrasing
– 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십삼일까지 카이스
트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
→ 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대
회 개최
– 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다.
→ 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
• Remember paraphrasing is like monolingual translation (no exact answer!)
 Keyphrase candidates are expected to make up a smaller space than the
original sentences do!
• 오늘 아침에 사고났대.
• 오늘 아침에 사고났다던데.
• 그거 알아? 오늘 아침 사고난거.
• 사고 났다더라구 오늘 아침에.
3
오늘 아침 사고 발생 (사고 남)
Introduction
• Keyphrase vs. Summary
 Summarization of a document can be either (conventionally):
• Extractive [Cheng and Lapata, 2016]
– Documents have several sentence candidates
• Abstractive [Rush et al., 2015]
– Documents without a representative sentence can be abstractively summarized
• Hybrid methodologies are in progress [Bae et al., 2019]
 In keyphrase extraction from the sentences:
• Both extractive and abstractive approach can be utilized
– Extractive: for the keywords
– Abstractive: for the plausible expression (sentence style, word-level paraphrasing)
4
오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다.
→ 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
Introduction
• Keyphrase for directives (question/command)?
 What should the keyphrases be?
• for questions: something that the speaker asks for
– 내일 서울에 비 얼마나 올지 좀 검색해봐.
→ 질문: 내일 서울 강수량
• for commands: something that the speaker requests
– 물이 끓으면 불을 제일 약한 걸로 돌려줘
→ 요구: 물이 끓으면 불을 제일 약한 것으로 하기
• Simplified but representative nominalize version of the core content
• Sometimes keyphrases are longer than the original sentence
→ the reason the process differs with summarization
• Discourse component revisited!
5
Introduction
• Research questions
 How discourse component (DC) is compared to structured query language
(SQL) and bilingual pivoting (BP) in view of paraphrase?
 How we can extract the keyphrase from a directive utterance in the form
of DC?
 How can DC be utilized in making up a paraphrase of questions and
commands?
6
Related work
• Keyphrase extraction, sentence generation, and paraphrasing
7
Original
sentence
Core content
(SQL or Keyphrase)
Paraphrase
Bilingual pivoting /
Word swapping /
Human paraphrase
SeqSQL /
Keyphrase extraction
Rule-based /
Learning-based /
Human generation
Related work
• 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십
삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
– How can we obtain a core content for paraphrasing (possibly by human)?
• Structured query language (SQL) [Zhong et al., 2017]
 {기간: 올해 시월 십이일부터 십삼일, 장소: 카이스트, 이벤트: 한글 및 한국어
정보처리 학술대회}
• A kind of semantic parsing
• Structured extraction of information is available
• Human-friendly data generation is not guaranteed
• Categorization can be limited
• Bilingual pivoting (BP) [Mallison et al., 2017]
 “As many of you may have waited for, we hold HCLT conference at KAIST
from twelfth to thirteens upcoming October.”
• Back-translation using other languages may give various expressions
• 1-1 correspondence doesn’t help extract the core content of the sentence
8
Related work
• 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십
삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
– How can we obtain a core content for paraphrasing (possibly by human)?
• Discourse component [Portner, 2004]
 This approach incorporates human generation, but can be efficient
• E.g., the following can be discourse component for the declaratives:
– 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회 개
최 (Common Ground)
• Core content information in monolingual natural language format
9
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
– How can it be utilized?
10
방카슈랑스란 무엇입니까
Intention
identification
Question?
방카슈랑스의 의미
Keyphrase extraction
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
 Corpus: Intention identification for Korean (3i4K) [Cho et al., 2018]
 Composition
• Question
• Command
• Rhetorical question
• Rhetorical command
• Statement
• Intonation-dependent utterances
• Fragments
11
Includes only utterances whose determination of
speech act was not affected by the sentence form
• Utterances are non-canonical and colloquial
• Includes various topics and situations
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
12
Data augmentation
• Generating questions and commands from keyphrases
 Prototype model [Cho et al., 2018] lacks alternative Qs, prohibitions and
strong REQs
 Scarce within the corpus, but frequently utilized in real-life
• Augmentation is required! but HOW?
13
Data augmentation
• Generating questions and commands from keyphrases
 For a discourse component (keyphrase) of a statement, we can think of:
 Similarly regarding question & commands:
• Question set >> Question?
• To-do-list >> Command!
• Generating questions/commands differs from expressing a thought in
interrogative/imperative (sentence form)
14
오늘 아침 사고 발생 (사고 남)
• 오늘 아침에 사고났대.
• 오늘 아침에 사고났다던데.
• 그거 알아? 오늘 아침 사고난거.
• 사고 났다더라구 오늘 아침에.
Data augmentation
• Generating questions and commands from keyphrases
 Question/command types in need:
• Alternative Q, Prohibition, Strong requirement (deficit)
• Wh-question (more required for practical usage)
 Phrases that are prepared:
• Total phrase #: 2,000
– 400 for alternative Q
– 800 for wh-Q
– 400 for prohibition
– 400 for strong requirement
• Sentences to be generated per phrase: 10
• Topics:
– 1,000 phrases for free topic
– 250 phrases for mail, house control, schedule, and weather each
 Leaves only the utterances with the consensus of more that 3 natives
15
Data augmentation
• Generating questions and commands from keyphrases
 Guideline for the participants
• 열 개의 문장은 최대한 서로 다른 스타일로 작성할 것. 이 때, 스타일은 존대 여부,
어조 등을 모두 포함.
• 꼭 키프레이즈에 있는 말을 반복할 필요 없고, 상황에 맞는 다른 단어/어구/술어를
넣어도 됨. 구어로 발화하기 적합한 표현일 것.
• 도치를 통해 문장 형태의 다양성을 추구하는 것 역시 권장됨.
• 설명의문문의 경우 의문사가 필수적으로 들어가야 하며 선택의문문도 경우에 따
라 삽입될 수 있음. 두 문장 유형 모두 의문문으로 작성될 필요 없음.
• 금지 문장의 경우 청자가 할 수 있는 어떤 행위를 하지 않도록 하는 문장이어야 하
며, 안 해도 괜찮다는 의미보다는 더 강제성을 지녀야 함. 그 행동을 금지하는 것이
다른 행동을 요구하는 것과 실질적으로 동치일 경우, 해당 표현으로 대체해도 크
게 문제되지 않음.
• 금지와 강한 요구 문장 모두 명령문일 필요 없지만, 청자의 행동을 막거나 강제하
는 목적을 지녀야 함. 강한 권유도 가능함.
• 화자/청자가 포함된 키프레이즈의 경우 각각 그에 상응하는 대명사 표현을 활용할
것. 이를 통해 화자/청자의 표현이 포함된 코퍼스와 포함되지 않은 코퍼스를 모두
구축.
16
Data augmentation
• Generating questions and commands from keyphrases
17
Data augmentation
• Generating questions and commands from keyphrases
 Will be distributed via https://github.com/warnikchow/sae4k
 The baseline system for automatic extraction is yet to be developed!
18
Summary
• Application of the concept “keyphrase”
 Analysis of questions and commands in human-friendly conversation
• Classification of non-canonical directive utterances
• Pre-processing for the semantic parsing of non-canonical utterances
• Making up an answer that continues the dialog
– e.g., 오늘 비 언제까지 온대냐? >> 오늘 비 오는 시간대가 궁금하신가요?
– (If inferred correctly...)
 As a a core content of an utterance
• For an efficient semantic web search (방카슈랑스?)
• For an efficient human generation of paraphrase
– More human-friendly compared to SQL (non-NL terms) or back-translation (requires
multilingual ability)
• Future work
 Implementation of automatic keyphrase extraction system
 Extension to paraphrasing or sentence similarity task
19
Reference (order of appearance)
• Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. arXiv
preprint arXiv:1603.07252.
• Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence
summarization. arXiv preprint arXiv:1509.00685.
• Bae, S., Kim, T., Kim, J., & Lee, S. G. (2019). Summary Level Training of Sentence Rewriting for
Abstractive Summarization. arXiv preprint arXiv:1909.08752.
• Zhong, V., Xiong, C., & Socher, R. (2017). Seq2sql: Generating structured queries from natural
language using reinforcement learning. arXiv preprint arXiv:1709.00103.
• Mallinson, J., Sennrich, R., & Lapata, M. (2017, April). Paraphrasing revisited with neural machine
translation. In Proceedings of the 15th Conference of the European Chapter of the Association for
Computational Linguistics: Volume 1, Long Papers (pp. 881-893).
• Portner, P. (2004, September). The semantics of imperatives within a theory of clause types.
In Semantics and linguistic theory (Vol. 14, pp. 235-252).
• Cho, W. I., Lee, H. S., Yoon, J. W., Kim, S. M., & Kim, N. S. (2018). Speech Intention Understanding in a
Head-final Language: A Disambiguation Utilizing Intonation-dependency. arXiv preprint
arXiv:1811.04231.
• Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question
and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631.
20
Thank you!
EndOfPresentation

More Related Content

What's hot

한국어 띄어쓰기 프로그램 도전기
한국어 띄어쓰기 프로그램 도전기한국어 띄어쓰기 프로그램 도전기
한국어 띄어쓰기 프로그램 도전기
Ted Taekyoon Choi
 
偶然にも500万個のSSH公開鍵を手に入れた俺たちは
偶然にも500万個のSSH公開鍵を手に入れた俺たちは偶然にも500万個のSSH公開鍵を手に入れた俺たちは
偶然にも500万個のSSH公開鍵を手に入れた俺たちは
Yoshio Hanawa
 
Context2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
Context2Vec 기반 단어 의미 중의성 해소, Word Sense DisambiguationContext2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
Context2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
찬희 이
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝
NAVER D2
 
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
Oracle Korea
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representations
Junya Kamura
 
Elasticsearchの機械学習機能を使ってみた
Elasticsearchの機械学習機能を使ってみたElasticsearchの機械学習機能を使ってみた
Elasticsearchの機械学習機能を使ってみた
YuichiArisaka
 
Elasticsearch development case
Elasticsearch development caseElasticsearch development case
Elasticsearch development case
일규 최
 
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
Jinwon Park
 
inter seminar インゼミ資料
inter seminar インゼミ資料inter seminar インゼミ資料
inter seminar インゼミ資料
Osaka-univ Yasuda Seminar 安田ゼミ
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
 
情報アクセス技術のためのテストコレクション作成
情報アクセス技術のためのテストコレクション作成情報アクセス技術のためのテストコレクション作成
情報アクセス技術のためのテストコレクション作成
kt.mako
 
Supervised Machine Learning of Elastic Stack
Supervised Machine Learning of Elastic StackSupervised Machine Learning of Elastic Stack
Supervised Machine Learning of Elastic Stack
Hiroshi Yoshioka
 
Systems Engineering Management Plan (SEMP) for a standard fisher boat
Systems Engineering Management Plan (SEMP) for a standard fisher boatSystems Engineering Management Plan (SEMP) for a standard fisher boat
Systems Engineering Management Plan (SEMP) for a standard fisher boat
Jinwon Park
 
구문과 의미론(정적 의미론까지)
구문과 의미론(정적 의미론까지)구문과 의미론(정적 의미론까지)
구문과 의미론(정적 의미론까지)
Nam Hyeonuk
 
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
HeeJung Hwang
 
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
Hangil Kim
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
Taekyung Han
 
文章読解支援のための語彙平易化
文章読解支援のための語彙平易化文章読解支援のための語彙平易化
文章読解支援のための語彙平易化
Tomoyuki Kajiwara
 
의존 구조 분석기, Dependency parser
의존 구조 분석기, Dependency parser의존 구조 분석기, Dependency parser
의존 구조 분석기, Dependency parser
찬희 이
 

What's hot (20)

한국어 띄어쓰기 프로그램 도전기
한국어 띄어쓰기 프로그램 도전기한국어 띄어쓰기 프로그램 도전기
한국어 띄어쓰기 프로그램 도전기
 
偶然にも500万個のSSH公開鍵を手に入れた俺たちは
偶然にも500万個のSSH公開鍵を手に入れた俺たちは偶然にも500万個のSSH公開鍵を手に入れた俺たちは
偶然にも500万個のSSH公開鍵を手に入れた俺たちは
 
Context2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
Context2Vec 기반 단어 의미 중의성 해소, Word Sense DisambiguationContext2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
Context2Vec 기반 단어 의미 중의성 해소, Word Sense Disambiguation
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝
 
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
[Main Session] 미래의 Java 미리보기 - 앰버와 발할라 프로젝트를 중심으로
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representations
 
Elasticsearchの機械学習機能を使ってみた
Elasticsearchの機械学習機能を使ってみたElasticsearchの機械学習機能を使ってみた
Elasticsearchの機械学習機能を使ってみた
 
Elasticsearch development case
Elasticsearch development caseElasticsearch development case
Elasticsearch development case
 
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
시스템공학 기본(Fundamental of systems engineering) - Day4 functional analysis and a...
 
inter seminar インゼミ資料
inter seminar インゼミ資料inter seminar インゼミ資料
inter seminar インゼミ資料
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 
情報アクセス技術のためのテストコレクション作成
情報アクセス技術のためのテストコレクション作成情報アクセス技術のためのテストコレクション作成
情報アクセス技術のためのテストコレクション作成
 
Supervised Machine Learning of Elastic Stack
Supervised Machine Learning of Elastic StackSupervised Machine Learning of Elastic Stack
Supervised Machine Learning of Elastic Stack
 
Systems Engineering Management Plan (SEMP) for a standard fisher boat
Systems Engineering Management Plan (SEMP) for a standard fisher boatSystems Engineering Management Plan (SEMP) for a standard fisher boat
Systems Engineering Management Plan (SEMP) for a standard fisher boat
 
구문과 의미론(정적 의미론까지)
구문과 의미론(정적 의미론까지)구문과 의미론(정적 의미론까지)
구문과 의미론(정적 의미론까지)
 
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
 
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
 
文章読解支援のための語彙平易化
文章読解支援のための語彙平易化文章読解支援のための語彙平易化
文章読解支援のための語彙平易化
 
의존 구조 분석기, Dependency parser
의존 구조 분석기, Dependency parser의존 구조 분석기, Dependency parser
의존 구조 분석기, Dependency parser
 

Similar to 1910 HCLT

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
Alp Öktem
 
Towards speech intention understanding in korean
Towards speech intention understanding in koreanTowards speech intention understanding in korean
Towards speech intention understanding in korean
NAVER Engineering
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
Cassandra Jacobs
 
Warnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4kWarnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4k
WarNik Chow
 
Dynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisDynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisAndi Wu
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
WarNik Chow
 
1910 JK27
1910 JK271910 JK27
1910 JK27
WarNik Chow
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
Lidia Pivovarova
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
Qi He
 
Planning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptxPlanning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptx
Trevor Haugh
 
Eskm20140903
Eskm20140903Eskm20140903
Eskm20140903
Shuhei Otani
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
Seid Hassen
 
Planning and writing assignments (business example)
Planning and writing assignments (business example)Planning and writing assignments (business example)
Planning and writing assignments (business example)
colinokeeffe
 
Principles of instruction and feedback for erasmus
Principles of instruction and feedback for erasmusPrinciples of instruction and feedback for erasmus
Principles of instruction and feedback for erasmus
ctheo12
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Association for Computational Linguistics
 
Academic-Phrasebank.pdf
Academic-Phrasebank.pdfAcademic-Phrasebank.pdf
Academic-Phrasebank.pdf
SirajudinAkmel1
 
A. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docxA. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docx
daniahendric
 

Similar to 1910 HCLT (20)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
Towards speech intention understanding in korean
Towards speech intention understanding in koreanTowards speech intention understanding in korean
Towards speech intention understanding in korean
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Warnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4kWarnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4k
 
Dynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisDynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence Analysis
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
 
1910 JK27
1910 JK271910 JK27
1910 JK27
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Planning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptxPlanning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptx
 
Eskm20140903
Eskm20140903Eskm20140903
Eskm20140903
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
Planning and writing assignments (business example)
Planning and writing assignments (business example)Planning and writing assignments (business example)
Planning and writing assignments (business example)
 
Principles of instruction and feedback for erasmus
Principles of instruction and feedback for erasmusPrinciples of instruction and feedback for erasmus
Principles of instruction and feedback for erasmus
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
Academic-Phrasebank.pdf
Academic-Phrasebank.pdfAcademic-Phrasebank.pdf
Academic-Phrasebank.pdf
 
A. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docxA. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docx
 

More from WarNik Chow

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
WarNik Chow
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
WarNik Chow
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
WarNik Chow
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
WarNik Chow
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
WarNik Chow
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
WarNik Chow
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
WarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
WarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
WarNik Chow
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
WarNik Chow
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
WarNik Chow
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
WarNik Chow
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
WarNik Chow
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
WarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
WarNik Chow
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
WarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
WarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
WarNik Chow
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP
WarNik Chow
 

More from WarNik Chow (20)

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP
 

Recently uploaded

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 

Recently uploaded (20)

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 

1910 HCLT

  • 1. Human Interface Laboratory 담화 성분을 활용한 지시 발화의 키 프레이즈 추출: 한국어 병렬 코퍼스 구축 및 데이터 증강 방법론 2019. 10. 12 @HCLT 2019 조원익, 문영기, 김종인, 김남수
  • 2. Contents • Introduction  What is keyphrase? Keyphrase vs. Summary  What is keyphrase for directives? • Related work  Keyphrase extraction, sentence generation, and paraphrasing  SQL, bilingual pivoting (BP), and discourse component (DC) • Corpus construction • Dataset augmentation • Summary  Application  Future work 1
  • 3. Introduction • What is keyphrase?  Keyphrase as a set of words that stands for a document • e.g., Keywords (topic words) for an abstract – Can be combined into some phrases » 담화성분 기반의 키프레이즈 추출, 패러프레이징을 위한 한국어 병렬 코퍼스 • But remember: keyphrases are also ‘phrase’! – And those hold for a document, or even for short ones (sentences)? 2
  • 4. Introduction • What is keyphrase?  Keyphrase as a phrase that summarizes a sentence • e.g., Extractive summarization that sometimes accompanies paraphrasing – 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십삼일까지 카이스 트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. → 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대 회 개최 – 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다. → 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정 • Remember paraphrasing is like monolingual translation (no exact answer!)  Keyphrase candidates are expected to make up a smaller space than the original sentences do! • 오늘 아침에 사고났대. • 오늘 아침에 사고났다던데. • 그거 알아? 오늘 아침 사고난거. • 사고 났다더라구 오늘 아침에. 3 오늘 아침 사고 발생 (사고 남)
  • 5. Introduction • Keyphrase vs. Summary  Summarization of a document can be either (conventionally): • Extractive [Cheng and Lapata, 2016] – Documents have several sentence candidates • Abstractive [Rush et al., 2015] – Documents without a representative sentence can be abstractively summarized • Hybrid methodologies are in progress [Bae et al., 2019]  In keyphrase extraction from the sentences: • Both extractive and abstractive approach can be utilized – Extractive: for the keywords – Abstractive: for the plausible expression (sentence style, word-level paraphrasing) 4 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다. → 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
  • 6. Introduction • Keyphrase for directives (question/command)?  What should the keyphrases be? • for questions: something that the speaker asks for – 내일 서울에 비 얼마나 올지 좀 검색해봐. → 질문: 내일 서울 강수량 • for commands: something that the speaker requests – 물이 끓으면 불을 제일 약한 걸로 돌려줘 → 요구: 물이 끓으면 불을 제일 약한 것으로 하기 • Simplified but representative nominalize version of the core content • Sometimes keyphrases are longer than the original sentence → the reason the process differs with summarization • Discourse component revisited! 5
  • 7. Introduction • Research questions  How discourse component (DC) is compared to structured query language (SQL) and bilingual pivoting (BP) in view of paraphrase?  How we can extract the keyphrase from a directive utterance in the form of DC?  How can DC be utilized in making up a paraphrase of questions and commands? 6
  • 8. Related work • Keyphrase extraction, sentence generation, and paraphrasing 7 Original sentence Core content (SQL or Keyphrase) Paraphrase Bilingual pivoting / Word swapping / Human paraphrase SeqSQL / Keyphrase extraction Rule-based / Learning-based / Human generation
  • 9. Related work • 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십 삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. – How can we obtain a core content for paraphrasing (possibly by human)? • Structured query language (SQL) [Zhong et al., 2017]  {기간: 올해 시월 십이일부터 십삼일, 장소: 카이스트, 이벤트: 한글 및 한국어 정보처리 학술대회} • A kind of semantic parsing • Structured extraction of information is available • Human-friendly data generation is not guaranteed • Categorization can be limited • Bilingual pivoting (BP) [Mallison et al., 2017]  “As many of you may have waited for, we hold HCLT conference at KAIST from twelfth to thirteens upcoming October.” • Back-translation using other languages may give various expressions • 1-1 correspondence doesn’t help extract the core content of the sentence 8
  • 10. Related work • 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십 삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. – How can we obtain a core content for paraphrasing (possibly by human)? • Discourse component [Portner, 2004]  This approach incorporates human generation, but can be efficient • E.g., the following can be discourse component for the declaratives: – 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회 개 최 (Common Ground) • Core content information in monolingual natural language format 9
  • 11. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act – How can it be utilized? 10 방카슈랑스란 무엇입니까 Intention identification Question? 방카슈랑스의 의미 Keyphrase extraction
  • 12. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act  Corpus: Intention identification for Korean (3i4K) [Cho et al., 2018]  Composition • Question • Command • Rhetorical question • Rhetorical command • Statement • Intonation-dependent utterances • Fragments 11 Includes only utterances whose determination of speech act was not affected by the sentence form • Utterances are non-canonical and colloquial • Includes various topics and situations
  • 13. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act 12
  • 14. Data augmentation • Generating questions and commands from keyphrases  Prototype model [Cho et al., 2018] lacks alternative Qs, prohibitions and strong REQs  Scarce within the corpus, but frequently utilized in real-life • Augmentation is required! but HOW? 13
  • 15. Data augmentation • Generating questions and commands from keyphrases  For a discourse component (keyphrase) of a statement, we can think of:  Similarly regarding question & commands: • Question set >> Question? • To-do-list >> Command! • Generating questions/commands differs from expressing a thought in interrogative/imperative (sentence form) 14 오늘 아침 사고 발생 (사고 남) • 오늘 아침에 사고났대. • 오늘 아침에 사고났다던데. • 그거 알아? 오늘 아침 사고난거. • 사고 났다더라구 오늘 아침에.
  • 16. Data augmentation • Generating questions and commands from keyphrases  Question/command types in need: • Alternative Q, Prohibition, Strong requirement (deficit) • Wh-question (more required for practical usage)  Phrases that are prepared: • Total phrase #: 2,000 – 400 for alternative Q – 800 for wh-Q – 400 for prohibition – 400 for strong requirement • Sentences to be generated per phrase: 10 • Topics: – 1,000 phrases for free topic – 250 phrases for mail, house control, schedule, and weather each  Leaves only the utterances with the consensus of more that 3 natives 15
  • 17. Data augmentation • Generating questions and commands from keyphrases  Guideline for the participants • 열 개의 문장은 최대한 서로 다른 스타일로 작성할 것. 이 때, 스타일은 존대 여부, 어조 등을 모두 포함. • 꼭 키프레이즈에 있는 말을 반복할 필요 없고, 상황에 맞는 다른 단어/어구/술어를 넣어도 됨. 구어로 발화하기 적합한 표현일 것. • 도치를 통해 문장 형태의 다양성을 추구하는 것 역시 권장됨. • 설명의문문의 경우 의문사가 필수적으로 들어가야 하며 선택의문문도 경우에 따 라 삽입될 수 있음. 두 문장 유형 모두 의문문으로 작성될 필요 없음. • 금지 문장의 경우 청자가 할 수 있는 어떤 행위를 하지 않도록 하는 문장이어야 하 며, 안 해도 괜찮다는 의미보다는 더 강제성을 지녀야 함. 그 행동을 금지하는 것이 다른 행동을 요구하는 것과 실질적으로 동치일 경우, 해당 표현으로 대체해도 크 게 문제되지 않음. • 금지와 강한 요구 문장 모두 명령문일 필요 없지만, 청자의 행동을 막거나 강제하 는 목적을 지녀야 함. 강한 권유도 가능함. • 화자/청자가 포함된 키프레이즈의 경우 각각 그에 상응하는 대명사 표현을 활용할 것. 이를 통해 화자/청자의 표현이 포함된 코퍼스와 포함되지 않은 코퍼스를 모두 구축. 16
  • 18. Data augmentation • Generating questions and commands from keyphrases 17
  • 19. Data augmentation • Generating questions and commands from keyphrases  Will be distributed via https://github.com/warnikchow/sae4k  The baseline system for automatic extraction is yet to be developed! 18
  • 20. Summary • Application of the concept “keyphrase”  Analysis of questions and commands in human-friendly conversation • Classification of non-canonical directive utterances • Pre-processing for the semantic parsing of non-canonical utterances • Making up an answer that continues the dialog – e.g., 오늘 비 언제까지 온대냐? >> 오늘 비 오는 시간대가 궁금하신가요? – (If inferred correctly...)  As a a core content of an utterance • For an efficient semantic web search (방카슈랑스?) • For an efficient human generation of paraphrase – More human-friendly compared to SQL (non-NL terms) or back-translation (requires multilingual ability) • Future work  Implementation of automatic keyphrase extraction system  Extension to paraphrasing or sentence similarity task 19
  • 21. Reference (order of appearance) • Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252. • Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685. • Bae, S., Kim, T., Kim, J., & Lee, S. G. (2019). Summary Level Training of Sentence Rewriting for Abstractive Summarization. arXiv preprint arXiv:1909.08752. • Zhong, V., Xiong, C., & Socher, R. (2017). Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103. • Mallinson, J., Sennrich, R., & Lapata, M. (2017, April). Paraphrasing revisited with neural machine translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (pp. 881-893). • Portner, P. (2004, September). The semantics of imperatives within a theory of clause types. In Semantics and linguistic theory (Vol. 14, pp. 235-252). • Cho, W. I., Lee, H. S., Yoon, J. W., Kim, S. M., & Kim, N. S. (2018). Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency. arXiv preprint arXiv:1811.04231. • Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631. 20

Editor's Notes

  1. .