SlideShare a Scribd company logo
1 of 22
Human Interface Laboratory
담화 성분을 활용한 지시 발화의 키 프레이즈 추출:
한국어 병렬 코퍼스 구축 및 데이터 증강 방법론
2019. 10. 12 @HCLT 2019
조원익, 문영기, 김종인, 김남수
Contents
• Introduction
 What is keyphrase? Keyphrase vs. Summary
 What is keyphrase for directives?
• Related work
 Keyphrase extraction, sentence generation, and paraphrasing
 SQL, bilingual pivoting (BP), and discourse component (DC)
• Corpus construction
• Dataset augmentation
• Summary
 Application
 Future work
1
Introduction
• What is keyphrase?
 Keyphrase as a set of words that stands for a document
• e.g., Keywords (topic words) for an abstract
– Can be combined into some phrases
» 담화성분 기반의 키프레이즈 추출, 패러프레이징을 위한 한국어 병렬 코퍼스
• But remember: keyphrases are also ‘phrase’!
– And those hold for a document, or even for short ones (sentences)?
2
Introduction
• What is keyphrase?
 Keyphrase as a phrase that summarizes a sentence
• e.g., Extractive summarization that sometimes accompanies paraphrasing
– 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십삼일까지 카이스
트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
→ 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대
회 개최
– 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다.
→ 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
• Remember paraphrasing is like monolingual translation (no exact answer!)
 Keyphrase candidates are expected to make up a smaller space than the
original sentences do!
• 오늘 아침에 사고났대.
• 오늘 아침에 사고났다던데.
• 그거 알아? 오늘 아침 사고난거.
• 사고 났다더라구 오늘 아침에.
3
오늘 아침 사고 발생 (사고 남)
Introduction
• Keyphrase vs. Summary
 Summarization of a document can be either (conventionally):
• Extractive [Cheng and Lapata, 2016]
– Documents have several sentence candidates
• Abstractive [Rush et al., 2015]
– Documents without a representative sentence can be abstractively summarized
• Hybrid methodologies are in progress [Bae et al., 2019]
 In keyphrase extraction from the sentences:
• Both extractive and abstractive approach can be utilized
– Extractive: for the keywords
– Abstractive: for the plausible expression (sentence style, word-level paraphrasing)
4
오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다.
→ 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
Introduction
• Keyphrase for directives (question/command)?
 What should the keyphrases be?
• for questions: something that the speaker asks for
– 내일 서울에 비 얼마나 올지 좀 검색해봐.
→ 질문: 내일 서울 강수량
• for commands: something that the speaker requests
– 물이 끓으면 불을 제일 약한 걸로 돌려줘
→ 요구: 물이 끓으면 불을 제일 약한 것으로 하기
• Simplified but representative nominalize version of the core content
• Sometimes keyphrases are longer than the original sentence
→ the reason the process differs with summarization
• Discourse component revisited!
5
Introduction
• Research questions
 How discourse component (DC) is compared to structured query language
(SQL) and bilingual pivoting (BP) in view of paraphrase?
 How we can extract the keyphrase from a directive utterance in the form
of DC?
 How can DC be utilized in making up a paraphrase of questions and
commands?
6
Related work
• Keyphrase extraction, sentence generation, and paraphrasing
7
Original
sentence
Core content
(SQL or Keyphrase)
Paraphrase
Bilingual pivoting /
Word swapping /
Human paraphrase
SeqSQL /
Keyphrase extraction
Rule-based /
Learning-based /
Human generation
Related work
• 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십
삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
– How can we obtain a core content for paraphrasing (possibly by human)?
• Structured query language (SQL) [Zhong et al., 2017]
 {기간: 올해 시월 십이일부터 십삼일, 장소: 카이스트, 이벤트: 한글 및 한국어
정보처리 학술대회}
• A kind of semantic parsing
• Structured extraction of information is available
• Human-friendly data generation is not guaranteed
• Categorization can be limited
• Bilingual pivoting (BP) [Mallison et al., 2017]
 “As many of you may have waited for, we hold HCLT conference at KAIST
from twelfth to thirteens upcoming October.”
• Back-translation using other languages may give various expressions
• 1-1 correspondence doesn’t help extract the core content of the sentence
8
Related work
• 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십
삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다.
– How can we obtain a core content for paraphrasing (possibly by human)?
• Discourse component [Portner, 2004]
 This approach incorporates human generation, but can be efficient
• E.g., the following can be discourse component for the declaratives:
– 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회 개
최 (Common Ground)
• Core content information in monolingual natural language format
9
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
– How can it be utilized?
10
방카슈랑스란 무엇입니까
Intention
identification
Question?
방카슈랑스의 의미
Keyphrase extraction
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
 Corpus: Intention identification for Korean (3i4K) [Cho et al., 2018]
 Composition
• Question
• Command
• Rhetorical question
• Rhetorical command
• Statement
• Intonation-dependent utterances
• Fragments
11
Includes only utterances whose determination of
speech act was not affected by the sentence form
• Utterances are non-canonical and colloquial
• Includes various topics and situations
Corpus construction
• Annotating keyphrases on a Korean corpus regarding speech act
12
Data augmentation
• Generating questions and commands from keyphrases
 Prototype model [Cho et al., 2018] lacks alternative Qs, prohibitions and
strong REQs
 Scarce within the corpus, but frequently utilized in real-life
• Augmentation is required! but HOW?
13
Data augmentation
• Generating questions and commands from keyphrases
 For a discourse component (keyphrase) of a statement, we can think of:
 Similarly regarding question & commands:
• Question set >> Question?
• To-do-list >> Command!
• Generating questions/commands differs from expressing a thought in
interrogative/imperative (sentence form)
14
오늘 아침 사고 발생 (사고 남)
• 오늘 아침에 사고났대.
• 오늘 아침에 사고났다던데.
• 그거 알아? 오늘 아침 사고난거.
• 사고 났다더라구 오늘 아침에.
Data augmentation
• Generating questions and commands from keyphrases
 Question/command types in need:
• Alternative Q, Prohibition, Strong requirement (deficit)
• Wh-question (more required for practical usage)
 Phrases that are prepared:
• Total phrase #: 2,000
– 400 for alternative Q
– 800 for wh-Q
– 400 for prohibition
– 400 for strong requirement
• Sentences to be generated per phrase: 10
• Topics:
– 1,000 phrases for free topic
– 250 phrases for mail, house control, schedule, and weather each
 Leaves only the utterances with the consensus of more that 3 natives
15
Data augmentation
• Generating questions and commands from keyphrases
 Guideline for the participants
• 열 개의 문장은 최대한 서로 다른 스타일로 작성할 것. 이 때, 스타일은 존대 여부,
어조 등을 모두 포함.
• 꼭 키프레이즈에 있는 말을 반복할 필요 없고, 상황에 맞는 다른 단어/어구/술어를
넣어도 됨. 구어로 발화하기 적합한 표현일 것.
• 도치를 통해 문장 형태의 다양성을 추구하는 것 역시 권장됨.
• 설명의문문의 경우 의문사가 필수적으로 들어가야 하며 선택의문문도 경우에 따
라 삽입될 수 있음. 두 문장 유형 모두 의문문으로 작성될 필요 없음.
• 금지 문장의 경우 청자가 할 수 있는 어떤 행위를 하지 않도록 하는 문장이어야 하
며, 안 해도 괜찮다는 의미보다는 더 강제성을 지녀야 함. 그 행동을 금지하는 것이
다른 행동을 요구하는 것과 실질적으로 동치일 경우, 해당 표현으로 대체해도 크
게 문제되지 않음.
• 금지와 강한 요구 문장 모두 명령문일 필요 없지만, 청자의 행동을 막거나 강제하
는 목적을 지녀야 함. 강한 권유도 가능함.
• 화자/청자가 포함된 키프레이즈의 경우 각각 그에 상응하는 대명사 표현을 활용할
것. 이를 통해 화자/청자의 표현이 포함된 코퍼스와 포함되지 않은 코퍼스를 모두
구축.
16
Data augmentation
• Generating questions and commands from keyphrases
17
Data augmentation
• Generating questions and commands from keyphrases
 Will be distributed via https://github.com/warnikchow/sae4k
 The baseline system for automatic extraction is yet to be developed!
18
Summary
• Application of the concept “keyphrase”
 Analysis of questions and commands in human-friendly conversation
• Classification of non-canonical directive utterances
• Pre-processing for the semantic parsing of non-canonical utterances
• Making up an answer that continues the dialog
– e.g., 오늘 비 언제까지 온대냐? >> 오늘 비 오는 시간대가 궁금하신가요?
– (If inferred correctly...)
 As a a core content of an utterance
• For an efficient semantic web search (방카슈랑스?)
• For an efficient human generation of paraphrase
– More human-friendly compared to SQL (non-NL terms) or back-translation (requires
multilingual ability)
• Future work
 Implementation of automatic keyphrase extraction system
 Extension to paraphrasing or sentence similarity task
19
Reference (order of appearance)
• Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. arXiv
preprint arXiv:1603.07252.
• Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence
summarization. arXiv preprint arXiv:1509.00685.
• Bae, S., Kim, T., Kim, J., & Lee, S. G. (2019). Summary Level Training of Sentence Rewriting for
Abstractive Summarization. arXiv preprint arXiv:1909.08752.
• Zhong, V., Xiong, C., & Socher, R. (2017). Seq2sql: Generating structured queries from natural
language using reinforcement learning. arXiv preprint arXiv:1709.00103.
• Mallinson, J., Sennrich, R., & Lapata, M. (2017, April). Paraphrasing revisited with neural machine
translation. In Proceedings of the 15th Conference of the European Chapter of the Association for
Computational Linguistics: Volume 1, Long Papers (pp. 881-893).
• Portner, P. (2004, September). The semantics of imperatives within a theory of clause types.
In Semantics and linguistic theory (Vol. 14, pp. 235-252).
• Cho, W. I., Lee, H. S., Yoon, J. W., Kim, S. M., & Kim, N. S. (2018). Speech Intention Understanding in a
Head-final Language: A Disambiguation Utilizing Intonation-dependency. arXiv preprint
arXiv:1811.04231.
• Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question
and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631.
20
Thank you!
EndOfPresentation

More Related Content

What's hot

F#の基礎(嘘)
F#の基礎(嘘)F#の基礎(嘘)
F#の基礎(嘘)bleis tift
 
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례Ian Choi
 
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...Sérgio Dias
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network TroubleshootingOpen Source Consulting
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processingHareem Naz
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Taekyung Han
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representationsJunya Kamura
 
Oficina de Python para iniciantes
Oficina de Python para iniciantesOficina de Python para iniciantes
Oficina de Python para iniciantesmarioaxavier7
 
Introdução à Programação em Python
Introdução à Programação em PythonIntrodução à Programação em Python
Introdução à Programação em PythonRodrigo Hübner
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기Jaewoo Ahn
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdfJunZhao68
 
파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차Taekyung Han
 
Effective Modern C++ 勉強会#1 Item3,4
Effective Modern C++ 勉強会#1 Item3,4Effective Modern C++ 勉強会#1 Item3,4
Effective Modern C++ 勉強会#1 Item3,4Takashi Hoshino
 
Python Módulo Básico - Introdução a linguagem Python
Python Módulo Básico - Introdução a linguagem PythonPython Módulo Básico - Introdução a linguagem Python
Python Módulo Básico - Introdução a linguagem Pythonantonio sérgio nogueira
 
Apresentando a Linguagem de Programação Python
Apresentando a Linguagem de Programação PythonApresentando a Linguagem de Programação Python
Apresentando a Linguagem de Programação PythonPriscila Mayumi
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術Kito Cheng
 
[Curso Java Basico - Exceptions] Aula 47: try, catch
[Curso Java Basico - Exceptions] Aula 47: try, catch[Curso Java Basico - Exceptions] Aula 47: try, catch
[Curso Java Basico - Exceptions] Aula 47: try, catchLoiane Groner
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisaUniversidade de São Paulo
 

What's hot (20)

F#の基礎(嘘)
F#の基礎(嘘)F#の基礎(嘘)
F#の基礎(嘘)
 
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례
[Pycon KR 2017] Rst와 함께하는 Python 문서 작성 & OpenStack 문서 활용 사례
 
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...
Fundamentos da Teoria da Computação Segunda Lista de Exercícios - Aula sobre ...
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processing
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
 
Deep contextualized word representations
Deep contextualized word representationsDeep contextualized word representations
Deep contextualized word representations
 
Oficina de Python para iniciantes
Oficina de Python para iniciantesOficina de Python para iniciantes
Oficina de Python para iniciantes
 
Introdução à Programação em Python
Introdução à Programação em PythonIntrodução à Programação em Python
Introdução à Programação em Python
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기
 
5 - Advanced SVE.pdf
5 - Advanced SVE.pdf5 - Advanced SVE.pdf
5 - Advanced SVE.pdf
 
파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차
 
Effective Modern C++ 勉強会#1 Item3,4
Effective Modern C++ 勉強会#1 Item3,4Effective Modern C++ 勉強会#1 Item3,4
Effective Modern C++ 勉強会#1 Item3,4
 
Python Módulo Básico - Introdução a linguagem Python
Python Módulo Básico - Introdução a linguagem PythonPython Módulo Básico - Introdução a linguagem Python
Python Módulo Básico - Introdução a linguagem Python
 
Apresentando a Linguagem de Programação Python
Apresentando a Linguagem de Programação PythonApresentando a Linguagem de Programação Python
Apresentando a Linguagem de Programação Python
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
 
[Curso Java Basico - Exceptions] Aula 47: try, catch
[Curso Java Basico - Exceptions] Aula 47: try, catch[Curso Java Basico - Exceptions] Aula 47: try, catch
[Curso Java Basico - Exceptions] Aula 47: try, catch
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa
2 metodologia depesquisaemcienciadacomputacao-estilosdepesquisa
 

Similar to 1910 HCLT

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...Alp Öktem
 
Towards speech intention understanding in korean
Towards speech intention understanding in koreanTowards speech intention understanding in korean
Towards speech intention understanding in koreanNAVER Engineering
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Warnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4kWarnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4kWarNik Chow
 
Dynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisDynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisAndi Wu
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529WarNik Chow
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Planning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptxPlanning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptxTrevor Haugh
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 
Planning and writing assignments (business example)
Planning and writing assignments (business example)Planning and writing assignments (business example)
Planning and writing assignments (business example)colinokeeffe
 
Principles of instruction and feedback for erasmus
Principles of instruction and feedback for erasmusPrinciples of instruction and feedback for erasmus
Principles of instruction and feedback for erasmusctheo12
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
A. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docxA. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docxdaniahendric
 

Similar to 1910 HCLT (20)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
Towards speech intention understanding in korean
Towards speech intention understanding in koreanTowards speech intention understanding in korean
Towards speech intention understanding in korean
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Warnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4kWarnikchow - Naver Tech Talk - 3i4k
Warnikchow - Naver Tech Talk - 3i4k
 
Dynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence AnalysisDynamic Lexical Acquisition in Chinese Sentence Analysis
Dynamic Lexical Acquisition in Chinese Sentence Analysis
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
 
1910 JK27
1910 JK271910 JK27
1910 JK27
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Planning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptxPlanning and writing assignments (business example) 2021.pptx
Planning and writing assignments (business example) 2021.pptx
 
Eskm20140903
Eskm20140903Eskm20140903
Eskm20140903
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
Planning and writing assignments (business example)
Planning and writing assignments (business example)Planning and writing assignments (business example)
Planning and writing assignments (business example)
 
Principles of instruction and feedback for erasmus
Principles of instruction and feedback for erasmusPrinciples of instruction and feedback for erasmus
Principles of instruction and feedback for erasmus
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
Academic-Phrasebank.pdf
Academic-Phrasebank.pdfAcademic-Phrasebank.pdf
Academic-Phrasebank.pdf
 
A. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docxA. InstructionsRemember the word argument” does not mean a fi.docx
A. InstructionsRemember the word argument” does not mean a fi.docx
 

More from WarNik Chow

2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inpersonWarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech datasetWarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2eWarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminarWarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate SpeechWarNik Chow
 

More from WarNik Chow (20)

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

1910 HCLT

  • 1. Human Interface Laboratory 담화 성분을 활용한 지시 발화의 키 프레이즈 추출: 한국어 병렬 코퍼스 구축 및 데이터 증강 방법론 2019. 10. 12 @HCLT 2019 조원익, 문영기, 김종인, 김남수
  • 2. Contents • Introduction  What is keyphrase? Keyphrase vs. Summary  What is keyphrase for directives? • Related work  Keyphrase extraction, sentence generation, and paraphrasing  SQL, bilingual pivoting (BP), and discourse component (DC) • Corpus construction • Dataset augmentation • Summary  Application  Future work 1
  • 3. Introduction • What is keyphrase?  Keyphrase as a set of words that stands for a document • e.g., Keywords (topic words) for an abstract – Can be combined into some phrases » 담화성분 기반의 키프레이즈 추출, 패러프레이징을 위한 한국어 병렬 코퍼스 • But remember: keyphrases are also ‘phrase’! – And those hold for a document, or even for short ones (sentences)? 2
  • 4. Introduction • What is keyphrase?  Keyphrase as a phrase that summarizes a sentence • e.g., Extractive summarization that sometimes accompanies paraphrasing – 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십삼일까지 카이스 트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. → 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대 회 개최 – 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다. → 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정 • Remember paraphrasing is like monolingual translation (no exact answer!)  Keyphrase candidates are expected to make up a smaller space than the original sentences do! • 오늘 아침에 사고났대. • 오늘 아침에 사고났다던데. • 그거 알아? 오늘 아침 사고난거. • 사고 났다더라구 오늘 아침에. 3 오늘 아침 사고 발생 (사고 남)
  • 5. Introduction • Keyphrase vs. Summary  Summarization of a document can be either (conventionally): • Extractive [Cheng and Lapata, 2016] – Documents have several sentence candidates • Abstractive [Rush et al., 2015] – Documents without a representative sentence can be abstractively summarized • Hybrid methodologies are in progress [Bae et al., 2019]  In keyphrase extraction from the sentences: • Both extractive and abstractive approach can be utilized – Extractive: for the keywords – Abstractive: for the plausible expression (sentence style, word-level paraphrasing) 4 오늘 저녁 여덟 시에 서울대입구 풍경소리에서 동아리 뒷풀이가 있을 예정입니다. → 오늘 이십 시 서울대입구 풍경소리에서 동아리 뒷풀이 예정
  • 6. Introduction • Keyphrase for directives (question/command)?  What should the keyphrases be? • for questions: something that the speaker asks for – 내일 서울에 비 얼마나 올지 좀 검색해봐. → 질문: 내일 서울 강수량 • for commands: something that the speaker requests – 물이 끓으면 불을 제일 약한 걸로 돌려줘 → 요구: 물이 끓으면 불을 제일 약한 것으로 하기 • Simplified but representative nominalize version of the core content • Sometimes keyphrases are longer than the original sentence → the reason the process differs with summarization • Discourse component revisited! 5
  • 7. Introduction • Research questions  How discourse component (DC) is compared to structured query language (SQL) and bilingual pivoting (BP) in view of paraphrase?  How we can extract the keyphrase from a directive utterance in the form of DC?  How can DC be utilized in making up a paraphrase of questions and commands? 6
  • 8. Related work • Keyphrase extraction, sentence generation, and paraphrasing 7 Original sentence Core content (SQL or Keyphrase) Paraphrase Bilingual pivoting / Word swapping / Human paraphrase SeqSQL / Keyphrase extraction Rule-based / Learning-based / Human generation
  • 9. Related work • 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십 삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. – How can we obtain a core content for paraphrasing (possibly by human)? • Structured query language (SQL) [Zhong et al., 2017]  {기간: 올해 시월 십이일부터 십삼일, 장소: 카이스트, 이벤트: 한글 및 한국어 정보처리 학술대회} • A kind of semantic parsing • Structured extraction of information is available • Human-friendly data generation is not guaranteed • Categorization can be limited • Bilingual pivoting (BP) [Mallison et al., 2017]  “As many of you may have waited for, we hold HCLT conference at KAIST from twelfth to thirteens upcoming October.” • Back-translation using other languages may give various expressions • 1-1 correspondence doesn’t help extract the core content of the sentence 8
  • 10. Related work • 많이들 궁금해하셨던 내용을 알려드리면, 올해에는 시월 십이일부터 십 삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회가 개최됩니다. – How can we obtain a core content for paraphrasing (possibly by human)? • Discourse component [Portner, 2004]  This approach incorporates human generation, but can be efficient • E.g., the following can be discourse component for the declaratives: – 올해 시월 십이일부터 십삼일까지 카이스트에서 한글 및 한국어 정보처리 학술대회 개 최 (Common Ground) • Core content information in monolingual natural language format 9
  • 11. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act – How can it be utilized? 10 방카슈랑스란 무엇입니까 Intention identification Question? 방카슈랑스의 의미 Keyphrase extraction
  • 12. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act  Corpus: Intention identification for Korean (3i4K) [Cho et al., 2018]  Composition • Question • Command • Rhetorical question • Rhetorical command • Statement • Intonation-dependent utterances • Fragments 11 Includes only utterances whose determination of speech act was not affected by the sentence form • Utterances are non-canonical and colloquial • Includes various topics and situations
  • 13. Corpus construction • Annotating keyphrases on a Korean corpus regarding speech act 12
  • 14. Data augmentation • Generating questions and commands from keyphrases  Prototype model [Cho et al., 2018] lacks alternative Qs, prohibitions and strong REQs  Scarce within the corpus, but frequently utilized in real-life • Augmentation is required! but HOW? 13
  • 15. Data augmentation • Generating questions and commands from keyphrases  For a discourse component (keyphrase) of a statement, we can think of:  Similarly regarding question & commands: • Question set >> Question? • To-do-list >> Command! • Generating questions/commands differs from expressing a thought in interrogative/imperative (sentence form) 14 오늘 아침 사고 발생 (사고 남) • 오늘 아침에 사고났대. • 오늘 아침에 사고났다던데. • 그거 알아? 오늘 아침 사고난거. • 사고 났다더라구 오늘 아침에.
  • 16. Data augmentation • Generating questions and commands from keyphrases  Question/command types in need: • Alternative Q, Prohibition, Strong requirement (deficit) • Wh-question (more required for practical usage)  Phrases that are prepared: • Total phrase #: 2,000 – 400 for alternative Q – 800 for wh-Q – 400 for prohibition – 400 for strong requirement • Sentences to be generated per phrase: 10 • Topics: – 1,000 phrases for free topic – 250 phrases for mail, house control, schedule, and weather each  Leaves only the utterances with the consensus of more that 3 natives 15
  • 17. Data augmentation • Generating questions and commands from keyphrases  Guideline for the participants • 열 개의 문장은 최대한 서로 다른 스타일로 작성할 것. 이 때, 스타일은 존대 여부, 어조 등을 모두 포함. • 꼭 키프레이즈에 있는 말을 반복할 필요 없고, 상황에 맞는 다른 단어/어구/술어를 넣어도 됨. 구어로 발화하기 적합한 표현일 것. • 도치를 통해 문장 형태의 다양성을 추구하는 것 역시 권장됨. • 설명의문문의 경우 의문사가 필수적으로 들어가야 하며 선택의문문도 경우에 따 라 삽입될 수 있음. 두 문장 유형 모두 의문문으로 작성될 필요 없음. • 금지 문장의 경우 청자가 할 수 있는 어떤 행위를 하지 않도록 하는 문장이어야 하 며, 안 해도 괜찮다는 의미보다는 더 강제성을 지녀야 함. 그 행동을 금지하는 것이 다른 행동을 요구하는 것과 실질적으로 동치일 경우, 해당 표현으로 대체해도 크 게 문제되지 않음. • 금지와 강한 요구 문장 모두 명령문일 필요 없지만, 청자의 행동을 막거나 강제하 는 목적을 지녀야 함. 강한 권유도 가능함. • 화자/청자가 포함된 키프레이즈의 경우 각각 그에 상응하는 대명사 표현을 활용할 것. 이를 통해 화자/청자의 표현이 포함된 코퍼스와 포함되지 않은 코퍼스를 모두 구축. 16
  • 18. Data augmentation • Generating questions and commands from keyphrases 17
  • 19. Data augmentation • Generating questions and commands from keyphrases  Will be distributed via https://github.com/warnikchow/sae4k  The baseline system for automatic extraction is yet to be developed! 18
  • 20. Summary • Application of the concept “keyphrase”  Analysis of questions and commands in human-friendly conversation • Classification of non-canonical directive utterances • Pre-processing for the semantic parsing of non-canonical utterances • Making up an answer that continues the dialog – e.g., 오늘 비 언제까지 온대냐? >> 오늘 비 오는 시간대가 궁금하신가요? – (If inferred correctly...)  As a a core content of an utterance • For an efficient semantic web search (방카슈랑스?) • For an efficient human generation of paraphrase – More human-friendly compared to SQL (non-NL terms) or back-translation (requires multilingual ability) • Future work  Implementation of automatic keyphrase extraction system  Extension to paraphrasing or sentence similarity task 19
  • 21. Reference (order of appearance) • Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252. • Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685. • Bae, S., Kim, T., Kim, J., & Lee, S. G. (2019). Summary Level Training of Sentence Rewriting for Abstractive Summarization. arXiv preprint arXiv:1909.08752. • Zhong, V., Xiong, C., & Socher, R. (2017). Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103. • Mallinson, J., Sennrich, R., & Lapata, M. (2017, April). Paraphrasing revisited with neural machine translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (pp. 881-893). • Portner, P. (2004, September). The semantics of imperatives within a theory of clause types. In Semantics and linguistic theory (Vol. 14, pp. 235-252). • Cho, W. I., Lee, H. S., Yoon, J. W., Kim, S. M., & Kim, N. S. (2018). Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency. arXiv preprint arXiv:1811.04231. • Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631. 20

Editor's Notes

  1. .