์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์ฑ—๋ด‡ ์ผ€๋ผ์ฝ˜
2019. 11. 16
Presenter: ์กฐ์›์ต From Keracorn Counsel Team
Contents
โ€ข Introduction
๏‚ง Members
๏‚ง Contribution
๏‚ง Motivation
๏‚ง Overview
โ€ข Proposed scheme
๏‚ง Dataset construction
๏‚ง Sentence similarity test
๏‚ง Chatbot specification
โ€ข Summary
1
Introduction
โ€ข Members
๏‚ง ๊น€์Šฌ๊ธฐ (๋ฉ˜ํ† )
๏‚ง ์œค์—ฐ์ˆ™ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘)
๏‚ง ์ „์œ ํ˜„ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘)
๏‚ง ์•ˆํƒœ๊ฒฝ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌํ˜„)
๏‚ง ๋ฐ˜ํƒœํ˜• (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ๋ฐฐํฌ ๊ธฐ์—ฌ)
๏‚ง ๋ฏผ์žฌ์˜ฅ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์œ ์‚ฌ์—ฐ๊ตฌ ๊ณต์œ )
๏‚ง ์กฐ์›์ต (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์œ ์‚ฌ๋„ ์ธก์ • ๋ชจ๋ธ)
๏‚ง ํ™ฉ์žฌํฌ (ํ…”๋ ˆ๊ทธ๋žจ ์—ฐ๋™ ๋ฐ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌํ˜„)
๏‚ง ๊น€ํ•˜๋ฆผ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ)
๏‚ง ๊น€์„ ์ง„ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ)
๏‚ง ์กฐํ˜œ์˜ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘)
2
Introduction
โ€ข Contribution
๏‚ง ์ผ€๋ผ์Šค๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์ฑ—๋ด‡ ๊ตฌํ˜„
โ€ข ์‹ค์ œ ์ƒ๋‹ด์‚ฌ๋ก€์ง‘ ๋ฐ ์˜จ๋ผ์ธ ์ƒ๋‹ด์ž๋ฃŒ ๊ธฐ๋ฐ˜์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/์ •์ œ
โ€ข ํ•œ๊ตญ์–ด ์ฝ”ํผ์Šค๋ฅผ ์ด์šฉํ•œ ์œ ์‚ฌ๋„ ์ธก์ • ๋ชจ๋ธ ๊ตฌ์ถ•
โ€ข ํ…”๋ ˆ๊ทธ๋žจ API๋ฅผ ํ™œ์šฉํ•œ ์ฑ—๋ด‡ ํ”„๋กœํ† ํƒ€์ž… ๊ตฌํ˜„
๏‚ง ์ •๊ธฐ์ ์ธ ํšŒ์˜
โ€ข ์˜จ๋ผ์ธ/์˜คํ”„๋ผ์ธ ๋ฏธํŒ…/ํšŒ์˜๋ก
๏‚ง ์‚ฐ์ถœ๋ฌผ ๊ณต์œ 
โ€ข https://github.com/Keracorn/Counsel
3
Introduction
โ€ข Motivation
4
Introduction
โ€ข Motivation
๏‚ง ์‹ฌ๋ฆฌ์ ์œผ๋กœ ์ทจ์•ฝํ•˜๊ณ  ๊ณ ๋ฏผ์ด ๋งŽ์ง€๋งŒ ๋ฌผ์–ด๋ณผ ๊ณณ์ด ์ ์€ 10๋Œ€๋“ค์ด ๊ณ ๋ฏผ์„ ํ„ธ์–ด
๋†“๊ณ  ๋‹ต๋ณ€์„ ๋“ค์„ ์ˆ˜ ์žˆ๋Š” ์ฑ—๋ด‡
๏‚ง ๋ฒ”์œ„๊ฐ€ ์ผ๋ฐ˜์ธ ๋Œ€์ƒ ์ƒ๋‹ด์— ๋น„ํ•ด ๋ฒ”์œ„๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ข์•„ ๋‹ต๋ณ€์ด ์›ํ™œ
๏‚ง ์ผ์ƒ ์งˆ๋‹ต ๋Œ€ํ™”๋ณด๋‹ค๋Š” ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ์— ์œ ์‚ฌํ•œ ๋‚ด์šฉ์„ ์ฐพ๋Š” ๋ฐฉํ–ฅ ๊ตฌ์„ฑ
๏‚ง ์ •๋‹ต์„ ์ œ์‹œํ•˜๋Š” ์ฑ—๋ด‡์ด ์•„๋‹Œ ๋Œ€์ƒ์ž์˜ ๊ณ ๋ฏผ์— ๋Œ€ํ•œ ๊ณต๊ฐ์„ ํ•˜๋Š” ๋ฐฉํ–ฅ ์ถ”๊ตฌ
๏‚ง ๊ต์œก, ์ƒ๋‹ด ๋ถ„์•ผ์—์„œ ์ฐธ๊ณ ํ•  ์ž๋ฃŒ๊ฐ€ ๋งŽ์€ ๋ถ„์•ผ์ด๊ณ , ํŒ€์› ๋ฐ ์ง€์ธ์˜ ์ƒ๋‹ด ๊ฒฝํ—˜
๊ทธ๋ฆฌ๊ณ  ์ฒญ์†Œ๋…„ ์‚ฌ์ด๋ฒ„ ์ƒ๋‹ด ์„ผํ„ฐ์— ์˜ฌ๋ผ์˜ค๋Š” ๊ฒŒ์‹œ๊ธ€๋“ค์ด ํ•ด๋‹น ์ฑ—๋ด‡์˜ ํ•„์š”์„ฑ
์„ ์ฆ๋ช…
๏‚ง (์ด์Šˆ) ํ˜„์žฌ ์ฑ—๋ด‡ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ดˆ๋ณด์ ์ธ ๋‹จ๊ณ„์—์„œ ๋ชฉํ‘œ๋‹ฌ์„ฑ์„ ์œ„ํ•œ ์Šคํ„ฐ๋”” ๋ฐฉ
ํ–ฅ ๋ฐ ํ˜„์‹ค ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์šฐ๋ ค๊ฐ€ ์žˆ์ง€๋งŒ ์—ด๋ฆฐ ์•„์ด๋””์–ด์™€ ๋‹ค์–‘ํ•œ ๋…ผ์˜ ํ™œ๋™
์ž์ฒด๊ฐ€ ์ปจํŠธ๋ฆฌ๋ทฐํ†ค์˜ ๋ชฉ์ ์— ๋ถ€ํ•ฉํ•˜๋‹ค๊ณ  ํŒ๋‹จ
๏‚ง (ํ–ฅํ›„์ผ์ •) ํŒ€์› ๊ฐ๊ฐ ์ฃผ์ œ๋ณ„๋กœ ํ• ๋‹นํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ์ •์ œ ์ž‘์—… ์ง„ํ–‰
5
Introduction
โ€ข Overview
6
โ€œ๊ณ ๋“ฑํ•™๊ต 1ํ•™๋…„์ž…๋‹ˆ๋‹ค. ์—ด์‹ฌํžˆ ํ•˜๋Š”๋ฐ๋„
์„ฑ์ ์ด ๋„ˆ๋ฌด ์•ˆ ์˜ฌ๋ผ์„œ ๊ฑฑ์ •์ด์—์š”. ๋น ๋ฅด๊ฒŒ
์˜ฌ๋ฆด ๋งŒ ํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ์š”?โ€
โ€ข ,,,,,
โ€ข ๋ถ€๋ชจ๋‹˜๊ณผ์˜ ๋ถˆํ™”๊ฐ€ ์‹ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์‹œ ์‚ฌ์ด๊ฐ€ ์ข‹์•„์ง€๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜์ฃ ?
โ€ข ์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ•ด๋„ ์„ฑ์ ์ด ์•ˆ ์˜ค๋ฆ…๋‹ˆ๋‹ค. ๋น ๋ฅด๊ฒŒ ์˜ฌ๋ฆฌ๊ณ  ์‹ถ์–ด์š”.
โ€ข ๋‚จ์ž์นœ๊ตฌ์™€ ์„œ๋กœ ๋ฐ”๋น ์„œ ๋ฉ€์–ด์กŒ์–ด์š”. ๋‹ค์‹œ ํšŒ๋ณตํ•  ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ์š”?
โ€ข ,,,,,,
์„ฑ์ ์ด ์˜ค๋ฅด๋Š” ๊ฒƒ ๊ฐ™์œผ๋ฉด์„œ๋„ ์•ˆ ์˜ฌ๋ผ์„œ ๊ณ ๋ฏผ์ผ ๋•Œ๊ฐ€ ์žˆ์ง€.
์•„๋งˆ ๋งŽ์€ ํ•™์ƒ๋“ค์˜ โ€ฆ
์งˆ๋‹ต SET
ANSWER
์œ ์‚ฌ๋„ ์ธก์ •
Proposed scheme
7
โ€ข Dataset construction
๏‚ง Wee ์‹ฌ๋ฆฌ์ƒ๋‹ด์„œ๋น„์Šค, Naver ์ง€์‹ IN, ์ฒญ์†Œ๋…„ ์‹ฌ๋ฆฌ์ƒ๋‹ด ์‚ฌ๋ก€์ง‘ ๋“ฑ์˜ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์—์„œ
๏‚ง ๋ฐœ์ทŒ ๋ฐ ๊ฐ์ƒ‰ (๋น„์˜๋ฆฌ ๋ชฉ์ ์ด๋ฉฐ ๊ฐ์ƒ‰์„ ๊ฑฐ์น˜๋‚˜, ์ถ”๊ฐ€์ ์ธ ๋ผ์ด์„ ์Šค ํ™•์ธ ํ•„์š”)
๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘
๋ฐ์ดํ„ฐ
ํ†ตํ•ฉ
๋ฐ์ดํ„ฐ
์ •์ œ
Proposed scheme
8
โ€ข Dataset construction
๏‚ง ํ•™์—…/์ง„๋กœ, ํ•™๊ต ๋ถ€์ ์‘, ๊ฐ€์กฑ, ํ•™๊ตํญ๋ ฅ, ์„ฑ, ๋Œ€์ธ๊ด€๊ณ„/๋”ฐ๋Œ๋ฆผ/์—ฐ์• , ์„ฑ๊ฒฉ, ์ •์‹ ๊ฑด๊ฐ•
Proposed scheme
โ€ข Dataset construction
๏‚ง Corpus for keyphrase extraction (will be expanded to sentence similarity)
9
Proposed scheme
10
โ€ข Sentence similarity test
๏‚ง ๋ชจ๋ธ ๊ฐœ๋ฐœ์—์„œ ํ•ต์‹ฌ์ด ๋˜๋Š” ๋ถ€๋ถ„: ์‚ฌ์šฉ์ž์˜ input์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” DB์˜ ์–ด๋–ค
QA set๊ณผ ๊ฐ€์žฅ ์ž˜ ๋งค์นญ๋˜๋Š”์ง€ ํŒŒ์•…ํ•˜๋Š” ๋ถ€๋ถ„
โ€ข (1) ์‚ฌ์šฉ์ž์˜ input์—์„œ ์ƒ๋‹ด์— ํ•„์š”ํ•œ ๋ถ€๋ถ„๋“ค์„ ๋ชจ์•„ ๊ฐ€๊ณตํ•˜์—ฌ, QA set๊ณผ ์ž˜ ๋งค์นญ
๋  ์ˆ˜ ์žˆ๋Š” ์–ด๋–ค ๋ฌธ์žฅ/๋ฌธ์žฅ list์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“œ๋Š” ๊ณผ์ •
โ€“ summarization system์„ ๋ณ„๋„๋กœ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ด์ƒ์ ํ•˜์ง€๋งŒ ํ˜„์‹ค์ ์œผ๋กœ ๋น„์ •ํ˜•
๊ตฌ์–ด์— ๋Œ€ํ•œ ์š”์•ฝ์„ ๋‹จ๊ธฐ๊ฐ„์— ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ด€๊ณ„๋กœ, ๋‚ด๋‹ด์ž๊ฐ€ ํ•œ turn๋™์•ˆ ๋งํ•˜๋Š” ๊ฒƒ
์ด๋ผ๊ณ  ์ƒ๊ฐ๋˜๋Š” input sentence๋“ค์˜ ๋ชจ์Œ(INPUT)์˜ ์ด๊ธธ์ด๊ฐ€ 100์Œ์ ˆ ์ด์ƒ์ธ ๊ฒฝ์šฐ(๊ณต
๋ฐฑ ํฌํ•จ), INPUT์—์„œ ๋’ท๋ถ€๋ถ„์˜ 100์Œ์ ˆ๋งŒ ๋”ฐ๋กœ ์ž˜๋ผ์„œ ์‚ฌ์šฉ. ๋งŒ์•ฝ INPUT์ด 100์Œ์ ˆ๋ณด๋‹ค
์งง์„ ๊ฒฝ์šฐ ๋ชจ๋‘ ์‚ฌ์šฉ.
โ€ข (2) ์ตœ์ข…์ ์œผ๋กœ Input์ด ๋˜๋Š” ๋ฌธ์žฅ/๋ฌธ์žฅ list์™€ ๊ฐ€์žฅ ๋น„์Šทํ•œ topic ๋ฐ intention์„ ๊ฐ€
์ง„ ๋ฐœํ™”(์—ฌ๊ธฐ์„œ๋Š” DB์˜ ์ƒ๋‹ด ์งˆ๋‹ต SET์˜ ์›์†Œ)์™€ ๋งค์นญํ•˜๋Š” ๊ณผ์ •
๏‚ง Featurization:
โ€ข ๋ฌธ์žฅ์„ ์–ด๋–ป๊ฒŒ ์ˆ˜์น˜ํ™”ํ•  ๊ฒƒ์ด๋ƒ?
โ€ข ์ˆ˜์น˜ํ™”๋œ ๋ฌธ์žฅ๋“ค์„ ์–ด๋–ป๊ฒŒ ๋ฐฐ์—ดํ•  ๊ฒƒ์ด๋ƒ?
Proposed scheme
โ€ข Sentence similarity test
๏‚ง Character-level embedding
11
๋ฐ˜A character (pan)
First sound (cho-seng) Second sound (cung-seng)
Third sound (cong-seng)
Structure: {Syllable: CV(C)}
# First sound (C): 19
# Second sound (V): 11
# Third sound (C): 27 + โ€˜ โ€˜
Total 19 * 11 * 28 = 11,172 characters!
Proposed scheme
โ€ข Sentence similarity test
๏‚ง Series vs. parallel arrangement
12
๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋ธ ํ•™์Šต ๋ชจ๋“ˆ ํ†ตํ•ฉ
S1 [SEP] S2
Series/Parallel
self-attentive BiLSTM
Non-related?
Related? (topic/act)
Paraphrase?
Query DB Questions
Proper Answer
์กฐ์›์ต, ๋ฌธ์˜๊ธฐ, ๊น€์ข…์ธ, ๊น€๋‚จ์ˆ˜, "๋‹ดํ™” ์„ฑ๋ถ„์„ ํ™œ์šฉํ•œ ์ง€์‹œ ๋ฐœํ™”์˜
ํ‚ค ํ”„๋ ˆ์ด์ฆˆ ์ถ”์ถœ: ํ•œ๊ตญ์–ด ๋ณ‘๋ ฌ ์ฝ”ํผ์Šค ๊ตฌ์ถ• ๋ฐ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•๋ก "
์ œ31ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ, 2019, pp. 241-245.
Proposed scheme
โ€ข Sentence similarity (dataset will be distributed)
๏‚ง Mail, schedule, house control, weather (4 topics)
๏‚ง Alt. Q, Wh- Q., Prohibition, Requirement (4 intentions)
๏‚ง 10,000 utterances
to about 550K pairs
13
S1 [SEP] S2
...
S1 [SEP] S2
S1
S2
...
...
(a)
(b)
(c)
Proposed scheme
โ€ข Chatbot specification
๏‚ง (ํ”Œ๋žซํผ) ๋‹ค์–‘ํ•œ ํ”Œ๋žซํผ์„ ์กฐ์‚ฌํ•œ ๊ฒฐ๊ณผ ํ…”๋ ˆ๊ทธ๋žจ ํ”Œ๋žซํผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •
โ€ข Token์„ ํ†ตํ•ด ์—ฐ๋™ํ•˜๊ณ , webhook์„ ์ด์šฉํ•ด API๋กœ request, response ์ง„ํ–‰
โ€ข ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ์•Œ๋งž์€ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ๋‚ด๋‹ด์ž์˜ ์ด๋ฆ„์„ ๋ฌผ์–ด๋ณด๋ฉฐ, ๋‚ด๋‹ด์ž์˜ ๊ณ 
๋ฏผ์„ ๋ฌธ์ž์—ด ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์— ์ „๋‹ฌ
โ€ข ๋ชจ๋ธ์€ ๊ณ ๋ฏผ๊ณผ ๊ฐ€์žฅ ๋น„์Šทํ•œ ์งˆ๋ฌธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„, ํ•ด๋‹น ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๋ฐ˜ํ™˜ํ•˜
๋„๋ก ๊ตฌํ˜„
๏‚ง (ํ˜•์ƒ๊ด€๋ฆฌ) ๊ณต์œ  Github๋ฅผ ์ด์šฉํ•˜์—ฌ ์†Œ์Šค์ฝ”๋“œ๋ฅผ ์—…๋กœ๋“œ ํ•˜๊ณ , ์ง€์†์ ์ธ
Update, Commit ํ™œ๋™์˜ ์ด๋ ฅ์„ ๋‚จ๊ฒจ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ๋Œ€ํ•œ ๋…ธํ•˜์šฐ ์Šต๋“
โ€ข ์ง€์†์ ์ธ update, commit ํ™œ๋™์€ ์ปจํŠธ๋ฆฌ๋ทฐํ†ค์˜ ์‚ฌ์ƒ๊ณผ ์ผ์น˜ํ•˜๋ฏ€๋กœ ๋ˆ„๊ตฌ๋‚˜ ์‰ฝ๊ฒŒ
์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒ์„ธํ•œ ์„ค๋ช… ํ•„์š”
โ€ข ๊ณต์œ  ๋ฐ ๊ณต๊ฐœSW ํ™œ์šฉ์œผ๋กœ ์ธ๊ณต์ง€๋Šฅ ์ƒํƒœ๊ณ„์— ๊ธฐ์—ฌ
โ€ข ํƒ€์ธ์˜ ๊ฐœ๋ฐœ์†Œ์Šค ๊ด€์ฐฐ ๋ฐ ํ˜‘์—…์„ ํ†ตํ•ด ๊ฐœ์ธ์ ์ธ ์—ญ๋Ÿ‰์— ๊ธฐ์—ฌ
14
Proposed scheme
โ€ข Chatbot specification
15
์‹ค์ œ ์ฑ—๋ด‡ ๊ตฌ์„ฑ
* ํ…”๋ ˆ๊ทธ๋žจ ํ”Œ๋žซํผ์— ์—ฐ๋™ํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค ๊ตฌ์„ฑ
* ํ…”๋ ˆ๊ทธ๋žจ์— ๋“ฑ๋กํ•œ url๋กœ ๋ฉ”์„ธ์ง€๋ฅผ ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ๋Š” webhook์„ ์ด์šฉ, http api๋กœ
๋ฉ”์„ธ์ง€ ๊ตํ™˜
* ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”๋ฅผ ์ด์–ด๊ฐˆ ์ˆ˜ ์žˆ๋„๋ก ์ผ์ • ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ์ด๋ฆ„์„ ๋ฌป๊ณ , ์งˆ๋ฌธ
์„ ๋ชจ๋ธ์— ์ „๋‹ฌ
Keracorn
Server
Telegram
Server
POST request
User
๋ฉ”์„ธ์ง€ ์ „์†ก
๋‹ต๋ณ€response
Proposed scheme
โ€ข Chatbot specification
๏‚ง Main scenario
โ€ข ์ธ์‚ฌ (Greetings)
โ€ข ๋ณ„๋ช…, ๋‚˜์ด ๋ฌป๊ธฐ
โ€ข ``๋‚ด๋‹ด์ž์•ผ, ์™œ ๋‚˜๋ฅผ ์ฐพ์•„์™”๋‹ˆ?โ€™โ€™
โ€“ ์งˆ๋ฌธ ๋ฐ›๊ธฐ
โ€“ ์งˆ๋ฌธ์ด ๋๋‚ฌ๋‹ค๋Š” ๊ฒƒ์„ ์–ด๋–ป๊ฒŒ ๊ตฌ๋ถ„ํ•  ๊ฒƒ์ธ๊ฐ€?
ยป ์‹œ๊ฐ„์œผ๋กœ ์ผ๋‹จ ์ •ํ•˜์ž!
ยป ์‘๋‹ต์ด ์ง€์—ฐ๋˜๋”๋ผ๋„ ์ถ”์ž„์ƒˆ(์Œโ€ฆ, ๊ทธ๋ ‡๊ตฌ๋‚˜, etc.)๋ฅผ ํ•ด์ฃผ๋ฉด ๊ดœ์ฐฎ์„ ๊ฒƒ! (not yet)
โ€“ ์งˆ๋ฌธ์ด ๋ชจ๋‘ ๋๋‚˜๋ฉด ๋‹ต๋ณ€
ยป ์งˆ๋ฌธ ์œ ์‚ฌ๋„ ๋ชจ๋ธ์„ dataset์— ๋Œ€ํ•ด inference (not yet)
16
Summary
17
์ œ์•ˆ ๋ฐฐ๊ฒฝ
Why
โ€ข ์ •์„œ์ ์œผ๋กœ ์ทจ์•ฝํ•œ ์ฒญ์†Œ๋…„๋“ค์ด ๊ณ ๋ฏผ์„ ์ƒ๋‹ดํ•  ๊ณณ์ด ๋งŽ์ง€ ์•Š์Œ
โ€ข ๋Œ€๋ฉด ์ƒ๋‹ดํ•˜๊ธฐ ์–ด๋ ต๊ฑฐ๋‚˜ ๋ฏผ๊ฐํ•œ ์ฃผ์ œ๋“ค์ด ์กด์žฌํ•จ
โ€ข ์‚ฌ๋žŒ๋ณด๋‹ค ์ธ๊ณต์ง€๋Šฅ์„ ๋Œ€์ƒ์œผ๋กœ ๊ณ ๋ฏผ์„ ํ„ธ์–ด๋†“๋Š” ๊ฒƒ์ด ์‹ฌ๋ฆฌ์ ์œผ๋กœ ์•ˆ์ •๋  ์ˆ˜ ์žˆ์Œ
์ฑ—๋ด‡ ์†Œ๊ฐœ
What
โ€ข ์ฒญ์†Œ๋…„์˜ ๊ณ ๋ฏผ์„ ๋“ค์–ด์คŒ
โ€ข ๊ณ ๋ฏผ์„ ๋‹ค ๋“ฃ๊ณ , ๊ฐ€์žฅ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š” ๋‹ต๋ณ€์„ ์ฐพ์•„์คŒ
์ฒญ์†Œ๋…„๋“ค์˜๋งํ•  ๊ณณ ์—†๋Š” ๊ณ ๋ฏผ์ƒ๋‹ด์„๋“ค์–ด์ฃผ๊ณ 
๊ฐ„๋‹จํ•œ ๋‹ต๋ณ€์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค๋ฉด?
๊ณผ์ • ์†Œ๊ฐœ
How
โ€ข ์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์งˆ์˜์‘๋‹ต ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ (์•ฝ 180๊ฐœ ์œ ํ˜•)
โ€ข ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ/์ •์ œ ๋ฐ ์งˆ์˜๋ฌธ์žฅ ๊ฐ„๋žตํ™”
โ€ข ๊ธฐ์กด์˜ ๋ฌธ์žฅ ์œ ์‚ฌ๋„ ๋น„๊ต ์ฝ”ํผ์Šค ํ™œ์šฉํ•˜์—ฌ Keras ๋ชจ๋ธ ํ•™์Šต
โ€ข ์ž…๋ ฅ ๋ฌธ์žฅ์„ ์ทจํ•ฉํ•˜๊ณ  ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์งˆ์˜๋ฅผ ์ฐพ๋Š” ๋ชจ๋“ˆ ๊ตฌ์ถ•
โ€ข ํ•ด๋‹น ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•˜์—ฌ, ์ฒซ์ธ์‚ฌ๋ถ€ํ„ฐ ๋Œ€ํ™”๋ฅผ ์ด์–ด๋‚˜๊ฐ€๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌ์„ฑ
โ€ข ํ…”๋ ˆ๊ทธ๋žจ API๋ฅผ ์ด์šฉํ•œ ์ฑ—๋ด‡ ์‹œ์Šคํ…œ ๊ตฌ์ถ•
์ฒญ์†Œ๋…„ ์ƒ๋‹ด๋ด‡ ์ผ€๋ผ์ฝ˜
Reference (order of appearance)
โ€ข Cho, W. I., Kim, S. M., & Kim, N. S. (2019). Investigating an Effective Character-level Embedding in
Korean Sentence Classification. arXiv preprint arXiv:1905.13656.
โ€ข Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question
and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631.
โ€ข ์กฐ์›์ต, ๋ฌธ์˜๊ธฐ, ๊น€์ข…์ธ, ๊น€๋‚จ์ˆ˜, "๋‹ดํ™” ์„ฑ๋ถ„์„ ํ™œ์šฉํ•œ ์ง€์‹œ ๋ฐœํ™”์˜ ํ‚ค ํ”„๋ ˆ์ด์ฆˆ ์ถ”์ถœ: ํ•œ๊ตญ์–ด ๋ณ‘๋ ฌ ์ฝ”ํผ์Šค
๊ตฌ์ถ• ๋ฐ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•๋ก " ์ œ31ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ, 2019, pp. 241-245.
โ€ข Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on
Signal Processing 45.11 (1997): 2673-2681.
โ€ข Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-
attentive sentence embedding. arXiv preprint arXiv:1703.03130.
โ€ข Chollet, F. (2015). Keras.
โ€ข Cho, W. I., Cho, J., Kang, W. H., & Kim, N. S. (2019). Disambiguating Speech Intention via Audio-Text
Co-attention Framework: A Case of Prosody-semantics Interface. arXiv preprint arXiv:1910.09275.
18
Thank you!
EndOfPresentation

1911 keracorn

  • 1.
    ์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์ฑ—๋ด‡์ผ€๋ผ์ฝ˜ 2019. 11. 16 Presenter: ์กฐ์›์ต From Keracorn Counsel Team
  • 2.
    Contents โ€ข Introduction ๏‚ง Members ๏‚งContribution ๏‚ง Motivation ๏‚ง Overview โ€ข Proposed scheme ๏‚ง Dataset construction ๏‚ง Sentence similarity test ๏‚ง Chatbot specification โ€ข Summary 1
  • 3.
    Introduction โ€ข Members ๏‚ง ๊น€์Šฌ๊ธฐ(๋ฉ˜ํ† ) ๏‚ง ์œค์—ฐ์ˆ™ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘) ๏‚ง ์ „์œ ํ˜„ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘) ๏‚ง ์•ˆํƒœ๊ฒฝ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌํ˜„) ๏‚ง ๋ฐ˜ํƒœํ˜• (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ๋ฐฐํฌ ๊ธฐ์—ฌ) ๏‚ง ๋ฏผ์žฌ์˜ฅ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์œ ์‚ฌ์—ฐ๊ตฌ ๊ณต์œ ) ๏‚ง ์กฐ์›์ต (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์œ ์‚ฌ๋„ ์ธก์ • ๋ชจ๋ธ) ๏‚ง ํ™ฉ์žฌํฌ (ํ…”๋ ˆ๊ทธ๋žจ ์—ฐ๋™ ๋ฐ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌํ˜„) ๏‚ง ๊น€ํ•˜๋ฆผ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ) ๏‚ง ๊น€์„ ์ง„ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ) ๏‚ง ์กฐํ˜œ์˜ (์ฑ—๋ด‡ ์•„์ด๋””์–ด ๋„์ถœ ๋ฐ ์ƒ๋‹ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘) 2
  • 4.
    Introduction โ€ข Contribution ๏‚ง ์ผ€๋ผ์Šค๋ฅผํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์ฑ—๋ด‡ ๊ตฌํ˜„ โ€ข ์‹ค์ œ ์ƒ๋‹ด์‚ฌ๋ก€์ง‘ ๋ฐ ์˜จ๋ผ์ธ ์ƒ๋‹ด์ž๋ฃŒ ๊ธฐ๋ฐ˜์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/์ •์ œ โ€ข ํ•œ๊ตญ์–ด ์ฝ”ํผ์Šค๋ฅผ ์ด์šฉํ•œ ์œ ์‚ฌ๋„ ์ธก์ • ๋ชจ๋ธ ๊ตฌ์ถ• โ€ข ํ…”๋ ˆ๊ทธ๋žจ API๋ฅผ ํ™œ์šฉํ•œ ์ฑ—๋ด‡ ํ”„๋กœํ† ํƒ€์ž… ๊ตฌํ˜„ ๏‚ง ์ •๊ธฐ์ ์ธ ํšŒ์˜ โ€ข ์˜จ๋ผ์ธ/์˜คํ”„๋ผ์ธ ๋ฏธํŒ…/ํšŒ์˜๋ก ๏‚ง ์‚ฐ์ถœ๋ฌผ ๊ณต์œ  โ€ข https://github.com/Keracorn/Counsel 3
  • 5.
  • 6.
    Introduction โ€ข Motivation ๏‚ง ์‹ฌ๋ฆฌ์ ์œผ๋กœ์ทจ์•ฝํ•˜๊ณ  ๊ณ ๋ฏผ์ด ๋งŽ์ง€๋งŒ ๋ฌผ์–ด๋ณผ ๊ณณ์ด ์ ์€ 10๋Œ€๋“ค์ด ๊ณ ๋ฏผ์„ ํ„ธ์–ด ๋†“๊ณ  ๋‹ต๋ณ€์„ ๋“ค์„ ์ˆ˜ ์žˆ๋Š” ์ฑ—๋ด‡ ๏‚ง ๋ฒ”์œ„๊ฐ€ ์ผ๋ฐ˜์ธ ๋Œ€์ƒ ์ƒ๋‹ด์— ๋น„ํ•ด ๋ฒ”์œ„๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ข์•„ ๋‹ต๋ณ€์ด ์›ํ™œ ๏‚ง ์ผ์ƒ ์งˆ๋‹ต ๋Œ€ํ™”๋ณด๋‹ค๋Š” ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ์— ์œ ์‚ฌํ•œ ๋‚ด์šฉ์„ ์ฐพ๋Š” ๋ฐฉํ–ฅ ๊ตฌ์„ฑ ๏‚ง ์ •๋‹ต์„ ์ œ์‹œํ•˜๋Š” ์ฑ—๋ด‡์ด ์•„๋‹Œ ๋Œ€์ƒ์ž์˜ ๊ณ ๋ฏผ์— ๋Œ€ํ•œ ๊ณต๊ฐ์„ ํ•˜๋Š” ๋ฐฉํ–ฅ ์ถ”๊ตฌ ๏‚ง ๊ต์œก, ์ƒ๋‹ด ๋ถ„์•ผ์—์„œ ์ฐธ๊ณ ํ•  ์ž๋ฃŒ๊ฐ€ ๋งŽ์€ ๋ถ„์•ผ์ด๊ณ , ํŒ€์› ๋ฐ ์ง€์ธ์˜ ์ƒ๋‹ด ๊ฒฝํ—˜ ๊ทธ๋ฆฌ๊ณ  ์ฒญ์†Œ๋…„ ์‚ฌ์ด๋ฒ„ ์ƒ๋‹ด ์„ผํ„ฐ์— ์˜ฌ๋ผ์˜ค๋Š” ๊ฒŒ์‹œ๊ธ€๋“ค์ด ํ•ด๋‹น ์ฑ—๋ด‡์˜ ํ•„์š”์„ฑ ์„ ์ฆ๋ช… ๏‚ง (์ด์Šˆ) ํ˜„์žฌ ์ฑ—๋ด‡ ๊ฐœ๋…์— ๋Œ€ํ•ด ์ดˆ๋ณด์ ์ธ ๋‹จ๊ณ„์—์„œ ๋ชฉํ‘œ๋‹ฌ์„ฑ์„ ์œ„ํ•œ ์Šคํ„ฐ๋”” ๋ฐฉ ํ–ฅ ๋ฐ ํ˜„์‹ค ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์šฐ๋ ค๊ฐ€ ์žˆ์ง€๋งŒ ์—ด๋ฆฐ ์•„์ด๋””์–ด์™€ ๋‹ค์–‘ํ•œ ๋…ผ์˜ ํ™œ๋™ ์ž์ฒด๊ฐ€ ์ปจํŠธ๋ฆฌ๋ทฐํ†ค์˜ ๋ชฉ์ ์— ๋ถ€ํ•ฉํ•˜๋‹ค๊ณ  ํŒ๋‹จ ๏‚ง (ํ–ฅํ›„์ผ์ •) ํŒ€์› ๊ฐ๊ฐ ์ฃผ์ œ๋ณ„๋กœ ํ• ๋‹นํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ์ •์ œ ์ž‘์—… ์ง„ํ–‰ 5
  • 7.
    Introduction โ€ข Overview 6 โ€œ๊ณ ๋“ฑํ•™๊ต 1ํ•™๋…„์ž…๋‹ˆ๋‹ค.์—ด์‹ฌํžˆ ํ•˜๋Š”๋ฐ๋„ ์„ฑ์ ์ด ๋„ˆ๋ฌด ์•ˆ ์˜ฌ๋ผ์„œ ๊ฑฑ์ •์ด์—์š”. ๋น ๋ฅด๊ฒŒ ์˜ฌ๋ฆด ๋งŒ ํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ์š”?โ€ โ€ข ,,,,, โ€ข ๋ถ€๋ชจ๋‹˜๊ณผ์˜ ๋ถˆํ™”๊ฐ€ ์‹ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์‹œ ์‚ฌ์ด๊ฐ€ ์ข‹์•„์ง€๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•˜์ฃ ? โ€ข ์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ•ด๋„ ์„ฑ์ ์ด ์•ˆ ์˜ค๋ฆ…๋‹ˆ๋‹ค. ๋น ๋ฅด๊ฒŒ ์˜ฌ๋ฆฌ๊ณ  ์‹ถ์–ด์š”. โ€ข ๋‚จ์ž์นœ๊ตฌ์™€ ์„œ๋กœ ๋ฐ”๋น ์„œ ๋ฉ€์–ด์กŒ์–ด์š”. ๋‹ค์‹œ ํšŒ๋ณตํ•  ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ์š”? โ€ข ,,,,,, ์„ฑ์ ์ด ์˜ค๋ฅด๋Š” ๊ฒƒ ๊ฐ™์œผ๋ฉด์„œ๋„ ์•ˆ ์˜ฌ๋ผ์„œ ๊ณ ๋ฏผ์ผ ๋•Œ๊ฐ€ ์žˆ์ง€. ์•„๋งˆ ๋งŽ์€ ํ•™์ƒ๋“ค์˜ โ€ฆ ์งˆ๋‹ต SET ANSWER ์œ ์‚ฌ๋„ ์ธก์ •
  • 8.
    Proposed scheme 7 โ€ข Datasetconstruction ๏‚ง Wee ์‹ฌ๋ฆฌ์ƒ๋‹ด์„œ๋น„์Šค, Naver ์ง€์‹ IN, ์ฒญ์†Œ๋…„ ์‹ฌ๋ฆฌ์ƒ๋‹ด ์‚ฌ๋ก€์ง‘ ๋“ฑ์˜ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์—์„œ ๏‚ง ๋ฐœ์ทŒ ๋ฐ ๊ฐ์ƒ‰ (๋น„์˜๋ฆฌ ๋ชฉ์ ์ด๋ฉฐ ๊ฐ์ƒ‰์„ ๊ฑฐ์น˜๋‚˜, ์ถ”๊ฐ€์ ์ธ ๋ผ์ด์„ ์Šค ํ™•์ธ ํ•„์š”) ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ๋ฐ์ดํ„ฐ ์ •์ œ
  • 9.
    Proposed scheme 8 โ€ข Datasetconstruction ๏‚ง ํ•™์—…/์ง„๋กœ, ํ•™๊ต ๋ถ€์ ์‘, ๊ฐ€์กฑ, ํ•™๊ตํญ๋ ฅ, ์„ฑ, ๋Œ€์ธ๊ด€๊ณ„/๋”ฐ๋Œ๋ฆผ/์—ฐ์• , ์„ฑ๊ฒฉ, ์ •์‹ ๊ฑด๊ฐ•
  • 10.
    Proposed scheme โ€ข Datasetconstruction ๏‚ง Corpus for keyphrase extraction (will be expanded to sentence similarity) 9
  • 11.
    Proposed scheme 10 โ€ข Sentencesimilarity test ๏‚ง ๋ชจ๋ธ ๊ฐœ๋ฐœ์—์„œ ํ•ต์‹ฌ์ด ๋˜๋Š” ๋ถ€๋ถ„: ์‚ฌ์šฉ์ž์˜ input์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” DB์˜ ์–ด๋–ค QA set๊ณผ ๊ฐ€์žฅ ์ž˜ ๋งค์นญ๋˜๋Š”์ง€ ํŒŒ์•…ํ•˜๋Š” ๋ถ€๋ถ„ โ€ข (1) ์‚ฌ์šฉ์ž์˜ input์—์„œ ์ƒ๋‹ด์— ํ•„์š”ํ•œ ๋ถ€๋ถ„๋“ค์„ ๋ชจ์•„ ๊ฐ€๊ณตํ•˜์—ฌ, QA set๊ณผ ์ž˜ ๋งค์นญ ๋  ์ˆ˜ ์žˆ๋Š” ์–ด๋–ค ๋ฌธ์žฅ/๋ฌธ์žฅ list์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“œ๋Š” ๊ณผ์ • โ€“ summarization system์„ ๋ณ„๋„๋กœ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ด์ƒ์ ํ•˜์ง€๋งŒ ํ˜„์‹ค์ ์œผ๋กœ ๋น„์ •ํ˜• ๊ตฌ์–ด์— ๋Œ€ํ•œ ์š”์•ฝ์„ ๋‹จ๊ธฐ๊ฐ„์— ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ด€๊ณ„๋กœ, ๋‚ด๋‹ด์ž๊ฐ€ ํ•œ turn๋™์•ˆ ๋งํ•˜๋Š” ๊ฒƒ ์ด๋ผ๊ณ  ์ƒ๊ฐ๋˜๋Š” input sentence๋“ค์˜ ๋ชจ์Œ(INPUT)์˜ ์ด๊ธธ์ด๊ฐ€ 100์Œ์ ˆ ์ด์ƒ์ธ ๊ฒฝ์šฐ(๊ณต ๋ฐฑ ํฌํ•จ), INPUT์—์„œ ๋’ท๋ถ€๋ถ„์˜ 100์Œ์ ˆ๋งŒ ๋”ฐ๋กœ ์ž˜๋ผ์„œ ์‚ฌ์šฉ. ๋งŒ์•ฝ INPUT์ด 100์Œ์ ˆ๋ณด๋‹ค ์งง์„ ๊ฒฝ์šฐ ๋ชจ๋‘ ์‚ฌ์šฉ. โ€ข (2) ์ตœ์ข…์ ์œผ๋กœ Input์ด ๋˜๋Š” ๋ฌธ์žฅ/๋ฌธ์žฅ list์™€ ๊ฐ€์žฅ ๋น„์Šทํ•œ topic ๋ฐ intention์„ ๊ฐ€ ์ง„ ๋ฐœํ™”(์—ฌ๊ธฐ์„œ๋Š” DB์˜ ์ƒ๋‹ด ์งˆ๋‹ต SET์˜ ์›์†Œ)์™€ ๋งค์นญํ•˜๋Š” ๊ณผ์ • ๏‚ง Featurization: โ€ข ๋ฌธ์žฅ์„ ์–ด๋–ป๊ฒŒ ์ˆ˜์น˜ํ™”ํ•  ๊ฒƒ์ด๋ƒ? โ€ข ์ˆ˜์น˜ํ™”๋œ ๋ฌธ์žฅ๋“ค์„ ์–ด๋–ป๊ฒŒ ๋ฐฐ์—ดํ•  ๊ฒƒ์ด๋ƒ?
  • 12.
    Proposed scheme โ€ข Sentencesimilarity test ๏‚ง Character-level embedding 11 ๋ฐ˜A character (pan) First sound (cho-seng) Second sound (cung-seng) Third sound (cong-seng) Structure: {Syllable: CV(C)} # First sound (C): 19 # Second sound (V): 11 # Third sound (C): 27 + โ€˜ โ€˜ Total 19 * 11 * 28 = 11,172 characters!
  • 13.
    Proposed scheme โ€ข Sentencesimilarity test ๏‚ง Series vs. parallel arrangement 12 ๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋ธ ํ•™์Šต ๋ชจ๋“ˆ ํ†ตํ•ฉ S1 [SEP] S2 Series/Parallel self-attentive BiLSTM Non-related? Related? (topic/act) Paraphrase? Query DB Questions Proper Answer ์กฐ์›์ต, ๋ฌธ์˜๊ธฐ, ๊น€์ข…์ธ, ๊น€๋‚จ์ˆ˜, "๋‹ดํ™” ์„ฑ๋ถ„์„ ํ™œ์šฉํ•œ ์ง€์‹œ ๋ฐœํ™”์˜ ํ‚ค ํ”„๋ ˆ์ด์ฆˆ ์ถ”์ถœ: ํ•œ๊ตญ์–ด ๋ณ‘๋ ฌ ์ฝ”ํผ์Šค ๊ตฌ์ถ• ๋ฐ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•๋ก " ์ œ31ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ, 2019, pp. 241-245.
  • 14.
    Proposed scheme โ€ข Sentencesimilarity (dataset will be distributed) ๏‚ง Mail, schedule, house control, weather (4 topics) ๏‚ง Alt. Q, Wh- Q., Prohibition, Requirement (4 intentions) ๏‚ง 10,000 utterances to about 550K pairs 13 S1 [SEP] S2 ... S1 [SEP] S2 S1 S2 ... ... (a) (b) (c)
  • 15.
    Proposed scheme โ€ข Chatbotspecification ๏‚ง (ํ”Œ๋žซํผ) ๋‹ค์–‘ํ•œ ํ”Œ๋žซํผ์„ ์กฐ์‚ฌํ•œ ๊ฒฐ๊ณผ ํ…”๋ ˆ๊ทธ๋žจ ํ”Œ๋žซํผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ • โ€ข Token์„ ํ†ตํ•ด ์—ฐ๋™ํ•˜๊ณ , webhook์„ ์ด์šฉํ•ด API๋กœ request, response ์ง„ํ–‰ โ€ข ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ์•Œ๋งž์€ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ๋‚ด๋‹ด์ž์˜ ์ด๋ฆ„์„ ๋ฌผ์–ด๋ณด๋ฉฐ, ๋‚ด๋‹ด์ž์˜ ๊ณ  ๋ฏผ์„ ๋ฌธ์ž์—ด ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์— ์ „๋‹ฌ โ€ข ๋ชจ๋ธ์€ ๊ณ ๋ฏผ๊ณผ ๊ฐ€์žฅ ๋น„์Šทํ•œ ์งˆ๋ฌธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„, ํ•ด๋‹น ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๋ฐ˜ํ™˜ํ•˜ ๋„๋ก ๊ตฌํ˜„ ๏‚ง (ํ˜•์ƒ๊ด€๋ฆฌ) ๊ณต์œ  Github๋ฅผ ์ด์šฉํ•˜์—ฌ ์†Œ์Šค์ฝ”๋“œ๋ฅผ ์—…๋กœ๋“œ ํ•˜๊ณ , ์ง€์†์ ์ธ Update, Commit ํ™œ๋™์˜ ์ด๋ ฅ์„ ๋‚จ๊ฒจ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ๋Œ€ํ•œ ๋…ธํ•˜์šฐ ์Šต๋“ โ€ข ์ง€์†์ ์ธ update, commit ํ™œ๋™์€ ์ปจํŠธ๋ฆฌ๋ทฐํ†ค์˜ ์‚ฌ์ƒ๊ณผ ์ผ์น˜ํ•˜๋ฏ€๋กœ ๋ˆ„๊ตฌ๋‚˜ ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒ์„ธํ•œ ์„ค๋ช… ํ•„์š” โ€ข ๊ณต์œ  ๋ฐ ๊ณต๊ฐœSW ํ™œ์šฉ์œผ๋กœ ์ธ๊ณต์ง€๋Šฅ ์ƒํƒœ๊ณ„์— ๊ธฐ์—ฌ โ€ข ํƒ€์ธ์˜ ๊ฐœ๋ฐœ์†Œ์Šค ๊ด€์ฐฐ ๋ฐ ํ˜‘์—…์„ ํ†ตํ•ด ๊ฐœ์ธ์ ์ธ ์—ญ๋Ÿ‰์— ๊ธฐ์—ฌ 14
  • 16.
    Proposed scheme โ€ข Chatbotspecification 15 ์‹ค์ œ ์ฑ—๋ด‡ ๊ตฌ์„ฑ * ํ…”๋ ˆ๊ทธ๋žจ ํ”Œ๋žซํผ์— ์—ฐ๋™ํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค ๊ตฌ์„ฑ * ํ…”๋ ˆ๊ทธ๋žจ์— ๋“ฑ๋กํ•œ url๋กœ ๋ฉ”์„ธ์ง€๋ฅผ ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ๋Š” webhook์„ ์ด์šฉ, http api๋กœ ๋ฉ”์„ธ์ง€ ๊ตํ™˜ * ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”๋ฅผ ์ด์–ด๊ฐˆ ์ˆ˜ ์žˆ๋„๋ก ์ผ์ • ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ๋‘๊ณ  ์ด๋ฆ„์„ ๋ฌป๊ณ , ์งˆ๋ฌธ ์„ ๋ชจ๋ธ์— ์ „๋‹ฌ Keracorn Server Telegram Server POST request User ๋ฉ”์„ธ์ง€ ์ „์†ก ๋‹ต๋ณ€response
  • 17.
    Proposed scheme โ€ข Chatbotspecification ๏‚ง Main scenario โ€ข ์ธ์‚ฌ (Greetings) โ€ข ๋ณ„๋ช…, ๋‚˜์ด ๋ฌป๊ธฐ โ€ข ``๋‚ด๋‹ด์ž์•ผ, ์™œ ๋‚˜๋ฅผ ์ฐพ์•„์™”๋‹ˆ?โ€™โ€™ โ€“ ์งˆ๋ฌธ ๋ฐ›๊ธฐ โ€“ ์งˆ๋ฌธ์ด ๋๋‚ฌ๋‹ค๋Š” ๊ฒƒ์„ ์–ด๋–ป๊ฒŒ ๊ตฌ๋ถ„ํ•  ๊ฒƒ์ธ๊ฐ€? ยป ์‹œ๊ฐ„์œผ๋กœ ์ผ๋‹จ ์ •ํ•˜์ž! ยป ์‘๋‹ต์ด ์ง€์—ฐ๋˜๋”๋ผ๋„ ์ถ”์ž„์ƒˆ(์Œโ€ฆ, ๊ทธ๋ ‡๊ตฌ๋‚˜, etc.)๋ฅผ ํ•ด์ฃผ๋ฉด ๊ดœ์ฐฎ์„ ๊ฒƒ! (not yet) โ€“ ์งˆ๋ฌธ์ด ๋ชจ๋‘ ๋๋‚˜๋ฉด ๋‹ต๋ณ€ ยป ์งˆ๋ฌธ ์œ ์‚ฌ๋„ ๋ชจ๋ธ์„ dataset์— ๋Œ€ํ•ด inference (not yet) 16
  • 18.
    Summary 17 ์ œ์•ˆ ๋ฐฐ๊ฒฝ Why โ€ข ์ •์„œ์ ์œผ๋กœ์ทจ์•ฝํ•œ ์ฒญ์†Œ๋…„๋“ค์ด ๊ณ ๋ฏผ์„ ์ƒ๋‹ดํ•  ๊ณณ์ด ๋งŽ์ง€ ์•Š์Œ โ€ข ๋Œ€๋ฉด ์ƒ๋‹ดํ•˜๊ธฐ ์–ด๋ ต๊ฑฐ๋‚˜ ๋ฏผ๊ฐํ•œ ์ฃผ์ œ๋“ค์ด ์กด์žฌํ•จ โ€ข ์‚ฌ๋žŒ๋ณด๋‹ค ์ธ๊ณต์ง€๋Šฅ์„ ๋Œ€์ƒ์œผ๋กœ ๊ณ ๋ฏผ์„ ํ„ธ์–ด๋†“๋Š” ๊ฒƒ์ด ์‹ฌ๋ฆฌ์ ์œผ๋กœ ์•ˆ์ •๋  ์ˆ˜ ์žˆ์Œ ์ฑ—๋ด‡ ์†Œ๊ฐœ What โ€ข ์ฒญ์†Œ๋…„์˜ ๊ณ ๋ฏผ์„ ๋“ค์–ด์คŒ โ€ข ๊ณ ๋ฏผ์„ ๋‹ค ๋“ฃ๊ณ , ๊ฐ€์žฅ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š” ๋‹ต๋ณ€์„ ์ฐพ์•„์คŒ ์ฒญ์†Œ๋…„๋“ค์˜๋งํ•  ๊ณณ ์—†๋Š” ๊ณ ๋ฏผ์ƒ๋‹ด์„๋“ค์–ด์ฃผ๊ณ  ๊ฐ„๋‹จํ•œ ๋‹ต๋ณ€์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค๋ฉด? ๊ณผ์ • ์†Œ๊ฐœ How โ€ข ์ฒญ์†Œ๋…„ ์ƒ๋‹ด ์งˆ์˜์‘๋‹ต ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ (์•ฝ 180๊ฐœ ์œ ํ˜•) โ€ข ๋ฐ์ดํ„ฐ ๊ฒ€์ฆ/์ •์ œ ๋ฐ ์งˆ์˜๋ฌธ์žฅ ๊ฐ„๋žตํ™” โ€ข ๊ธฐ์กด์˜ ๋ฌธ์žฅ ์œ ์‚ฌ๋„ ๋น„๊ต ์ฝ”ํผ์Šค ํ™œ์šฉํ•˜์—ฌ Keras ๋ชจ๋ธ ํ•™์Šต โ€ข ์ž…๋ ฅ ๋ฌธ์žฅ์„ ์ทจํ•ฉํ•˜๊ณ  ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์งˆ์˜๋ฅผ ์ฐพ๋Š” ๋ชจ๋“ˆ ๊ตฌ์ถ• โ€ข ํ•ด๋‹น ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•˜์—ฌ, ์ฒซ์ธ์‚ฌ๋ถ€ํ„ฐ ๋Œ€ํ™”๋ฅผ ์ด์–ด๋‚˜๊ฐ€๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค ๊ตฌ์„ฑ โ€ข ํ…”๋ ˆ๊ทธ๋žจ API๋ฅผ ์ด์šฉํ•œ ์ฑ—๋ด‡ ์‹œ์Šคํ…œ ๊ตฌ์ถ• ์ฒญ์†Œ๋…„ ์ƒ๋‹ด๋ด‡ ์ผ€๋ผ์ฝ˜
  • 19.
    Reference (order ofappearance) โ€ข Cho, W. I., Kim, S. M., & Kim, N. S. (2019). Investigating an Effective Character-level Embedding in Korean Sentence Classification. arXiv preprint arXiv:1905.13656. โ€ข Cho, W. I., Moon, Y. K., Kang, W. H., & Kim, N. S. (2018). Extracting Arguments from Korean Question and Command: An Annotated Corpus for Structured Paraphrasing. arXiv preprint arXiv:1810.04631. โ€ข ์กฐ์›์ต, ๋ฌธ์˜๊ธฐ, ๊น€์ข…์ธ, ๊น€๋‚จ์ˆ˜, "๋‹ดํ™” ์„ฑ๋ถ„์„ ํ™œ์šฉํ•œ ์ง€์‹œ ๋ฐœํ™”์˜ ํ‚ค ํ”„๋ ˆ์ด์ฆˆ ์ถ”์ถœ: ํ•œ๊ตญ์–ด ๋ณ‘๋ ฌ ์ฝ”ํผ์Šค ๊ตฌ์ถ• ๋ฐ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•๋ก " ์ œ31ํšŒ ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ, 2019, pp. 241-245. โ€ข Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681. โ€ข Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self- attentive sentence embedding. arXiv preprint arXiv:1703.03130. โ€ข Chollet, F. (2015). Keras. โ€ข Cho, W. I., Cho, J., Kang, W. H., & Kim, N. S. (2019). Disambiguating Speech Intention via Audio-Text Co-attention Framework: A Case of Prosody-semantics Interface. arXiv preprint arXiv:1910.09275. 18
  • 20.

Editor's Notes