Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sk t academy lecture note

1,300 views

Published on

NLP and AI Chatbot for developer.

Published in: Software

Sk t academy lecture note

  1. 1. Lecture BY Session 1 : SeungWoo Kim tmddno1@gmail.com Session 2 : SuSang Kim healess1@gmail.com Python과 Tensorflow를 활용한 AI Chatbot 개발
  2. 2. 1. 도커실행환경 https://github.com/TensorMSA/tensormsa_docker.git ./tensormsa_docker/docker_compose_cpu 2. 소스설명코드 - jupyter git clone https://github.com/TensorMSA/tensormsa_jupyter.git Session 1 : chap05_nlp Session 2 : chap13_chatbot_lecture 시작 전 실습 환경 구성
  3. 3. ●ML&DL Engineer (2014 ~ 2017) ○ POSCO Smart Factory Machine Learning Based Scheduling (2014~2015) ○ POSCO AI ChatBot (2016 ~ 2017) ○ Deep Learning Open Source Framework - TensorMSA (2016~2017) ●Android Developer - POSCO Mobile system (2010 ~ 2014) ○ LBS, IPS Vehicle & Navigation System ○ IPS with Deep Learning - Patent (2016) ●Awards ○ OSS world Challenge 2017 (on top 12 , on progress now) ○ Employee of the year 2015, 2017 on POSCO ICT ●Woori Bank AI (‘17.11.1 ~) Session 1 : SeungWoo Kim tmddno1@gmail.com
  4. 4. Session 1 - Understand NLP
  5. 5. Session 1 - 강의 목표 전체 ChatBot 아키텍처를 이해하고 서비스를 구성하기 위해 필요한 기반 지식에 대한 설명을 통해 Session 2 에서 실질적인 챗봇 개발에 대한 설명을 더 잘 이해 할 수 있도록 돕고자 함 . 챗봇 , 자연어 처리, 딥러닝 그리고 구현의 연관성을 이해하는 것에 중점 ! Session 1 - Understand NLP
  6. 6. About ChatBot Session 1 - Understand NLP Natural Language Understanding Natural Language Generation User System 자연어 Semantic Frame자연어 Semantic Frame Why we need nlp on ChatBot system?
  7. 7. About ChatBot Session 1 - Understand NLP Sort of Chatbot Easy Hard Retrieval-based model Generative model Traditional algorithms Deep Learning algorithms Short Conversation Long Conversation Closed Domain Open Domain
  8. 8. About ChatBot Session 1 - Understand NLP Retrieval-Based vs Generative Models Retrieval-based models (easier) use a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of Machine Learning classifiers. These systems don’t generate any new text, they just pick a response from a fixed set. Generative models (harder) don’t rely on pre-defined responses. They generate new responses from scratch. Generative models are typically based on Machine Translation techniques, but instead of translating from one language to another, we “translate” from an input to an output (response).
  9. 9. About ChatBot Session 1 - Understand NLP Use Deep Learning or Not Using Deep Learning Using Deep Learning do not guarantee better performance all the time to compared with using traditional techniques. It’s more expensive to gather enough data and train heavy model. Using traditional algorithms Most of current chatbot systems are based on those traditional algorithms and It has own strong points to compared with DL algorithms. 형태소 분석 품사 태깅 패턴 매칭 구문 분석 의미 분석 감성 분석 대화 처리 CharCNN BiLSTMCrf Seq2Seq Word2Vec RNN DMN E2E MMN Attention DNN TFIDF SVM Dictionary Bayesian Logistic LSA HMM USE BOTH
  10. 10. About ChatBot Session 1 - Understand NLP Long Conversation vs Short Conversation Short Conversation the goal is to create a single response to a single input. For example, you may receive a specific question from a user and reply with an appropriate answer. Long conversation go through multiple turns and need to keep track of what has been said. Customer support conversations are typically long conversational threads with multiple questions.
  11. 11. About ChatBot Session 1 - Understand NLP Open Domain vs Closed Domain “Closed Domain You can ask a limited set of questions on specific topics. (Easier). What is the Weather in Miami?” “Open Domain I can ask a question about any topic… and expect a relevant response. (Harder) Think of a long conversation around refinancing my mortgage where I could ask anything.” Mark Clark
  12. 12. OverView Session 1 - Understand NLP Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning BasicNLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers SyntaxNet Scenario Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System ChatBot Server Numpy Pandas Tensorflow 파이프 라인 데이터 처리 ML & DL Library Scikit Learn Konlpy 개발 관련 데이터 수집 데이터 전처리 모델 훈련 모델 평가 모델 서비스 BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message 1 2 3 기본 이론 관련 딥러닝 이론 설명 예제를 통한 구현 설명 Session 1 - Understand NLP Memory Network Seq2SeqResponse Generation Ontology DM Legacy Data Base [AI Based] Chat-Bot Research Environment Data MartMonitoring Summary Result Train Data AI Model Pipe Line
  13. 13. Session 1 - Contents 1. 자연어 처리 이론 > 일반적으로 자연어를 처리하기 위해 필요한 언어학적 이론 설명 2. 딥러닝 이론 > 자연어 처리 이론에서 이야기하는 문제에 해당하는 딥러닝 이론 3. 구현 > 딥러닝 및 라이브러리 등을 사용한 이론의 구현
  14. 14. About NLP (Natural Language Process) Session 1 - Understand NLP Mostly Solved Making Good Progress Still Really Hard Spam Detection (스팸분석) Text Categorization (텍스트 분류) Part of Speech Tagging (단어 분석) Named Entity Recognition (의미 구분 분석) Information Extraction (정보 추출) Sentiment Analysis (감정분석) Coreference Resolution (같은 단어 복수 참조) Word Sense Disambiguation (복수 의미 분류) Syntactic Parsing (구문해석) Machine Translation (기계번역) Semantic Search (의미 분석 검색) Question & Answer (질의 응답) Textual inference (문장 추론) Summarization (텍스트 요약) Discourse & Dialog (대화 & 토론)
  15. 15. About NLP (Natural Language Process) Session 1 - Understand NLP Text Categorization Text Classification assigns one or more classes to a document according to their content. Classes are selected from a previously established taxonomy (a hierarchy of catergories or classes). Spam Detection Spam Detection is also the part of Text Classification problem. Part of Speech grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
  16. 16. About NLP (Natural Language Process) Session 1 - Understand NLP Low Level Information Extraction
  17. 17. About NLP (Natural Language Process) Session 1 - Understand NLP Information Extraction on Broader view https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A %2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx F9su0J9KiWobVP4Kg Rule Based Extraction Named Entity recognition Syntax Anal Relation Search Ontology Information Extraction
  18. 18. About NLP (Natural Language Process) Session 1 - Understand NLP Coreference Resolution I did not vote for the Donald Trump because I think he is too reckless Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction. Deep Reinforcement Learning for Mention-Ranking Coreference Models Improving Coreference Resolution by Learning Entity-Level Distributed Representations https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
  19. 19. About NLP (Natural Language Process) Session 1 - Understand NLP Word Sense Disambiguation [Example] 1. a type of fish 2. tones of low frequency and the sentences: 1. I went fishing for some sea bass. 2. The bass line of the song is too weak. http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf supervised way lable data example simi-supervised way
  20. 20. About NLP (Natural Language Process) Session 1 - Understand NLP Syntactic Parsing syntactic parsing is Find structural relationships between words in a sentence https://web.stanford.edu/~jurafsky/slp3/12.pdf
  21. 21. About NLP (Natural Language Process) Session 1 - Understand NLP Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish). Machine Translation
  22. 22. About NLP (Natural Language Process) Session 1 - Understand NLP Semantic Search Semantic search seeks to improve search accuracy by understanding a searcher’s intent through contextual meaning. Question and Answer Able to answer questions in natural language based on Knowledge data (usually ontology) ex) Best example is IBM Watson Textural Inference Recognize, generate, or extract pairs <T,H> of natural language expressions, such that a human who reads (and trusts) T would infer that His most likely also true Summarization Extracting interesting parts of the text and create a summary by using these parts of the text and allow for rephrasings to make summary more grammatically correct. Discourse & Dialog Do conversation with understanding the whole history of dialog and semantic meaning of speaker.
  23. 23. Standard Natural Language Process Session 1 - Understand NLP Spoken Utterance Lexical (어휘) Analysis : Word Structure Speech Recognition Written Utterance Syntactic (구문) Analysis : Sentence Structure Morphemes, Word Semantic (의미) Analysis : Meaning of Words & Sentence Sentence Discourse (대화) Analysis : Relationship between sentence Context beyond Sentence
  24. 24. Lexical Analysis Syntactic Analysis Semantic Analysis NLU Server (Understand) NLG Server (Generate) Voice Recognition Discourse Analysis 자연어 처리 이론 기본 이론 Session 1 - Understand NLP Session 1 - Now We are Here! Response Generation
  25. 25. Session 1 - Understand NLP AI Speaker Alexa Alexa Microphone System NLP - Voice Recognition
  26. 26. Session 1 - Understand NLP Deep Learning for Classification Hidden Markov Model for Language Model NLP - Voice Recognition
  27. 27. Lexical Analysis Syntactic Analysis Semantic Analysis NLU Server (Understand) NLG Server (Generate) Voice Recognition Discourse Analysis 자연어 처리 이론 기본 이론 Session 1 - Understand NLP Session 1 - Now We are Here! Response Generation
  28. 28. Session 1 - Understand NLP NLP - Lexical Analysis Main Factors on Lexical Analysis 1. Sentence Splitting 2. Tokenizing 3. Morphological 4. Part of speech Tagging
  29. 29. Session 1 - Understand NLP NLP - Lexical Analysis Lexical Analysis What if there is no line change char (‘n’) ? Where is the EOS point? What if sentence is not separated into words properly with space? [Examples] [Problems]
  30. 30. Session 1 - Understand NLP NLP - Lexical Analysis Word stemming lemmatization Love Lov Love Loves Lov Love Loved Lov Love Loving Lov Love Innovation Innovat Innovation Innovations Innovat Innovation Innovate Innovat Innovate Innovates Innovat Innovate Innovative Innovat Innovative Morphing Examples Stemming & lemmatization Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning or grammatical function)” and other features like stem in a language that carries information. Lexical Analysis
  31. 31. Session 1 - Understand NLP NLP - Lexical Analysis Lexical Analysis Ambiguity “that” can be a subordinating conjunction or a relative pronoun - The fact that/IN you’re here - A man that/WDT I know “Around” can be a preposition, particle, or adverb - I bought it at the shop around/IN the corner. - I never got around/RP to getting a car. - A new Toyota Prius costs around/RB $25K. Degree of ambiguity (in Brown corpus) - 11.5% of word types (40% of word tokens) are ambiguous # of Tags 1 2 3 4 5 6 7 # of Words 35340 3760 264 61 12 2 1 #Ambiguity Problem is much serious in Korean Part-of-speech tagging is one of the most important text analysis tasks used to classify words into their part-of-speech and label them according the tagset which is a collection of tags used for the pos tagging. Part-of-speech tagging also known as word classes or lexical categories
  32. 32. Session 1 - Understand NLP NLP - Lexical Analysis Lexical Analysis Hannanum Kkma Komoran Mecab Twitter 하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun 을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa 나 / N 날 / VV 나 / NP 나 / NP 나 / Noun 는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa 자동차 / N 자동차 / NNG 자동차 / NNG 자동차 / NNG 자동차 / Noun Anal Result Comparison Library Performance Comparison
  33. 33. Session 1 - Understand NLP NLP - Lexical Analysis Lexical Analysis [Code]
  34. 34. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  35. 35. Session 1 - Understand NLP NLP - Lexical Analysis (1) Word Segmentation (2) POS Tagging (3) Chunking (4) Clause Identification (5) Named Entity Recognition (6) Semantic Role Labeling (7) Information Extraction What we can do with sequence labeling What’s sequence labeling Sequence Labeling
  36. 36. Session 1 - Understand NLP NLP - Lexical Analysis Word POS Chunk NE West NNP B-NP B-MISC Indian NNP I-NP I-MISC all-around NN I-NP O Phil NNP I-NP B-PER Simons NNP I-NP I-PER took VBD B-VP O four CD B-NP O for IN B-PP O 38 CD B-NP O on IN B-PP O Friday NNP B-NP O <iob data set example> POS Tag 의미 ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing Chunk Tag 의미 B : Begin of Chunk I : Continuation of Chunk E: End of Chunk NP : Noun VP : Verb NER BIO Tag 의미 B : Start with new Chunk I : word inside Chunk O: Outside of Chunk Sequence Labeling
  37. 37. Session 1 - Understand NLP NLP - Lexical Analysis BiLSTM-CRF Description Sequence Labeling with Deep Learning Deep Learning Basic Word Embedding DL FrameWorks Prerequisite
  38. 38. Session 1 - Understand NLP NLP - Lexical Analysis VIDEO Deep Learning Basic
  39. 39. Session 1 - Understand NLP New Algorithms Back Propagation CNN, RNN .. etc Big Data HDFS MapReduce Hardware GPU Parallel Execution Cloud Service NLP - Lexical Analysis Deep Learning Basic
  40. 40. Session 1 - Understand NLP 3 5 7 9 (1) Problem (2) Algorithm (3) Programming Y = 2 * X + 1 function(x) { return x*2 + 1 } NLP - Lexical Analysis Deep Learning Basic
  41. 41. Session 1 - Understand NLP 3 5 7 9 (1) Problem (2) Algorithm (3) Programming Y = w * X + b 3 5 7 9 initial optimized NLP - Lexical Analysis Deep Learning Basic
  42. 42. Session 1 - Understand NLP Supervised Learning Unsupervised Learning Reinforcement Learning CAT CAT CAT DOG DOG DOG Deep Learning Basic NLP - Lexical Analysis
  43. 43. Session 1 - Understand NLP 1. Perceptron 2. Activation Function 3. Cost 4. Gradient Descent 5. Back Propagation 6. Optimizers Deep Learning Basic NLP - Lexical Analysis
  44. 44. Session 1 - Understand NLP Deep Learning Basic - Perceptron wX + b NLP - Lexical Analysis
  45. 45. Session 1 - Understand NLP Deep Learning Basic - Perceptron wX + b Activation Function NLP - Lexical Analysis
  46. 46. Session 1 - Understand NLP Deep Learning Basic - Activation Function Logistic Regression Nonlinear Problems NLP - Lexical Analysis
  47. 47. Session 1 - Understand NLP Deep Learning Basic - Activation Function NLP - Lexical Analysis
  48. 48. Session 1 - Understand NLP Deep Learning Basic - Loss (Error) Initial Optimized LOSS x y y~ 0 3 7 1 5 9 2 7 11 3 9 13 4 11 15 5 13 17 6 15 19 Y X0 1 2 3 Y = wX + b NLP - Lexical Analysis
  49. 49. Session 1 - Understand NLP x y init opt 0 3 7 3 1 5 9 5 2 7 11 7 init : ((7-3)^2 + (9-5)^2 + (6-11)^2) / 3 = 16 opt : ((3-3)^2 + (5-5)^2 + (7-7)^2) / 3 = 0 HOW? Deep Learning Basic - Loss (Error) W, b Cost(W, b) NLP - Lexical Analysis
  50. 50. Session 1 - Understand NLP Deep Learning Basic - Gradient Descent weight Learning Rate gradient NLP - Lexical Analysis
  51. 51. Session 1 - Understand NLP Output Hidden Input Train Data Forward Propagation y-y~ (Error) Back Propagation Update Each Weight partial derivative chain rule Deep Learning Basic - BackPropagation NLP - Lexical Analysis
  52. 52. Session 1 - Understand NLP Deep Learning Basic - Optimizer NLP - Lexical Analysis https://www.youtube.com/watch?v=hMLUgM6kTp8
  53. 53. Session 1 - Understand NLP NLP - Lexical Analysis SGD Adagrad RMS Momentum Nag Adadelta Adam Adaptive 계열 알고리즘 Deep Learning Basic - Optimizer 기존 진행 방향 반영 가속도 개념의 적용 Momentum과 유사 이동 위치에서 반영 2차 미분 값 활용 느린 것은 더 빨리 빠른 것은 더 꼼꼼히 누적 Gradient 를 Sum이 아닌 지수평균으로대체하여 G가 무한이 커지는 것을 방지 지수평균 사용, StepSize 변화 값의 제곱 사용 Adadelta, Momentum 특성 두 가지 모두 적용 http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html
  54. 54. Session 1 - Understand NLP NLP - Lexical Analysis https://arxiv.org/pdf/1705.08292.pdf "Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이 다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬 generalization 측면에서 뛰어나다." The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] , Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota Technological Institute at Chicago May 24, 2017 There is no optimizer best for all cases!! When to use adaptive optimizer? If input embedding vectors are sparse, it’s better to use adaptive optimizer! Deep Learning Basic - Optimizer
  55. 55. Session 1 - Understand NLP # tf Graph input x = tf.placeholder("float", [None, 784]) y = tf.placeholder("float", [None, 10]) # Store layers weight & bias weights = { 'h1': tf.Variable(tf.random_normal([784, 256])), 'h2': tf.Variable(tf.random_normal([256, 256])), 'out': tf.Variable(tf.random_normal([256, 10])) } biases = { 'b1': tf.Variable(tf.random_normal([256])), 'b2': tf.Variable(tf.random_normal([256])), 'out': tf.Variable(tf.random_normal([10])) } # Hidden layer with RELU activation layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) layer_1 = tf.nn.relu(layer_1) # Hidden layer with RELU activation layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) layer_2 = tf.nn.relu(layer_2) # Output layer with linear activation pred = tf.matmul(layer_2, weights['out']) + biases['out'] hypothesis = tf.nn.softmax(pred ) # Define loss and optimizer cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), reduction_indices=1)) tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) input Hidden Out 784 256 10 Hidden 256 784 256 784 256 256 10 256 S O F T M A X Y=Activation(W*x + b) [Error] Cross Entropy W W1 A(W*x + b) b b A(W*x + b)x 2 1 3 4 5 256 784 1 Deep Learning Basic NLP - Lexical Analysis
  56. 56. Session 1 - Understand NLP START 오늘 날씨 는 ? PAD PAD END START 오늘 날씨 는 어때 ? PAD END START 오늘 비가 오 려 나 ? END Case of long sentence … Vanishing Problem happens Various length of data cause waste of computing power Here we have concept of Dynamic RNN BiDirectional Lstm learn given data from backward Long Short Term Memory Cell Cell State https://brunch.co.kr/@chris-song/9 updateforget out cell state https://blog.altoros.com/the-magic-behind-google-translate- sequence-to-sequence-models-and-tensorflow.html NLP - Lexical Analysis Deep Learning Basic
  57. 57. Session 1 - Understand NLP NLP - Lexical Analysis Deep Learning Basic Overfitting Fine Tuning Multi Tasking Ensemble Data Preprocessing Drop Out Batch Normalization Network Compression https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdfhttps://arxiv.org/pdf/1510.00149.pdf Adam+SGD Learning Rate Decaying Fully Convolutional 1by1 Convolutional Filter Quantize Neural Networks AutoML Hyper Parameter Random Search Grid Search Genetic Algorithm
  58. 58. Session 1 - Understand NLP Session 1 - Now We are Here ! Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Numpy Pandas Tensorflow 데이터 처리 ML & DL Library Scikit Learn Konlpy 개발 관련 구현 Response Generation Memory Network Seq2Seq
  59. 59. Session 1 - Understand NLP https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software#cite_note-29 NLP - Lexical Analysis - Implementation Deep Learning Framework comparison pytorch
  60. 60. Session 1 - Understand NLP NLP - Lexical Analysis - Implementation Deep Learning Framework comparison  dynamic vs static graph definition Debugging Visualization Deployment VS
  61. 61. Session 1 - Understand NLP NLP - Lexical Analysis - Implementation Deep Learning Framework - Tensorflow with tf.Graph().as_default() : X = tf.placeholder("float") Y = tf.placeholder("float") W = tf.Variable(rng.randn(), name="weight") b = tf.Variable(rng.randn(), name="bias") pred = tf.add(tf.multiply(X, W), b) cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples) optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) tf.summary.FileWriter(logs_path, graph=tf.get_default_graph()) # Fit all training data for epoch in range(training_epochs): for (x, y) in zip(train_X, train_Y): sess.run(optimizer, feed_dict={X: x, Y: y}) Tensorflow : static graph definition Pytorch : dynamic graph definition
  62. 62. Session 1 - Understand NLP https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106 NLP - Lexical Analysis - Implementation Deep Learning Framework comparison https://blog.paperspace.com/which-ml-framework-should-i-use/
  63. 63. Session 1 - Understand NLP NLP - Lexical Analysis - Implementation Deep Learning Framework - Tensorflow Graph (Edge + Node) + Session
  64. 64. Session 1 - Understand NLP NLP - Lexical Analysis - Implementation Deep Learning Framework - Tensorflow https://github.com/TensorMSA/tensormsa_jupyter/blob/master/chap03_basic_models/linear_regressions.ipynb
  65. 65. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here! Response Generation Memory Network Seq2Seq
  66. 66. Session 1 - Understand NLP Word Embedding 이란 ? 텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서 단위를 수치화하여 표현하는 방법의 일종
  67. 67. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding Word Representation Discrete Representation WordNet OneHot Vector Distributed Representation Direct Prediction Word2Vec Count Based Full Document Windows LSA SVD of x Glove FastText
  68. 68. Session 1 - Understand NLP WordNet NLP - Lexical Analysis - Word Embedding 과거에는 WordNet과 같은 방법을 사용했다. WordNet이란, 각 단어끼리의 관계(상위단어, 동의어) 가 나타나 있는 트리구조의 그래프 모형이다. 물론 이를 구축하기 위한 작업은 전부 사람이 했다. 그러다보니 주관적이고 유지하는데 있어 많은 노동이 필요하다는 한계가 존재했다.
  69. 69. Session 1 - Understand NLP OneHot Vector NLP - Lexical Analysis - Word Embedding
  70. 70. Session 1 - Understand NLP LSA(잠재적 의미 분석) with SVD(특이값 분해) NLP - Lexical Analysis - Word Embedding https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/06/pcasvdlsa/ - doc1 doc2 doc3 나 1 0 0 는 1 1 2 학교 1 1 0 에 1 1 0 가 1 1 0 ㄴ 1 0 0 다 1 0 1 영희 0 1 1 좋 0 0 1 truncated SVDSVD LSA(잠재적 의미 분석)
  71. 71. Session 1 - Understand NLP SVD of X NLP - Lexical Analysis - Word Embedding https://swalloow.github.io/cs224d-lecture2 이 방법은 Window의 길이 (일반적으로 5 - 10) 에 따라 대칭적으로 이동하면서 확인하는 방법이다. ● I like deep learning. ● I like NLP. ● I enjoy flying 위와 같은 corpus가 있을 때, 이를 matrix로 표현하면 다음과 같다. 간단히 보면 각 단어의 빈도 수를 체크한 것이다. SVD 로 차원 축소Window size로 빈도 조사 결과
  72. 72. Session 1 - Understand NLP https://www.tensorflow.org/tutorials/word2vec http://w.elnn.kr/search/ Word2Vector Demo Site 장점 : 차원의 축소 , 의미적 유사성의 표현 단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도 NLP - Lexical Analysis - Word Embedding Word2Vec
  73. 73. Session 1 - Understand NLP C-Bow the quick brown fox jumped over the lazy dog ([brown, jumped], fox) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text NLP - Lexical Analysis - Word Embedding Word2Vec
  74. 74. Session 1 - Understand NLP the quick brown fox jumped over the lazy dog (fox, brown), (fox, jumped) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text Skip-Gram NLP - Lexical Analysis - Word Embedding Word2Vec
  75. 75. Session 1 - Understand NLP (1)PV-DM (2)PV-DBOW (3)DM + DBOW (Vector Concat) W2V W2V W2V (4)AVG(TF-IDF * W2V) the quick brown fox jumped over the lazy dog (paragraph, the) (paragraph, quick) (paragraph, brown) (paragraph, fox) (paragraph, jumped) ([paragraph, quick, brown, fox, juped], over) ([paragraph, quick, brown, fox, juped,over],the) vector vector vector TF-IDF TF-IDF TF-IDF X X X vector AVG NLP - Lexical Analysis - Word Embedding Doc2Vec
  76. 76. Session 1 - Understand NLP tfidf(t,d,D) = tf(t,d) x idf(t,D) https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/ http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/ Not exactly word embedding but used on nlp with deep learning pretty often - Document similarity - Words importance on document - Used on search engine (like elasticsearch though it use BM25 for now) NLP - Lexical Analysis - Word Embedding TF-IDF
  77. 77. Session 1 - Understand NLP - Introduce several ways to embed char as vector 안 녕 하 세 요 1 가 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 나 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 다 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 라 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 마 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 바 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 사 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 아 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 자 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 An Neung Ha Se Yo (ㅇ ㅏ ㄴ) (ㄴ ㅕ ㅇ) . . . . 2 a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 ㄱ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄴ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄷ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄹ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ㅁ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ㅂ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ㅅ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ㅇ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ㅈ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 NLP - Lexical Analysis - Word Embedding Char Embeding
  78. 78. Session 1 - Understand NLP the quick brown fox jumped over the lazy dog 0.2 0.1 0.4 0.21 0 0 0 f o x fox Word2Vector 0 1 0 0 0 0 1 0 OneHot Encoding OneHot Encoding OneHot Encoding 1.Word2Vec 계열은 의미적 상관성을 잘 표현 2.OneHot 은 강한 신호적 특성으로 Train 에 효과적 3.Word 단위 Embedding 은 단어를 잘 기억함 4.Char 단위 Embedding 은 미훈련 단어 처리에 용이 NLP - Lexical Analysis - Word Embedding + Char +Word Concat
  79. 79. Session 1 - Understand NLP Words not exactly matched with the pretrained dict will return “UNKNOWN” So FastText (by Facebook ) use ngram on their word embedding algorithm.. 에어컨 ~ 에어조단 비교 에어컨 ['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5 에어조단 ['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6 일치 ['$$에', '$에어'] => 2 점수 일치 2건 / 중복제거 전체 7건 => 0.2222 NLP - Lexical Analysis - Word Embedding FastText
  80. 80. Session 1 - Understand NLP Glove NLP - Lexical Analysis - Word Embedding (their dot product equals the logarithm of the words’ probability of co-occurrence) “임베딩된 단어벡터 간 유사도 측정을 수월하게 하면서도 말뭉치 전체의 통계 정보를 좀 더 잘 반영해보자”가 GloVe가 지향하는 핵심 목표라 말할 수 있을 것 같습니다. 동시 등장 확률 https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/09/glove/ Glove 는 특정 문맥 단어가 주어졌을 때 임베딩된 두 단어벡터의 내적이 두 단어의 동시 등장 확률 간 비율이 되게끔 단어를 임베딩 하고자 하였음
  81. 81. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here! Numpy Pandas Tensorflow 데이터 처리 ML & DL Library Scikit Learn Konlpy 개발 관련 구현 Response Generation Memory Network Seq2Seq
  82. 82. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding OneHot Encoding : Simple Test Code show concept of onehot http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ [Code]
  83. 83. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding Word2Vector : Using Gensim word2vec package http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  84. 84. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding FastText : FaceBook fasttext with gensim wrapper http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  85. 85. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding FastText : Possible to use pretrained vector and do find tuning on it http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
  86. 86. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding N-grams are simply all combinations of adjacent words or letters of length n that you can find in your source text.
  87. 87. Session 1 - Understand NLP NLP - Lexical Analysis - Word Embedding For large dataset word2vec training GPU acceleration is needed You can also think about using Tensorflow or Keras for training model https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py
  88. 88. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  89. 89. Session 1 - Understand NLP NLP - Lexical Analysis - DL ALgorithms Paper Model CoNLL 2003 (F1 %) Collobert et al.(2011) MLP with word embeddings+gazetteer 89.59 Passos et al.(2014) Lexicon Infused Phrase Embeddings 90.90 Chiu and Nichols(2015) Bi-LSTM with word+char+lexicon embeddings 90.77 Luo et al.(2015) Semi-CRF jointly trained with linking 91.20 Lample et al.(2016) Bi-LSTM-CRF with word+char embeddings 90.94 Lample et al.(2016) Bi-LSTM with word+char embeddings 89.15 https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/ https://arxiv.org/pdf/1708.02709.pdf NER (Named Entity Recognition) Algorithm Performance
  90. 90. NLP - Lexical Analysis - DL ALgorithms what do we want to do with this algorithm?
  91. 91. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf 김승우 B-PERSON 전화번호 B-TARGET 검색 O 김승우 B-PERSON 이메일 B-TARGET 검색 O 김승우 B-PERSON 이미지 B-TARGET 검색 O IOB Data 김승우 전화번호 검색 김승우 이메일 검색 김승우 이미지 검색 Plain Data Sentence Splitting Token Morphing Part of Speech Tagging Lexical Analysis Word2Vector OneHot Encoding 1 0 0 0 0 1 0 0 0 0 1 0 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List
  92. 92. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List [Code]
  93. 93. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf 김 우 승 김승우 전화번호 이메일 Concat Vector [Code]
  94. 94. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf Concat Vector 김승우 전화번호 이메일 검색 B-PERSONB-TARGET BiLstm Fully Connected Layer B-? B-? B-? [Code]
  95. 95. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf Conditional Random Field Soft Max [Code]
  96. 96. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Probabilistic Model for sequence data segmentation and labeling https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2 he first method makes local choices. In other words, even if we capture some information from the context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a location should help us to decide that New corresponds to the beginning of a location. Given a sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R
  97. 97. Session 1 - Understand NLP NLP - Lexical Analysis - BiLstmCrf Real Project BiLstm Result Sample Code Predict Test Result Test data Not Included in Train Set Predicts well http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
  98. 98. Lexical Analysis Syntactic Analysis Semantic Analysis NLU Server (Understand) NLG Server (Generate) Voice Recognition Discourse Analysis 자연어 처리 이론 기본 이론 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation
  99. 99. Session 1 - Understand NLP NLP - Lexical Analysis - SyntaxNet 구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는 구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를 결정하는 것을 말한다. Graph-Based Models Transition-Based Models CYK Style Parsing MST finding Algorithm Projective & Non Projective Model
  100. 100. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models Sentence W Repeat until all words have their head - Select two target words in data structure (One dependent & one head candidate) - Deterministically predict next parsing action from parsing model - Modify structure according parsing action C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree t1 t2 t3 t8 t9 t10 tm Oracle (Classifier) Predict the best transition
  101. 101. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System
  102. 102. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Assume that we are given an oracle : - for any non-terminal configuration, it can predict the correct transition (for deterministic parsing) - That is, it takes two words & magically gives us the dependency relation b/w item if one exists
  103. 103. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move Economic from buffer B to stack S
  104. 104. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (had, news, nsubj) to A Remove news from stack (since it now has head in A)
  105. 105. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (ROOT, had, root) to A keep had in stack : because it can have other dependents on the right
  106. 106. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (effect, little, amod) to A Remove little from stack (since it now has head in A)
  107. 107. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, effect, dobj) to A Keep effect in stack : because it can have other dependents on right
  108. 108. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (effect, on, prep) to A Keep on in stack : because it can have other dependents on the right
  109. 109. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move financial from buffer B to stack S
  110. 110. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (market, financial, amod) to A Remove financial from stack (since it now has head in A)
  111. 111. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (on, markets, pmod) to A Keep markets in stack : because it can have other dependents on the right
  112. 112. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Reduce : Remove markets, on, effect from stack (since they already have head in A) ※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle
  113. 113. Session 1 - Understand NLP NLP - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, period, p) to A Keep period in stack Done !
  114. 114. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  115. 115. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet Parsing type Paper Model WSJ Dependency Parsing Chen and Manning(2014) Fully-connected NN with features including POS 91.8/89.6 (UAS/LAS) Dependency Parsing Weiss et al.(2015) Deep fully-connected NN with features including POS 94.3/92.4 (UAS/LAS) Dependency Parsing Dyer et al.(2015) Stack LSTM 93.1/90.9 (UAS/LAS) Constituency Parsing Petrov et al.(2006) Probabilistic context-free grammars (PCFG) 91.8 (F1 Score) Constituency Parsing Zhu et al.(2013) Feature-based transition parsing 91.3 (F1 Score) Constituency Parsing Vinyals et al.(2015b) seq2seq learning with LSTM+Attention 93.5 (F1 Score) Syntax Parsing Algorithm Performance 파싱(parsing, 구문분석)에는 두 가지 유형이 있다. 하나는 개별 단어를 이들 사이의 관계를 고려해 연결하는 의존구문분석(dependency parsing)과 텍스트를 반복적으로 하위 구문으로 분리하는 구성성분분석(constituency parsing)이다.
  116. 116. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized below for both the POS and the dependency parsing task) is used to extract sparse features, which are fed into the network in groups. We show only a small subset of the features to simplify the presentation in the schematic Google SyntaxNet with Deep Learning - Pos Tagging
  117. 117. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks https://arxiv.org/pdf/1603.06042.pdf 1 2 3 1 I _ PRP PRP _ 2 nsubj _ _ 2 knew _ VBD VBD _ 0 ROOT _ _ 3 I _ PRP PRP _ 5 nsubj _ _ 4 could _ MD MD _ 5 aux _ _ 5 do _ VB VB _ 2 ccomp _ _ 6 it _ PRP PRP _ 5 dobj _ _ 7 properly _ RB RB _ 5 advmod _ _ 8 if _ IN IN _ 9 mark _ _ 9 given _ VBN VBN _ 5 advcl _ _ 10 the _ DT DT _ 12 det _ _ 11 right _ JJ JJ _ 12 amod _ _ 12 kind _ NN NN _ 9 dobj _ _ 13 of _ IN IN _ 12 prep _ _ 14 support _ NN NN _ 13 pobj _ _ 15 . _ . . _ 2 punct _ _ 18 units (1),(2),(3) 18 units (1),(2),(3) 12 units (2),(3) (1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6 (2) The first and second leftmost / rightmost children of the top two words on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8 (3) The leftmost of leftmost / rightmost of rightmost children of the top two words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4
  118. 118. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet Google SyntaxNet with Deep Learning - Local Parser 1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to the stack. 2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc pointing to the left. Push the first word back on the stack. 3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc point to the right. Push the second word back on the stack.
  119. 119. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet As we describe in the paper, there are several problems with the locally normalized models we just trained. The most important is the label-bias problem: the model doesn't learn what a good parse looks like, only what action to take given a history of gold decisions. This is because the scores are normalized locally using a softmax for each decision. Google SyntaxNet with Deep Learning - Global Training
  120. 120. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet What’s Beam Search Algorithm on RNN ? https://www.youtube.com/watch?v=UXW6Cs82UKo Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum. But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every step and remove others (pruning). This is for find global maximum predict result .
  121. 121. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet What’s Beam Search Algorithm on RNN ? Follow best every step may can miss chance to find global optimal case
  122. 122. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet What’s Beam Search Algorithm on RNN ? Consider all cases will require too much computing power
  123. 123. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet What’s Beam Search Algorithm on RNN ? Remove low score cases for every step (Pruning)
  124. 124. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet http://universaldependencies.org/ Google SyntaxNet do not support Korean as a default language. But as we can see bellow, we can train the model with Sejong corpus data. Though we have to covert the format for SyntaxNet to understand. Google SyntaxNet with Deep Learning - How about Korean
  125. 125. Session 1 - Understand NLP NLP - Syntactic Analysis - SyntaxNet Demo Site (we also use samples on this site) http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service) https://github.com/TensorMSA/tensormsa_syntax_docker Google SyntaxNet with Deep Learning - Test it by yourself
  126. 126. Lexical Analysis Syntactic Analysis Semantic Analysis NLU Server (Understand) NLG Server (Generate) Voice Recognition Discourse Analysis 자연어 처리 이론 기본 이론 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation
  127. 127. Session 1 - Understand NLP NLP - Semantic Analysis Sentential semantics - Semantic role labeling (SRL) - Phrase similarity (=paraphrase) - Sentence Classification, Sentence Emotion Analysis and etc What is Semantic in study of language Three perspectives on meaning - Lexical semantics : individual words - Sentential semantics : individual sentences - Discourse or Pragmatics : longer piece of text or conversation NLP Tasks for Semantics
  128. 128. Session 1 - Understand NLP NLP - Semantic Analysis What is Semantic Role Labeling (SRL) SRL = Semantic roles express the abstract role that arguments of a predicate can take in the event. The police arrested the suspect in the park last night Agent predicate Theme Location Time Who did what to whom where when Can we figure out that these sentences have the same meaning? Can we figure out the bought, sold, purchase used on sentence with same meaning? XYZ corporation bought the stock. The sold the stock to XYZ corporation. The stock was bought by XYZ corporation. The purchase of the stock by XYZ corporation.
  129. 129. Session 1 - Understand NLP NLP - Semantic Analysis - Semantic Role Labeling Common Semantic Role Labeling Architecture http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf Syntatic Parse Argument Identification Argument Classification Structural Inference Prune Constituents Candidates Semantic roles Arguments Step-1 Candidate Selection - Parse the sentence - Prune/filter the parse tree (eliminate some tree constituents to speed up the execution) Step-2 Argument Identification - A binary classification of each node as Argument or NONE - Local scoring Step-3 Argument Classification - A multi class (one-of-N) classification of all the argument candidates - Global /joint scoring ML ML ML
  130. 130. Session 1 - Understand NLP Paper Model CoNLL2005 (F1 %) CoNLL2012 (F1 %) Collobert et al.(2011) CNN with parsing features 76.06 Tackstrom et al.(2015) Manual features with DP for inference 78.6 79.4 Zhou and Xu(2015) Bidirectional LSTM 81.07 81.27 He et al.(2017) Bidirectional LSTM with highway connections 83.2 83.4 의미역 결정(Semantic Role Labeling, SRL)은 문장에서 술어(predicate)-논항(argument) 구조를 발견하는 것을 목표로 한다. 각 목표 동사(술어)에 대해, 동사의 의미역을 취하는 문장의 모든 구성요소가 인식된다. 전형적인 의미 논항은 행위주, 대상, 도구 등이며 위치, 시간, 방법, 원인 등도 포함된다(Zhou and Xu, 2015). 표7은 CoNLL 2015 및 2012 데이터셋에서 여러 모델의 성능을 보여준다. 전통적인 SRL 시스템은 여러 단계로 구성된다. 파싱 트리를 생성한 뒤 트리의 노드가 주어진 동사의 논항을 나타내는지 판별한 다음, 해당 SRL 태그를 결정한다. 각 분류 과정은 많은 피처를 추출하여 통계 모델(statistical model)로 전달하는 과정을 대개 수반한다. (Collobert et al., 2011) Tackstrom et al. (2015)는 술어가 주어지면 파싱 트리를 기반으로 하는 일련의 피처로 구성요소의 범위와 해당 술어에 대한 의미역 후보들에 점수를 매긴다. 그들은 효율적인 추론을 위한 동적 프로그래밍(dynamic programming) 알고리즘을 제안했다. Collobert et al., (2011)은 추가적인 참조 테이블의 형태로 제공된 파싱 정보에 의해 보강된 CNN을 사용하여 유사한 결과를 얻었다. Zhou and Xu(2015)는 임의의 긴 문맥을 모델링하기 위해 bidirectional LSTM을 제안했는데, 파싱 트리 정보 없이도 성공적인 것으로 판명되었다. He et al. (2017)은 이 연구를 더욱 확장해 ‘highway connection’을 소개했다. NLP - Semantic Analysis - Semantic Role Labeling LSTM is effective of SRL problem too !
  131. 131. Session 1 - Understand NLP NLP - Semantic Analysis - Semantic Role Labeling Bidirectional LSTM with highway connections Stack more layers on RNN with highway technique ! https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf
  132. 132. Session 1 - Understand NLP NLP - Semantic Analysis - Semantic Role Labeling Semantic Role Labeling Applications Information : Anna is friend of mine. http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb Who WhoWhat session.run("MATCH (you:Person {name:'You'})" "FOREACH (name in ['Anna'] |" " CREATE (you)-[:FRIEND]->(:Person {name:name}))") result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)" "RETURN you, yourFriends") Neo4j Insert Query Neo4j Jupyter example & visualize
  133. 133. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  134. 134. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN What kind of problem we want to solve ? Can we figure out that these sentences are positive or negative? 돈이 아깝지 않다 (긍정) 다시는 오지 않을 거야 (부정) 음식이 정말 맛이 없다 (부정) 이 식당은 정말 맛있다 (긍정) Analysis negative and positive with dictionary word “않다” is usually negative but ? 돈이 아깝지 않다 => Positive 다시는 오지 않을 거야 => Negative
  135. 135. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN There are many ways of doing text classification.. Traditional Rule based Machine Learning - Logistic & SVM Deep Learning - CharCNN, RNN, Etc..
  136. 136. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN Paper Model SST-1 SST-2 Socher et al.(2013) Recursive Neural Tensor Network 45.7 85.4 Kim(2014) Multichannel CNN 47.4 88.1 Kalchbrenner et al.(2014) DCNN with k-max pooling 48.5 86.8 Tai et al.(2015) Bidirectional LSTM 48.5 87.2 Le and Mikolov(2014) Paragraph Vector 48.7 87.8 Tai et al.(2015) Constituency Tree-LSTM 51.0 88.0 Kumar et al.(2015) DMN 52.1 88.6 https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/ https://arxiv.org/pdf/1708.02709.pdf Semantic Analysis - CharCNN
  137. 137. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Deep Learning Method CharCNN can be a solution for this kind of problem. 1 2
  138. 138. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Preparing Data for embedding is pretty similar to other neural networks 1. Word Embedding & OneHot didn’t show that much difference. 2. Personally, prefer to concat char onehot + word2vector오늘 메뉴 는 뭐 지? PAD PAD 1. Need to define sentence max length 2. Need padding like other nlp neural networks
  139. 139. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Using Multi Convolution Filter Size
  140. 140. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Other steps are same (fully connected > softmax > loss> optimizer)
  141. 141. Session 1 - Understand NLP NLP - Semantic Analysis - CharCNN You can see Char CNN can distinguish two sentences
  142. 142. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  143. 143. Session 1 - Understand NLP NLP - Discourse Analysis https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/ Paper Model bAbI (Mean accuracy %) Farbes (Accuracy %) Fader et al.(2013) Paraphrase-driven lexicon learning 0.54 Bordes et al.(2014) Weekly supervised embedding 0.73 Weston et al.(2014) Memory Networks 93.3 0.83 Sukhbaatar et al.(2015) End-to-end Memory Networks 88.4 Kumar et al.(2015) DMN 93.6 Discourse Analysis - End2End Memory Network
  144. 144. Session 1 - Understand NLP Discourse Analysis - End2End Memory Network https://arxiv.org/pdf/1503.08895v4.pdf https://arxiv.org/pdf/1503.08895v4.pdf NLP - Discourse Analysis
  145. 145. Session 1 - Understand NLP Here is the network architecture of end2end memory network https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/ https://www.slideshare.net/mobile/carpedm20/ss-63116251 NLP - Discourse Analysis - Memory Network
  146. 146. Session 1 - Understand NLP (1) Feed data (“Sentences”, “Question”, “Target”) 1 2 3 NLP - Discourse Analysis - Memory Network
  147. 147. Session 1 - Understand NLP Convert word index to embedding vector (Training target vector A,B,C) 1 3 Vocab Size 2 Dim Size vocab size Mem Size NLP - Discourse Analysis - Memory Network
  148. 148. Session 1 - Understand NLP Embedding A from given context sentences multiply Input Question Embedding (using embedding B which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer 1 2 1 2 multiply NLP - Discourse Analysis - Memory Network
  149. 149. Session 1 - Understand NLP NLP - Lexical Analysis - Memory Network Set embedding C(on the code it’s B) this is also the target variable for train
  150. 150. Session 1 - Understand NLP Embedding C(one the code it’s B) Multiply softmax result NLP - Discourse Analysis - Memory Network
  151. 151. Session 1 - Understand NLP For the last multiply question and output of memory network again NLP - Discourse Analysis - Memory Network
  152. 152. Session 1 - Understand NLP stack more memory layers NLP - Discourse Analysis - Memory Network
  153. 153. Session 1 - Understand NLP Set fully connected layer and calculate error with softmax cross entropy NLP - Discourse Analysis - Memory Network
  154. 154. Session 1 - Understand NLP On the given code I removed 90% of data set because we are using CPU for education.. So result may can be poor….. NLP - Discourse Analysis - Memory Network
  155. 155. Session 1 - Understand NLP https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/ https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano Dynamic Memory Networks Episodic Memory Other types of memory networks .. NLP - Discourse Analysis - Memory Network
  156. 156. Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning Basic NLU Server (Understand) NLG Server (Generate) SyntaxNet Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론 기본 이론 관련 딥러닝 이론 설명 Session 1 - Understand NLP Session 1 - Now We are Here ! Response Generation Memory Network Seq2Seq
  157. 157. Session 1 - Understand NLP Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인 다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다. - Input : 딥 러닝 재미 즐거운 일 - Output : 딥 러닝은 재미있고 즐거운 일이다 https://arxiv.org/pdf/1406.1078.pdf https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow NLP - Response Generator - Seq2Seq https://nlp.stanford.edu/pubs/emnlp15_attn.pdf
  158. 158. Session 1 - Understand NLP NLP - Response Generator - Attention Mechanism Attention Mechanism on Machine Translation https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb
  159. 159. Session 1 - Understand NLP NLP - Response Generator - Attention Mechanism Attention Mechanism on Machine Translation Bahdanau http://aclweb.org/anthology/D15-1166 Luong https://blog.heuritech.com/2016/01/20/attention-mechanism LocalGlobal Input Feeding
  160. 160. Session 1 - Understand NLP NLP - Response Generator - Bahdanau https://blog.heuritech.com/2016/01/20/attention-mechanism/ Without Attention Mechanism With Attention Mechanism
  161. 161. Session 1 - Understand NLP NLP - Response Generator - Bahdanau 1.embedding layer with inputs ○ embedded = embedding(last_rnn_output) 2.attention layer with inputs and outputs , normalized to create ○ attn_energies[j] = attn_layer(last_hidden, encoder_outputs[j]) ○ attn_weights = normalize(attn_energies) 3.context vector as an attention-weighted average of encoder outputs ○ context = sum(attn_weights * encoder_outputs) 4.RNN layer(s) with inputs and internal hidden state, outputting ○ rnn_input = concat(embedded, context) ○ rnn_output, rnn_hidden = rnn(rnn_input, last_hidden) 5.an output layer with inputs , outputting ○ output = out(embedded, rnn_output, context)
  162. 162. Session 1 - Understand NLP NLP - Response Generator - Implementation http://localhost:8888/tree/chap05_nlp/attention_seq2seq data_util (1)Data Processing & Feed Data
  163. 163. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (2)Word Embedding
  164. 164. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (3)Encoder
  165. 165. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (4)Attention
  166. 166. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (5)Decoder & Attention
  167. 167. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (6)Loss & Optimization
  168. 168. Session 1 - Understand NLP http://localhost:8888/tree/chap05_nlp/attention_seq2seq NLP - Response Generator - Implementation (7)Inference Task
  169. 169. Session 1 - Understand NLP NLP - Response Generator - Seq2Seq Pointer Network https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264 논문 저자들은 “포인터 네트워크"라는 새로운 뉴럴넷 구조를 제안합니다. 포인터 네트워크는 집중 메커니즘을 가진 seq2seq 구조로, 입력의 "인덱스"를 출력합니다. 출력 보카가 입력 시퀀스의 길이에 따라 달라지므로 다양한 크기의 입력을 다룰 수 있다는 장점이 있습니다. (주석: 기존의 seq2seq나 뉴럴 튜링 머신은 고정된 길이만 다룰 수 있었습니다.) 여기서 사용한 집중 메커니즘은 표준 seq2seq 집중 메커니즘을 살짝 변형했으며 O(n^2)의 시간 복잡도를 갖습니다. 논문 저자들은 제안한 구조를 평가하기 위해 컨벡스 헐, 딜루나이 삼각화, 순환 판매원 문제(TSP) 등 입력의 위치(순서)를 정답으로 출력해야하는 과제를 사용했습니다. 그 결과 포인터 네트워크는 잘 작동했고, 심지어 학습 데이터보다 더 긴 길이의 시퀀스에서도 동작했습니다. What else ?
  170. 170. Session 2 - Make ChatBot
  171. 171. Session 2 - 강의 목표 Sessionn 1에서 배운 NLP에 대한 이해를 바탕으로 AI를 적용하여 전체 아키텍쳐를 이해하고 피자 주문 봇을 바탕으로 수강생분들이 자기만의 ChatBot을 만들어 가는 것을 목표로 함 Session 2 - Make ChatBot
  172. 172. Session 2 : Susang Kim healess1@gmail.com ●Chatbot Develover ○ Released in POSCO (Find people using by NLP/AI) ○ Deep Learning MSA (ML,DNN, CNN, RNN) ●Agile Develover (worked at Pivotal Labs) ○ TDD, CI, Pair programming, User Story ●iOS Develover (Ranked App store in 100th - 2011 Korea) ●Front-End Developer (React, D3, Typescript and ES6) ●OSS world Challenge 2017 (on top 12 , on progress now) ●POSCO MES ... (working at POSCO ICT for 10 year)
  173. 173. Facebook AI shut down after creating their own language 논문 https://arxiv.org/abs/1706.05125
  174. 174. Remind of Session 1 Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning BasicNLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers SyntaxNet Scenario Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System ChatBot Server Numpy Pandas Tensorflow 파이프 라인 데이터 처리 ML & DL Library Scikit Learn Konlpy 개발 관련 데이터 수집 데이터 전처리 모델 훈련 모델 평가 모델 서비스 BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message 기본 이론 관련 딥러닝 이론 설명 예제를 통한 구현 설명 Memory Network Seq2SeqResponse Generation Ontology DM Legacy Data Base [AI Based] Chat-Bot Research Environment Data MartMonitoring Summary Result Train Data AI Model Pipe Line Session 2 - Make ChatBot
  175. 175. Session 2 - Make Chatbot [출처 Deview 2016 - https://deview.kr/2016/schedule#session/176] 요즘 왜 Chatbot이 뜨는가?? 직관적인 UX 일관성 있는 경험 음성과 연결 가능 별도 App 설치 필요 없음 다양한 서비스와 연결 가능 빠른 Feedback 플랫폼에 독립
  176. 176. Chatbot의 특징 • 많은 기술이 필요 (NLP, AI, F/W, Text Mining and 다양한 개발 skill) • Deep Learning을 공부하는 입장에서 결과 확인이 빠름 - 적은 Computing으로 빠른 결과확인 가능 (Text 기반) • 재미가 있음(Micro Data처리에 비해 Biz dependency가 적은편) - 이미지(CNN)이나 정형Data(DNN)보다는 Data처리에 대한 부담감이 적음 (형태소 분석기등으로 쉽게 전처리 쓴다는 가정하에) • 응용분야가 많음 (API기반의 다양한 서비스 연결 Smart Management) - Intent와 Slot만 채워주면 어느 서비스와 연결가능 • 관련 오픈소스가 적어 블루오션 (한글은 대부분 자체개발해야함) - 다행인건 딥러닝 기반의 언어독립적 Text algorithm이 많이 공개되어 활용 가능 • Bot Service가 있으나 가격부담, 한국어는 잘안됨, Customize 불가 Session 2 - Make ChatBot
  177. 177. Session 2 - Understand Chatbot Chatbot은? AI (패턴,맥락) 언어학 (자연언어처리) 프로그래밍 (Data처리-Python) Bot F/W (Story/Slot설계) Architecture (응답속도) Text Mining (Data구성) Chatbot Chatbot 구현을 위해서는 많은 분야의 다양한 기술 필요
  178. 178. Session 2 - Make Chatbot 다양한 Chatbot Platform이 존재는하고 있음 API.AI로 코딩없이 챗봇 만들기 https://calyfactory.github.io/api.ai-chatbot/ 모든 챗봇에는 의도와 개체인식이 존재 또한 그 것을 위해서는 Data가 중요함!!! api.ai에 가입해서 챗봇을 만들어보면서 원리를 파악해보면 도움이 됨
  179. 179. Session 2 - Make Chatbot Closed Domain vs Open Domain Rule Based General (abstract) Open Closed Retrieval (accuracy) Impossible Strong AI Weak AI level of difficulty 작은 Biz 도메인으로 시작해서 정확도를 높이면서 여러 Biz를 추가하는 상황
  180. 180. Session 2 - Make Chatbot Rule Based vs AI Computer Input Program Output Rule 이름, 지역, 팀등 조건별로 일일이 rule을 등록해야한다 - 정확도는 올라가나 모든 질문을 다 등록?? (룰을 백만개 등록하면 가능) Computer Input Output Program AI(ML, DL) 라벨링된 Data만으로 결과를 구할 수 있는 모델을 만들 수 있다 - 비슷한 Data들도 잘찾는편(Word2Vec,Glove) intent = 판교에 근무하는 김수상 찾아줘 => Intent : 특정 지역 사람 찾아줘 NER = 판교에 근무하는 김수상 찾아줘 => NER : B-Loc O O B-Name O 정확한 결과를 얻을 수 있으나 모든 질문은 불가 비슷한 유형의 질문은 적당히 잘 찾아줌 Data가 많을 수록 정확도 향상(학습효과) If (loc = 판교 and comp = 포스코ICT) person = 김수상 elif (loc = 판교 and comp = SK) person = 가나다 else person = 홍길동
  181. 181. Make ChatBot Now Lexical Analysis Syntactic Analysis Semantic Analysis Word Embedding BilstmCrf CharCNN Deep Learning BasicNLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers SyntaxNet Scenario Voice Recognition Discourse Analysis 자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System ChatBot Server Numpy Pandas Tensorflow 파이프 라인 데이터 처리 ML & DL Library Scikit Learn Konlpy 개발 관련 데이터 수집 데이터 전처리 모델 훈련 모델 평가 모델 서비스 BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message 기본 이론 관련 딥러닝 이론 설명 예제를 통한 구현 설명 Memory Network Seq2SeqResponse Generation Ontology DM Legacy Data Base [AI Based] Chat-Bot Research Environment Data MartMonitoring Summary Result Train Data AI Model Pipe Line Session 2 - Make ChatBot This Lesson
  182. 182. Session 2 - Make Chatbot 나만의 ChatBot를 만들어보자 피자 주문 챗봇을 어떻게 만들지? 피자를 주문하려면 피자 종류도 여러가지고, 사이즈도 다양하고,장소와 날짜, 사이드메뉴도등 다양한데 어떻게 ChatBot으로 만들 수 있을까? ⇒ 피자주문과 관련된 스토리가 구성되야함 ⇒ 딥러닝과 적당한 로직으로 피자 주문 Bot을 만들어보자
  183. 183. Session 2 - Make Chatbot NLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers Scenario Chat-Bot System ChatBot Server BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message Session 2 - Make Chatbot 질문 : 판교에 포스코ICT에 배달해줘 답변 : 사이즈를 선택해주세요 답변 : 장소를 입력해주세요 답변 : 피자주문 처리가 완료되었습니다. Text(Message) 1 3 4 2
  184. 184. Session 2 - Make Chatbot Chatbot Interface Flow NLP Context Analyzer Decision Maker 판교에 포스코ICT에 배달해줘 Intent : 피자주문 Entity : 장소 = 판교 포스코ICT Service Manager Response Generator 메뉴=null 시간=null 배달관련 Slot 분석(Knowlodge Base/Scenario) Entity : 메뉴:Null, 시간:null 피자주문 처리가 완료되었습니다. 피자주문 Slot 완성 어떤 메뉴를 원하시나요? 어떤 메뉴를 원해? (Tone Gen) Slot OK
  185. 185. Session 2 - Make Chatbot Story slot의 구성 (Frame-based DM) 피자 주문하고 싶어 Pizza Slot Size Type Side menu 피자 주문 의도 파악 피자 Bot의 스토리 구성 1) 어떤 사이즈를 원하시나요? 2) 어떤 종류를 원하시나요? 3) 사이드 메뉴는 필요하신가요? 사용자 답변 - 페파로니 피자로 라지 사이즈에 콜라추가해주세요 NER처리 및 Slot 구성 Pizza Slot Size Large Type Pepperoni Side menu cola 서비스 연결 (Slot API Call) 처리를 위해 Slot를 선택할 수 있게 보여주는 것도 방법 (UX기술까지 필요??)
  186. 186. Session 2 - Make Chatbot 1. 맥북 프로 검색해줘 2. 전처리 -> 맥북 프로 NER 3. 맥북프로 -> 대표 Entity처리 -> MacBook Pro API Call 4. 검색결과 출력 5. 상세 서비스 조회를 위한 Slot 출력 6. 새상담 원할 경우 새상담 클릭 Slot를 선택할 수 있게 화면에 출력함으로써 챗봇의 정확도를 대폭 향상 시킬 수 있음 (해당 Frame안에서만 선택할 수 있기에…) ex) “삼성 노트북” 쳐보면 Slot별 선택 바로봇 http://www.11st.co.kr/toc/bridge.tmall?method=chatPage Slot Trigger API
  187. 187. Session 2 - Make Chatbot NLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers Scenario Chat-Bot System ChatBot Server BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message Session 2 - Make Chatbot 판교에 포스코ICT에 배달해줘 NLU를 어떻게 한다는거지? => AI 적용을 위해 Vector로 변환을 해야함 1
  188. 188. Session 2 - Make Chatbot Word Represention의 정의 (컴퓨터가 잘 이해할수 있게) - One Hot은 단어별 강한 신호적 특성으로 Train 에 효과적 (Scope가 작을경우-Sparse) - Word 단위 Embedding 은 단어를 잘 기억함 (But Sparse) / W2V (유사도) - Gloves는 단어의 세부 종류까지도 구분 (카라칼-고양이) - Char 단위 Embedding 은 미훈련 단어 처리에 용이 (Vector을 줄이기위한 영어변환) - 한글을 변환한 영어 Char 단위 Embedding는 백터 수를 줄이면서 영어 처리도 가능 Train을 위한 Word Representation 15 한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구.pdf
  189. 189. Session 2 - Make Chatbot 일반적으로 Biz에 따른 Text는 존재하나 Deep Learning를 구현하기 위해서는 정제된 Text와 Tagging이 가능한 매우 많은 Data가 있어야함 한국어 Corpus를 일반적으로 세종 말뭉치를 사용하여 추가적인 Biz 어휘는 새로 학습시킴(노가다) - Corpus (annotation) 세종말뭉치(2007 ) https://ithub.korean.go.kr/user/main.do - 물결21 (2001~2014) 소스X http://corpus.korea.ac.kr/ - Web Crawling or down (Wiki, Namu Wiki) - Domain Specific의 경우엔 Text Data는 직접 만들어야함(Augmentation) 특화된 단어의 경우 새로 학습시켜야함 (ㅎㅇ? , 방가방가) ※ 고유명사등 새로운 어휘가 생성될때 새로 등록을 해주어야함 Data를 어떻게 얻는가?
  190. 190. Session 2 - Make Chatbot 문체부·국립국어원 '2차 세종계획' 추진 4차 산업혁명의 기반인 인공지능(AI)의 핵심 중 하나는 사람과 기계의 자유로운 의사소통이다. 컴퓨터가 인간의 말이나 글을 제대로 이해하고 반응하려면 인간이 말하고 쓰는 자연언어를 처리할 수 있는 방대한 언어 데이터베이스가 필요하다. 이러한 언어 데이터베이스를 말뭉치(corpus)라고 한다. 최근 빠르게 보급되는 음성인식 인공지능의 정확도는 이러한 말뭉치가 얼마나 풍부하게 정교하게 구축돼 있느냐에 달려있다. 문화체육관광부와 국립국어원은 한국어 인공지능 기술의 발전을 위해 2018~2022년 총 154억7천만 어절의 말뭉치를 구축하는 국어 정보화사업 계획을 마련했다고 9일 밝혔다.
  191. 191. Session 2 - Make Chatbot Train Vector를 정한 후 Feature를 뽑아야함 Cleansing -> Feature Engineering -> Train (상황별 특수문자 제거, 의미 있는 단어 도출 - Tagging) 의도나 객체와 상관있는 단어만 추출해내어 성능을 향상시킴Train Cost를 줄이고 모델의 성능을 향상) 임베딩 차원도 줄이는 효과 (Dense Respresention-SVD) abcd~z, 0~9, ?, !, (,),’,’,공백등 약 70여개 초중종성으로 글자를 쪼개기에는 어려움 .lower()를 활용하는것도 방법 백터 줄이기 학습시킬 Data의 구성
  192. 192. Session 2 - Make Chatbot 판교에 포스코ICT에 배달해줘 Data의 양이 적은데 어떻게 정제된 Data를 구하지? [AI Based] Chat-Bot Research Environment Data MartMonitoring AI Model Pipe Line Session 2 - Make ChatBot Session 2 - Make Chatbot 1
  193. 193. Session 2 - Make Chatbot Data Augmentation for AI (Intent - tag) 판교에 오늘 피자 주문해줘 Story Definition Intent Mapping주문 해줘 Entity Mapping 메뉴 : 피자, 장소 : 판교, 날짜 : 오늘 Pattern Generation 30% of Train Data 의도 : 피자 주문 (주문) Preprocessing판교 오늘 피자 주문 Story key value (주문) tagloc tagdate tagmenu 주문 Model Train(Char-CNN) Evaluation tagloc tagdate tagmenu 주문 tagloc tagdate 주문 tagdate tagmenu 주문 tagloc tagmenu 주문 Predictiontagloc tagdate 주문 tagmenu Hyper parameter Selection 의도 = 주문
  194. 194. Session 2 - Make Chatbot Data flow for Model in AI (NER - BIO) 판교에 오늘 피자 주문해줘 Story Definition tagloc tagdate tagmenu 주문 BIO-Mapping Preprocessing판교 오늘 피자 주문 B_Loc / B_Date / B_menu Model Train(Bi-LSTM) B-loc B-date B-menu 주문 B-loc B-date 주문 B-date B-menu 주문 B-loc B-menu 주문 Text Generator Pattern Matching tagloc tagdate tagmenu 주문 tagloc tagdate 주문 tagdate tagmenu 주문 tagloc tagmenu 주문 W2V 30% of Train Data Evaluation Prediction판교 오늘 피자 주문 Hyper parameter Selection 피자 : 0.12 장소 : 0.7 메뉴 : 0.3 객체인식 B_loc O B_Date B_menu 주문 O
  195. 195. Session 2 - Make Chatbot NLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers Scenario Chat-Bot System ChatBot Server BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message Session 2 - Make Chatbot 판교에 포스코ICT에 배달해줘 Data는 구했는데 의도를 어떻게 알아내지? 1
  196. 196. Intent를 알아내는법 (Text Classification) 피자주문 하고 싶어 / 여행 정보 알려줘 / 호텔 예약해줘 주문, 정보, 예약의 3가지 의도 문장 내 Word검색으로 일일이 파악할 수도 있으나 한계가 있음 ex) 피쟈 시켜먹고 싶어 / 여행 좋은데 알려줘…. Deeplearning를 활용하면 이런 문제들을 해결 할 수 있음 Char + CNN으로 분류해보자 (CNN - Feature 주문, 정보, 예약) (Word Similarity 피자, 피쟈 / 정보, 갈만한데)
  197. 197. Intent를 알아내는법 (Text Classification - Data 구성) Word 피자 주문 하고 싶어 Vector가 많다면 영어발음변환 PIJA JUMUN HAGO SIPO 숫자, 특수문자,공백등 모두 고려해야함 W2V(Pretrained) 피자 (0.12, 0.54, 0.72) 주문(0.56, 0.65, 0.64) 하고(0.67, 0.91, 0.13) 싶어(0.89, 0.14, 0.11) Ont Hotencoding (Word단위 or 글자단위) (0100000000) (0000010000) (0010000000) (0000000100) Ont Hotencoding (A~Z Vector) (0100000000) (0000010000) (0010000000) (0000000100)
  198. 198. Char CNN? CNN은 일반적으로 이미지의 특징을 추출하여 인식하는데 많이 쓰이나 이미지도 결국은 Vector이고 텍스트도 Vector을 감안하면 텍스트의 Feature를 뽑아낼 수 있음
  199. 199. Text Classification - Char CNN 지금 피자 주문 하고 싶어 [논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882] 예약 주문 정보 Feature 바라볼단어수 [3,4,5 filter] Vector (W2V) 길이/차원/윈도우 Static / Non Static / Random pooling 추상화 classification 분류 Char-CNN을 활용하여 의도를 파악해보자
  200. 200. Why Char-CNN?? Char-CNN이 일반적인 다른 알고리즘과 비교하여 좋은 성능을 보임 논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882
  201. 201. Text Classification (Multi-class SVM) Char-CNN보다 간단하게 Machine Learning를 활용하여 의도를 파악할 수 있음
  202. 202. Session 2 - Make Chatbot NLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers Scenario Chat-Bot System ChatBot Server BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services message Session 2 - Make Chatbot 판교에 포스코ICT에 배달해줘 Entity는 어떻게 알아내지? 1
  203. 203. RNN에 대한 이해 연속된 Data에 대한 모델링에 유용 시퀀스를 입력으로 받기 때문에 Backpropagation을 시간에 대해서도 수행(BPTT) http://aikorea.org/blog/rnn-tutorial-3/
  204. 204. http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf Seq2Seq (RNN+RNN) 이해 Chatbot에서는 Generator의 역활 Sentence Generator 영화 자막이나 소설책을 활용하여 학습시킬 수 있음 (형태소 분석기로 input/output정의)
  205. 205. http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf LSTM에 대한 이해 Cell State https://brunch.co.kr/@chris-song/9 updateforget out cell state
  206. 206. http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf ResNet과 RNN의 LSTM은 비슷한 개념
  207. 207. Named Entity Recognition 알아내기 Bidirectional LSTM (양방향 Layer) - RNN기반의 모델 - 특정위치에 있는 단어의 태깅에 유용 문장내 단어 위치에 따른 의미 처리하는 효과적인 방법 [ 한국어 정보처리 학술대회 - https://sites.google.com/site/2016hclt/jalyosil]
  208. 208. Why Bi-LSTM CRF ? [ Bidirectional LSTM-CRF Models for Sequence Tagging - https://arxiv.org/pdf/1508.01991.pdf ]
  209. 209. 피자 주문하고 싶어 B-Pizza B-Order O O 여행 정보 알려줘 B-Travel B-Information O 호텔 예약해줘 B-Hotel B-Reserve O Named Entity Recognition 알아내기 brat를 활용 BIO Tagging B-시작어휘 I-이어지는 어휘 O-어휘아님, 공백(OUT) U-Unknown (Word Embedding이 없을시) ※New York?,수상하다? Brat - http://brat.nlplab.org/examples.html / https://wapiti.limsi.fr/
  210. 210. Bi-LSTM으로 사전 강화 -> 모델 학습 피자 주문하고 싶어 B-Pizza B-Order O O 여행 정보 알려줘 B-Travel B-Info O 호텔 예약해줘 B-Hotel B-Reserve O 피이쟈 주문하고 싶어 놀러갈 정보 알려줘 숙소 예약해줘 피자 여행 호텔 Bi-LSTM을 통해서 신규 어휘를 도출하고 학습Data에 반영하여 모델의 성능을 지속적으로 향상 시킴
  211. 211. Session 2 - Make Chatbot NLU Server (Understand) NLG Server (Generate) DM Server Messaging Platform BackEnd Service Servers Scenario Chat-Bot System ChatBot Server BackEnd Service Servers message intent & slot information message message Semantic Frame Semantic Frame connect services 판교에 포스코ICT에 배달해줘 의도도 파악했고 Entity도 알아냈으니 서비스를 만들어보자 message Session 2 - Make Chatbot 12 3
  212. 212. Session 2 - Make Chatbot ChatBot Layer Log File Chatbot Architecture Deep Learning Layer 위에 ChatBot Layer 와 같은 Application Layer 를 구성하고 각 Application Layer 는 필요한 기능을 DL Layer 와 연동. DeepLearning Layer Bi-LSTM CRF Char-CNN SVM Seq2Seq Attention NAS File Model Bot DB Residual Vgg NLP Context Analyzer Decision Maker Response Genertor ※ 이미지검색을 위해 Residual등과 같은 모델 활용 Bot Builder GPU Deeplearning Predict Dict File Bot config Train Train Intent / NER
  213. 213. Session 2 - Make Chatbot NLP Architecture Preprocessing Python Konlpy Mecab (Sejong Corpus) Tensorflow SVM Char-CNN Bi-LSTM CRF Gensim FastText User-Dic Synonym Voting Python API Service (Swagger) 판교 근무하는 포스코ICT에 김수상한테 피자 주문하고 싶어... [Intent 도출] 피자 주문 [NER 도출] 판교 - Loc 포스코ICT - Loc 김수상 - Name 고유명사 ('포스코'ICT'', 'NNP'), ('김수상', 'NNP'), ※Mecab 고유명사등록 링크 문장길이체크 , 특수기호 (...) 삭제 명사 추출 명사 추출 [('판교', 'NNG'), ('근무', 'NNG'), ('하', 'XSV'), ('는', 'ETM'), ('포스코'ICT'', 'NNP'), ('에', 'JKB'), ('김수상', 'NNP'), ('한테', 'JKB'), ('피자', 'NNG'), ('주문', 'NNG'), ('하', 'XSV'), ('고', 'EC'), ('싶', 'VX'), ('어', 'EC')] Intent Slot 및 모델 비교 피자주문 Slot의 Entity값 NER 결과값 Input Data=’’ 판교 근무하는 포스코ICT에 김수상한테 피자 주문하고 싶어…” Intent=’피자주문’ Intent_History=[‘피자주문’,’’] story_slot_entity { ‘메뉴’:피자’’, ‘지역’ : ’판교 포스코ICT’, ‘이름’ : ‘김수상’} request_type=’text’ service_type=’pizza order’ output_data=’’ } Meta
  214. 214. Session 2 - Make Chatbot Docker (Ubuntu) in AWS EC2 (c4.8xlarge / p2.xlarge GPU) NAS DB Server Bot Builder (analysis) React Chatbot Server (Django) Python Tensorflow Postgres SQL Bootstrap Web Service Architecture(MSA) D3 SCSS Konlpy Nginx Celery Log File Model File Rabbit MQ Service Java Node Python Rest Gensim Front-End Java (Trigger) Rest LB Rest AP2 GPU Server (HDF5) GPU Server (HDF5) Dict File Hbase
  215. 215. Session 2 - Make Chatbot Bot Builder and UX (Story)
  216. 216. Session 2 - Make Chatbot ChatBot Definition ChatBot Intent ChatBot Service ChatBot Intent Entity ChatBot Story ChatBot Response ChatBot Model ChatBot Tagging ChatBot Entity Relation ChatBot Synonym Bot Builder DB Service의 확대를 위해 가능하면 Common하게 구성
  217. 217. Session 2 - Make Chatbot Rest API Client Input Data=페파로니 피자 주문할께 Intent=’’ Intent_History=[‘ ’,’’] story_slot_entity { 메뉴:’’, 사이즈:’’, 사이드:’’ } request_type=text service_type=’’ output_data=’’ Server Input Data= 페파로니 피자 주문할께 Intent=피자주문 Intent_History=[‘피자주문’,’’] story_slot_entity { 메뉴:피자, 사이즈:라지, 사이드: 콜라 } request_type=text service_type=’’ output_data=주문완료 Chatbot API ※ 필수 값들만 JSON으로 통신하고 다른 값은 Dilog Manager(Log)에서 관리
  218. 218. Session 2 - Make Chatbot Case별 Test Coverage 코드 구현 1. 로직 변경 (단위테스트) 2. Model 변경 (Hyper Parameter) 3. Data 변경(Slot, Dict, Entity,유의어) 4. 속성 값 변경 (Threshold, Rule기준) 단순 로직 변경과는 다르게 Data와 Model의 변경사항을 지속적 검증 할 수 있는 방안 필요 가동상황에서 정확도를 올리기 위해선 Continous Integration이 필수 (Jenkins / Travis CI등) Test Codes for Chatbot 피자주문 호텔예약 의도점검->NER점검-> Slot점검 여행정보 input 판교에 피자주문할께 -> intent : 피자주문 slot : {메뉴,크리,사이드-extra}
  219. 219. 실무에서 발생하는 문제와 해결 Tips
  220. 220. 모델의 정합성을 올리기 위해 복수개의 모델과 로직으로 보완 (Scoring / Voting) 의도를 찾는 경우 여러모델을 비교하여 가장 근접한 값을 찾는다 Textming과 앙상블의 조합으로 정합도르 올리자(Fine tunning) 포스코ICT에 지금 피자 배달해줘 Char-CNN VotingSVM(Multi-class) Result naive_bayes.MultinomialNB 각 의도별 Slot 비교 배달의 경우엔 장소,시간이 필수 여행정보 메뉴배달 메뉴배달 피자 배달 Ensemble and Voting 모델별 가중치 Slot 비교 병렬 수행
  221. 221. Trigger 처리 (사랑, 이미지 검색) 1. 사랑단어가 포함될 경우 <실재 가동 사례> 직원 : XXX 사원에게 사랑한다고 포스톡 보내줘 챗봇 : 너무 쉽게 사랑하지 마세요. 직원 : 니가 먼제 내 사랑을 논해 챗봇 : 학습중이라 아직 잘 모르는게 많아요. 직원 : ㅋㅋㅋㅋ 챗봇 : ㅋㅋㅋ [안녕, 사랑, ㅋㅋㅋ] 등에 Trigger를 적용하고 이에 확보된 Data를 Seq2Seq모델에 학습시켜 NLP전처리 모델로 사용 https://www.youtube.com/watch?v=x9bvkXJ-JeQ 2.이미지 검색 시(ResNet Model Call)
  222. 222. 필요시 Tone Generater을 쓰자 말투를 다르게만듬 (지역별, 존댓말 , 부하톤) 주문이 완료되었습니다 (일반) 주문이 완료되었단다 (공손) 주문이 완료되었어요 (존대) 주문이 완료되었다니깐 (짜증) Seq2Seq Model활용 - Encoder에 명사등 구성 Decoder에 명사+조사 구성 Response Generator의 경우 형태소 분석기의 응용
  223. 223. 유의어 처리(N-Gram) 페파로니 - Pepperoni, 폐파로니, 페파피자..... / Mac Book Pro - 맥프로, 맥북프로... 고객별로 다양한 단어를 사용하나 API호출시에는 지정 값으로 해야 함 N-Gram을 활용하여 유의어로 학습한 결과를 Dict에 찾는 방식 (일반적 trigram) 링크 https://www.simplicity.be/article/throwing-dices-recognizing-west-flemish-and-other-languages/ 각 Entity별 N과 Threshold 값을 적절하게 조절 ※ threshold : 작을수록 비슷하게 찾음
  224. 224. Response Speed LB 구성 Nginx 사용 적절한 수의 Thread와 AP Caching of Data (Memory - API사용) Chatbot에서 수용할수 있는 MAX Time반영
  225. 225. 학습시 병렬 처리를 위한 Coding tf.device를 통해 연산할 Device를 지정 CPU와 GPU의 적절한 분배 GPU가 많다고 무조건 빠른지는...
  226. 226. TensorMSA 소개
  227. 227. 마무리 ● 챗봇의 구현에 있어서 Hot한 기술의 사용도 중요하지만 무엇보다 Domain별 Data의 의미를 알고 컴퓨터가 잘 이해할 수 있게 해야함 ● 학습할 Data와 예측 Data의 패턴을 일치화하는 것이 중요(일관성) ● 딥러닝은 대량의 정제된 Data와 확보가 중요함 ● 딥러닝은 성능개선에 있어 충분한 해결 방안이 될 수 있음
  228. 228. When the singularity comes... Google IO17 : https://www.youtube.com/watch?v=Y2VF8tmLFHw
  229. 229. Reference 모두를 위한 딥러닝 http://hunkim.github.io/ml/ 제28회 한글 및 한국어 정보처리 학술 대회 한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구등 Stanford University CS231n http://cs231n.stanford.edu/ Creating AI chat bot with Python 3 and Tensorflow[신정규] https://speakerdeck.com/inureyes/building-ai-chat-bot-using-python-3-and-tensorflow 파이썬으로 챗봇_만들기 [김선동] https://www.slideshare.net/KimSungdong1/20170227-72644192?next_slideshow=1 딥러닝을 이용한 지역 컨텍스트 검색 [김진호] http://www.slideshare.net/deview/221-67605830 Developing Korean Chatbot 101 [조재민] https://www.slideshare.net/JaeminCho6/developing-korean-chatbot-101-71013451 Tensorflow-Tutorials https://github.com/golbin/TensorFlow-Tutorials

×