Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NLP Deep Learning with Tensorflow

1,304 views

Published on

Understand what is natural language process and how can we approach this problem with deep learning especially using google tensorflow

Published in: Technology

NLP Deep Learning with Tensorflow

  1. 1. NLP 에 대한 이해와 Tensorflow 를 활용한 실무 적용 WRITTEN BY SeungWooKim tmddno1@gmail.com
  2. 2. 현 POSCO IT 사업부 - AI TFT 리더 POSCO IT 사업부 AI 프로젝트 지원 FrameWork 개발 리더 POSCO AI Chat Bot 시범 서비스 개발 리더 POSCO ICT BigData & AI 사내 강사 성균관대학교 컴퓨터 공학 전공 tmddno1@gmail.com
  3. 3. 1. 강의 도커 환경 https://github.com/TensorMSA/skp_edu_docker 2. 강의 소스 코드 git clone https://github.com/TensorMSA/tensormsa_jupyter.git
  4. 4. 강의 목표 "피자 주문을 ChatBot Messenger 를 통해서 서비스 하고 싶다.. 어떤 데이터를 수집하고, 어떤 신경망을 사용하고, 어떻게 아키택쳐를 구성해야 목표를 달성 할 수 있을까?" 예를 들어 위와 같이 자연어 처리와 관련된 어떤 문제가 주어졌을 때 데이터와 딥러닝 관점에서 문제를 접근 할 수 있는 통찰력 획득 [다음 세션] 이번 시간에 배운 재료를 아키택쳐 관점에서의 어플리케이션 레벨에서 적용하고 응용하는 방법에 대한 세션
  5. 5. 1.NLP & Deep Learning 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition 2-3.Syntactic Analysis ㅛ 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation
  6. 6. 3.Language Generation 3-1.Basic Seq2Seq 3-2.Other types of Seq2Seq (Attention, Pointer) 4.Tips 4-1.Hyper Parameter Random Search 4-2.Genetic Algorithm for Hyper Parameter Search 4-3.Auto Hyper Parameter Search with Multi GPU Server
  7. 7. 1.NLP & Deep Learning
  8. 8. NLP and Deep Learning Today’s Focus 이미지등 다른 분야와 마찬가지로 DL 이 좋은 성능을 보여주지만, 분야의 특성상 100% DL 로 대체될 수는 없다. 기존 연구 분야에 대한 이해 중요 https://www.slideshare.net/ssuser06e0c5/ss-64417928
  9. 9. What’s NLP (Natural Language Process) ? Let’s find out with examples
  10. 10. NLP Applications Mostly Solved Making Good Progress Still Really Hard Spam Detection (스팸분석) Text Categorization (텍스트 분류) Part of Speech Tagging (단어 분석) Named Entity Recognition (의미 구분 분석) Information Extraction (정보 추출) Sentiment Analysis (감정분석) Coreference Resolution (같은 단어 복수 참조) Word Sense Disambiguation (복수 의미 분류) Syntactic Parsing (구문해석) Machine Translation (기계번역) Semantic Search (의미 분석 검색) Question & Answer (질의 응답) Textual inference (문장 추론) Summarization (텍스트 요약) Discourse & Dialog (대화 & 토론)
  11. 11. NLP Applications Text Categorization Text Classification assigns one or more classes to a document according to their content. Classes are selected from a previously established taxonomy (a hierarchy of catergories or classes). Spam Detection Spam Detection is also the part of Text Classification problem. Part of Speech grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
  12. 12. NLP Applications Low Level Information Extraction
  13. 13. NLP Applications Information Extraction on Broader view https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A %2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx F9su0J9KiWobVP4Kg Rule Based Extraction Named Entity recognition Syntax Anal Relation Search Ontology Information Extraction
  14. 14. NLP ApplicationsNLP Applications Coreference Resolution I did not vote for the Donald Trump because I think he is too reckless Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction. Deep Reinforcement Learning for Mention-Ranking Coreference Models Improving Coreference Resolution by Learning Entity-Level Distributed Representations https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
  15. 15. NLP ApplicationsNLP Applications Word Sense Disambiguation [Example] 1. a type of fish 2. tones of low frequency and the sentences: 1. I went fishing for some sea bass. 2. The bass line of the song is too weak. http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf supervised way lable data example simi-supervised way ontology based
  16. 16. NLP Applications Syntatic Parsing syntatic parsing is Find structural relationships between words in a sentence https://web.stanford.edu/~jurafsky/slp3/12.pdf
  17. 17. NLP Applications Machine Translation Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish).
  18. 18. NLP Applications Semantic Search Semantic search seeks to improve search accuracy by understanding a searcher’s intent through contextual meaning. Question and Answer Able to answer questions in natural language based on Knowledge data (usually ontology) ex) Best example is IBM Watson Textural Inference Recognize, generate, or extract pairs <T,H> of natural language expressions, such that a human who reads (and trusts) T would infer that His most likely also true Summarization Extracting interesting parts of the text and create a summary by using these parts of the text and allow for rephrasings to make summary more grammatically correct. Discourse & Dialog Do conversation with understanding the whole history of dialog and semantic meaning of speaker.
  19. 19. Level of NLP ○ pragmatics : use of language ○ Semantics : meaning of words & sentences ○ (Surface) Syntax : Phrase & Sentence ○ Morphology : morpheme, word ○ Phonology : phoneme (abstract unit of speech sound) ○ Phonetics : phone (acoustic unit of speech sound) 음성과 단어 단어의 구성 단어의 순서 단어&문장 의미 대화의도 & 맥락 High Low
  20. 20. 2.Language Analysis Process
  21. 21. Spoken Utterance Lexical (어휘) Analysis : Word Structure Speech Recognition Written Utterance Syntactic (구문) Analysis : Sentence Structure Morphemes, Word Semantic (의미) Analysis : Meaning of Words & Sentence Sentence Discourse (대화) Analysis : Relationship between sentence Context beyond Sentence Language Analysis
  22. 22. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis
  23. 23. Language Analysis - Speech Recognition AI Speaker Alexa Alexa Microphone System
  24. 24. Language Analysis - Speech Recognition Deep Learning for Classification Hidden Markov Model for Language Model
  25. 25. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  26. 26. Language Analysis - Lexical Analysis Main Factors on Lexical Analysis Sentence Splitting Tokenizing Morphological Part of Speech Tagging
  27. 27. Lexical Analysis - Sentence Splitting & Tokenizing What if there is no line change char (‘n’) ? Where is the EOS point? What if sentence is not separated into words properly with space? [Examples] [Problems]
  28. 28. Language Analysis - Lexical Analysis - Morphological Word stemming lemmatization Love Lov Love Loves Lov Love Loved Lov Love Loving Lov Love Innovation Innovat Innovation Innovations Innovat Innovation Innovate Innovat Innovate Innovates Innovat Innovate Innovative Innovat Innovative Morphing Examples Stemming & lemmatization Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning or grammatical function)” and other features like stem in a language that carries information.
  29. 29. Language Analysis - Lexical Analysis - Part of Speech Tagging Ambiguity “that” can be a subordinating conjunction or a relative pronoun - The fact that/IN you’re here - A man that/WDT I know “Around” can be a preposition, particle, or adverb - I bought it at the shop around/IN the corner. - I never got around/RP to getting a car. - A new Toyota Prius costs around/RB $25K. Degree of ambiguity (in Brown corpus) - 11.5% of word types (40% of word tokens) are ambiguous # of Tags 1 2 3 4 5 6 7 # of Words 35340 3760 264 61 12 2 1 #Ambiguity Problem is much serious in Korean Part-of-speech tagging is one of the most important text analysis tasks used to classify words into their part-of-speech and label them according the tagset which is a collection of tags used for the pos tagging. Part-of-speech tagging also known as word classes or lexical categories
  30. 30. Language Analysis - Lexical Analysis - Implementation Hannanum Kkma Komoran Mecab Twitter 하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun 을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa 나 / N 날 / VV 나 / NP 나 / NP 나 / Noun 는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa 자동차 / N 자동차 / NNG 자동차 / NNG 자동차 / NNG 자동차 / Noun Anal Result Comparison Library Performance Comparison
  31. 31. Language Analysis - Lexical Analysis - Implementation [Code]
  32. 32. Language Analysis - Lexical Analysis - Implementation [Code]
  33. 33. Language Analysis - Lexical Analysis - Implementation [Code]
  34. 34. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  35. 35. Language Analysis - Lexical Analysis [Deep Learning - Sequence Labeling - BiLSTM-CRF] (1) Word Segmentation (2) POS Tagging (3) Chunking (4) Clause Identification (5) Named Entity Recognition (6) Semantic Role Labeling (7) Information Extraction What we can do with sequence labeling What’s sequence labeling
  36. 36. Language Analysis - Lexical Analysis [Deep Learning - Sequence Labeling - BiLSTM-CRF] Word POS Chunk NE West NNP B-NP B-MISC Indian NNP I-NP I-MISC all-around NN I-NP O Phil NNP I-NP B-PER Simons NNP I-NP I-PER took VBD B-VP O four CD B-NP O for IN B-PP O 38 CD B-NP O on IN B-PP O Friday NNP B-NP O iob data set example POS Tag 의미 ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing Chunk Tag 의미 B : Begin of Chunk I : Continuation of Chunk E: End of Chunk NP : Noun VP : Verb NER BIO Tag 의미 B : Start with new Chunk I : word inside Chunk O: Outside of Chunk
  37. 37. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] BiLSTM-CRF Description Before we Talk about BiLstmCrf which is really important algorithm for sequence labelling.. Let’s talk about necessary knowledge that we have to know really briefly
  38. 38. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3. Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  39. 39. Language Analysis - Lexical Analysis - Check Prerequisite [Those will be needed to understand what I am trying to explain] Concept of perceptron & Deep Neural Network Concept of SoftMax DNN & Matrix Gradient Descent Back Propagation Activation Functions
  40. 40. Language Analysis - Brief Explanation # tf Graph input x = tf.placeholder("float", [None, 784]) y = tf.placeholder("float", [None, 10]) # Store layers weight & bias weights = { 'h1': tf.Variable(tf.random_normal([784, 256])), 'h2': tf.Variable(tf.random_normal([256, 256])), 'out': tf.Variable(tf.random_normal([256, 10])) } biases = { 'b1': tf.Variable(tf.random_normal([256])), 'b2': tf.Variable(tf.random_normal([256])), 'out': tf.Variable(tf.random_normal([10])) } # Hidden layer with RELU activation layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) layer_1 = tf.nn.relu(layer_1) # Hidden layer with RELU activation layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) layer_2 = tf.nn.relu(layer_2) # Output layer with linear activation pred = tf.matmul(layer_2, weights['out']) + biases['out'] hypothesis = tf.nn.softmax(pred ) # Define loss and optimizer cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), reduction_indices=1)) tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) input Hidden Out 784 256 10 Hidden 256 784 256 786 256 256 10 256 S O F T M A X Y=Activation(W*x + b) [Error] Cross Entropy W W1 A(W*x + b) b b A(W*x + b)x 2 1 3 4 5 256 786 1
  41. 41. Language Analysis - Lexical Analysis - Check Prerequisite [Those will be needed to understand what I am trying to explain] Dynamic RNN BiDirectional LSTM Word EmbeddingRecurrent Neural Network LSTM (Long Short Term Memory)
  42. 42. Language Analysis - Brief Explain START 오늘 날씨 는 ? PAD PAD END START 오늘 날씨 는 어때 ? PAD END START 오늘 비가 오 려 나 ? END Case of long sentence … Vanishing Problem happens Various length of data cause waste of computing power Here we have concept of Dynamic RNN BiDirectional Lstm learn given data from backward Long Short Term Memory Cell Cell State https://brunch.co.kr/@chris-song/9 updateforget out cell state https://blog.altoros.com/the-magic-behind-google-translate- sequence-to-sequence-models-and-tensorflow.html
  43. 43. Language Analysis - Word embedding Word Embedding 이란 ? 텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서 단위를 수치화하여 표현하는 방법의 일종 장점 : 차원의 축소 , 의미적 유사성의 표현 단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도
  44. 44. Language Analysis - Word embedding - OneHot Encoding Concept of OneHot Encoding
  45. 45. Language Analysis - Word embedding - Word2Vec https://www.tensorflow.org/tutorials/word2vec http://w.elnn.kr/search/ Concept of Word2Vector Word2Vector Demo Site
  46. 46. Language Analysis - Word embedding - Word2Vec C-Bow the quick brown fox jumped over the lazy dog ([brown, jumped], fox) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text
  47. 47. Language Analysis - Word embedding - Word2Vec the quick brown fox jumped over the lazy dog (fox, brown), (fox, jumped) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text Skip-Gram
  48. 48. Language Analysis - Word embedding - Doc2Vec (1)PV-DM (2)PV-DBOW (3)DM + DBOW (Vector Concat) W2V W2V W2V (4)AVG(TF-IDF * W2V) the quick brown fox jumped over the lazy dog (paragraph, the) (paragraph, quick) (paragraph, brown) (paragraph, fox) (paragraph, jumped) . ([paragraph, quick, brown, fox, juped], over) ([paragraph, quick, brown, fox, juped,over],the) vector vector vector TF-IDF TF-IDF TF-IDF X X X vector AVG
  49. 49. tfidf(t,d,D) = tf(t,d) x idf(t,D) Language Analysis - Word embedding - TF-IDF https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/ http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/ Not exactly word embedding but used on nlp with deep learning pretty often - Document similarity - Words importance on document - Used on search engine (like elasticsearch though it use BM25 for now)
  50. 50. Language Analysis - Word embedding - Char Embedding - Introduce several ways to embed char as vector 안 녕 하 세 요 1 가 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 나 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 다 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 라 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 마 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 바 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 사 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 아 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 자 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 An Neung Ha Se Yo (ㅇ ㅏ ㄴ) (ㄴ ㅕ ㅇ) . . . . 2 a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 ㄱ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄴ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄷ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄹ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ㅁ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ㅂ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ㅅ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ㅇ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ㅈ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  51. 51. Language Analysis - Word embedding - Word+Char the quick brown fox jumped over the lazy dog 0.2 0.1 0.4 0.21 0 0 0 f o x fox Word2Vector 0 1 0 0 0 0 1 0 OneHot Encoding OneHot Encoding OneHot Encoding 1.Word2Vec 계열은 의미적 상관성을 잘 표현 2.OneHot 은 강한 신호적 특성으로 Train 에 효과적 3.Word 단위 Embedding 은 단어를 잘 기억함 4.Char 단위 Embedding 은 미훈련 단어 처리에 용이
  52. 52. Language Analysis - Word embedding - NGram In case of Word2Vec it can represent only the trained word.. Words not exactly match the pretrained dict will return “UNKNOWN” So FastText (by Facebook ) use ngram on their word embedding algorithm.. 에어컨 ~ 에어조단 비교 에어컨 ['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5 에어조단 ['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6 일치 ['$$에', '$에어'] => 2 점수 일치 2건 / 중복제거 전체 7건 => 0.2222
  53. 53. http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/ Language Analysis - Word embedding - vector distance Cosine Similarity from math import* def square_rooted(x): return round(sqrt(sum([a*a for a in x])),3) def cosine_similarity(x,y): numerator = sum(a*b for a,b in zip(x,y)) denominator = square_rooted(x)*square_rooted(y) return round(numerator/float(denominator),3) print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])
  54. 54. Language Analysis - Word embedding - Implementation OneHot Encoding : Simple Test Code show concept of onehot http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ [Code]
  55. 55. Language Analysis - Word embedding - Implementation Word2Vector : Using Gensim word2vec package http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  56. 56. Language Analysis - Word embedding - Implementation FastText : FaceBook fasttext with gensim wrapper http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  57. 57. Language Analysis - Word embedding - Implementation FastText : Possible to use pretrained vector and do find tuning on it http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
  58. 58. Language Analysis - Word embedding - Implementation N-grams are simply all combinations of adjacent words or letters of length n that you can find in your source text.
  59. 59. Language Analysis - Word embedding - Implementation For large dataset word2vec training GPU acceleration is needed You can also think about using Tensorflow or Keras for training model https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py
  60. 60. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3. Other prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  61. 61. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] BiLSTM-CRF Description http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
  62. 62. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김승우 B-PERSON 전화번호 B-TARGET 검색 O 김승우 B-PERSON 이메일 B-TARGET 검색 O 김승우 B-PERSON 이미지 B-TARGET 검색 O IOB Data 김승우 전화번호 검색 김승우 이메일 검색 김승우 이미지 검색 Plain Data Sentence Splitting Token Morphing Part of Speech Tagging Lexical Analysis Word2Vector OneHot Encoding 1 0 0 0 0 1 0 0 0 0 1 0 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List
  63. 63. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List [Code]
  64. 64. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김 우 승 김승우 전화번호 이메일 Concat Vector [Code]
  65. 65. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Concat Vector 김승우 전화번호 이메일 검색 B-PERSONB-TARGET BiLstm Fully Connected Layer B-? B-? B-? [Code]
  66. 66. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Conditional Random Field Soft Max [Code]
  67. 67. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Probabilistic Model for sequence data segmentation and labeling https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2 he first method makes local choices. In other words, even if we capture some information from the context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a location should help us to decide that New corresponds to the beginning of a location. Given a sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R
  68. 68. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Gradient Descent Momentum NAG Adagrad Adadelta Rmsprop Adam [Code]
  69. 69. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] https://arxiv.org/pdf/1705.08292.pdf "Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이 다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬 generalization 측면에서 뛰어나다." The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] , Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota Technological Institute at Chicago May 24, 2017 There is no optimizer best for all cases!! When to use adaptive optimizer? If input embedding vectors are sparse, it’s better to use adaptive optimizer!
  70. 70. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Real Project BiLstm Result Sample Code Predict Test Result Test data Not Included in Train Set Predicts well http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
  71. 71. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker
  72. 72. Language Analysis - Syntactic Analysis 구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는 구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를 결정하는 것을 말한다. Graph-Based Models Transition-Based Models CYK Style Parsing MST finding Algorithm Projective & Non Projective Model
  73. 73. Language Analysis - Syntactic Analysis Transition-Based Models Sentence W Repeat until all words have their head - Select two target words in data structure (One dependent & one head candidate) - Deterministically predict next parsing action from parsing model - Modify structure according parsing action C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree t1 t2 t3 t8 t9 t10 tm Oracle (Classifier) Predict the best transition
  74. 74. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System
  75. 75. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Assume that we are given an oracle : - for any non-terminal configuration, it can predict the correct transition (for deterministic parsing) - That is, it takes two words & magically gives us the dependency relation b/w item if one exists
  76. 76. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move Economic from buffer B to stack S
  77. 77. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (news, Economic, amod) to arc set A Remove Economic from stack (since it now has head in A)
  78. 78. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move news from buffer B to stack S
  79. 79. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (had, news, nsubj) to A Remove news from stack (since it now has head in A)
  80. 80. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (ROOT, had, root) to A keep had in stack : because it can have other dependents on the right
  81. 81. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (effect, little, amod) to A Remove little from stack (since it now has head in A)
  82. 82. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, effect, dobj) to A Keep effect in stack : because it can have other dependents on right
  83. 83. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (effect, on, prep) to A Keep on in stack : because it can have other dependents on the right
  84. 84. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move financial from buffer B to stack S
  85. 85. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (market, financial, amod) to A Remove financial from stack (since it now has head in A)
  86. 86. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (on, markets, pmod) to A Keep markets in stack : because it can have other dependents on the right
  87. 87. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Reduce : Remove markets, on, effect from stack (since they already have head in A) ※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle
  88. 88. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, period, p) to A Keep period in stack Done !
  89. 89. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker
  90. 90. Language Analysis - Syntactic Analysis - Syntax Net We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized below for both the POS and the dependency parsing task) is used to extract sparse features, which are fed into the network in groups. We show only a small subset of the features to simplify the presentation in the schematic Google SyntaxNet with Deep Learning - Pos Tagging http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf
  91. 91. Language Analysis - Syntactic Analysis - Syntax Net Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks https://arxiv.org/pdf/1603.06042.pdf 1 2 3 1 I _ PRP PRP _ 2 nsubj _ _ 2 knew _ VBD VBD _ 0 ROOT _ _ 3 I _ PRP PRP _ 5 nsubj _ _ 4 could _ MD MD _ 5 aux _ _ 5 do _ VB VB _ 2 ccomp _ _ 6 it _ PRP PRP _ 5 dobj _ _ 7 properly _ RB RB _ 5 advmod _ _ 8 if _ IN IN _ 9 mark _ _ 9 given _ VBN VBN _ 5 advcl _ _ 10 the _ DT DT _ 12 det _ _ 11 right _ JJ JJ _ 12 amod _ _ 12 kind _ NN NN _ 9 dobj _ _ 13 of _ IN IN _ 12 prep _ _ 14 support _ NN NN _ 13 pobj _ _ 15 . _ . . _ 2 punct _ _ 18 units (1),(2),(3) 18 units (1),(2),(3) 12 units (2),(3) (1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6 (2) The first and second leftmost / rightmost children of the top two words on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8 (3) The leftmost of leftmost / rightmost of rightmost children of the top two words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4
  92. 92. Language Analysis - Syntactic Analysis - Syntax Net Google SyntaxNet with Deep Learning - Local Parser 1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to the stack. 2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc pointing to the left. Push the first word back on the stack. 3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc point to the right. Push the second word back on the stack.
  93. 93. Language Analysis - Syntactic Analysis - Syntax Net As we describe in the paper, there are several problems with the locally normalized models we just trained. The most important is the label-bias problem: the model doesn't learn what a good parse looks like, only what action to take given a history of gold decisions. This is because the scores are normalized locally using a softmax for each decision. Google SyntaxNet with Deep Learning - Global Training
  94. 94. Language Analysis - Syntactic Analysis - Syntax Net What’s Beam Search Algorithm on RNN ? https://www.youtube.com/watch?v=UXW6Cs82UKo Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum. But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every step and remove others (pruning). This is for find global maximum predict result .
  95. 95. Language Analysis - Syntactic Analysis - Syntax Net http://universaldependencies.org/ Google SyntaxNet do not support Korean as a default language. But as we can see bellow, we can train the model with Sejong corpus data. Though we have to covert the format for SyntaxNet to understand. Google SyntaxNet with Deep Learning - How about Korean
  96. 96. Language Analysis - Syntactic Analysis - Syntax Net Demo Site (we also use samples on this site) http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service) https://github.com/TensorMSA/tensormsa_syntax_docker Google SyntaxNet with Deep Learning - Test it by yourself
  97. 97. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis
  98. 98. Sentential semantics - Semantic role labeling (SRL) - Phrase similarity (=paraphrase) - Sentence Classification, Sentence Emotion Analysis and etc Language Analysis - Semantic Analysis What is Semantic in study of language Three perspectives on meaning - Lexical semantics : individual words - Sentential semantics : individual sentences - Discourse or Pragmatics : longer piece of text or conversation NLP Tasks for Semantics
  99. 99. Language Analysis - Semantic Analysis - SRL What is Semantic Role Labeling (SRL) SRL = Semantic roles express the abstract role that arguments of a predicate can take in the event. The police arrested the suspect in the park last night Agent predicate Theme Location Time Who did what to whom where when Can we figure out that these sentences have the same meaning? Can we figure out the bought, sold, purchase used on sentence with same meaning? XYZ corporation bought the stock. The sold the stock to XYZ corporation. The stock was bought by XYZ corporation. The purchase of the stock by XYZ corporation.
  100. 100. Language Analysis - Semantic Analysis - SRL Common Semantic Role Labeling Architecture http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf Syntatic Parse Argument Identification Argument Classification Structural Inference Prune Constituents Candidates Semantic roles Arguments Step-1 Candidate Selection - Parse the sentence - Prune/filter the parse tree (eliminate some tree constituents to speed up the execution) Step-2 Argument Identification - A binary classification of each node as Argument or NONE - Local scoring Step-3 Argument Classification - A multi class (one-of-N) classification of all the argument candidates - Global /joint scoring ML ML ML
  101. 101. Language Analysis - Semantic Analysis - SRL Exceptions to the Standard Architecture 1. Specialized parsing for SRL - Syntactic parser trained to predict argument candidates - Semantic parsing = parsing + SRL - SRL based on dependency parsing 2. Sequential labeling (instead of tree traversing) - Motivated by Lack of full parse trees
  102. 102. Language Analysis - Semantic Analysis - SRL Semantic Role Labeling Applications Information : Anna is friend of mine. http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb Name NameRelation session.run("MATCH (you:Person {name:'You'})" "FOREACH (name in ['Anna'] |" " CREATE (you)-[:FRIEND]->(:Person {name:name}))") result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)" "RETURN you, yourFriends") Neo4j Insert Query Neo4j Jupyter example & visualize
  103. 103. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis
  104. 104. Language Analysis - Semantic Analysis - Text Classification Can we figure out that these sentences are positive or negative? 돈이 아깝지 않다 (긍정) 다시는 오지 않을 거야 (부정) 음식이 정말 맛이 없다 (부정) 이 식당은 정말 맛있다 (긍정) Analysis negative and positive with dictionary word “않다” is usually negative but ? 돈이 아깝지 않다 => Positive 다시는 오지 않을 거야 => Negative
  105. 105. There are many ways of doing text classification.. Traditional Rule based Machine Learning - Logistic & SVM Deep Learning - CharCNN, RNN, Etc.. Language Analysis - Semantic Analysis - Text Classification
  106. 106. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Deep Learning Method CharCNN can be a solution for this kind of problem. 1 2 3
  107. 107. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Preparing Data for embedding is pretty similar to other neural networks 1. Word Embedding & OneHot didn’t show that much difference. 2. Personally, prefer to concat char onehot + word2vector오늘 메뉴 는 뭐 지? PAD PAD 1. Need to define sentence max length 2. Need padding like other nlp neural networks
  108. 108. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Using Multi Convolution Filter Size
  109. 109. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Other steps are same (fully connected > softmax > loss> optimizer)
  110. 110. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb You can see Char CNN can distinguish two sentences
  111. 111. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation 2-5-2.Memory Network for global context
  112. 112. Language Analysis - Dialogue Understand https://research.fb.com/publications Getting to a natural language dialogue state with a chatbot remains a challenge and will require a number of research breakthroughs. At FAIR we have chosen to tackle the problem from both ends: general AI and reasoning by machines through communication as well as conducting research grounded in current dialog systems, using lessons learned from exposing actual chatbots to people. The attempt to understand and interpret dialogue is not a new one. As far back as 20 years, there were several efforts to build a machine a person could talk to and teach how to have a conversation. These incorporated technology and engineering, but were single purposed with a very narrow focus, using pre-programmed scripted responses. Thanks to progress in machine learning, particularly in the last few years, having AI agents being able to converse with people in natural language has become a more realistic endeavor that is garnering attention from both the research community and industry. However, most of today’s dialogue systems continue to be scripted: their natural language understanding module may be based on machine learning, but what they execute or answer is in general dictated by if/then statements or rules engines. While they are improvement on what existed decades ago, it is in large part due to the large databases of content used to create and script their responses. Amazing free papers!! read it right now!
  113. 113. Discourse Analysis with RNN On conversation topic changes often so keep track the topic of conversation is important. 안녕 안녕 넌 뭐할줄 아니? 기능은 XX 가 있어요 사람 좀 찾아볼까해 누구를 찾아드려요? 포항 제강부 IT담당 홍길동 팀장의 그룹장을 좀 찾아줘 (지역:포항), 부서(제강부),업무 (IT), 이름 (홍길동), 직급(팀장), 상위자(그룹장) 을 검색합니다. 내일 점심 먹자고 문자 보내줘 “내일 점식 먹자고” 로 전송합니다. 아냐. 수고했어. 나가서 먹지 대화를 초기화 합니다. State : 초기 상태 State : 도움말 상태 State : 사람 찾기 상태 State : 조회한 사람에 문자 보내기 State : 초기 상태
  114. 114. Dialogue State Tracking Challenge and Accepted papers Discourse Analysis with RNN http://www.phontron.com/paper/yoshino16iwsds.pdfhttp://www.colips.org/workshop/dstc4/papers.html * Dialogue State Tracking using Long Short Term Memory Neural Networks Koichiro Yoshino, Takuya Hiraoka, Graham Neubig and Satoshi Nakamura
  115. 115. Let’s Predict intent of sentence on the conversation. Basic idea is keep the RNN state info and continue prediction from that point. Intent Intent Intent Dialogue state tracking with LSTM Doc2Vec Doc2Vec Doc2Vec T I M E L I N E
  116. 116. Key point of this code is using RNN State Vector as memory Discourse Analysis with RNN http://localhost:8888/tree/chap05_nlp/state_tracking
  117. 117. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation 2-5-2.Memory Network for global context
  118. 118. Goal of Dialogue understand and Memory network.. Memory Network for Dialogue understand https://arxiv.org/pdf/1503.08895v4.pdf https://arxiv.org/pdf/1503.08895v4.pdf
  119. 119. Here is the network architecture of end2end memory network Memory Network for Dialogue understand https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/ https://www.slideshare.net/mobile/carpedm20/ss-63116251
  120. 120. (1) Feed data (“Sentences”, “Question”, “Target”) Memory Network for Dialogue understand 1 2 3
  121. 121. Convert word index to embedding vector (Training target vector A,B,C) Memory Network for Dialogue understand 1 3 Vocab Size 2 Dim Size vocab size Mem Size
  122. 122. Embedding A from given context sentences multiply Input Question Embedding (using embedding B which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer Memory Network for Dialogue understand 1 2 1 2 multiply
  123. 123. Set embedding C(on the code it’s B) this is also the target variable for train Memory Network for Dialogue understand
  124. 124. Embedding C(one the code it’s B) Multiply softmax result Memory Network for Dialogue understand
  125. 125. For the last multiply question and output of memory network again Memory Network for Dialogue understand
  126. 126. stack more memory layers Memory Network for Dialogue understand
  127. 127. Memory Network for Dialogue understand Set fully connected layer and calculate error with softmax cross entropy
  128. 128. Memory Network for Dialogue understand On the given code I removed 90% of data set because we are using CPU for education.. So result may can be poor…..
  129. 129. Memory Network for Dialogue understand bAbi Test Results .. (comparing DMN & MemNN ) https://research.fb.com/downloads/babi/
  130. 130. https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/ https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano Dynamic Memory Networks Episodic Memory Memory Network for Dialogue understand Other types of memory networks ..
  131. 131. 1.NLP & Deep Learning 2.Language Analysis Process 3.Language Generation 3-1.Basic Seq2Seq 3-2.Other types of Seq2Seq (Attention, Pointer)
  132. 132. Response Generator - Seq2Seq Model Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인 다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다. - Input : 딥 러닝 재미 즐거운 일 - Output : 딥 러닝은 재미있고 즐거운 일이다 https://arxiv.org/pdf/1406.1078.pdf https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
  133. 133. Attention Mechanism Pointer Network https://medium.com/@devnag/pointer-networks-in-tensorflow- with-sample-code-14645063f264 Seq2Seq 의 변형된 형태들… Response Generator - Seq2Seq Model ※ 다음 강의에서 자세히 진행할 예정인 내용으로 상세 내용 생략 http://localhost:8888/tree/chap05_nlp/attention_seq2seq
  134. 134. 결국 Natural Language Process 는 "기존 자연어 처리 알고리즘", "Deep Learning" Algorithm” 그리고 각종 “Software Architecture” 의 거대한 Combination Conclusion 기존 자연어 처리 이론 Deep Learning Theory Software Architecture
  135. 135. Conclusion 지금까지 이야기한 내용들을 연결하여 하나의 예를 만들어 보자 Web Document Web Crawler Lexical (어휘) Analysis Syntactic (구문) Analysis Semantic (의미) Analysis Ontology Man Filtering information Dialogue (구문) Analysis information Lexical (어휘) Analysis Syntactic (구문) Analysis Semantic (의미) Analysis Dialogue (구문) Analysis Web Server Response Generation IN OUT
  136. 136. 4.Tips 4-1.Hyper Parameter Random Search 4-2.Genetic Algorithm 4-3.Using multiple GPU Server
  137. 137. Hyper Parameter Optimization Set of graph flow Set of graph flow Set of graph flow Hyper Parm Range ~ Hyper Parameter Random Search Genetic Algorithm Approximation Hyper Parameter 서치를 위한 Genetic Algorithm 에 대한 설명 1 2 3
  138. 138. Hyper Parameter Optimization Hyper Parameter Random Search 에 대한 설명 http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf In this more challenging optimization problem random search is still effective, but not 300 RANDOM SEARCH FOR HYPER-PARAMETER OPTIMIZATION superior as it was as in the case of neural network optimization. Comparing to the 3-layer DBN results in Larochelle et al. (2007), random search found a better model than the manual search in one data set (convex), an equally good model in four (mnist basic, mnist rotated, rectangles, and rectangles images), and an inferior model in three (mnist background images, mnist background random, mnist rotated background images).
  139. 139. Hyper Parameter Optimization [1Layer] - Grid vs Random [3Layer] - Grid+Manual vs Random
  140. 140. Hyper Parameter Optimization Genetic Algorithm on Hyper parameter optimization (Approximation) https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164 Let’s say it takes five minutes to train and evaluate a network on your dataset. And let’s say we have four parameters with five possible settings each. To try them all would take (5**4) * 5 minutes, or 3,125 minutes, or about 52 hours. Now let’s say we use a genetic algorithm to evolve 10 generations with a population of 20 (more on what this means below), with a plan to keep the top 25% plus a few more, so ~8 per generation. This means that in our first generation we score 20 networks (20 * 5 = 100 minutes). Every generation after that only requires around 12 runs, since we don’t have the score the ones we keep. That’s 100 + (9 generations * 5 minutes * 12 networks) = 640 minutes, or 11 hours. https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html use multi gpu cluster servers hyper parameter random search
  141. 141. Hyper Parameter Optimization Let’s see how hyperparameter optimization with genetic algorithm works .. . .. http://localhost:8888/tree/chap05_nlp/automl
  142. 142. 다음 강의 목표 NLP 관점에서 Deep Learning 을 적용하기 위한 데이터와 모델에 대한 이해를 돕기위한 강의를 진행하였습니다. 다음 시간에는 이러한 재료들을 모아서 아키택쳐 관점에서 응용하고 활용하기 위한 방법들에 대해서 강의하고자 합니다. 감사합니다.

×