Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

自然語言處理概覽

463 views

Published on

AI Tech 2019 2nd meetup

Published in: Technology
  • Did u try to use external powers for studying? Like ⇒ www.HelpWriting.net ⇐ ? They helped me a lot once.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

自然語言處理概覽

  1. 1. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 1/89 ⾃然語⾔處理概覽⾃然語⾔處理概覽 2019.2.17 杜岳華2019.2.17 杜岳華
  2. 2. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 2/89 經驗主義 理性主義 經驗主義:基於統計的 理性主義:基於規則的
  3. 3. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 3/89 OutlineOutline ⾺可夫的研究 ⾃動機- 理性主義的發展 電腦與程式語⾔ 基於機率的⾃然語⾔處理 ⾃然語⾔處理的問題 近代⾃然語⾔技術的進展 Issues Neural Language Model Distributed Representations Neural Network in NLP Unsupervised Learning Deep Generative Models Other networks
  4. 4. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 4/89 ⾺可夫的研究⾺可夫的研究 數學的⾓度來分析⽂學作品數學的⾓度來分析⽂學作品 俄國詩⼈普希⾦(Aleksander Sergeyevich Pushkin, 1799~1837)的詩《尤⾦》(Eugene Onegin) ⺟⾳ ⼦⾳ 0.128 0.872 0.337 0.663
  5. 5. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 5/89 ⾺可夫的研究⾺可夫的研究 P (⺟⾳ → ⺟⾳), P (⺟⾳ → ⼦⾳), P (⼦⾳ → ⺟⾳), P (⼦⾳ → ⼦⾳) ⺟⾳        ⼦⾳ M = [ ] 0.128 0.872 0.337 0.663 Ref. , ⾺可夫⽣平簡介(1)(A Brief Introduction of Markov’s Life: Part 1)(http://highscope.ch.ntu.edu.tw/wordpress/?p=51032) ⾺可夫⽣平簡介(2)(A Brief Introduction of Markov’s Life: Part 2)(http://highscope.ch.ntu.edu.tw/wordpress/?p=51034)
  6. 6. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 6/89 ⾃動機- 理性主義的發展⾃動機- 理性主義的發展
  7. 7. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 7/89 有限狀態機(Finite state machine)有限狀態機(Finite state machine) S₁ 1 0 S₂ 1 0 字⺟集合Alphabet: Σ = {a, b, c, d, e} 狀態空間States: S = { , , . . . , }s0 s1 sn 初始狀態Initial state:  ∈ Ss0 狀態轉移函數State-transition function: δ : S × Σ → S 最終狀態Set of final states: F ⊆ S
  8. 8. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 8/89 有限狀態機(Finite state machine)有限狀態機(Finite state machine) Locked Un- locked Coin Coin Push Push 動作: 字⺟集合Alphabet:Σ = {Coin, P ush} 狀態空間States:S = {Locked, U nlocked} P P CP Ref. Finite-state machine (https://en.wikipedia.org/wiki/Finite- state_machine)
  9. 9. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 9/89 有限狀態機可接受的動作可以形成字串有限狀態機可接受的動作可以形成字串
  10. 10. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 10/89 形式語⾔(Formal language)形式語⾔(Formal language) 字⺟集合Alphabet字⺟集合Alphabet Σ = {a, b, c, d, e} 句⼦Sentence句⼦Sentence abc, abb, a, bcdaeadea 語⾔Language語⾔Language is nite. L = {abc, abb, a, bcdaeadea, daea} abck ∉ L
  11. 11. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 11/89 字串操作字串操作 w = ⋯ ,a1 a2 an v = ⋯b1 b2 bn ConcatenationConcatenation wv = ⋯ ⋯a1 a2 an b1 b2 bn ReverseReverse = ⋯w R an a2 a1 LengthLength |w| = n Empty stringEmpty string λ
  12. 12. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 12/89 字串操作字串操作 RepeatRepeat = ww ⋯ w, = λw n w 0 * operator* operator Σ = {a, b} = {λ, a, b, aa, ab, ba, bb, ⋯}Σ ∗ + operator+ operator Σ = {a, b} = − λ = {a, b, aa, ab, ba, bb, ⋯}Σ + Σ ∗
  13. 13. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 13/89 形式語⾔(Formal language)形式語⾔(Formal language) ⽂法grammar⽂法grammar 產⽣語⾔的規則 句⼦= 名詞+ 動詞句⼦= 名詞+ 動詞 Car run. Mary cry. ... ⽂法⽂法 S(句⼦) → A(名詞)B(動詞) A → car | Mary B → run | cry
  14. 14. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 14/89 形式⽂法(Formal grammar)形式⽂法(Formal grammar) ⽂法⽂法 變量Finite set of variables: V = {S, A, B} 終結字符Finite set of terminals: T = {car, M ary, run, cry} 起始變量Start variables: S 產⽣式規則Finite set of production rules  例⼦例⼦ S → aSb S → λ L = {λ, ab, aabb, aaabbb, . . . }
  15. 15. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 15/89 正規⽂法(Regular grammar)正規⽂法(Regular grammar) S → aS|bA A → cA|λ ab abccc aaabcc cb bcccc aaabbcc
  16. 16. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 16/89 電腦與程式語⾔電腦與程式語⾔
  17. 17. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 17/89 正規⽂法(Regular grammar)產⽣正規語⾔(Regular正規⽂法(Regular grammar)產⽣正規語⾔(Regular language)language) S → a|aB|λ ↓ L 正規表⽰法(Regular expression)正規表⽰法(Regular expression) [A-Z]d{9} "A123456789" 09d{8} "0912345678" d{4}-d{2}-d{2} "1996-08-06" .*@gmail.com "test@gmail.com"
  18. 18. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 18/89 正規語⾔可以被確定有限狀態⾃動機(Deterministic nite正規語⾔可以被確定有限狀態⾃動機(Deterministic nite automaton)接受automaton)接受 L ↓ S₁ 1 0 S₂ 1 0
  19. 19. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 19/89 為什麼電腦讀的懂程式語⾔?為什麼電腦讀的懂程式語⾔? ⽂法x 語⾔x ⾃動機⽂法x 語⾔x ⾃動機 Grammar ⇔ Language ⇔ Automata 形式⽂法 形式語⾔ ⾃動機 程式語⾔ 電腦 編譯器
  20. 20. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 20/89 Chomsky hierarchyChomsky hierarchy Regular Grammar  ⇔  Regular language  ⇔  DFA Context-free Grammar  ⇔  Context-free language  ⇔  Pushdown automaton Context-sensitive Grammar  ⇔  Context-sensitive language  ⇔  Linear bounded automaton Unrestricted Grammar  ⇔  Recursively enumerable language  ⇔  Turing machine
  21. 21. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 21/89 Chomsky hierarchyChomsky hierarchy Ref. Arti cial grammar learning meets formal language theory: an overview. (https://openi.nlm.nih.gov/detailedresult.php? img=PMC3367694_rstb20120103-g2&req=4)
  22. 22. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 22/89 Noam ChomskyNoam Chomsky Ref. , Noam Chomsky──稱霸語⾔與電腦科學的當代學術⼤師 (https://hk.thenewslens.com/article/72714) Wikipedia - Noam Chomsky (https://en.wikipedia.org/wiki/Noam_Chomsky)
  23. 23. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 23/89
  24. 24. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 24/89 基於機率的⾃然語⾔處理基於機率的⾃然語⾔處理 1950s~1960s,經驗主義崛起1950s~1960s,經驗主義崛起 Naive Bayes - 1951 Brown Corpus - 1961, Brown University (世界上第⼀個語料庫) Maximum Entropy - 1963 Hidden Markov Model (HMM) - 1966 Viterbi algorithm - 1967 1990s,經驗主義的繁榮- 機率與資料成為標準⽅法1990s,經驗主義的繁榮- 機率與資料成為標準⽅法 N-gram model - 1992 Probabilistic Latent Semantic Analysis (PLSA) - 1999 Conditional Random Fields (CRF) - 2001
  25. 25. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 25/89 ⾃然語⾔處理的問題⾃然語⾔處理的問題
  26. 26. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 26/89 WordWord Word segmentationWord segmentation Text: "The cat sat on the mat." ↓. Tokens: "The", "cat", "sat", "on", "the", "mat", "."
  27. 27. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 27/89 WordWord 詞幹提取Stemming & 詞形還原Lemmatization詞幹提取Stemming & 詞形還原Lemmatization "fishing", "fished", "fish", "fisher"  → "fish"
  28. 28. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 28/89 WordWord Terminology extractionTerminology extraction Ref. ve lters (https:// ve lters.org/term-extraction/)
  29. 29. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 29/89
  30. 30. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 30/89 SyntaxSyntax Part-of-speech tagging (POS tagging)Part-of-speech tagging (POS tagging)
  31. 31. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 31/89 Ref. Parts of speech and functions: “Bob made a book collector happy the other day” (https://english.stackexchange.com/questions/218058/parts-of- speech-and-functions-bob-made-a-book-collector-happy-the- other-day)
  32. 32. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 32/89 SyntaxSyntax ParsingParsing Dependency Parsing the relationships between words in a sentence (marking things like Primary Objects and predicates) Constituency Parsing building out the Parse Tree using a Probabilistic Context-Free Grammar (PCFG)
  33. 33. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 33/89 Ref. Parsing (https://www.cs.bgu.ac.il/~elhadad/nlp11/nlp03.html)
  34. 34. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 34/89 SemanticsSemantics Named entity recognition (NER)Named entity recognition (NER)
  35. 35. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 35/89 Ref. Named Entity Recognition: Milestone Models, Papers and Technologies (https://blog.paralleldots.com/data-science/named- entity-recognition-milestone-models-papers-and-technologies/)
  36. 36. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 36/89 SemanticsSemantics Textual entailment ⽂字蘊涵Textual entailment ⽂字蘊涵 今天太陽很⼤ 今天沒有下⾬ 今天很熱 今天下⾬ 明天沒有下⾬ positive TE 正向蘊含 negative TE ⽭盾蘊含 non TE 獨⽴蘊含
  37. 37. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 37/89 SemanticsSemantics Relationship extractionRelationship extraction
  38. 38. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 38/89 Ref. Natural language question answering over RDF - A graph data driven approach (https://www.researchgate.net/ gure/Relationship-Extraction- DEFINITION-5-Let-us-consider-a-dependency-tree-Y-of-a- natural_ g2_266656635)
  39. 39. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 39/89 SemanticsSemantics Sentiment analysis 情感分析Sentiment analysis 情感分析
  40. 40. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 40/89 Ref. Social media sentiment analysis: 4 ways models can improve marketing (http://www.simafore.com/blog/bid/113465/Social- media-sentiment-analysis-4-ways-models-can-improve- marketing)
  41. 41. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 41/89 Natural language understandingNatural language understanding NLP NLU Named entity recognition (NER) Part-of-speech tagging (POS tagging) Reading comprehension Topic modeling Machine translation Dialogue system QA system Summarization Textual entailment Word segmentation Parsing Sentiment analysis Relationship extraction Topic segmentation
  42. 42. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 42/89 NLP & NLUNLP & NLU Natural Language Processing (NLP) 詞(word) 語法(syntax, grammar) 語義(semantics, meaning) Natural Language Understanding (NLU) "我的輪胎爆胎了" 字⾯意義:字⾯上直接傳達的意義 我的輪胎損毀 意向意義:字⾯背後的意義 我的輪胎沒辦法使⽤,導致汽⾞無法使⽤ 意圖,使役意義Intent:想讓代理⼈幫忙做什麼 希望更換輪胎 情境或脈絡Context Ref. ⾃然语⾔处理(NLP)vs ⾃然语⾔理解(NLU)(https://blog.csdn.net/ZLJ925/artic based-vs-nlu- %E8%81%8A%E5%A4%A9%E6%A9%9F%E5%99%A8%E4%BA%BA%E5%A6%82%E4%B 17065de49a)
  43. 43. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 43/89 Natural language generationNatural language generation Ref. The Ultimate Guide to Natural Language Generation (https://medium.com/@AutomatedInsights/the-ultimate-guide- to-natural-language-generation-bdcb457423d6)
  44. 44. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 44/89 ApplicationApplication Machine translationMachine translation
  45. 45. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 45/89 ApplicationApplication SummarizationSummarization Ref. Text Summarization based on Semantic Graph (http://www.nlp.cs.ucf.edu/research/)
  46. 46. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 46/89
  47. 47. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 47/89 ApplicationApplication Question answeringQuestion answering Ref. Automatic Question Answering (https://towardsdatascience.com/automatic-question-answering- ac7593432842)
  48. 48. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 48/89 ApplicationApplication Dialogue systemDialogue system Ref. 【專家剖析】Chatbot三⼤技術關鍵與最新研究⽅向 (https://www.ithome.com.tw/news/113445)
  49. 49. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 49/89
  50. 50. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 50/89 語⾔學相關領域語⾔學相關領域 語法學- 研究語法結構規律的科學 語義學- 找出語義表達的規律性、內在解釋、不同語⾔在語義表達⽅⾯的個性以 及共性 語⽤學- 研究語境對語⾔含義產⽣的影響與做出的貢獻 ⼼理語⾔學— 探究語⾔在思想中的表述與運作 神經語⾔學— 探究語⾔在⼤腦中的表述 計算語⾔學- 試圖找出⾃然語⾔的規律,建⽴運算模型,最終讓電腦能夠像⼈類 般分析,理解和處理⾃然語⾔ Wikipedia - 語⾔學 (https://zh.wikipedia.org/wiki/%E8%AF%AD%E8%A8%80%E5%AD%A6)
  51. 51. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 51/89 近代⾃然語⾔技術的進展近代⾃然語⾔技術的進展
  52. 52. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 52/89 IssuesIssues 中⽂⾃動分詞(Chinese word segmentation) 詞性標註(Part-of-speech tagging) 句法分析(Parsing) ⾃然語⾔⽣成(Natural language generation) ⽂字分類(Text categorization) 資訊檢索(Information retrieval) 資訊抽取(Information extraction) ⽂字校對(Text-proo ng) 問答系統(Question answering) 機器翻譯(Machine translation) ⾃動摘要(Automatic summarization) 閱讀理解(reading comprehension)
  53. 53. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 53/89 Yoshua BengioYoshua Bengio Full Professor, Department of Computer Science and Operations Research Université de Montréal Canada Research Chair in Statistical Learning Algorithms
  54. 54. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 54/89 Neural Language Model - 2001Neural Language Model - 2001 softmax tanh . . . . . .. . . . . . . . . . . . . . . across words most computation here index for index for index for shared parameters Matrix in look−up Table . . . C C wt−1wt−2 C(wt−2) C(wt−1)C(wt−n+1) wt−n+1 i-th output = P(wt = i |context) Bengio et al., A Neural Probabilistic Language Model, NIPS Proceedings, 2001; Journal of Machine Learning Research (JMLR), 2003
  55. 55. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 55/89 Distributed RepresentationsDistributed Representations To embed syntax or semantic information in distributed representation (in aTo embed syntax or semantic information in distributed representation (in a vector)vector) Why and where can we nd such information?Why and where can we nd such information? What de ne the meaning of a word?What de ne the meaning of a word? Context!Context! approachapproach Word Embeddings Character Embeddings Contextualized Word Embeddings
  56. 56. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 56/89 Word2vec - 2013Word2vec - 2013 Ref. Vector Representations of Words (https://www.tensor ow.org/tutorials/representation/word2vec)
  57. 57. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 57/89 Word2vecWord2vec Compositionality: queen = king − man + woman Ref. Vector Representations of Words (https://www.tensor ow.org/tutorials/representation/word2vec)
  58. 58. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 58/89 Word2vecWord2vec
  59. 59. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 59/89 Ref. , Distributed Representations of Words and Phrases and their Compositionality (https://dl.acm.org/citation.cfm?id=2999959) Ef cient Estimation of Word Representations in Vector Space (https://arxiv.org/abs/1301.3781)
  60. 60. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 60/89 Character EmbeddingsCharacter Embeddings Capture the intra-word morphological and shape information can be useful parts-of-speech (POS) tagging named-entity recognition (NER) Santos and Guimaraes [31] applied character-level representations, along with word embeddings for NER, achieving state-of-the-art results in Portuguese and Spanish corpora. AdvantageAdvantage out-of-vocabulary (OOV) words
  61. 61. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 61/89 Contextualized Word EmbeddingsContextualized Word Embeddings Disadvantage of global word embedding: Word2Vec and GloveDisadvantage of global word embedding: Word2Vec and Glove ⼀詞多義 ⼀詞多義⼀詞多義 1. “The bank will not be accepting cash on Saturdays” 2. “The river over owed the bank.” Deep contextual word embeddingsDeep contextual word embeddings Embedding from Language Model (ELMo): extracts the intermediate layer representations from the biLM Pre-trained language modelPre-trained language model Embedding from Language Model (ELMo) Generative pre-training (GPT)
  62. 62. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 62/89 Neural Network in NLPNeural Network in NLP convolutionalneural networks recurrent neural networks recursive neural networks
  63. 63. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 63/89 Convolutional Neural Network (CNN) in NLPConvolutional Neural Network (CNN) in NLP PurposePurpose Extracts higher-level features from constituting words or n-grams TasksTasks sentiment analysis summarization machine translation question answering (QA)
  64. 64. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 64/89 CNN in NLPCNN in NLP Able to extract salient n-gram feature to create an informative latent semantic representation. wo wN−1 Input Sentence Lookuptable Feature1 Featurek Convolution layer Max-pool overtime FullyConnectedLayer SoftmaxClassification w1
  65. 65. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 65/89 Basic CNNBasic CNN
  66. 66. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 66/89 ApplicationsApplications Convolutional kernelConvolutional kernel speci c n-gram feature extractorsspeci c n-gram feature extractors→→ TasksTasks sentence classi cation sentiment classi cation subjectivity classi cation question type classi cation Time-delay neural network (TDNN)Time-delay neural network (TDNN) Convolutions are performed across all windows throughout the sentence at the same time Dynamic multi-pooling CNN (DMCNN)Dynamic multi-pooling CNN (DMCNN) dynamic k-max pooling
  67. 67. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 67/89 Recurrent Neural Network (RNN) in NLPRecurrent Neural Network (RNN) in NLP Idea and purposeIdea and purpose Processing sequential information Encode a range of sequential information into a x-sized vector Output is depends on previous reuslts and current input ApplicationsApplications Language model Machine translation Speech recognition Image caption
  68. 68. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 68/89 RNNRNN = tanh(U , W )ht xt ht−1 Ref. Everything you need to know about Recurrent Neural Networks (https://medium.com/ai-journal/lstm-gru-recurrent- neural-networks-81fe2bcdf1f9)
  69. 69. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 69/89 RNNRNN AdvantageAdvantage Able to capture the inherent sequential nature in language Able to model variable length of data Perform time distributed joint processing DisadvantageDisadvantage vanishing gradient problem
  70. 70. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 70/89 Long Short-Term MemoryLong Short-Term Memory overcome the vanishing and exploding gradient problem
  71. 71. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 71/89 Ref. Simple RNN vs GRU vs LSTM :- Difference lies in More Flexible control (https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs- lstm-difference-lies-in-more- exible-control-5f33e07b1e57)
  72. 72. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 72/89 Gated Recurrent UnitGated Recurrent Unit less complexity, ef cient version of RNN than LSTM
  73. 73. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 73/89 Ref. Simple RNN vs GRU vs LSTM :- Difference lies in More Flexible control (https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs- lstm-difference-lies-in-more- exible-control-5f33e07b1e57)
  74. 74. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 74/89 RNN for word-level classi cationRNN for word-level classi cation Bidirectional LSTM for NER [hb1 ;hf 1] This is a book [hb2 ;hf 2] [hb3 ;hf 3] [hb 4 ;hf 4] RNN for sentence-level classi cationRNN for sentence-level classi cation LSTM for sentiment classi cation
  75. 75. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 75/89 RNN for generating languageRNN for generating language CNN LSTM LSTM LSTM Image wN−1 p1 pN−1 Output TrueImageDescription p2 w2w1
  76. 76. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 76/89 Types of RNNTypes of RNN Ref. Recurrent Neural Networks in DL4J (https://deeplearning4j.org/docs/latest/deeplearning4j-nn- recurrent)
  77. 77. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 77/89 Sequence-to-sequence model - 2014Sequence-to-sequence model - 2014 Ref. Sequence to sequence model: Introduction and concepts (https://towardsdatascience.com/sequence-to-sequence-model- introduction-and-concepts-44d9b41cd42d)
  78. 78. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 78/89 Attention Mechanism - 2015Attention Mechanism - 2015 Ref. Attention? Attention! (https://lilianweng.github.io/lil- log/2018/06/24/attention-attention.html)
  79. 79. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 79/89 Attention MechanismAttention Mechanism Ref. Attention? Attention! (https://lilianweng.github.io/lil- log/2018/06/24/attention-attention.html)
  80. 80. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 80/89 Attention MechanismAttention Mechanism
  81. 81. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 81/89 TransformerTransformer IssueIssue the sequential processing at the encoding step more parallelizable and required lesser time
  82. 82. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 82/89 TransformerTransformer
  83. 83. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 83/89 WaveNet - GoogleWaveNet - Google Ref. WaveNet: A Generative Model for Raw Audio (https://deepmind.com/blog/wavenet-generative-model-raw- audio/)
  84. 84. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 84/89 Recursive Neural Network in NLPRecursive Neural Network in NLP IdeaIdea a natural way to model sequences language exhibits a natural recursive structure a compositional function on the representations of phrases or words to compute the representation of a higher-level phrase
  85. 85. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 85/89 Unsupervised sentence representation learningUnsupervised sentence representation learning sentence encoders seq2seq model the encoder could be seen as a generic feature extractor Ref. Sequence to sequence model: Introduction and concepts (https://towardsdatascience.com/sequence-to-sequence-model- introduction-and-concepts-44d9b41cd42d)
  86. 86. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 86/89 Deep Generative ModelsDeep Generative Models PurposePurpose To discover rich structure in natural language while generating realistic sentences from a latent code space. variational autoencoders (VAEs) generative adversarial networks (GANs)
  87. 87. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 87/89 Deep Generative ModelsDeep Generative Models LSTM LSTM linear linear LSTM μ σz LSTM LSTM LSTM x1 x2 x3 <EOS> <EOS>y1 y2 y1 y2 Decoder Encoder
  88. 88. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 88/89 Other networksOther networks Memory-Augmented NetworkMemory-Augmented Network Neural Turing Machines Memory Network Reinforcement LearningReinforcement Learning
  89. 89. 2019/2/17 intro-to-nlp slides http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 89/89 Thank you for attentionThank you for attention Q & AQ & A ReferenceReference Recent Trends in Deep Learning Based Natural Language Processing (https://arxiv.o Deep Learning for NLP: An Overview of Recent Trends (https://medium.com/dair-ai/ overview-of-recent-trends-d0d8f40a776d) 15年来,⾃然语⾔处理发展史上的8⼤⾥程碑(https://zhuanlan.zhihu.com/p/47239 獨家| ⼀⽂讀懂⾃然語⾔處理NLP(附學習資料) (https://tw.saowen.com/a/0c1d7d1765b999218654702c1e4d2d0e71c5e138141e

×