2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 1/89
⾃然語⾔處理概覽⾃然語⾔處理概覽
2019.2.17 杜岳華2019.2.17 杜岳華
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 2/89
經驗主義 理性主義
經驗主義:基於統計的
理性主義:基於規則的
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 3/89
OutlineOutline
⾺可夫的研究
⾃動機- 理性主義的發展
電腦與程式語⾔
基於機率的⾃然語⾔處理
⾃然語⾔處理的問題
近代⾃然語⾔技術的進展
Issues
Neural Language Model
Distributed Representations
Neural Network in NLP
Unsupervised Learning
Deep Generative Models
Other networks
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 4/89
⾺可夫的研究⾺可夫的研究
數學的⾓度來分析⽂學作品數學的⾓度來分析⽂學作品
俄國詩⼈普希⾦(Aleksander Sergeyevich Pushkin, 1799~1837)的詩《尤⾦》(Eugene
Onegin)
⺟⾳ ⼦⾳
0.128
0.872
0.337
0.663
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 5/89
⾺可夫的研究⾺可夫的研究
P (⺟⾳ → ⺟⾳), P (⺟⾳ → ⼦⾳), P (⼦⾳ → ⺟⾳), P (⼦⾳ → ⼦⾳)
⺟⾳        ⼦⾳
M = [ ]
0.128
0.872
0.337
0.663
Ref.
,
⾺可夫⽣平簡介(1)(A Brief Introduction of Markov’s Life:
Part 1)(http://highscope.ch.ntu.edu.tw/wordpress/?p=51032)
⾺可夫⽣平簡介(2)(A Brief Introduction of Markov’s Life: Part
2)(http://highscope.ch.ntu.edu.tw/wordpress/?p=51034)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 6/89
⾃動機- 理性主義的發展⾃動機- 理性主義的發展
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 7/89
有限狀態機(Finite state machine)有限狀態機(Finite state machine)
S₁
1
0
S₂ 1
0
字⺟集合Alphabet: Σ = {a, b, c, d, e}
狀態空間States: S = { , , . . . , }s0 s1 sn
初始狀態Initial state:  ∈ Ss0
狀態轉移函數State-transition function: δ : S × Σ → S
最終狀態Set of final states: F ⊆ S
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 8/89
有限狀態機(Finite state machine)有限狀態機(Finite state machine)
Locked
Un-
locked
Coin
Coin
Push
Push
動作:
字⺟集合Alphabet:Σ = {Coin, P ush}
狀態空間States:S = {Locked, U nlocked}
P P CP
Ref. Finite-state machine (https://en.wikipedia.org/wiki/Finite-
state_machine)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 9/89
有限狀態機可接受的動作可以形成字串有限狀態機可接受的動作可以形成字串
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 10/89
形式語⾔(Formal language)形式語⾔(Formal language)
字⺟集合Alphabet字⺟集合Alphabet
Σ = {a, b, c, d, e}
句⼦Sentence句⼦Sentence
abc, abb, a, bcdaeadea
語⾔Language語⾔Language
is nite.
L = {abc, abb, a, bcdaeadea, daea}
abck ∉ L
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 11/89
字串操作字串操作
w = ⋯ ,a1 a2 an
v = ⋯b1 b2 bn
ConcatenationConcatenation
wv = ⋯ ⋯a1 a2 an b1 b2 bn
ReverseReverse
= ⋯w
R
an a2 a1
LengthLength
|w| = n
Empty stringEmpty string
λ
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 12/89
字串操作字串操作
RepeatRepeat
= ww ⋯ w, = λw
n
w
0
* operator* operator
Σ = {a, b}
= {λ, a, b, aa, ab, ba, bb, ⋯}Σ
∗
+ operator+ operator
Σ = {a, b}
= − λ = {a, b, aa, ab, ba, bb, ⋯}Σ
+
Σ
∗
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 13/89
形式語⾔(Formal language)形式語⾔(Formal language)
⽂法grammar⽂法grammar
產⽣語⾔的規則
句⼦= 名詞+ 動詞句⼦= 名詞+ 動詞
Car run.
Mary cry.
...
⽂法⽂法
S(句⼦) → A(名詞)B(動詞)
A → car | Mary
B → run | cry
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 14/89
形式⽂法(Formal grammar)形式⽂法(Formal grammar)
⽂法⽂法
變量Finite set of variables: V = {S, A, B}
終結字符Finite set of terminals: T = {car, M ary, run, cry}
起始變量Start variables: S
產⽣式規則Finite set of production rules 
例⼦例⼦
S → aSb
S → λ
L = {λ, ab, aabb, aaabbb, . . . }
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 15/89
正規⽂法(Regular grammar)正規⽂法(Regular grammar)
S → aS|bA
A → cA|λ
ab
abccc
aaabcc
cb
bcccc
aaabbcc
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 16/89
電腦與程式語⾔電腦與程式語⾔
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 17/89
正規⽂法(Regular grammar)產⽣正規語⾔(Regular正規⽂法(Regular grammar)產⽣正規語⾔(Regular
language)language)
S → a|aB|λ
↓
L
正規表⽰法(Regular expression)正規表⽰法(Regular expression)
[A-Z]d{9}
"A123456789"
09d{8}
"0912345678"
d{4}-d{2}-d{2}
"1996-08-06"
.*@gmail.com
"test@gmail.com"
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 18/89
正規語⾔可以被確定有限狀態⾃動機(Deterministic nite正規語⾔可以被確定有限狀態⾃動機(Deterministic nite
automaton)接受automaton)接受
L
↓
S₁
1
0
S₂ 1
0
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 19/89
為什麼電腦讀的懂程式語⾔?為什麼電腦讀的懂程式語⾔?
⽂法x 語⾔x ⾃動機⽂法x 語⾔x ⾃動機
Grammar ⇔ Language ⇔ Automata
形式⽂法
形式語⾔ ⾃動機
程式語⾔ 電腦
編譯器
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 20/89
Chomsky hierarchyChomsky hierarchy
Regular Grammar  ⇔  Regular language  ⇔  DFA
Context-free Grammar  ⇔  Context-free language 
⇔  Pushdown automaton
Context-sensitive Grammar  ⇔  Context-sensitive language 
⇔  Linear bounded automaton
Unrestricted Grammar  ⇔  Recursively enumerable language 
⇔  Turing machine
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 21/89
Chomsky hierarchyChomsky hierarchy
Ref. Arti cial grammar learning meets formal language theory: an
overview. (https://openi.nlm.nih.gov/detailedresult.php?
img=PMC3367694_rstb20120103-g2&req=4)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 22/89
Noam ChomskyNoam Chomsky
Ref.
,
Noam Chomsky──稱霸語⾔與電腦科學的當代學術⼤師
(https://hk.thenewslens.com/article/72714) Wikipedia - Noam
Chomsky (https://en.wikipedia.org/wiki/Noam_Chomsky)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 23/89
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 24/89
基於機率的⾃然語⾔處理基於機率的⾃然語⾔處理
1950s~1960s,經驗主義崛起1950s~1960s,經驗主義崛起
Naive Bayes - 1951
Brown Corpus - 1961, Brown University (世界上第⼀個語料庫)
Maximum Entropy - 1963
Hidden Markov Model (HMM) - 1966
Viterbi algorithm - 1967
1990s,經驗主義的繁榮- 機率與資料成為標準⽅法1990s,經驗主義的繁榮- 機率與資料成為標準⽅法
N-gram model - 1992
Probabilistic Latent Semantic Analysis (PLSA) - 1999
Conditional Random Fields (CRF) - 2001
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 25/89
⾃然語⾔處理的問題⾃然語⾔處理的問題
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 26/89
WordWord
Word segmentationWord segmentation
Text: "The cat sat on the mat."
↓.
Tokens: "The", "cat", "sat", "on", "the", "mat", "."
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 27/89
WordWord
詞幹提取Stemming & 詞形還原Lemmatization詞幹提取Stemming & 詞形還原Lemmatization
"fishing", "fished", "fish", "fisher"  → "fish"
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 28/89
WordWord
Terminology extractionTerminology extraction
Ref. ve lters (https:// ve lters.org/term-extraction/)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 29/89
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 30/89
SyntaxSyntax
Part-of-speech tagging (POS tagging)Part-of-speech tagging (POS tagging)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 31/89
Ref. Parts of speech and functions: “Bob made a book collector
happy the other day”
(https://english.stackexchange.com/questions/218058/parts-of-
speech-and-functions-bob-made-a-book-collector-happy-the-
other-day)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 32/89
SyntaxSyntax
ParsingParsing
Dependency Parsing
the relationships between words in a sentence (marking things like
Primary Objects and predicates)
Constituency Parsing
building out the Parse Tree using a Probabilistic Context-Free Grammar
(PCFG)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 33/89
Ref. Parsing
(https://www.cs.bgu.ac.il/~elhadad/nlp11/nlp03.html)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 34/89
SemanticsSemantics
Named entity recognition (NER)Named entity recognition (NER)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 35/89
Ref. Named Entity Recognition: Milestone Models, Papers and
Technologies (https://blog.paralleldots.com/data-science/named-
entity-recognition-milestone-models-papers-and-technologies/)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 36/89
SemanticsSemantics
Textual entailment ⽂字蘊涵Textual entailment ⽂字蘊涵
今天太陽很⼤ 今天沒有下⾬
今天很熱
今天下⾬
明天沒有下⾬
positive TE
正向蘊含
negative TE
⽭盾蘊含
non TE
獨⽴蘊含
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 37/89
SemanticsSemantics
Relationship extractionRelationship extraction
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 38/89
Ref. Natural language question answering over RDF - A graph
data driven approach
(https://www.researchgate.net/ gure/Relationship-Extraction-
DEFINITION-5-Let-us-consider-a-dependency-tree-Y-of-a-
natural_ g2_266656635)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 39/89
SemanticsSemantics
Sentiment analysis 情感分析Sentiment analysis 情感分析
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 40/89
Ref. Social media sentiment analysis: 4 ways models can improve
marketing (http://www.simafore.com/blog/bid/113465/Social-
media-sentiment-analysis-4-ways-models-can-improve-
marketing)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 41/89
Natural language understandingNatural language understanding
NLP
NLU
Named entity
recognition (NER)
Part-of-speech
tagging (POS tagging)
Reading
comprehension
Topic modeling
Machine translation
Dialogue
system
QA system
Summarization
Textual
entailment
Word
segmentation
Parsing
Sentiment
analysis Relationship
extraction
Topic
segmentation
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 42/89
NLP & NLUNLP & NLU
Natural Language Processing (NLP)
詞(word)
語法(syntax, grammar)
語義(semantics, meaning)
Natural Language Understanding (NLU)
"我的輪胎爆胎了"
字⾯意義:字⾯上直接傳達的意義
我的輪胎損毀
意向意義:字⾯背後的意義
我的輪胎沒辦法使⽤,導致汽⾞無法使⽤
意圖,使役意義Intent:想讓代理⼈幫忙做什麼
希望更換輪胎
情境或脈絡Context
Ref. ⾃然语⾔处理(NLP)vs ⾃然语⾔理解(NLU)(https://blog.csdn.net/ZLJ925/artic
based-vs-nlu-
%E8%81%8A%E5%A4%A9%E6%A9%9F%E5%99%A8%E4%BA%BA%E5%A6%82%E4%B
17065de49a)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 43/89
Natural language generationNatural language generation
Ref. The Ultimate Guide to Natural Language Generation
(https://medium.com/@AutomatedInsights/the-ultimate-guide-
to-natural-language-generation-bdcb457423d6)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 44/89
ApplicationApplication
Machine translationMachine translation
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 45/89
ApplicationApplication
SummarizationSummarization
Ref. Text Summarization based on Semantic Graph
(http://www.nlp.cs.ucf.edu/research/)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 46/89
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 47/89
ApplicationApplication
Question answeringQuestion answering
Ref. Automatic Question Answering
(https://towardsdatascience.com/automatic-question-answering-
ac7593432842)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 48/89
ApplicationApplication
Dialogue systemDialogue system
Ref. 【專家剖析】Chatbot三⼤技術關鍵與最新研究⽅向
(https://www.ithome.com.tw/news/113445)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 49/89
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 50/89
語⾔學相關領域語⾔學相關領域
語法學- 研究語法結構規律的科學
語義學- 找出語義表達的規律性、內在解釋、不同語⾔在語義表達⽅⾯的個性以
及共性
語⽤學- 研究語境對語⾔含義產⽣的影響與做出的貢獻
⼼理語⾔學— 探究語⾔在思想中的表述與運作
神經語⾔學— 探究語⾔在⼤腦中的表述
計算語⾔學- 試圖找出⾃然語⾔的規律,建⽴運算模型,最終讓電腦能夠像⼈類
般分析,理解和處理⾃然語⾔
Wikipedia - 語⾔學
(https://zh.wikipedia.org/wiki/%E8%AF%AD%E8%A8%80%E5%AD%A6)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 51/89
近代⾃然語⾔技術的進展近代⾃然語⾔技術的進展
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 52/89
IssuesIssues
中⽂⾃動分詞(Chinese word segmentation)
詞性標註(Part-of-speech tagging)
句法分析(Parsing)
⾃然語⾔⽣成(Natural language generation)
⽂字分類(Text categorization)
資訊檢索(Information retrieval)
資訊抽取(Information extraction)
⽂字校對(Text-proo ng)
問答系統(Question answering)
機器翻譯(Machine translation)
⾃動摘要(Automatic summarization)
閱讀理解(reading comprehension)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 53/89
Yoshua BengioYoshua Bengio
Full Professor, Department of Computer Science and Operations Research
Université de Montréal
Canada Research Chair in Statistical Learning Algorithms
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 54/89
Neural Language Model - 2001Neural Language Model - 2001
softmax
tanh
. . . . . .. . .
. . . . . .
. . . . . .
across words
most computation here
index for index for index for
shared parameters
Matrix
in
look−up
Table
. . .
C
C
wt−1wt−2
C(wt−2) C(wt−1)C(wt−n+1)
wt−n+1
i-th output = P(wt = i |context)
Bengio et al., A Neural Probabilistic Language Model, NIPS
Proceedings, 2001; Journal of Machine Learning Research
(JMLR), 2003
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 55/89
Distributed RepresentationsDistributed Representations
To embed syntax or semantic information in distributed representation (in aTo embed syntax or semantic information in distributed representation (in a
vector)vector)
Why and where can we nd such information?Why and where can we nd such information?
What de ne the meaning of a word?What de ne the meaning of a word?
Context!Context!
approachapproach
Word Embeddings
Character Embeddings
Contextualized Word Embeddings
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 56/89
Word2vec - 2013Word2vec - 2013
Ref. Vector Representations of Words
(https://www.tensor ow.org/tutorials/representation/word2vec)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 57/89
Word2vecWord2vec
Compositionality: queen = king − man + woman
Ref. Vector Representations of Words
(https://www.tensor ow.org/tutorials/representation/word2vec)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 58/89
Word2vecWord2vec
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 59/89
Ref.
,
Distributed Representations of Words and Phrases and their
Compositionality (https://dl.acm.org/citation.cfm?id=2999959)
Ef cient Estimation of Word Representations in Vector Space
(https://arxiv.org/abs/1301.3781)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 60/89
Character EmbeddingsCharacter Embeddings
Capture the intra-word morphological and shape information can be useful
parts-of-speech (POS) tagging
named-entity recognition (NER)
Santos and Guimaraes [31] applied character-level representations, along with
word embeddings for NER, achieving state-of-the-art results in Portuguese and
Spanish corpora.
AdvantageAdvantage
out-of-vocabulary (OOV) words
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 61/89
Contextualized Word EmbeddingsContextualized Word Embeddings
Disadvantage of global word embedding: Word2Vec and GloveDisadvantage of global word embedding: Word2Vec and Glove
⼀詞多義
⼀詞多義⼀詞多義
1. “The bank will not be accepting cash on Saturdays”
2. “The river over owed the bank.”
Deep contextual word embeddingsDeep contextual word embeddings
Embedding from Language Model (ELMo): extracts the intermediate layer
representations from the biLM
Pre-trained language modelPre-trained language model
Embedding from Language Model (ELMo)
Generative pre-training (GPT)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 62/89
Neural Network in NLPNeural Network in NLP
convolutionalneural networks
recurrent neural networks
recursive neural networks
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 63/89
Convolutional Neural Network (CNN) in NLPConvolutional Neural Network (CNN) in NLP
PurposePurpose
Extracts higher-level features from constituting words or n-grams
TasksTasks
sentiment analysis
summarization
machine translation
question answering (QA)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 64/89
CNN in NLPCNN in NLP
Able to extract salient n-gram feature to create an informative latent semantic
representation.
wo wN−1
Input
Sentence
Lookuptable
Feature1
Featurek
Convolution
layer
Max-pool
overtime
FullyConnectedLayer
SoftmaxClassification
w1
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 65/89
Basic CNNBasic CNN
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 66/89
ApplicationsApplications
Convolutional kernelConvolutional kernel speci c n-gram feature extractorsspeci c n-gram feature extractors→→
TasksTasks
sentence classi cation
sentiment classi cation
subjectivity classi cation
question type classi cation
Time-delay neural network (TDNN)Time-delay neural network (TDNN)
Convolutions are performed across all windows throughout the sentence at the
same time
Dynamic multi-pooling CNN (DMCNN)Dynamic multi-pooling CNN (DMCNN)
dynamic k-max pooling
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 67/89
Recurrent Neural Network (RNN) in NLPRecurrent Neural Network (RNN) in NLP
Idea and purposeIdea and purpose
Processing sequential information
Encode a range of sequential information into a x-sized vector
Output is depends on previous reuslts and current input
ApplicationsApplications
Language model
Machine translation
Speech recognition
Image caption
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 68/89
RNNRNN
= tanh(U , W )ht xt ht−1
Ref. Everything you need to know about Recurrent Neural
Networks (https://medium.com/ai-journal/lstm-gru-recurrent-
neural-networks-81fe2bcdf1f9)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 69/89
RNNRNN
AdvantageAdvantage
Able to capture the inherent sequential nature in language
Able to model variable length of data
Perform time distributed joint processing
DisadvantageDisadvantage
vanishing gradient problem
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 70/89
Long Short-Term MemoryLong Short-Term Memory
overcome the vanishing and exploding gradient problem
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 71/89
Ref. Simple RNN vs GRU vs LSTM :- Difference lies in More
Flexible control
(https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-
lstm-difference-lies-in-more- exible-control-5f33e07b1e57)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 72/89
Gated Recurrent UnitGated Recurrent Unit
less complexity, ef cient version of RNN than LSTM
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 73/89
Ref. Simple RNN vs GRU vs LSTM :- Difference lies in More
Flexible control
(https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-
lstm-difference-lies-in-more- exible-control-5f33e07b1e57)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 74/89
RNN for word-level classi cationRNN for word-level classi cation
Bidirectional LSTM for NER
[hb1 ;hf 1]
This is a book
[hb2 ;hf 2] [hb3 ;hf 3] [hb 4 ;hf 4]
RNN for sentence-level classi cationRNN for sentence-level classi cation
LSTM for sentiment classi cation
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 75/89
RNN for generating languageRNN for generating language
CNN LSTM LSTM LSTM
Image
wN−1
p1 pN−1
Output
TrueImageDescription
p2
w2w1
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 76/89
Types of RNNTypes of RNN
Ref. Recurrent Neural Networks in DL4J
(https://deeplearning4j.org/docs/latest/deeplearning4j-nn-
recurrent)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 77/89
Sequence-to-sequence model - 2014Sequence-to-sequence model - 2014
Ref. Sequence to sequence model: Introduction and concepts
(https://towardsdatascience.com/sequence-to-sequence-model-
introduction-and-concepts-44d9b41cd42d)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 78/89
Attention Mechanism - 2015Attention Mechanism - 2015
Ref. Attention? Attention! (https://lilianweng.github.io/lil-
log/2018/06/24/attention-attention.html)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 79/89
Attention MechanismAttention Mechanism
Ref. Attention? Attention! (https://lilianweng.github.io/lil-
log/2018/06/24/attention-attention.html)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 80/89
Attention MechanismAttention Mechanism
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 81/89
TransformerTransformer
IssueIssue
the sequential processing at the encoding step
more parallelizable and required lesser time
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 82/89
TransformerTransformer
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 83/89
WaveNet - GoogleWaveNet - Google
Ref. WaveNet: A Generative Model for Raw Audio
(https://deepmind.com/blog/wavenet-generative-model-raw-
audio/)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 84/89
Recursive Neural Network in NLPRecursive Neural Network in NLP
IdeaIdea
a natural way to model sequences
language exhibits a natural recursive structure
a compositional function on the representations of phrases or words to compute
the representation of a higher-level phrase
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 85/89
Unsupervised sentence representation learningUnsupervised sentence representation learning
sentence encoders
seq2seq model
the encoder could be seen as a generic feature extractor
Ref. Sequence to sequence model: Introduction and concepts
(https://towardsdatascience.com/sequence-to-sequence-model-
introduction-and-concepts-44d9b41cd42d)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 86/89
Deep Generative ModelsDeep Generative Models
PurposePurpose
To discover rich structure in natural language while generating realistic sentences from a
latent code space.
variational autoencoders (VAEs)
generative adversarial networks (GANs)
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 87/89
Deep Generative ModelsDeep Generative Models
LSTM LSTM
linear linear
LSTM
μ σz LSTM LSTM LSTM
x1 x2 x3
<EOS>
<EOS>y1 y2
y1 y2
Decoder
Encoder
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 88/89
Other networksOther networks
Memory-Augmented NetworkMemory-Augmented Network
Neural Turing Machines
Memory Network
Reinforcement LearningReinforcement Learning
2019/2/17 intro-to-nlp slides
http://127.0.0.1:8000/intro-to-nlp.slides.html?print-pdf#/ 89/89
Thank you for attentionThank you for attention
Q & AQ & A
ReferenceReference
Recent Trends in Deep Learning Based Natural Language Processing (https://arxiv.o
Deep Learning for NLP: An Overview of Recent Trends (https://medium.com/dair-ai/
overview-of-recent-trends-d0d8f40a776d)
15年来,⾃然语⾔处理发展史上的8⼤⾥程碑(https://zhuanlan.zhihu.com/p/47239
獨家| ⼀⽂讀懂⾃然語⾔處理NLP(附學習資料)
(https://tw.saowen.com/a/0c1d7d1765b999218654702c1e4d2d0e71c5e138141e

自然語言處理概覽