#02 Next RNN

Deep Learning Basic
#02 Next RNN
Terence Huang

自我介紹
學經歷
緯創資通高級工程師 (2015/3 ~)
長庚醫院研究助理 (2013/9 ~2014/11)
中山大學應數系統計碩士 (2011/9 ~ 2013/7)
高雄師大數學系學士 (2005/9 ~ 2009/7)
獲獎
2017 台南智慧黑客松智慧醫療組第二名
2012 中山高大統計新秀
特殊經歷
訪問 MIT 的工程師 (2017/10 ~ 2018/10)
Pytorch Kaohsiung 讀書會的社長
2
Terence Huang

進入 deep learning 文字分析前
• 怎麼把文字塞進 neural network
3
這
是
貓

• One-hot vector
4
這
是
貓
1,0,0
0,1,0
0,0,1

• One-hot vector by character
5
這
是
大
1,0,0,0
0,1,0,0
0,0,1,0
象 0,0,1,0

• One-hot vector by token
6
這
是
大象
1,0,0
0,1,0
0,0,1

• 只能放 one-hot vector 的 feature?
7
Feature 意義
BOW, Bag of word 文中每個字出現的次數/比例
TF-IDF 文中每個字的獨特程度
Word2vec 考慮上下文對每個字的意義影響，以向量做表示
Doc2vec 每個文章的意義，以向量做表示
LDA, Latent Dirichlet Allocation 每個文章中的主題比例

BOW, Bag of words
• 意義：文中每個字出現的次數/比例
• 用途：用詞習慣比較、文章抄襲、…
8

BOW, Bag of words
• 計算方式: Counting the number of times each meaningful word appears in each
document. Remove stop words.
9
"With all this stuff going down at the moment with MJ i've
started listening to his music, watching the odd
documentary here and there, watched The Wiz and
watched Moonwalker again. Maybe i just want to get a
certain insight into this guy who i thought was really cool
in the eighties just to maybe make up my mind whether he
is guilty or innocent. Moonwalker is part biography, part
feature film which i remember going to see at the cinema
when it was originally released. Some of it has subtle
messages about MJ's feeling towards the press and also
the obvious message of drugs are bad m'kay.
<br/><br/>..."
feature Count
also 2
bad 3
film 2
get 1
like 3
made 1
make 1
…
…
Document 1
𝑡1
𝑡2
𝑡7
⋮
⋮

BOW, Bag of words
• 計算方式: We have many documents, each document has different meaningful
words.
10
also bad film get like made …
2 3 2 1 3 1 …
film films like made man many …
1 2 1 1 1 2 …
also film get good movie one …
1 4 1 1 1 2 …
even film made movie much time …
3 1 1 1 1 1 …
Document 1
Document 2
Document 3
Document 4

BOW, Bag of words
• 計算方式: Use the most frequent words in all documents.
11
film movie one like good even …
Document 1 2 3 4 3 0 0 …
Document 2 1 1 2 1 0 0 …
Document 3 4 1 2 0 1 0 …
Document 4 1 1 0 0 0 3 …
…
…
…
…
…
…
…
Document 1000 0 2 0 3 1 2 …
1653 1630 1088 769 664 501 …
𝑡1 𝑡2 𝑡6⋯ ⋯

BOW, Bag of words
• 計算方式: Use the most frequent words in all documents.
12
film movie one like good even …
Document 1 2% 3% 4% 3% 0% 0% … 100%
Document 2 1% 1% 2% 1% 0% 0% … 100%
Document 3 4% 1% 2% 0% 1% 0% … 100%
Document 4 1% 1% 0% 0% 0% 3% … 100%
…
…
…
…
…
…
…
100%
Document 1000 0% 2% 0% 3% 1% 2% … 100%

TFIDF, Term frequency–inverse document frequency
• 意義：文中每個字的獨特程度，獨特性是跟其他的文一起比
• 用途：用來找出一篇文章中, 足以代表這篇文章的關鍵字的方法
13

• 計算方式：尋找這樣的關鍵字, 要考慮以下兩個要素
- TF, Term Frequency
• 這個字在這篇文章中出現的頻率
- IDF, Inverse-Document Frequency
• 在所有的文章中,有幾篇文章有這個字
• TF-IDF 值越大的, 越可以作為代表這篇文章的關鍵字
14
𝑇𝐹𝐼𝐷𝐹 = 𝑇𝐹 × Inv 𝐷𝐹 = 𝑛 𝑤
𝑑
× log2
𝑁
𝑁 𝑤
𝑛 𝑤
𝑑
：文章 𝑑 中，文字 𝑤 出現的次數
𝑁：表示總共有幾篇文章
𝑁 𝑤：表示有文字 𝑤 的文章有幾篇

• Example：Reuters Corpus (10788 篇文章)
- 其中某篇新聞內容
• GRAIN SHIPS LOADING AT PORTLAND
There were three grain ships loading and two ships were waiting to load at Portland , according to
the Portland Merchants Exchange .
- 𝑇𝐹𝐼𝐷𝐹portland = 3 × log2
10788
28
≈ 25.769, 10788 篇中有 28 篇出現 portland
- 𝑇𝐹𝐼𝐷𝐹𝑡𝑜 = 2 × log2
10788
6944
≈ 1.271, 10788 篇中有 6944 篇出現 to
15

Word2vec
• 意義：考慮上下文對每個字的意義影響，以向量做表示
• 用途：相似字、雙語翻譯、Analogies (ex."king - man + woman = queen.")
16
之前編碼 Word2vec
dog Id222 0,1,0,0,0,0,0,0 0.12,0.13,0.01,0.01,0.01
cat Id357 0,0,0,0,0,0,1,0 0.12,0.12,0.01,0.01,0.01
car Id358 0,0,0,0,0,0,0,1 0.01,0.01,0.33,0.40,0.25

Word2vec
• 計算方式: Skip-Gram
- ex. Doc1 = I have a cat. I love cat and dog.
- 7 個不同單字
8 個 pair
17
pair * i have i have a have a cat cat and dog
字 i have a ⋯ and
上下文 [*, have] [i, a] [have, cat] [cat, dog]
範例

Word2vec
• 計算方式: Skip-Gram
- ex. have a cat → a, [have, cat]
18
0
0
1
0
0
0
0
0.1
⋮
0.2
a
0
1
0
0
0
0
0
0
0
0
1
0
0
0
have
cat
0.1
0.9
0.0
0.0
0.1
0.3
0.2
0.1
0.1
0.0
0.0
0.1
0.4
0.2
∑
∑
∑
∑
∑
越像越好
轉出的向量
0.1
⋮
0.2
one hot
vector
one hot
vector
one hot
vector
越像越好
∑
∑
∑

Word2vec
• 意義：考慮上下文對每個字的意義影響，以向量做表示
• 用途：相似字、雙語翻譯、Analogies (ex."king - man + woman = queen.")
19
之前編碼 Word2vec
dog Id222 0,1,0,0,0,0,0,0 0.12,0.13,0.01,0.01,0.01
cat Id357 0,0,0,0,0,0,1,0 0.12,0.12,0.01,0.01,0.01
car Id358 0,0,0,0,0,0,0,1 0.01,0.01,0.33,0.40,0.25

Doc2vec
• 意義：每個文章的意義，以向量做表示
20

Doc2vec
• 計算方式: 借用 word2vec 的結果，平均內文 meaningful word 的向量
21
I have a cat. I love cat and dog.
Document 1
0.12,0.12,0.01,0.01,0.01
0.12,0.13,0.01,0.01,0.01
I
have
a
cat
cat
I
love
and
dog
0.12,0.12,0.01,0.01,0.01
0.01,0.01,0.01,0.01,0.95
0.01,0.01,0.01,0.01,0.95
0.01,0.01,0.01,0.95,0.01
0.39,0.40,0.06,1.00,1.00
0.39,0.40,0.06,1.00,1.00
6
文章的意義不單一？

LDA, Latent Dirichlet Allocation
• 意義：每個主題佔內文的比例
22
0.2
0.1
0.4
0.3
0.3
0.5
0.1
0.1
The William Randolph
Hearst Foundation will
give $1.25 million to
Lincoln Center,
Metropolitan …
''Our board felt that we
had a real opportunity to
make a mark on the
future of …
Documents
…
LDA
…
Arts
Budgets
Children
Education

23
Arts
0.461 * music +
0.134 * movie + …
Budgets
0.211 * tax +
0.035 * money + …
Children
0.309 * children +
0.241 * family + …
Education
0.217 * school +
0.222 * teacher + …
Lincoln Center,
Metropolitan …
make a mark on the
future of …
Topics
Documents
…
LDA

24
Arts
0.461 * music +
0.134 * movie + …
Budgets
0.211 * tax +
0.035 * money + …
Children
0.309 * children +
0.241 * family + …
Education
0.217 * school +
0.222 * teacher + …
0.2
0.1
0.4
0.3
0.3
0.5
0.1
0.1
Lincoln Center,
Metropolitan …
make a mark on the
future of …
Topics
Documents
…
LDA

• 只能放 one-hot vector 的 feature?
25
Feature 意義
BOW, Bag of word 文中每個字出現的次數/比例
TF-IDF 文中每個字的獨特程度
Word2vec 考慮上下文對每個字的意義影響，以向量做表示
Doc2vec 每個文章的意義，以向量做表示
LDA, Latent Dirichlet Allocation 每個文章中的主題比例

動手時間 Q&A
1. 程式碼理解
- Copy & Paste 範例程式: https://github.com/Terence0408/Teach_code
- 解析程式
- 改動程式
- 做點應用
26

正式進入 RNN/LSTM
• Motivation: Why Learn recurrent neural network?
- Language Modeling
- Key: speech recognition, machine translation, image captioning, …
27
RNN are popular models that have
shown great promise
in many Language Modeling tasks

正式進入 RNN/LSTM
• Task1: predict {Yes, No} by sentence
• Task2: predict next words by previous words in sentence
• Ponder the question “how is that even possible?”
28
Input Output
the clouds are in the sky
I grew up in France…… I speak fluent French
Input Output
the clouds are in the sky Yes
the clouds are in the ground No

RNN , recurrent neural networks
• RNN (Recurrent Neural Networks)
29
𝑿 𝟏
𝒉 𝟏
𝑿 𝟐
𝒉 𝟐
𝒉 𝟎 𝒉 𝟏
Pass
weight
……..
……..

RNN, recurrent neural networks
30
In this moment
– Input: 𝑋𝑡 & 𝒉 𝒕−𝟏
– Output: 𝒉 𝒕

RNN, recurrent neural networks
31
In this moment
– Input: 𝑋𝑡+1 & 𝒉 𝒕
– Output: 𝒉 𝒕+𝟏
In this moment
– Input: 𝑋𝑡 & 𝒉 𝒕−𝟏
– Output: 𝒉 𝒕

LSTM, Long Short Term Memory Networks
• RNN is Good. But …
- Sometimes, we only need to look at recent information to perform the present task.
- Long-Term Dependencies problem
32
Input Output
I grew up in France…… I speak fluent French
Important information, but …
Too far to pass information/weight to output

33
RNN
LSTM

• LSTM (Long Short Term Memory Networks)
34
Throw old info. Store new info. Update memory
𝑓𝑡: 上一步 memory cell
使用比例
C 𝑡: 更新這一步的
memory cell
𝑜𝑡: 下一步的 memory cell
使用比例
ℎ 𝑡: 輸入給下一步的資訊

• 權重計算
- Backpropagation: 傳遞預測誤差來更新權重
1. 計算預測誤差
2. 傳遞誤差讓前一層用 SGD 找最適權重
3. 計算更新前一層後的預測誤差
4. 一直把誤差傳遞給前面層做更新
35

• 修正權重: Backpropagation through time
• Problem: Compute cost & vanishing gradient
36

• 修正權重: Truncated Backpropagation through time
37
True
Sequence length: 6
Errors truncated to 3 steps
Tensorflow-style
Sequence length: 3
Errors truncated to 3 steps

解析程式
• imdb_lstm.py 電影評論預測
1. 引入 Keras 套件
2. 模型結構
3. 資料 (Image & label) 擺放
4. 預測結果
5. 參數調教
38

解析範例程式
2. 模型結構
39
建立 deep learning 環境叫 model
設定預測誤差參數
放入資料開始跑模型
同時預測 Test set 結果
與訓練模型間關係：獨立
批次跑，一次跑多少筆

解析範例程式
3. 資料 (Image & label) 擺放
40
Token 轉化向量的維度
一個句子要有幾個 token
從 imbd dataset
取出現次數前 20000的 token
處裡句子 token 數不足
Pad(通常補空格)

動手時間 Q&A
• Copy & Paste 範例程式:
https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py
• 嘗試各種參數組合
• 換筆資料
• 換別的模型結構
• 記得，把Epoch 設 1 & 資料切小一點
41

RNN/LSTM 應用
42
Signal, stock
Seq2seq: 對話機器人
POS/NER tagging

Q&A
Thanks for your attention!
44

#02 Next RNN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to #02 Next RNN

Similar to #02 Next RNN (20)

More from Terence Huang

More from Terence Huang (8)

Recently uploaded

Recently uploaded (20)

#02 Next RNN