Sequence Model with practicing hands on coding.pdf

Sequence Model
Orozco Hsu
2025-04-18
1

About me
2
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• LLM Architecture & Development
• Research
• Data Ops (ML Ops)
• Generative AI research
• Business Data Analysis, AI

Tutorial
Content
3
Homework
Sequence Models
Sequence Data & Sequence Models
• Vanilla RNN (Recurrent Neural Network)
• LSTM (Long short term memory)
• GRU (Gated recurrent unit)
• Transformer Model

Code
• Sample code
• https://drive.google.com/drive/folders/1Cd8mI72yzKhdIBm2OykSzwAC2uZF0
44-?usp=sharing
4

Sequence Data: Types
• Sequence Data:
• The order of elements is significant.
• It can have variable lengths. In natural language, sentences can be of different
lengths, and in genomics, DNA sequences can vary in length depending on the
organism.
5

Sequence Data: Examples
• Examples:
• Image Captioning.
6
圖片經 CNN 計算將特徵向量輸出給 LSTM hidden state，並將之初始化，
LSTM 是有【記憶能力】的 recurrent 結構，會自動保留上下文（也就是圖片內容 + 前面生成的字詞）
於是 LSTM 生成出 A 字詞，在將 A 輸入給 LSTM hidden state 接下來陸續完成整個句子

• Examples:
• Speech Signals.
7
Language models 會將自詞整理，輸出成人類理解的句子

• Examples:
• Time Series Data, such as time stamped transactional data.
8
1. 一般來說，我們討論時間序列分析，都會採用滯後特徵 (Lagged features) 有助於捕捉時間序列中的模式、趨勢與季節性變化，它能幫助模
型理解過去數值對現在的影響；但如果採用 LSTM 自帶時間記憶機制 (隱含 lagged 特性)
2. LSTM 在每個時間步都會考慮前面的 hidden state 和 cell state（記憶能力）

• Examples:
• Language Translation (Natural Language Text).
• Chatbot.
• Text summarization.
• Text categorization.
• Parts of speech tagging.
• Stemming.
• Text mining.
9

Sequence Models: applications
• Sequence models are a class of machine learning models designed for
tasks that involve sequential data, where the order of elements in the
input is important.
• Model applications:
10
one to one: Fixed length input/output, a general neural network model.
one to many: Image captioning.
many to one: Sentiment analysis.
many to many: machine translation.

Sequence Model: Recurrent Neural Networks (RNNs)
• RNNs are a fundamental type of sequence model.
• They process sequences one element at a time sequentially while
maintaining an internal hidden state that stores information about
previous elements in the sequence.
• Traditional RNNs suffer from the Vanishing gradient problem, which
limits their ability to capture long-range dependencies.
11

12
Hidden state 會自動記住上下文關係，所以比傳統 NLP (n-gram) 強，輸入一個字產生下一個字，這種策略又叫做自回饋 (AutoRegressive)
h(0): 初始狀態，全為0，就像一張白紙，等待
模型看過越多的文本，產生越多的記憶。
公式: h(1) = σ( Wh * h(0) + We * e(1) + b )
h(1): 會記住 the
h(2): 會記住 the student
h(3): 會記住 the student opened
h(4): 會記住 the student opened their

• 堆疊式 (Stacked RNN) 是透過多層遞進
的方式提升序列建模能力，每層處理
【不同層次】的特徵
• 層數的選擇應依據資料序列長度與任務
需求來調整，以避免模型過度複雜而無
法有效訓練
13
(任務需求考量，如: 情感分析 vs 機器翻譯)

14
Vanilla_RNN02.ipynb (調整參數 (num_layers)，比較收斂的結果 )
Vanilla RNN 指的是最基本的 RNN架構，僅有一層 RNN cell，每個時間步只傳遞一個 hidden state（h）

Sequence Model: Long Short-Term Memory Networks (LSTM)
• They are a type of RNNs designed to overcome the Vanishing gradient
problem.
• They introduce specialized memory cells and gating mechanisms that
allow them to capture and preserve information over long sequences.
• Gates (input, forget, and output) to regulate the flow of information.
• 傳統的 NLP 當中，依賴前一個狀態進行文字生成的做法
• Hidden Markov Models (HMM)
• N-Gram
15
(LSTM 梯度消失問題仍存在，只是相較 RNNs 會好一些而已)
(不考慮上下文)
(額外引入了 cell state，用來保留長期記憶，進而有效處理長距離依賴問題)

Sequence Model: Long Short-Term Memory Networks (LSTM)
16
Pytorch_lstm_02.ipynb
• Forget gate: 要忘記多少東西，防止 cell state 無限堆疊累積無用資訊
• Input gate: 決定要不要將新的資訊寫入cell state
• Output gate: 決定 cell state 的哪部分輸出到 hidden state
• Cell input: 要寫進記憶的內容，會經過 tanh() 處理
f(t) = σ(W_f * x(t) + U_f * h(t-1) + b_f) ← forget gate
i(t) = σ(W_i * x(t) + U_i * h(t-1) + b_i) ← input gate
o(t) = σ(W_o * x(t) + U_o * h(t-1) + b_o) ← output gate
c̃(t) = tanh(W_c * x(t) + U_c * h(t-1) + b_c) ← cell input
c(t) = f(t) ⊙ c(t-1) + i(t) ⊙ c̃(t)
h(t) = o(t) ⊙ tanh(c(t))

Sequence Model: Gated Recurrent Units (GRUs)
• They are another variant of RNNs that are similar to LSTM but with a
simplified structure.
• They also use gating mechanisms to control the flow of information
within the network.
• Gates (reset gate and update gate) to regulate the flow of information
• They are computationally more efficient than LSTM while still being
able to capture dependencies in sequential data.
17
(權重參數的數量比較少一些，所以收斂速度上會比較快)

18
Sequence Model: Gated Recurrent Units (GRUs)
只有一個 hidden state 輸出，較 LSTM 簡單，參數較少；常用來與 LSTM 進行比較
• Reset gate : 決定如何結合新輸入與過去的 hidden state
• Update gate: 類似 input gate + forget gate 控制新舊知識比例
決定「要接受多少新記憶」，新狀態是舊
記憶與新記憶的加權平均，權重由 update
gate 決定
z 表示「保留原有記憶的比例」，那麼 1−z
就是「接受新記憶的比例」
pytorch_gru_01.ipynb

Homework
• 改寫 HW02.ipynb，比較多種模型 RMSE 變化與您使用的參數
19
RNN Types Train Score Test Score Parameters
Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500
LSTM
GRU

補充
20
常見矩陣向量的處理方式
在 python 上叫做容器
在 numpy 上叫做張量
在 pytorch 上叫做張量
矩陣向量在這三種格式上轉換，
GPU 只能在 pytorch 上執行

Pytorch tutorial
• Pytorch Tutorials
• Welcome to PyTorch Tutorials — PyTorch Tutorials 2.6.0+cu124
documentation
21

Sequence Model with practicing hands on coding.pdf

More Related Content

Similar to Sequence Model with practicing hands on coding.pdf

More from FEG

Recently uploaded

Sequence Model with practicing hands on coding.pdf