Sequence Model
Orozco Hsu
2025-04-18
1
About me
2
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• LLM Architecture & Development
• Research
• Data Ops (ML Ops)
• Generative AI research
• Business Data Analysis, AI
Tutorial
Content
3
Homework
Sequence Models
Sequence Data & Sequence Models
• Vanilla RNN (Recurrent Neural Network)
• LSTM (Long short term memory)
• GRU (Gated recurrent unit)
• Transformer Model
Code
• Sample code
• https://drive.google.com/drive/folders/1Cd8mI72yzKhdIBm2OykSzwAC2uZF0
44-?usp=sharing
4
Sequence Data: Types
• Sequence Data:
• The order of elements is significant.
• It can have variable lengths. In natural language, sentences can be of different
lengths, and in genomics, DNA sequences can vary in length depending on the
organism.
5
Sequence Data: Examples
• Examples:
• Image Captioning.
6
圖片經 CNN 計算將特徵向量輸出給 LSTM hidden state,並將之初始化,
LSTM 是有【記憶能力】的 recurrent 結構,會自動保留上下文(也就是圖片內容 + 前面生成的字詞)
於是 LSTM 生成出 A 字詞,在將 A 輸入給 LSTM hidden state 接下來 陸續完成整個句子
Sequence Data: Examples
• Examples:
• Speech Signals.
7
Language models 會將自詞整理,輸出成人類理解的句子
Sequence Data: Examples
• Examples:
• Time Series Data, such as time stamped transactional data.
8
1. 一般來說,我們討論時間序列分析,都會採用滯後特徵 (Lagged features) 有助於捕捉時間序列中的模式、趨勢與季節性變化,它能幫助模
型理解過去數值對現在的影響;但如果採用 LSTM 自帶時間記憶機制 (隱含 lagged 特性)
2. LSTM 在每個時間步都會考慮前面的 hidden state 和 cell state(記憶能力)
• Examples:
• Language Translation (Natural Language Text).
• Chatbot.
• Text summarization.
• Text categorization.
• Parts of speech tagging.
• Stemming.
• Text mining.
Sequence Data: Examples
9
Sequence Models: applications
• Sequence models are a class of machine learning models designed for
tasks that involve sequential data, where the order of elements in the
input is important.
• Model applications:
10
one to one: Fixed length input/output, a general neural network model.
one to many: Image captioning.
many to one: Sentiment analysis.
many to many: machine translation.
Sequence Model: Recurrent Neural Networks (RNNs)
• RNNs are a fundamental type of sequence model.
• They process sequences one element at a time sequentially while
maintaining an internal hidden state that stores information about
previous elements in the sequence.
• Traditional RNNs suffer from the Vanishing gradient problem, which
limits their ability to capture long-range dependencies.
11
Sequence Model: Recurrent Neural Networks (RNNs)
12
Hidden state 會自動記住上下文關係,所以比傳統 NLP (n-gram) 強,輸入一個字產生下一個字,這種策略又叫做自回饋 (AutoRegressive)
h(0): 初始狀態,全為0,就像一張白紙,等待
模型看過越多的文本,產生越多的記憶。
公式: h(1) = σ( Wh * h(0) + We * e(1) + b )
h(1): 會記住 the
h(2): 會記住 the student
h(3): 會記住 the student opened
h(4): 會記住 the student opened their
• 堆疊式 (Stacked RNN) 是透過多層遞進
的方式提升序列建模能力,每層處理
【不同層次】的特徵
• 層數的選擇應依據資料序列長度與任務
需求來調整,以避免模型過度複雜而無
法有效訓練
13
Sequence Model: Recurrent Neural Networks (RNNs)
(任務需求考量,如: 情感分析 vs 機器翻譯)
14
Sequence Model: Recurrent Neural Networks (RNNs)
Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 )
Vanilla RNN 指的是最基本的 RNN架構,僅有 一層 RNN cell,每個時間步只傳遞一個 hidden state(h)
Sequence Model: Long Short-Term Memory Networks (LSTM)
• They are a type of RNNs designed to overcome the Vanishing gradient
problem.
• They introduce specialized memory cells and gating mechanisms that
allow them to capture and preserve information over long sequences.
• Gates (input, forget, and output) to regulate the flow of information.
• 傳統的 NLP 當中, 依賴前一個狀態進行文字生成的做法
• Hidden Markov Models (HMM)
• N-Gram
15
(LSTM 梯度消失問題仍存在,只是相較 RNNs 會好一些而已)
(不考慮上下文)
(額外引入了 cell state,用來保留長期記憶,進而有效處理長距離依賴問題)
Sequence Model: Long Short-Term Memory Networks (LSTM)
16
Pytorch_lstm_02.ipynb
• Forget gate: 要忘記多少東西,防止 cell state 無限堆疊累積無用資訊
• Input gate: 決定要不要將新的資訊寫入cell state
• Output gate: 決定 cell state 的哪部分輸出到 hidden state
• Cell input: 要寫進記憶的內容,會經過 tanh() 處理
f(t) = σ(W_f * x(t) + U_f * h(t-1) + b_f) ← forget gate
i(t) = σ(W_i * x(t) + U_i * h(t-1) + b_i) ← input gate
o(t) = σ(W_o * x(t) + U_o * h(t-1) + b_o) ← output gate
c̃(t) = tanh(W_c * x(t) + U_c * h(t-1) + b_c) ← cell input
c(t) = f(t) ⊙ c(t-1) + i(t) ⊙ c̃(t)
h(t) = o(t) ⊙ tanh(c(t))
Sequence Model: Gated Recurrent Units (GRUs)
• They are another variant of RNNs that are similar to LSTM but with a
simplified structure.
• They also use gating mechanisms to control the flow of information
within the network.
• Gates (reset gate and update gate) to regulate the flow of information
• They are computationally more efficient than LSTM while still being
able to capture dependencies in sequential data.
17
(權重參數的數量比較少一些,所以收斂速度上會比較快)
18
Sequence Model: Gated Recurrent Units (GRUs)
只有一個 hidden state 輸出,較 LSTM 簡單,參數較少;常用來與 LSTM 進行比較
• Reset gate : 決定如何結合新輸入與過去的 hidden state
• Update gate: 類似 input gate + forget gate 控制新舊知識比例
決定「要接受多少新記憶」,新狀態是 舊
記憶與新記憶的加權平均,權重由 update
gate 決定
z 表示「保留原有記憶的比例」,那麼 1−z
就是「接受新記憶的比例」
pytorch_gru_01.ipynb
Homework
• 改寫 HW02.ipynb,比較多種模型 RMSE 變化與您使用的參數
19
RNN Types Train Score Test Score Parameters
Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500
LSTM
GRU
補充
20
常見矩陣向量的處理方式
在 python 上叫做容器
在 numpy 上叫做張量
在 pytorch 上叫做張量
矩陣向量在這三種格式上轉換,
GPU 只能在 pytorch 上執行
Pytorch tutorial
• Pytorch Tutorials
• Welcome to PyTorch Tutorials — PyTorch Tutorials 2.6.0+cu124
documentation
21

Sequence Model with practicing hands on coding.pdf

  • 1.
  • 2.
    About me 2 • Education •NCU (MIS)、NCCU (CS) • Experiences • Telecom big data Innovation • Retail Media Network (RMN) • Customer Data Platform (CDP) • Know-your-customer (KYC) • Digital Transformation • LLM Architecture & Development • Research • Data Ops (ML Ops) • Generative AI research • Business Data Analysis, AI
  • 3.
    Tutorial Content 3 Homework Sequence Models Sequence Data& Sequence Models • Vanilla RNN (Recurrent Neural Network) • LSTM (Long short term memory) • GRU (Gated recurrent unit) • Transformer Model
  • 4.
    Code • Sample code •https://drive.google.com/drive/folders/1Cd8mI72yzKhdIBm2OykSzwAC2uZF0 44-?usp=sharing 4
  • 5.
    Sequence Data: Types •Sequence Data: • The order of elements is significant. • It can have variable lengths. In natural language, sentences can be of different lengths, and in genomics, DNA sequences can vary in length depending on the organism. 5
  • 6.
    Sequence Data: Examples •Examples: • Image Captioning. 6 圖片經 CNN 計算將特徵向量輸出給 LSTM hidden state,並將之初始化, LSTM 是有【記憶能力】的 recurrent 結構,會自動保留上下文(也就是圖片內容 + 前面生成的字詞) 於是 LSTM 生成出 A 字詞,在將 A 輸入給 LSTM hidden state 接下來 陸續完成整個句子
  • 7.
    Sequence Data: Examples •Examples: • Speech Signals. 7 Language models 會將自詞整理,輸出成人類理解的句子
  • 8.
    Sequence Data: Examples •Examples: • Time Series Data, such as time stamped transactional data. 8 1. 一般來說,我們討論時間序列分析,都會採用滯後特徵 (Lagged features) 有助於捕捉時間序列中的模式、趨勢與季節性變化,它能幫助模 型理解過去數值對現在的影響;但如果採用 LSTM 自帶時間記憶機制 (隱含 lagged 特性) 2. LSTM 在每個時間步都會考慮前面的 hidden state 和 cell state(記憶能力)
  • 9.
    • Examples: • LanguageTranslation (Natural Language Text). • Chatbot. • Text summarization. • Text categorization. • Parts of speech tagging. • Stemming. • Text mining. Sequence Data: Examples 9
  • 10.
    Sequence Models: applications •Sequence models are a class of machine learning models designed for tasks that involve sequential data, where the order of elements in the input is important. • Model applications: 10 one to one: Fixed length input/output, a general neural network model. one to many: Image captioning. many to one: Sentiment analysis. many to many: machine translation.
  • 11.
    Sequence Model: RecurrentNeural Networks (RNNs) • RNNs are a fundamental type of sequence model. • They process sequences one element at a time sequentially while maintaining an internal hidden state that stores information about previous elements in the sequence. • Traditional RNNs suffer from the Vanishing gradient problem, which limits their ability to capture long-range dependencies. 11
  • 12.
    Sequence Model: RecurrentNeural Networks (RNNs) 12 Hidden state 會自動記住上下文關係,所以比傳統 NLP (n-gram) 強,輸入一個字產生下一個字,這種策略又叫做自回饋 (AutoRegressive) h(0): 初始狀態,全為0,就像一張白紙,等待 模型看過越多的文本,產生越多的記憶。 公式: h(1) = σ( Wh * h(0) + We * e(1) + b ) h(1): 會記住 the h(2): 會記住 the student h(3): 會記住 the student opened h(4): 會記住 the student opened their
  • 13.
    • 堆疊式 (StackedRNN) 是透過多層遞進 的方式提升序列建模能力,每層處理 【不同層次】的特徵 • 層數的選擇應依據資料序列長度與任務 需求來調整,以避免模型過度複雜而無 法有效訓練 13 Sequence Model: Recurrent Neural Networks (RNNs) (任務需求考量,如: 情感分析 vs 機器翻譯)
  • 14.
    14 Sequence Model: RecurrentNeural Networks (RNNs) Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 ) Vanilla RNN 指的是最基本的 RNN架構,僅有 一層 RNN cell,每個時間步只傳遞一個 hidden state(h)
  • 15.
    Sequence Model: LongShort-Term Memory Networks (LSTM) • They are a type of RNNs designed to overcome the Vanishing gradient problem. • They introduce specialized memory cells and gating mechanisms that allow them to capture and preserve information over long sequences. • Gates (input, forget, and output) to regulate the flow of information. • 傳統的 NLP 當中, 依賴前一個狀態進行文字生成的做法 • Hidden Markov Models (HMM) • N-Gram 15 (LSTM 梯度消失問題仍存在,只是相較 RNNs 會好一些而已) (不考慮上下文) (額外引入了 cell state,用來保留長期記憶,進而有效處理長距離依賴問題)
  • 16.
    Sequence Model: LongShort-Term Memory Networks (LSTM) 16 Pytorch_lstm_02.ipynb • Forget gate: 要忘記多少東西,防止 cell state 無限堆疊累積無用資訊 • Input gate: 決定要不要將新的資訊寫入cell state • Output gate: 決定 cell state 的哪部分輸出到 hidden state • Cell input: 要寫進記憶的內容,會經過 tanh() 處理 f(t) = σ(W_f * x(t) + U_f * h(t-1) + b_f) ← forget gate i(t) = σ(W_i * x(t) + U_i * h(t-1) + b_i) ← input gate o(t) = σ(W_o * x(t) + U_o * h(t-1) + b_o) ← output gate c̃(t) = tanh(W_c * x(t) + U_c * h(t-1) + b_c) ← cell input c(t) = f(t) ⊙ c(t-1) + i(t) ⊙ c̃(t) h(t) = o(t) ⊙ tanh(c(t))
  • 17.
    Sequence Model: GatedRecurrent Units (GRUs) • They are another variant of RNNs that are similar to LSTM but with a simplified structure. • They also use gating mechanisms to control the flow of information within the network. • Gates (reset gate and update gate) to regulate the flow of information • They are computationally more efficient than LSTM while still being able to capture dependencies in sequential data. 17 (權重參數的數量比較少一些,所以收斂速度上會比較快)
  • 18.
    18 Sequence Model: GatedRecurrent Units (GRUs) 只有一個 hidden state 輸出,較 LSTM 簡單,參數較少;常用來與 LSTM 進行比較 • Reset gate : 決定如何結合新輸入與過去的 hidden state • Update gate: 類似 input gate + forget gate 控制新舊知識比例 決定「要接受多少新記憶」,新狀態是 舊 記憶與新記憶的加權平均,權重由 update gate 決定 z 表示「保留原有記憶的比例」,那麼 1−z 就是「接受新記憶的比例」 pytorch_gru_01.ipynb
  • 19.
    Homework • 改寫 HW02.ipynb,比較多種模型RMSE 變化與您使用的參數 19 RNN Types Train Score Test Score Parameters Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500 LSTM GRU
  • 20.
    補充 20 常見矩陣向量的處理方式 在 python 上叫做容器 在numpy 上叫做張量 在 pytorch 上叫做張量 矩陣向量在這三種格式上轉換, GPU 只能在 pytorch 上執行
  • 21.
    Pytorch tutorial • PytorchTutorials • Welcome to PyTorch Tutorials — PyTorch Tutorials 2.6.0+cu124 documentation 21