SlideShare a Scribd company logo
1 of 30
Download to read offline
Sequence Model
Orozco Hsu
2024-04-17
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• Research
• Data Ops (ML Ops)
• Business Data Analysis, AI
2
Tutorial
Content
3
Homework
Sequence Models
Sequence Data & Sequence Models
• Vanilla RNN (Recurrent Neural Network)
• LSTM (Long short term memory)
• GRU (Gated recurrent unit)
• Transformer Model
Pytorch tutorial
• Pytorch Tutorials
• Welcome to PyTorch Tutorials — PyTorch Tutorials 2.2.1+cu121
documentation
4
Sequence Data: Types
• Sequence Data:
• The order of elements is significant.
• It can have variable lengths. In natural language, sentences can be of different
lengths, and in genomics, DNA sequences can vary in length depending on the
organism.
5
Sequence Data: Examples
• Examples:
• Image Captioning.
6
Sequence Data: Examples
• Examples:
• Speech Signals.
7
Sequence Data: Examples
• Examples:
• Time Series Data, such as time stamped transactional data.
8
Lagged features can help you capture the patterns, trends, and seasonality
in your time series data, as well as the effects of external factors or events.
• Examples:
• Language Translation (Natural Language Text).
• Chatbot.
• Text summarization.
• Text categorization.
• Parts of speech tagging.
• Stemming.
• Text mining.
Sequence Data: Examples
9
補充
• Pytorch 模型 forward 練習
10
補充
• RNNs 輸入輸出
11
RNN — PyTorch 2.2 documentation
補充
• 激活值計算
12
Sequence Models: applications
• Sequence models are a class of machine learning models designed for
tasks that involve sequential data, where the order of elements in the
input is important.
• Model applications:
13
one to one: Fixed length input/output, a general neural network model.
one to many: Image captioning.
many to one: Sentiment analysis.
many to many: machine translation.
Sequence Model: Recurrent Neural Networks (RNNs)
• RNNs are a fundamental type of sequence model.
• They process sequences one element at a time sequentially while
maintaining an internal hidden state that stores information about
previous elements in the sequence.
• Traditional RNNs suffer from the Vanishing gradient problem, which
limits their ability to capture long-range dependencies.
14
Sequence Model: Recurrent Neural Networks (RNNs)
15
• Stacked RNNs also called Deep RNNs.
• The hidden state is responsible for
memorizing the information from the
previous timestep and using that for
further adjustment of weights in
Training a model.
16
Sequence Model: Recurrent Neural Networks (RNNs)
If your data sequence is short then don’t use more that 2-3
layers because un-necessarily extra training time may lead to
make your model un-optimized.
Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 )
Sequence Model: Long Short-Term Memory Networks (LSTM)
• They are a type of RNNs designed to overcome the Vanishing gradient
problem.
• They introduce specialized memory cells and gating mechanisms that
allow them to capture and preserve information over long sequences.
• Gates (input, forget, and output) to regulate the flow of information.
• They usually perform better than Hidden Markov Models (HMM).
17
(梯度消失問題仍存在,只是相較 RNNs 會好一些而已)
Sequence Model: Long Short-Term Memory Networks (LSTM)
18
Pytorch_lstm_02.ipynb
Sequence Model: Gated Recurrent Units (GRUs)
• They are another variant of RNNs that are similar to LSTM but with a
simplified structure.
• They also use gating mechanisms to control the flow of information
within the network.
• Gates (rest and update) to regulate the flow of information
• They are computationally more efficient than LSTM while still being
able to capture dependencies in sequential data.
19
pytorch_gru_01.ipynb
(權重參數的數量比較小一些,所以收斂速度上會比較快)
Sequence Model: Seq2seq
20
Sequence Model: Transformer Models
• They are a more recent and highly effective architecture for sequence
modeling.
• They move away from recurrence and rely on a self-attention
mechanism to process sequences in parallel and capture long-term
dependencies in data, making them more efficient than traditional
RNNs.
• Self-attention mechanisms to weight the importance of different parts of
input data.
• They have been particularly successful in NLP tasks and have led to
models like BERT, GPT, and others.
21
Sequence Model: Transformer Models
• In the paper Attention is all you need (2017).
• It abandons traditional CNNs and RNNs and
entire network structure composed of Attention
mechanisms and FNNs.
22
https://arxiv.org/abs/1706.03762
• It includes Encoder and
Decoder.
• Bidirectional Encoder Representations from Transformers
• 使用 Encoder 進行編碼,採用 self-attention 機制在編碼 token 同時考
慮上下文的 token,上下文的意思就是雙向的意思 (bidirectional)。
• 採用 Self-Attention多頭機制提取全面性信息。
• 是一種 pre-trained模型,可視為一種特徵提取器。
• 使用 Transformer 的 Encoder 進行特徵提取,可說 Encoder 就是 BERT
模型。
23
Sequence Model: Transformer Models (BERT)
Sequence Model: Transformer Models (Encoder)
• How are you? • 你好嗎?
Tokenizer
<start> “How” “are” “you” “?” <end>
Vocabulary
How 1
are 10
you 300
? 4
Word to index mapping
d=512 Add PE
(Positional Encoding)
Multi-Head-
Attention (HMA)
In
parallel
Feed
forward
Residual
N
O
R
M
Residual
N
O
R
M
Block Block Block
vectors
Sequence Model: Multi-head-attention
• Attention mechanism:
• Select more useful information from words.
• Q, K and V are obtained by applying a linear transformation to the input word
vector x.
• Each matrix W can be learned through training.
25
Q: Information to be queried
K: Vectors being queried
V: Values obtained from the query
(多頭的意思,想像成 CNN 中多個卷積核的作用)
1. MLP/CNN的情況不同,不同層有不同的參數,因此有各自的梯度。
2. RNN權重參數是共享,最終的計算方式是梯度總和下更新;而梯度不會消失,只是距
離遠的梯度會消失而已,所以梯度被近距離的主導,缺乏遠距離的依賴關係。
3. 難以學到遠距離的依賴關係,因此需要導入 attention 機制。
Sequence Model: Transformer Models (Decoder)
26
vectors
Q,K,V
B
O
S
你 好 嗎
你 好 嗎 ?
max max max max
你 0.8
好 0
嗎 0.1
… …
END 0
Size V
(Common characters)
?
E
N
D
max
Add PE
Masked MHA
Cross Attention
Feed Forward
你 1
好 0
嗎 0
… …
… 0
Ground truth
Minimize
cross
entropy
Sequence Model: Summarized
27
Sequence Model: Transformer Models
28
Homework
• 改寫 HW02.ipynb,新增 LSTM、GRU 兩種模型,比較 RMSE
變化與您使用的參數
29
RNN Types Train Score Test Score Parameters
Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500
LSTM
GRU
補充
30

More Related Content

Similar to Sequence Model pytorch at colab with gpu.pdf

Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Anthill Talk Aditya
Anthill Talk AdityaAnthill Talk Aditya
Anthill Talk AdityaAditya Patel
 
IRJET- Survey on Text Error Detection using Deep Learning
IRJET-  	  Survey on Text Error Detection using Deep LearningIRJET-  	  Survey on Text Error Detection using Deep Learning
IRJET- Survey on Text Error Detection using Deep LearningIRJET Journal
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningDr. Ananth Krishnamoorthy
 
Deep learning fundamentals workshop
Deep learning fundamentals workshopDeep learning fundamentals workshop
Deep learning fundamentals workshopSatnam Singh
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programmingShaveta Banda
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Intro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfIntro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfomardesoky789
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 

Similar to Sequence Model pytorch at colab with gpu.pdf (20)

Lecture 2
Lecture 2Lecture 2
Lecture 2
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
DEEP LEARNING.docx
DEEP LEARNING.docxDEEP LEARNING.docx
DEEP LEARNING.docx
 
Deep learning internals
Deep learning internalsDeep learning internals
Deep learning internals
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Anthill Talk Aditya
Anthill Talk AdityaAnthill Talk Aditya
Anthill Talk Aditya
 
IRJET- Survey on Text Error Detection using Deep Learning
IRJET-  	  Survey on Text Error Detection using Deep LearningIRJET-  	  Survey on Text Error Detection using Deep Learning
IRJET- Survey on Text Error Detection using Deep Learning
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
Deep learning fundamentals workshop
Deep learning fundamentals workshopDeep learning fundamentals workshop
Deep learning fundamentals workshop
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Deep learning
Deep learningDeep learning
Deep learning
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Intro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdfIntro.to RNN (Recurrent Neural Network).pdf
Intro.to RNN (Recurrent Neural Network).pdf
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 

More from FEG

學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdfFEG
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdfFEG
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318FEG
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practicesFEG
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratchFEG
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratchFEG
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_RulesFEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)FEG
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis VisualizationFEG
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)FEG
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)FEG
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)FEG
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised LearningFEG
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in ExcelFEG
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdfFEG
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdfFEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdfFEG
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdfFEG
 

More from FEG (20)

學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Sequence Model pytorch at colab with gpu.pdf

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Experiences • Telecom big data Innovation • Retail Media Network (RMN) • Customer Data Platform (CDP) • Know-your-customer (KYC) • Digital Transformation • Research • Data Ops (ML Ops) • Business Data Analysis, AI 2
  • 3. Tutorial Content 3 Homework Sequence Models Sequence Data & Sequence Models • Vanilla RNN (Recurrent Neural Network) • LSTM (Long short term memory) • GRU (Gated recurrent unit) • Transformer Model
  • 4. Pytorch tutorial • Pytorch Tutorials • Welcome to PyTorch Tutorials — PyTorch Tutorials 2.2.1+cu121 documentation 4
  • 5. Sequence Data: Types • Sequence Data: • The order of elements is significant. • It can have variable lengths. In natural language, sentences can be of different lengths, and in genomics, DNA sequences can vary in length depending on the organism. 5
  • 6. Sequence Data: Examples • Examples: • Image Captioning. 6
  • 7. Sequence Data: Examples • Examples: • Speech Signals. 7
  • 8. Sequence Data: Examples • Examples: • Time Series Data, such as time stamped transactional data. 8 Lagged features can help you capture the patterns, trends, and seasonality in your time series data, as well as the effects of external factors or events.
  • 9. • Examples: • Language Translation (Natural Language Text). • Chatbot. • Text summarization. • Text categorization. • Parts of speech tagging. • Stemming. • Text mining. Sequence Data: Examples 9
  • 10. 補充 • Pytorch 模型 forward 練習 10
  • 11. 補充 • RNNs 輸入輸出 11 RNN — PyTorch 2.2 documentation
  • 13. Sequence Models: applications • Sequence models are a class of machine learning models designed for tasks that involve sequential data, where the order of elements in the input is important. • Model applications: 13 one to one: Fixed length input/output, a general neural network model. one to many: Image captioning. many to one: Sentiment analysis. many to many: machine translation.
  • 14. Sequence Model: Recurrent Neural Networks (RNNs) • RNNs are a fundamental type of sequence model. • They process sequences one element at a time sequentially while maintaining an internal hidden state that stores information about previous elements in the sequence. • Traditional RNNs suffer from the Vanishing gradient problem, which limits their ability to capture long-range dependencies. 14
  • 15. Sequence Model: Recurrent Neural Networks (RNNs) 15
  • 16. • Stacked RNNs also called Deep RNNs. • The hidden state is responsible for memorizing the information from the previous timestep and using that for further adjustment of weights in Training a model. 16 Sequence Model: Recurrent Neural Networks (RNNs) If your data sequence is short then don’t use more that 2-3 layers because un-necessarily extra training time may lead to make your model un-optimized. Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 )
  • 17. Sequence Model: Long Short-Term Memory Networks (LSTM) • They are a type of RNNs designed to overcome the Vanishing gradient problem. • They introduce specialized memory cells and gating mechanisms that allow them to capture and preserve information over long sequences. • Gates (input, forget, and output) to regulate the flow of information. • They usually perform better than Hidden Markov Models (HMM). 17 (梯度消失問題仍存在,只是相較 RNNs 會好一些而已)
  • 18. Sequence Model: Long Short-Term Memory Networks (LSTM) 18 Pytorch_lstm_02.ipynb
  • 19. Sequence Model: Gated Recurrent Units (GRUs) • They are another variant of RNNs that are similar to LSTM but with a simplified structure. • They also use gating mechanisms to control the flow of information within the network. • Gates (rest and update) to regulate the flow of information • They are computationally more efficient than LSTM while still being able to capture dependencies in sequential data. 19 pytorch_gru_01.ipynb (權重參數的數量比較小一些,所以收斂速度上會比較快)
  • 21. Sequence Model: Transformer Models • They are a more recent and highly effective architecture for sequence modeling. • They move away from recurrence and rely on a self-attention mechanism to process sequences in parallel and capture long-term dependencies in data, making them more efficient than traditional RNNs. • Self-attention mechanisms to weight the importance of different parts of input data. • They have been particularly successful in NLP tasks and have led to models like BERT, GPT, and others. 21
  • 22. Sequence Model: Transformer Models • In the paper Attention is all you need (2017). • It abandons traditional CNNs and RNNs and entire network structure composed of Attention mechanisms and FNNs. 22 https://arxiv.org/abs/1706.03762 • It includes Encoder and Decoder.
  • 23. • Bidirectional Encoder Representations from Transformers • 使用 Encoder 進行編碼,採用 self-attention 機制在編碼 token 同時考 慮上下文的 token,上下文的意思就是雙向的意思 (bidirectional)。 • 採用 Self-Attention多頭機制提取全面性信息。 • 是一種 pre-trained模型,可視為一種特徵提取器。 • 使用 Transformer 的 Encoder 進行特徵提取,可說 Encoder 就是 BERT 模型。 23 Sequence Model: Transformer Models (BERT)
  • 24. Sequence Model: Transformer Models (Encoder) • How are you? • 你好嗎? Tokenizer <start> “How” “are” “you” “?” <end> Vocabulary How 1 are 10 you 300 ? 4 Word to index mapping d=512 Add PE (Positional Encoding) Multi-Head- Attention (HMA) In parallel Feed forward Residual N O R M Residual N O R M Block Block Block vectors
  • 25. Sequence Model: Multi-head-attention • Attention mechanism: • Select more useful information from words. • Q, K and V are obtained by applying a linear transformation to the input word vector x. • Each matrix W can be learned through training. 25 Q: Information to be queried K: Vectors being queried V: Values obtained from the query (多頭的意思,想像成 CNN 中多個卷積核的作用) 1. MLP/CNN的情況不同,不同層有不同的參數,因此有各自的梯度。 2. RNN權重參數是共享,最終的計算方式是梯度總和下更新;而梯度不會消失,只是距 離遠的梯度會消失而已,所以梯度被近距離的主導,缺乏遠距離的依賴關係。 3. 難以學到遠距離的依賴關係,因此需要導入 attention 機制。
  • 26. Sequence Model: Transformer Models (Decoder) 26 vectors Q,K,V B O S 你 好 嗎 你 好 嗎 ? max max max max 你 0.8 好 0 嗎 0.1 … … END 0 Size V (Common characters) ? E N D max Add PE Masked MHA Cross Attention Feed Forward 你 1 好 0 嗎 0 … … … 0 Ground truth Minimize cross entropy
  • 29. Homework • 改寫 HW02.ipynb,新增 LSTM、GRU 兩種模型,比較 RMSE 變化與您使用的參數 29 RNN Types Train Score Test Score Parameters Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500 LSTM GRU