SlideShare a Scribd company logo
Copyright © 2019-2021 JianKai Wang
Deep Learning to Text
JianKai Wang, 王建凱
https://jiankaiwang.no-ip.biz/
https://www.linkedin.com/in/wangjiankai/
https://github.com/jiankaiwang/sophia
from RNN to Transformer
Copyright © 2019-2021 JianKai Wang
Tasks on Text
2
Model
Input Sequence
● Classification
● Extraction
● Output Sequence
● Parts of Inputs for the bigger model
● …
Copyright © 2019-2021 JianKai Wang
What is Seq2Seq? Why is Seq2Seq?
3
Seq2Seq Model
Input Sequence
Output Sequence
Seq2Seq Model
[NMT-From]
How are you?
[Q&A-Questions]
Where is the apply form?
[Text Header]
About seq2seq, he said:
[History Values]
[0.98, 0.97, 0.96]
...
[NMT-To]
你好嗎?
[Q&A-Answers]
Under the doc button.
[Text Generation]
he said: It is awesome.
[Multistep Values]
[0.95, 0.94, 0.93]
...
Copyright © 2019-2021 JianKai Wang
Goals
● Train a model that takes an input sequence and outputs a
corresponding sequence.
● If the output sequence is going to:
○ select categories: the model would be a classification problem.
■ like Character-based / Alphabet-based
■ Like Token-based: Subwords (Tensor: Ten + sor) or Words (Tensor)
○ predict numbers: the model would be a regression problem.
○ generate feature vectors: the model could be one of the above
4
Copyright © 2019-2021 JianKai Wang
Backbone of Seq2Seq
5
<S> How are you ? <E>
你
<S>
好 嗎 ? <E>
Intermediate Context
Encoder
Decoder
Copyright © 2019-2021 JianKai Wang
Time Series using RNN Cells
6
RNN
Xi
hi
RNN
X1
h0
h1
RNN
h2
X2
RNN
hn
Xn
... RNN
Xi
hi
Standard Unfolded Concept
U
W
From_RNN_to_LSTM.ipynb
Copyright © 2019-2021 JianKai Wang
RNN Issues
7
Solved by Gradient Clipping Solved by Gated Cell
From_RNN_to_LSTM.ipynb
Copyright © 2019-2021 JianKai Wang
Gated RNN Cell
8
RNN Cell GRU Cell
From_RNN_to_LSTM.ipynb
LSTM Cell
…
Copyright © 2019-2021 JianKai Wang
Bidirectional Model
9
From_RNN_to_LSTM.ipynb
RNN Cell
Gated Cell: LSTM, GRU
Bidirectional models
Attention models
LSTM + Bidirectional
Embedding:
m: encoder.vocabulary.size()
n: representative features
encoder /w decoder
Bidirectional:
forward + backward direction
concatenated both results
Copyright © 2019-2021 JianKai Wang
More Stacks
10
Different types of Models Bidirectional Model with more stacks
tf.keras.layers.bidirectional(
layer=tf.keras.layers.LSTM(...),
return_sequences=True,
)
From_RNN_to_LSTM.ipynb
Andrej Karpathy blog (2015)
Copyright © 2019-2021 JianKai Wang
It’s your turn! https://github.com/jiankaiwang/sophia
11
Copyright © 2019-2021 JianKai Wang
Seq2Seq and how attention works?
12
Encoder Decoder
<start> <start>
<end>
(1) Weighted ?
(2) Return Sequence?
(4) Predictive Position?
(3) Reference to encoder ?
Copyright © 2019-2021 JianKai Wang
Additive and Multiplicative Attention
13
TF2Keras_NMT_Attention.ipynb
Dzmitry Bahdanau, et al. (2014)
Copyright © 2019-2021 JianKai Wang
It’s your turn! https://github.com/jiankaiwang/sophia
14
Copyright © 2019-2021 JianKai Wang
Global and Local Attention Model
15
Minh-Thang Luong, et al. (2015)
Copyright © 2019-2021 JianKai Wang
Decoder
Here comes Transformer!
16
RNN Cell
Gated Cell: LSTM, GRU
Bidirectional models
Attention models Input Sequence
Output Sequence
Encoder
(1) Why RNN?
Other unit?
(2) encoder-decoder model?
Copyright © 2019-2021 JianKai Wang
Positional Encoding
17
● Adding the encoding to the
embedding vector.
● Not necessary, recommended for
the text, and time series task.
TF2Keras_Transformer_LanguageUnderstanding.ipynb
Copyright © 2019-2021 JianKai Wang
Masking
18
TF2Keras_Transformer_LanguageUnderstanding.ipynb
Input Sequences
[[2,7,36,25,14,3,0,0,0],
[2,19,20,13,7,9,3,0,0],
…]
Padded Mask
[[0,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,1,1],
…]
● Mask all the pad tokens in the batch of
sequence.
● Make sure these padded tokens would not
be trained.
Look-ahead Mask
[[0,1,1,1,1],
[0,0,1,1,1],
[0,0,0,1,1],
[0,0,0,0,1],
[0,0,0,0,0]]
● Mask all future tokens while training.
● Predict the future token by the previous
token only.
Copyright © 2019-2021 JianKai Wang
Scaled dot product attention
19
TF2Keras_Transformer_LanguageUnderstanding.ipynb
● Query: the output from decoder in the previous attention
● Key: the input from encoder in the previous attention
● Value: the hidden from encoder in previous attention
● Mask: for the padded mask or the look ahead mask
● Scale: smart operation for softmax
● Softmax generates the relation between Q and K
Copyright © 2019-2021 JianKai Wang
Multi-head attention and Feed forward network
20
● Consists of h times scaled dot product attention.
● Each V, K, Q are put through a linear layer first.
● In the example code, we split the dimension with
the number of the heads (h in the diagram).
E.g. 512 (dimensions) / 8 (heads) = 64 (depth)
● We concatenate the results from multiple heads by
using transpose and reshape due to the splitting
dimensions.
● The FFN is used after the multi-head attention for
summarizing.
● The encoder’s FFN is more like the hidden state for
the decoder.
TF2Keras_Transformer_LanguageUnderstanding.ipynb
Copyright © 2019-2021 JianKai Wang
Transformer with Encoder and Decoder
21
Multi-head attention with padding mask
Receiving the output
as the key (K) and the value (V)
Receiving the output from the first
attention layer as the query (Q)
The output shape:
[batch_size, dec_input, dec_vocab_size]
TF2Keras_Transformer_LanguageUnderstanding.ipynb
Copyright © 2019-2021 JianKai Wang
It’s your turn! https://github.com/jiankaiwang/sophia
22
Copyright © 2019-2021 JianKai Wang
BERT Input Representation
23
Copyright © 2019-2021 JianKai Wang
BERT (Bidirectional Encoder Representations from Transformers)
24
Copyright © 2019-2021 JianKai Wang
Reference
● H Sak, et al. (2014) Long short-term memory recurrent neural network architectures for large scale
acoustic modeling. INTERSPEECH
● Dzmitry Bahdanau, et al. (2014) Neural Machine Translation by Jointly Learning to Align and Translate
arXiv:1409.0473
● Minh-Thang Luong, et al. (2015) Effective Approaches to Attention-based Neural Machine Translation
arXiv:1508.04025
● Ashish Vaswani, et al. (2017) Attention Is All You Need arXiv:1706.03762v5
● Jacob Devlin, et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding arXiv:1810.04805
● Retrieved 2022, from https://www.tensorflow.org/
25
Copyright © 2019-2021 JianKai Wang
Take-Home Message
● From RNN, LSTM, attention, to transformer, this is the path
for deep learning on text.
● From the earlier research, we can understand how to
enhance the unit or the layer in the model.
JianKai Wang, 王建凱
gljankai@gmail.com
https://jiankaiwang.no-ip.biz/
https://www.linkedin.com/in/wangjiankai/
26

More Related Content

Similar to Deep Learning to Text

Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
secure lazy binding, and the 64bit time_t development process by Philip Guenther
secure lazy binding, and the 64bit time_t development process by Philip Guenthersecure lazy binding, and the 64bit time_t development process by Philip Guenther
secure lazy binding, and the 64bit time_t development process by Philip Guenther
eurobsdcon
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Databricks
 
BALAJI K _Resume
BALAJI K _ResumeBALAJI K _Resume
BALAJI K _ResumeBalaji K
 
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion SchemesHigh Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
Silicon Mentor
 
IRJET - From C to JAVA
IRJET -  	  From C to JAVAIRJET -  	  From C to JAVA
IRJET - From C to JAVA
IRJET Journal
 
Lte epc trial experience
Lte epc trial experienceLte epc trial experience
Lte epc trial experience
Hussien Mahmoud
 
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and NoteworthyJava EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
Peter Pilgrim
 
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.PilgrimJavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
Payara
 
2011.02.18 marco parenzan - case study. conversione di una applicazione for...
2011.02.18   marco parenzan - case study. conversione di una applicazione for...2011.02.18   marco parenzan - case study. conversione di una applicazione for...
2011.02.18 marco parenzan - case study. conversione di una applicazione for...
Marco Parenzan
 
CSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.pptCSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.ppt
RashedurRahman18
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
Mark Stoodley
 
C Interview Basic Q&A- 1
C Interview Basic Q&A- 1C Interview Basic Q&A- 1
C Interview Basic Q&A- 1
Jyoti Rawat
 
Duel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & SkiaDuel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & Skia
Samsung Open Source Group
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
Melissa Luster
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
IRJET Journal
 
CDK - The next big thing - Quang Phuong
CDK - The next big thing - Quang PhuongCDK - The next big thing - Quang Phuong
CDK - The next big thing - Quang Phuong
Vietnam Open Infrastructure User Group
 
Dynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdfDynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdf
AgneshShetty
 
How to Use OpenGL/ES on Native Activity
How to Use OpenGL/ES on Native ActivityHow to Use OpenGL/ES on Native Activity

Similar to Deep Learning to Text (20)

Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
secure lazy binding, and the 64bit time_t development process by Philip Guenther
secure lazy binding, and the 64bit time_t development process by Philip Guenthersecure lazy binding, and the 64bit time_t development process by Philip Guenther
secure lazy binding, and the 64bit time_t development process by Philip Guenther
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
BALAJI K _Resume
BALAJI K _ResumeBALAJI K _Resume
BALAJI K _Resume
 
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion SchemesHigh Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
 
IRJET - From C to JAVA
IRJET -  	  From C to JAVAIRJET -  	  From C to JAVA
IRJET - From C to JAVA
 
Lte epc trial experience
Lte epc trial experienceLte epc trial experience
Lte epc trial experience
 
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and NoteworthyJava EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
Java EE & Glass Fish User Group: Digital JavaEE 7 - New and Noteworthy
 
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.PilgrimJavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
JavaEE & GlassFish UG - Digital JavaEE 7 New & Noteworthy by P.Pilgrim
 
2011.02.18 marco parenzan - case study. conversione di una applicazione for...
2011.02.18   marco parenzan - case study. conversione di una applicazione for...2011.02.18   marco parenzan - case study. conversione di una applicazione for...
2011.02.18 marco parenzan - case study. conversione di una applicazione for...
 
CSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.pptCSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.ppt
 
verification resume
verification resumeverification resume
verification resume
 
Oh the compilers you'll build
Oh the compilers you'll buildOh the compilers you'll build
Oh the compilers you'll build
 
C Interview Basic Q&A- 1
C Interview Basic Q&A- 1C Interview Basic Q&A- 1
C Interview Basic Q&A- 1
 
Duel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & SkiaDuel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & Skia
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
 
CDK - The next big thing - Quang Phuong
CDK - The next big thing - Quang PhuongCDK - The next big thing - Quang Phuong
CDK - The next big thing - Quang Phuong
 
Dynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdfDynamic sorting algorithm vizualizer.pdf
Dynamic sorting algorithm vizualizer.pdf
 
How to Use OpenGL/ES on Native Activity
How to Use OpenGL/ES on Native ActivityHow to Use OpenGL/ES on Native Activity
How to Use OpenGL/ES on Native Activity
 

More from Jian-Kai Wang

Kubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and ServicesKubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and Services
Jian-Kai Wang
 
Tools for the Reality Technology (實境技術工具介紹)
Tools for the Reality Technology (實境技術工具介紹)Tools for the Reality Technology (實境技術工具介紹)
Tools for the Reality Technology (實境技術工具介紹)
Jian-Kai Wang
 
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
Jian-Kai Wang
 
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
Jian-Kai Wang
 
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
Jian-Kai Wang
 
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
Jian-Kai Wang
 
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
Jian-Kai Wang
 
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
Jian-Kai Wang
 
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
Jian-Kai Wang
 
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
Jian-Kai Wang
 
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
Jian-Kai Wang
 

More from Jian-Kai Wang (11)

Kubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and ServicesKubernetes Basis: Pods, Deployments, and Services
Kubernetes Basis: Pods, Deployments, and Services
 
Tools for the Reality Technology (實境技術工具介紹)
Tools for the Reality Technology (實境技術工具介紹)Tools for the Reality Technology (實境技術工具介紹)
Tools for the Reality Technology (實境技術工具介紹)
 
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
Tensorflow Extended: 端至端機器學習框架: 從概念到實作 (Tensorflow Extended: An end-to-end ML...
 
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
從圖像辨識到物件偵測,進階的圖影像人工智慧 (From Image Classification to Object Detection, Advance...
 
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
使用 Keras, Tensorflow 進行分散式訓練初探 (Distributed Training in Keras and Tensorflow)
 
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
2017 更新版 : 使用 Power BI 資料分析工具於傳染病應用 (Power BI Platform for Communicable Disea...
 
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
自動化資料準備供分析與視覺化應用 : 理論與實作 (automatic data preparation for data analyzing and v...
 
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
自動化系統建立 : 理論與實作 (Automatic Manufacturing System in Data Analysis)
 
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
CKAN : 資料開放平台技術介紹 (CAKN : Technical Introduction to Open Data Portal)
 
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
疾病管制署資料開放平台介紹 (Introduction to Taiwan Centers for Disease Control Open Data P...
 
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
Power BI 工具於傳染病應用 (Power BI Platform for Communicable Diseases)
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Deep Learning to Text

  • 1. Copyright © 2019-2021 JianKai Wang Deep Learning to Text JianKai Wang, 王建凱 https://jiankaiwang.no-ip.biz/ https://www.linkedin.com/in/wangjiankai/ https://github.com/jiankaiwang/sophia from RNN to Transformer
  • 2. Copyright © 2019-2021 JianKai Wang Tasks on Text 2 Model Input Sequence ● Classification ● Extraction ● Output Sequence ● Parts of Inputs for the bigger model ● …
  • 3. Copyright © 2019-2021 JianKai Wang What is Seq2Seq? Why is Seq2Seq? 3 Seq2Seq Model Input Sequence Output Sequence Seq2Seq Model [NMT-From] How are you? [Q&A-Questions] Where is the apply form? [Text Header] About seq2seq, he said: [History Values] [0.98, 0.97, 0.96] ... [NMT-To] 你好嗎? [Q&A-Answers] Under the doc button. [Text Generation] he said: It is awesome. [Multistep Values] [0.95, 0.94, 0.93] ...
  • 4. Copyright © 2019-2021 JianKai Wang Goals ● Train a model that takes an input sequence and outputs a corresponding sequence. ● If the output sequence is going to: ○ select categories: the model would be a classification problem. ■ like Character-based / Alphabet-based ■ Like Token-based: Subwords (Tensor: Ten + sor) or Words (Tensor) ○ predict numbers: the model would be a regression problem. ○ generate feature vectors: the model could be one of the above 4
  • 5. Copyright © 2019-2021 JianKai Wang Backbone of Seq2Seq 5 <S> How are you ? <E> 你 <S> 好 嗎 ? <E> Intermediate Context Encoder Decoder
  • 6. Copyright © 2019-2021 JianKai Wang Time Series using RNN Cells 6 RNN Xi hi RNN X1 h0 h1 RNN h2 X2 RNN hn Xn ... RNN Xi hi Standard Unfolded Concept U W From_RNN_to_LSTM.ipynb
  • 7. Copyright © 2019-2021 JianKai Wang RNN Issues 7 Solved by Gradient Clipping Solved by Gated Cell From_RNN_to_LSTM.ipynb
  • 8. Copyright © 2019-2021 JianKai Wang Gated RNN Cell 8 RNN Cell GRU Cell From_RNN_to_LSTM.ipynb LSTM Cell …
  • 9. Copyright © 2019-2021 JianKai Wang Bidirectional Model 9 From_RNN_to_LSTM.ipynb RNN Cell Gated Cell: LSTM, GRU Bidirectional models Attention models LSTM + Bidirectional Embedding: m: encoder.vocabulary.size() n: representative features encoder /w decoder Bidirectional: forward + backward direction concatenated both results
  • 10. Copyright © 2019-2021 JianKai Wang More Stacks 10 Different types of Models Bidirectional Model with more stacks tf.keras.layers.bidirectional( layer=tf.keras.layers.LSTM(...), return_sequences=True, ) From_RNN_to_LSTM.ipynb Andrej Karpathy blog (2015)
  • 11. Copyright © 2019-2021 JianKai Wang It’s your turn! https://github.com/jiankaiwang/sophia 11
  • 12. Copyright © 2019-2021 JianKai Wang Seq2Seq and how attention works? 12 Encoder Decoder <start> <start> <end> (1) Weighted ? (2) Return Sequence? (4) Predictive Position? (3) Reference to encoder ?
  • 13. Copyright © 2019-2021 JianKai Wang Additive and Multiplicative Attention 13 TF2Keras_NMT_Attention.ipynb Dzmitry Bahdanau, et al. (2014)
  • 14. Copyright © 2019-2021 JianKai Wang It’s your turn! https://github.com/jiankaiwang/sophia 14
  • 15. Copyright © 2019-2021 JianKai Wang Global and Local Attention Model 15 Minh-Thang Luong, et al. (2015)
  • 16. Copyright © 2019-2021 JianKai Wang Decoder Here comes Transformer! 16 RNN Cell Gated Cell: LSTM, GRU Bidirectional models Attention models Input Sequence Output Sequence Encoder (1) Why RNN? Other unit? (2) encoder-decoder model?
  • 17. Copyright © 2019-2021 JianKai Wang Positional Encoding 17 ● Adding the encoding to the embedding vector. ● Not necessary, recommended for the text, and time series task. TF2Keras_Transformer_LanguageUnderstanding.ipynb
  • 18. Copyright © 2019-2021 JianKai Wang Masking 18 TF2Keras_Transformer_LanguageUnderstanding.ipynb Input Sequences [[2,7,36,25,14,3,0,0,0], [2,19,20,13,7,9,3,0,0], …] Padded Mask [[0,0,0,0,0,0,1,1,1], [0,0,0,0,0,0,0,1,1], …] ● Mask all the pad tokens in the batch of sequence. ● Make sure these padded tokens would not be trained. Look-ahead Mask [[0,1,1,1,1], [0,0,1,1,1], [0,0,0,1,1], [0,0,0,0,1], [0,0,0,0,0]] ● Mask all future tokens while training. ● Predict the future token by the previous token only.
  • 19. Copyright © 2019-2021 JianKai Wang Scaled dot product attention 19 TF2Keras_Transformer_LanguageUnderstanding.ipynb ● Query: the output from decoder in the previous attention ● Key: the input from encoder in the previous attention ● Value: the hidden from encoder in previous attention ● Mask: for the padded mask or the look ahead mask ● Scale: smart operation for softmax ● Softmax generates the relation between Q and K
  • 20. Copyright © 2019-2021 JianKai Wang Multi-head attention and Feed forward network 20 ● Consists of h times scaled dot product attention. ● Each V, K, Q are put through a linear layer first. ● In the example code, we split the dimension with the number of the heads (h in the diagram). E.g. 512 (dimensions) / 8 (heads) = 64 (depth) ● We concatenate the results from multiple heads by using transpose and reshape due to the splitting dimensions. ● The FFN is used after the multi-head attention for summarizing. ● The encoder’s FFN is more like the hidden state for the decoder. TF2Keras_Transformer_LanguageUnderstanding.ipynb
  • 21. Copyright © 2019-2021 JianKai Wang Transformer with Encoder and Decoder 21 Multi-head attention with padding mask Receiving the output as the key (K) and the value (V) Receiving the output from the first attention layer as the query (Q) The output shape: [batch_size, dec_input, dec_vocab_size] TF2Keras_Transformer_LanguageUnderstanding.ipynb
  • 22. Copyright © 2019-2021 JianKai Wang It’s your turn! https://github.com/jiankaiwang/sophia 22
  • 23. Copyright © 2019-2021 JianKai Wang BERT Input Representation 23
  • 24. Copyright © 2019-2021 JianKai Wang BERT (Bidirectional Encoder Representations from Transformers) 24
  • 25. Copyright © 2019-2021 JianKai Wang Reference ● H Sak, et al. (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. INTERSPEECH ● Dzmitry Bahdanau, et al. (2014) Neural Machine Translation by Jointly Learning to Align and Translate arXiv:1409.0473 ● Minh-Thang Luong, et al. (2015) Effective Approaches to Attention-based Neural Machine Translation arXiv:1508.04025 ● Ashish Vaswani, et al. (2017) Attention Is All You Need arXiv:1706.03762v5 ● Jacob Devlin, et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv:1810.04805 ● Retrieved 2022, from https://www.tensorflow.org/ 25
  • 26. Copyright © 2019-2021 JianKai Wang Take-Home Message ● From RNN, LSTM, attention, to transformer, this is the path for deep learning on text. ● From the earlier research, we can understand how to enhance the unit or the layer in the model. JianKai Wang, 王建凱 gljankai@gmail.com https://jiankaiwang.no-ip.biz/ https://www.linkedin.com/in/wangjiankai/ 26