5_RNN_LSTM.pdf

國立臺北護理健康大學 NTUNHS
RNN, LSTM
(On text mining)
Orozco Hsu
2022-05-23
1

About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2

Tutorial
Content
3
RNN, LSTM
Homework
Word embedding
Pre-trained word embedding

Code
• Download code
• https://github.com/orozcohsu/ntunhs_2022_01.git
• Folder/file
• 20220523_inter_master/run.ipynb
4

Code
5
Click button
Open it with Colab
Copy it to your
google drive
Check your google
drive

NLP v.s. NLU
6
• NLP
• Parsing
• Stop-word removal
• Part-of-speech (POS) tagging
• Tokenization
• Many more
• NLU
• Interpret the natural language
• Derive meaning
• Identify context
• Draw insights

NLU
• In NLU, various ML algorithms are used to identify the sentiment,
perform Name Entity Recognition (NER), process semantics, etc.
• NLU algorithms often operate on text that has already been
standardized by text pre-processing steps.
7

Word embedding
• Word embedding is a term used for the representation of words for
text analysis
• Typically in the form of a real-valued vector that encodes the meaning of the
word
• The words that are closer in the vector space are expected to be similar in
meaning.
8

Processing Categorical Features
10
Age Gender Nationality
35 Male US
31 Male China
29 Female Inida
27 Male US
Age is a numeric feature because it is in order
35 years old is older than 31 years old
Gender is a binary feature
Represent Female by 0
Represent Male by 1
Nationality is a categorical feature
There are 197 countries in the world
Represent Nationality by numeric vectors

Processing Categorical Features
11
Age Gender Nationality
35 1 [1,0,0,0…0]
31 1 [0,1,0,0…0]
29 0 [0,0,1,0…0]
27 1 [1,0,0,0…0]
Apply Nationality to one-hot encoding

Processing Text Data (Step1)
• Step1: Tokenization (Text to words)
• We are given a piece of text (string)
S = … to be or not to be …
• Split the string into a list of words
L = […, to, be, or, not, to, be, …]
12

• Step2: Count word frequencies
• Build a dictionary to count word’s frequencies
• Initially, the dictionary is empty
• If word w is not in the dictionary, add (w,1) to
the dictionary
• If word w is in the dictionary, add its frequency
counter
13
Key
(word)
Value
(frequency)
a 219
to 398
hamlet 5
be 131
not 499
prince 12
kill 31

• Sort the dictionary so that the frequency is
in the descending order
• Replace frequency by index (starting from 1)
• If the vocabulary is too big, only keeps the
most frequent words
• Infrequent words are usually meaningless
• Typo is also an example
• Bigger vocabulary causes higher-dim one-hot
vectors (Heavier computation)
• More parameters in word-embedding layer
14
Key
(word)
Index
not 1
to 2
a 3
be 4
kill 5
prince 6
hamlet 7
or 8

• Step3: One-hot encoding
• Mapping every word to its index
• For example:
Words: [to, be, or, not, to, be]
Indices: [2, 4, 8, 1, 2 ,4]
15

16

Text Processing and Word Embedding (Step4)
• Step4: Aligning sequence
• Cut off the text to keep w words (ex: w=7)
• Keep pre or post w words
17
the fat cat sat still on the big red mat the fat cat sat still on the
on the big red mat NULL NULL on the big red mat
pre
post

Word embedding
• Mapping the one-hot vectors to low-dimensional vectors
18
ei is the one-hot vector of the i-th word dictionary
P is the parameter matrix which can be learned from training data

RNN
• Recurrent Neural Networks work (RNN) in three stages
• In the first stage, it moves forward through the hidden layer
and makes a prediction
• In the second stage, it compares its prediction with the true
value using the loss function. Loss function showcases how
well a model is performing. The lower the value of the loss
function, the better is the model.
• In the final stage, it uses the error values in back-
propagation, which further calculates the gradient for each
point (node). The gradient is the value used to adjust the
weights of the network at each point.
20

RNN
21
One to one (ex. Image classification, one image as input and output a prediction possibility)
Many to one (ex. More texts as input and output a sentiment result or prediction of next character)
Many to many (ex. Text translation)

RNN
• RNN is good to model the sequential data
• Text/Speech data
• Time series data
• Message-bot agent
22

RNN
• Gradient vanishing/exploding
• RNN that while back-propagating, gradients might be vanishing or gradients
might be exploding
• If the weight is < 1, it will either decay to zero exponentially fast in t′−t, or
grow exponentially fast
25
Ref: https://prvnk10.medium.com/vanishing-and-exploding-gradients-52af750ede32

Exercise
• Try to run this code
26
RNN01

Long short term memory (LSTM)
• Long Short Term Memory Network is an advanced RNN, a sequential
network, that allows information to persist
• LSTM/GRU which uses selective read, write and forget to pass on the
relevant information to the state vector
• Resolve RNN short memory problem, LSTM uses forget/ input gate to
selective ignore the passing information and avoid the gradient
vanish/explode problem
27

LSTM: Conveyor belt
• Passing information directly flows to the next
29

LSTM: Input gate
• Decides which value of the conveyor belt it will be updated
31

LSTM: New value
• To be added to the conveyor belt
32

LSTM: Update the conveyor belt
33

LSTM: Update the conveyor belt
34

LSTM: Output gate
• Decides what flows from the conveyor belt to the state ht
35

Exercise
39
RNN_LSTM_01 (Day master only)
RNN_LSTM_02 (International master class only)

Pre-trained word embedding
• Word2vec (2013) from Google team (CBOW and Skip-gram)
• https://code.google.com/archive/p/word2vec/
• Glove (Global vectors for word Rrepresentation) (2014)
• https://nlp.stanford.edu/projects/glove/
• Fasttext from Facebook (2017)
• https://fasttext.cc/
• Spacy (Industrial-Strength natural language processing) (2015)
• https://spacy.io/
40

Spacy
• Support for 64+ languages
• Pre-trained word vectors
• State-of-the-art speed
• Linguistically-motivated tokenization
• Components for named entity recognition, part-of-speech tagging,
dependency parsing, sentence segmentation, text classification,
lemmatization, morphological analysis, entity linking and more
• https://spacy.io/models/zh (Chinese words embedding)
41

Pre-trained the embedding layer
42
Ref: https://towardsdatascience.com/pre-trained-word-embedding-for-text-classification-end2end-approach-5fbf5cd8aead
• Pre-trained word
embedding is an
example of Transfer
Learning
• The main idea
behind it is to use
public embedding
that are already
trained on large
datasets

Exercise
44
LSTM_04_pretrain_spacy

Homework
• Explain what is RNN/ LSTM/ word embedding ?
• Try to create the (Chinese word level) embedding and make the
sentiment prediction (Day master class)
• Prepare your own sentiment dataset with (text, level) columns
• Follow the hw01 to continue the rest of work
• Refer RNN01 to parse your sentence
• Refer LSTM_04_pretrain_spacy to make your model and make prediction
45

More
• About spacy Chinese NLP
• https://zhuanlan.zhihu.com/p/353110681
• About Chinese NLP tutorial
• https://drive.google.com/file/d/1LdHs0vPlc7MWbM-
emv8Wwz1lQsSKP7Zi/view?usp=sharing
46

5_RNN_LSTM.pdf

Recommended

Recommended

More Related Content

Similar to 5_RNN_LSTM.pdf

Similar to 5_RNN_LSTM.pdf (20)

More from FEG

More from FEG (20)

Recently uploaded

Recently uploaded (20)

5_RNN_LSTM.pdf