SlideShare a Scribd company logo
國立臺北護理健康大學 NTUNHS
RNN, LSTM
(On text mining)
Orozco Hsu
2022-05-23
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
RNN, LSTM
Homework
Word embedding
Pre-trained word embedding
Code
• Download code
• https://github.com/orozcohsu/ntunhs_2022_01.git
• Folder/file
• 20220523_inter_master/run.ipynb
4
Code
5
Click button
Open it with Colab
Copy it to your
google drive
Check your google
drive
NLP v.s. NLU
6
• NLP
• Parsing
• Stop-word removal
• Part-of-speech (POS) tagging
• Tokenization
• Many more
• NLU
• Interpret the natural language
• Derive meaning
• Identify context
• Draw insights
NLU
• In NLU, various ML algorithms are used to identify the sentiment,
perform Name Entity Recognition (NER), process semantics, etc.
• NLU algorithms often operate on text that has already been
standardized by text pre-processing steps.
7
Word embedding
• Word embedding is a term used for the representation of words for
text analysis
• Typically in the form of a real-valued vector that encodes the meaning of the
word
• The words that are closer in the vector space are expected to be similar in
meaning.
8
Word embedding
9
Processing Categorical Features
10
Age Gender Nationality
35 Male US
31 Male China
29 Female Inida
27 Male US
Age is a numeric feature because it is in order
35 years old is older than 31 years old
Gender is a binary feature
Represent Female by 0
Represent Male by 1
Nationality is a categorical feature
There are 197 countries in the world
Represent Nationality by numeric vectors
Processing Categorical Features
11
Age Gender Nationality
35 1 [1,0,0,0…0]
31 1 [0,1,0,0…0]
29 0 [0,0,1,0…0]
27 1 [1,0,0,0…0]
Apply Nationality to one-hot encoding
Processing Text Data (Step1)
• Step1: Tokenization (Text to words)
• We are given a piece of text (string)
S = … to be or not to be …
• Split the string into a list of words
L = […, to, be, or, not, to, be, …]
12
Processing Text Data (Step2)
• Step2: Count word frequencies
• Build a dictionary to count word’s frequencies
• Initially, the dictionary is empty
• If word w is not in the dictionary, add (w,1) to
the dictionary
• If word w is in the dictionary, add its frequency
counter
13
Key
(word)
Value
(frequency)
a 219
to 398
hamlet 5
be 131
not 499
prince 12
kill 31
Processing Text Data (Step2)
• Sort the dictionary so that the frequency is
in the descending order
• Replace frequency by index (starting from 1)
• If the vocabulary is too big, only keeps the
most frequent words
• Infrequent words are usually meaningless
• Typo is also an example
• Bigger vocabulary causes higher-dim one-hot
vectors (Heavier computation)
• More parameters in word-embedding layer
14
Key
(word)
Index
not 1
to 2
a 3
be 4
kill 5
prince 6
hamlet 7
or 8
Processing Text Data (Step3)
• Step3: One-hot encoding
• Mapping every word to its index
• For example:
Words: [to, be, or, not, to, be]
Indices: [2, 4, 8, 1, 2 ,4]
15
Processing Text Data (Step3)
16
Text Processing and Word Embedding (Step4)
• Step4: Aligning sequence
• Cut off the text to keep w words (ex: w=7)
• Keep pre or post w words
17
the fat cat sat still on the big red mat the fat cat sat still on the
on the big red mat NULL NULL on the big red mat
pre
post
Word embedding
• Mapping the one-hot vectors to low-dimensional vectors
18
ei is the one-hot vector of the i-th word dictionary
P is the parameter matrix which can be learned from training data
Word embedding
19
RNN
• Recurrent Neural Networks work (RNN) in three stages
• In the first stage, it moves forward through the hidden layer
and makes a prediction
• In the second stage, it compares its prediction with the true
value using the loss function. Loss function showcases how
well a model is performing. The lower the value of the loss
function, the better is the model.
• In the final stage, it uses the error values in back-
propagation, which further calculates the gradient for each
point (node). The gradient is the value used to adjust the
weights of the network at each point.
20
RNN
21
One to one (ex. Image classification, one image as input and output a prediction possibility)
Many to one (ex. More texts as input and output a sentiment result or prediction of next character)
Many to many (ex. Text translation)
RNN
• RNN is good to model the sequential data
• Text/Speech data
• Time series data
• Message-bot agent
22
RNN
23
RNN
24
RNN
• Gradient vanishing/exploding
• RNN that while back-propagating, gradients might be vanishing or gradients
might be exploding
• If the weight is < 1, it will either decay to zero exponentially fast in t′−t, or
grow exponentially fast
25
Ref: https://prvnk10.medium.com/vanishing-and-exploding-gradients-52af750ede32
Exercise
• Try to run this code
26
RNN01
Long short term memory (LSTM)
• Long Short Term Memory Network is an advanced RNN, a sequential
network, that allows information to persist
• LSTM/GRU which uses selective read, write and forget to pass on the
relevant information to the state vector
• Resolve RNN short memory problem, LSTM uses forget/ input gate to
selective ignore the passing information and avoid the gradient
vanish/explode problem
27
28
LSTM: Conveyor belt
• Passing information directly flows to the next
29
LSTM: Forget gate
30
LSTM: Input gate
• Decides which value of the conveyor belt it will be updated
31
LSTM: New value
• To be added to the conveyor belt
32
LSTM: Update the conveyor belt
33
LSTM: Update the conveyor belt
34
LSTM: Output gate
• Decides what flows from the conveyor belt to the state ht
35
LSTM: update the state
36
More
• Stacked RNN
37
More
• Bidirectional RNN
38
Exercise
• Try to run this code
39
RNN_LSTM_01 (Day master only)
RNN_LSTM_02 (International master class only)
Pre-trained word embedding
• Word2vec (2013) from Google team (CBOW and Skip-gram)
• https://code.google.com/archive/p/word2vec/
• Glove (Global vectors for word Rrepresentation) (2014)
• https://nlp.stanford.edu/projects/glove/
• Fasttext from Facebook (2017)
• https://fasttext.cc/
• Spacy (Industrial-Strength natural language processing) (2015)
• https://spacy.io/
40
Spacy
• Support for 64+ languages
• Pre-trained word vectors
• State-of-the-art speed
• Linguistically-motivated tokenization
• Components for named entity recognition, part-of-speech tagging,
dependency parsing, sentence segmentation, text classification,
lemmatization, morphological analysis, entity linking and more
• https://spacy.io/models/zh (Chinese words embedding)
41
Pre-trained the embedding layer
42
Ref: https://towardsdatascience.com/pre-trained-word-embedding-for-text-classification-end2end-approach-5fbf5cd8aead
• Pre-trained word
embedding is an
example of Transfer
Learning
• The main idea
behind it is to use
public embedding
that are already
trained on large
datasets
43
Exercise
• Try to run this code
44
LSTM_04_pretrain_spacy
Homework
• Explain what is RNN/ LSTM/ word embedding ?
• Try to create the (Chinese word level) embedding and make the
sentiment prediction (Day master class)
• Prepare your own sentiment dataset with (text, level) columns
• Follow the hw01 to continue the rest of work
• Refer RNN01 to parse your sentence
• Refer LSTM_04_pretrain_spacy to make your model and make prediction
45
More
• About spacy Chinese NLP
• https://zhuanlan.zhihu.com/p/353110681
• About Chinese NLP tutorial
• https://drive.google.com/file/d/1LdHs0vPlc7MWbM-
emv8Wwz1lQsSKP7Zi/view?usp=sharing
46

More Related Content

Similar to 5_RNN_LSTM.pdf

Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
Zachary S. Brown
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
Afaq Mansoor Khan
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
Olga Zinkevych
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
Darshan Patel
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
Rimzim Thube
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
Vitomir Kovanovic
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptx
ssuser2624f71
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
Association for Computational Linguistics
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David Martínez Rego
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Jeongkyu Shin
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
Bryan Gummibearehausen
 

Similar to 5_RNN_LSTM.pdf (20)

Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptx
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 

More from FEG

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
FEG
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
FEG
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
FEG
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
FEG
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
FEG
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
FEG
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
FEG
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
FEG
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
FEG
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
FEG
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
FEG
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
FEG
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
FEG
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
FEG
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
FEG
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
FEG
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
FEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
FEG
 

More from FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

5_RNN_LSTM.pdf

  • 1. 國立臺北護理健康大學 NTUNHS RNN, LSTM (On text mining) Orozco Hsu 2022-05-23 1
  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 4. Code • Download code • https://github.com/orozcohsu/ntunhs_2022_01.git • Folder/file • 20220523_inter_master/run.ipynb 4
  • 5. Code 5 Click button Open it with Colab Copy it to your google drive Check your google drive
  • 6. NLP v.s. NLU 6 • NLP • Parsing • Stop-word removal • Part-of-speech (POS) tagging • Tokenization • Many more • NLU • Interpret the natural language • Derive meaning • Identify context • Draw insights
  • 7. NLU • In NLU, various ML algorithms are used to identify the sentiment, perform Name Entity Recognition (NER), process semantics, etc. • NLU algorithms often operate on text that has already been standardized by text pre-processing steps. 7
  • 8. Word embedding • Word embedding is a term used for the representation of words for text analysis • Typically in the form of a real-valued vector that encodes the meaning of the word • The words that are closer in the vector space are expected to be similar in meaning. 8
  • 10. Processing Categorical Features 10 Age Gender Nationality 35 Male US 31 Male China 29 Female Inida 27 Male US Age is a numeric feature because it is in order 35 years old is older than 31 years old Gender is a binary feature Represent Female by 0 Represent Male by 1 Nationality is a categorical feature There are 197 countries in the world Represent Nationality by numeric vectors
  • 11. Processing Categorical Features 11 Age Gender Nationality 35 1 [1,0,0,0…0] 31 1 [0,1,0,0…0] 29 0 [0,0,1,0…0] 27 1 [1,0,0,0…0] Apply Nationality to one-hot encoding
  • 12. Processing Text Data (Step1) • Step1: Tokenization (Text to words) • We are given a piece of text (string) S = … to be or not to be … • Split the string into a list of words L = […, to, be, or, not, to, be, …] 12
  • 13. Processing Text Data (Step2) • Step2: Count word frequencies • Build a dictionary to count word’s frequencies • Initially, the dictionary is empty • If word w is not in the dictionary, add (w,1) to the dictionary • If word w is in the dictionary, add its frequency counter 13 Key (word) Value (frequency) a 219 to 398 hamlet 5 be 131 not 499 prince 12 kill 31
  • 14. Processing Text Data (Step2) • Sort the dictionary so that the frequency is in the descending order • Replace frequency by index (starting from 1) • If the vocabulary is too big, only keeps the most frequent words • Infrequent words are usually meaningless • Typo is also an example • Bigger vocabulary causes higher-dim one-hot vectors (Heavier computation) • More parameters in word-embedding layer 14 Key (word) Index not 1 to 2 a 3 be 4 kill 5 prince 6 hamlet 7 or 8
  • 15. Processing Text Data (Step3) • Step3: One-hot encoding • Mapping every word to its index • For example: Words: [to, be, or, not, to, be] Indices: [2, 4, 8, 1, 2 ,4] 15
  • 16. Processing Text Data (Step3) 16
  • 17. Text Processing and Word Embedding (Step4) • Step4: Aligning sequence • Cut off the text to keep w words (ex: w=7) • Keep pre or post w words 17 the fat cat sat still on the big red mat the fat cat sat still on the on the big red mat NULL NULL on the big red mat pre post
  • 18. Word embedding • Mapping the one-hot vectors to low-dimensional vectors 18 ei is the one-hot vector of the i-th word dictionary P is the parameter matrix which can be learned from training data
  • 20. RNN • Recurrent Neural Networks work (RNN) in three stages • In the first stage, it moves forward through the hidden layer and makes a prediction • In the second stage, it compares its prediction with the true value using the loss function. Loss function showcases how well a model is performing. The lower the value of the loss function, the better is the model. • In the final stage, it uses the error values in back- propagation, which further calculates the gradient for each point (node). The gradient is the value used to adjust the weights of the network at each point. 20
  • 21. RNN 21 One to one (ex. Image classification, one image as input and output a prediction possibility) Many to one (ex. More texts as input and output a sentiment result or prediction of next character) Many to many (ex. Text translation)
  • 22. RNN • RNN is good to model the sequential data • Text/Speech data • Time series data • Message-bot agent 22
  • 25. RNN • Gradient vanishing/exploding • RNN that while back-propagating, gradients might be vanishing or gradients might be exploding • If the weight is < 1, it will either decay to zero exponentially fast in t′−t, or grow exponentially fast 25 Ref: https://prvnk10.medium.com/vanishing-and-exploding-gradients-52af750ede32
  • 26. Exercise • Try to run this code 26 RNN01
  • 27. Long short term memory (LSTM) • Long Short Term Memory Network is an advanced RNN, a sequential network, that allows information to persist • LSTM/GRU which uses selective read, write and forget to pass on the relevant information to the state vector • Resolve RNN short memory problem, LSTM uses forget/ input gate to selective ignore the passing information and avoid the gradient vanish/explode problem 27
  • 28. 28
  • 29. LSTM: Conveyor belt • Passing information directly flows to the next 29
  • 31. LSTM: Input gate • Decides which value of the conveyor belt it will be updated 31
  • 32. LSTM: New value • To be added to the conveyor belt 32
  • 33. LSTM: Update the conveyor belt 33
  • 34. LSTM: Update the conveyor belt 34
  • 35. LSTM: Output gate • Decides what flows from the conveyor belt to the state ht 35
  • 36. LSTM: update the state 36
  • 39. Exercise • Try to run this code 39 RNN_LSTM_01 (Day master only) RNN_LSTM_02 (International master class only)
  • 40. Pre-trained word embedding • Word2vec (2013) from Google team (CBOW and Skip-gram) • https://code.google.com/archive/p/word2vec/ • Glove (Global vectors for word Rrepresentation) (2014) • https://nlp.stanford.edu/projects/glove/ • Fasttext from Facebook (2017) • https://fasttext.cc/ • Spacy (Industrial-Strength natural language processing) (2015) • https://spacy.io/ 40
  • 41. Spacy • Support for 64+ languages • Pre-trained word vectors • State-of-the-art speed • Linguistically-motivated tokenization • Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more • https://spacy.io/models/zh (Chinese words embedding) 41
  • 42. Pre-trained the embedding layer 42 Ref: https://towardsdatascience.com/pre-trained-word-embedding-for-text-classification-end2end-approach-5fbf5cd8aead • Pre-trained word embedding is an example of Transfer Learning • The main idea behind it is to use public embedding that are already trained on large datasets
  • 43. 43
  • 44. Exercise • Try to run this code 44 LSTM_04_pretrain_spacy
  • 45. Homework • Explain what is RNN/ LSTM/ word embedding ? • Try to create the (Chinese word level) embedding and make the sentiment prediction (Day master class) • Prepare your own sentiment dataset with (text, level) columns • Follow the hw01 to continue the rest of work • Refer RNN01 to parse your sentence • Refer LSTM_04_pretrain_spacy to make your model and make prediction 45
  • 46. More • About spacy Chinese NLP • https://zhuanlan.zhihu.com/p/353110681 • About Chinese NLP tutorial • https://drive.google.com/file/d/1LdHs0vPlc7MWbM- emv8Wwz1lQsSKP7Zi/view?usp=sharing 46