Deep learning for NLP and Transformer

Deep learning basics for NLP
Arvind Devaraj (Grouply.org)
Lead Data Scientist, Embibe

Agenda
Classical ML vs Deep Learning
What is Deep learning for NLP
Learning Resources
How Machine translation (NMT) works

ML Tools
Scikit-learn
Jupyter
Pandas
TensorFlow
Google colab
Nltk/spacy

ML Courses
MLcourse.Ai
Cs231n (stanford)
Fast.Ai
Deeplearning.Ai
Coursera(nlp)
Analyticsvidya
Edureka
Simplilearn
Siraj Raval

Youtube channels and Books
KrisNaik(deployment)
StatQuest
DeepLearning.TV
CodeEmporium
3blue1brown
Bob Trenwith
Data School
Cognitive Class
The Hundred-page Machine Learning Book
Hands on ML with scikit learn, keras and
tensorflow
Deep learning (Ian Goodfellow)
Refer Grouply.org

ML Algorithms
1. Linear Regression
2. Logistic Regression
3. Decision Trees
4. Random Forest
5. Clustering (K means, Hierarchical)

Machine learning skills
1. Math skills
Probability and statistics
Linear Algebra
Calculus
2. Programming in Python/R
3. Data Engineer skills
1. Ability to work with large amounts of data
2. Data preprocessing
3. Knowledge of SQL and NoSQL
4. ETL (Extract Transform and Load) operations
5. Data Analysis and Visualization skills

Project ideas
NLP
Sentiment analysis
Spam classifier
Text classifier (Tagging documents)
Smartbook
Kaggle

Classical machine learning
More info

Introduction to Deep learning
Difference between classical ML and DL
Architectures - MLP, CNN, RNN
Frameworks - Tensorflow, Keras, Pytorch

Neural Network
Neural Networks act as a ‘black box’ that takes inputs and predicts an output.

Neural Network
‘Black box’ that takes inputs and predicts an output.
Trained using known (input,output) , approximates the function and maps new inputs
Learns the function mapping inputs to output by adjusting the internal parameters (weights)

Word2vec using neural networks

RNN
RNNs are used when the inputs have some state information
Examples include time series and word sequences
Can captures the essence of the sequence in its hidden state (context)
https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9

RNN can learn sentence representations

RNNs can predict
RNN can learn a probability function p( word | previous-words )

RNN - Summary
RNN can encode sentence meaning
Can predict next word given a sequence of words.
Can be trained to learn a language model

Translation
Two RNNs jointly trained (one using English, another with German)
Training using Parallel corpora

Models
1. Neural Networks
2. Recurrent Neural Networks
3. Encoder-Decoder
4. Attention Models
5. Transformer

Encoder Decoder
http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

Encoder-Decoder in Translation

Encoder - Decoder summary
2014 -Google replaced their Statistical model with NMT
Due to its flexibility it is the goto framework for NLG with different models taking
roles of encoder and decoder
The decoder can not only be conditioned on a sequence but on arbitrary
representation enabling lot of use cases (like generating caption from image)

Translation using Encoder-Decoder
Issues
● Meaning is crammed
● Long term dependencies
● Word alignment
● Word interdependencies

Translation issues (long term dependencies)
The model cannot remember enough
difficult to retain >30 words.
"Born in France..went to EPFL Switzerland..I speak fluent ..."

Pay selective attention
First introduced in Image Recognition

Machine translation
NN can encode words
RNN can encode sentences
Long sentences need changes to RNN architecture
Two RNNs can act as encoder and decoder (of any representation)
Encoding everything into single context loses information
Selectively pay attention to the inputs we need.
Get rid of RNN and use only attention mechanism. Make it parallelizable
Richer representation of inputs using Self-attention.
Use Encoder-Decoder attention as usual for translation

Deep learning for NLP and Transformer

More Related Content

What's hot

Similar to Deep learning for NLP and Transformer

More from Arvind Devaraj

Recently uploaded

Deep learning for NLP and Transformer