Deep learning basics for NLP
Arvind Devaraj (Grouply.org)
Lead Data Scientist, Embibe
Agenda
Classical ML vs Deep Learning
What is Deep learning for NLP
Learning Resources
How Machine translation (NMT) works
ML Tools
Scikit-learn
Jupyter
Pandas
TensorFlow
Google colab
Nltk/spacy
ML Courses
MLcourse.Ai
Cs231n (stanford)
Fast.Ai
Deeplearning.Ai
Coursera(nlp)
Analyticsvidya
Edureka
Simplilearn
Siraj Raval
Youtube channels and Books
KrisNaik(deployment)
StatQuest
DeepLearning.TV
CodeEmporium
3blue1brown
Bob Trenwith
Data School
Cognitive Class
The Hundred-page Machine Learning Book
Hands on ML with scikit learn, keras and
tensorflow
Deep learning (Ian Goodfellow)
Refer Grouply.org
ML Types
ML Algorithms
1. Linear Regression
2. Logistic Regression
3. Decision Trees
4. Random Forest
5. Clustering (K means, Hierarchical)
Machine learning skills
1. Math skills
Probability and statistics
Linear Algebra
Calculus
2. Programming in Python/R
3. Data Engineer skills
1. Ability to work with large amounts of data
2. Data preprocessing
3. Knowledge of SQL and NoSQL
4. ETL (Extract Transform and Load) operations
5. Data Analysis and Visualization skills
Project ideas
NLP
Sentiment analysis
Spam classifier
Text classifier (Tagging documents)
Smartbook
Kaggle
Classical machine learning
More info
Deep learning for NLP
Introduction to Deep learning
Difference between classical ML and DL
Architectures - MLP, CNN, RNN
Frameworks - Tensorflow, Keras, Pytorch
Model 1
Neural Networks
Neural Network
Neural Networks act as a ‘black box’ that takes inputs and predicts an output.
Neural Network
Neural Network Training
Basic Neural Network
Neural Network
Neural Network
‘Black box’ that takes inputs and predicts an output.
Trained using known (input,output) , approximates the function and maps new inputs
Learns the function mapping inputs to output by adjusting the internal parameters (weights)
Word2vec using neural networks
Word2vec using neural networks
Word2vec using neural networks
Types of Neural Nets
Model 2
RNN
RNN
RNNs are used when the inputs have some state information
Examples include time series and word sequences
Can captures the essence of the sequence in its hidden state (context)
https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9
RNN - unrolled in time
RNN
RNN can learn sentence representations
RNNs can predict
RNN can learn a probability function p( word | previous-words )
RNN - Summary
RNN can encode sentence meaning
Can predict next word given a sequence of words.
Can be trained to learn a language model
Translation
Two RNNs jointly trained (one using English, another with German)
Training using Parallel corpora
Models
1. Neural Networks
2. Recurrent Neural Networks
3. Encoder-Decoder
4. Attention Models
5. Transformer
Model 3
Encoder Decoder
Encoder Decoder
http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
Encoder Decoder
Encoder-Decoder in Translation
Encoder - Decoder summary
2014 -Google replaced their Statistical model with NMT
Due to its flexibility it is the goto framework for NLG with different models taking
roles of encoder and decoder
The decoder can not only be conditioned on a sequence but on arbitrary
representation enabling lot of use cases (like generating caption from image)
Translation using Encoder-Decoder
Issues
● Meaning is crammed
● Long term dependencies
● Word alignment
● Word interdependencies
Translation issues (long term dependencies)
The model cannot remember enough
difficult to retain >30 words.
"Born in France..went to EPFL Switzerland..I speak fluent ..."
Pay selective attention
Pay selective attention
First introduced in Image Recognition
Translation with Attention
Translation with transformer
Machine translation
NN can encode words
RNN can encode sentences
Long sentences need changes to RNN architecture
Two RNNs can act as encoder and decoder (of any representation)
Encoding everything into single context loses information
Selectively pay attention to the inputs we need.
Get rid of RNN and use only attention mechanism. Make it parallelizable
Richer representation of inputs using Self-attention.
Use Encoder-Decoder attention as usual for translation
Word2vec to BERT

Deep learning for NLP and Transformer