Deep Learning for Natural Language Processing

THE NEED FOR NATURAL
LANGUAGE PROCESSING
 No. of internet users – huge and growing
 Treasure chest of data in the form of
Natural Language

APPLICATIONS
Search
Customer SupportQ & A
Summarization

NATURAL LANGUAGE
PROCESSING
 Rule based systems (since 1960s)
 Statistical Machine Learning (since late
1980s)
 Naïve Bayes, SVM, HMM, LDA, …
 Spam classifier, Google news, Google
Translate

WHY IS NLP HARD?
“Flipkart is a good
website”
(Easy)

“I didn’t receive the
product on time”
(Negation)

“Really shoddy service”
(Rare words)

“It’s gr8 to see this”
(Misspellings)

“Well played
Flipkart! You’re
giving IRCTC a
run for their
money”
(Sarcasm)

Accuracy sometimes not
good enough for production

EXCITING DEEP LEARNING RESULTS
 Amazing results, esp. in image and speech
domain
 Image Net: 6% error rate
 Facial Recognition: 97.35% accuracy
 Speech Recognition: 25% error reduction
 Handwriting Recognition (ICDAR)

DEEP LEARNING FOR NLP
 Positive – Negative Sentiment Analysis
 Accuracy increase: 85% to 96%
 73% error reduction
 State-of-the-art results on various text
classification tasks (Same Model)
 Tweets, Reviews, Emails
 Beyond Text Classification

Why does it outperform
statistical models?

RAW DATA
Flipkart! You need to improve your delivery

FEATURE ENGINEERING
 Functions which transform input (raw) data into a
feature space
 Discriminative – for decision boundary
 Feature engineering is painful
 Deep Neural Networks: Identify the features
automatically

DEEP NEURAL NETWORKS
Higher layers form higher levels of abstractions.

DEEP NEURAL NETWORKS
Unsupervised pre-training

DEEP LEARNING FOR NLP
 Why Deep Learning?
 Problems with applying deep-learning to
natural language

PROBLEMS WITH STATISTICAL
MODELS

BAG OF WORDS
“FLIPKART IS
BETTER THAN
AMAZON”

PROBLEMS WITH STATISTICAL
MODELS
 Word ordering information lost
 Data sparsity
 Words as atomic symbols
 Very hard to find higher level features
 Features other than BOW

HOW TO ENCODE THE
MEANING OF A WORD?
 Wordnet: Dictionary of synonyms
 Synonyms: Adept, expert, good, practiced,
proficient, skillful

WORD EMBEDDINGS: THE FIRST
BREAKTHROUGH

WORD EMBEDDINGS:
VISUALIZATIONS

WORD EMBEDDING: VISUALIZATIONS

WORD EMBEDDING:
VISUALIZATIONS
 Trained in a completely unsupervised way
 Reduce data sparsity
 Semantic Hashing
 Appear to carry semantic information
about the words
 Freely available for Out of Box usage

COMPOSITIONALITY
 How do we go beyond words (sentences and
paragraphs)?
 This turns out to be a very hard problem
 Simple Approaches
 Word Vector Averaging
 Weighted Word Vector Averaging

CONVOLUTIONAL NEURAL
NETWORKS
 Excellent feature extractors in image
 Features are detected regardless of position in
image
 NLP Almost from Scratch: Collobert et al 2011
 First applied CNN for NLP

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
0.46 0.04 -0.09 Composition

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
Weight Matrix
(3 x 9)
[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]
[-0.33 0.56 0.98 -0.13 -0.81
-0.01 0.17 0.64 -0.16]
[0.46 0.04 -0.09]
0.46 0.04 -0.09

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57 0.81 0.25
0.46
0.04
-0.09

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.18 0.26 0.40
-0.57
0.81
0.25
0.46
0.04
-0.09

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40

-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40
Neutral

DEMYSTIFYING MAX POOLING
 Finds the most important part(s) of sentence

CNN FOR TEXT
 Window sizes: 3,4,5
 Static mode
 Non Static mode
 Multichannel mode
 Multiclass Classification

RESULTS
Dataset Source Labels Statistical
Models
CNN
Flipkart Twitter
Sentiment
Twitter Pos, Neg 85% 96%
Flipkart Twitter
Sentiment
Twitter Pos, Neg, Neu 76% 89%
Fine grained
sentiment in Emails
Emails Angry, Sad, Complaint,
Request
55% 68%
SST2 Movie
Reviews
Pos, Neg 79.4% 87.5%
SemEval Task 4 Restaurant
Reviews
food / service / ambience /
price / misc
88.5% 89.6%

DRAWBACKS & LEARNINGS
 Computationally Expensive
 How to scale training?
 How to scale prediction?
 Libraries for Deep Learning
 Theano
 PyLearn2
 Torch

“I THINK YOU SHOULD BE MORE EXPLICIT HERE IN STEP TWO”

OPEN SOURCED
 https://github.com/flipkart-incubator/optimus

BEYOND TEXT CLASSIFICATION
 Text Classification covers a lot of NLP
problems (or problems can be reduced to it)
 Word Embedding
 Unsupervised Learning
 Sequence Learning
 RNN, LSTM

RECURRENT MODELS
 RNNs, LSTMs
 Machine Translation, Chat, Classification

Deep Learning for Natural Language Processing

More Related Content

What's hot

Viewers also liked

Similar to Deep Learning for Natural Language Processing

Recently uploaded

Deep Learning for Natural Language Processing

Editor's Notes