Prerana Singhal
THE NEED FOR NATURAL
LANGUAGE PROCESSING
 No. of internet users – huge and growing
 Treasure chest of data in the form of
Natural Language
APPLICATIONS
Search
Customer SupportQ & A
Summarization
 Sentiment Analysis
NATURAL LANGUAGE
PROCESSING
 Rule based systems (since 1960s)
 Statistical Machine Learning (since late
1980s)
 Naïve Bayes, SVM, HMM, LDA, …
 Spam classifier, Google news, Google
Translate
WHY IS NLP HARD?
“Flipkart is a good
website”
(Easy)
“I didn’t receive the
product on time”
(Negation)
“Really shoddy service”
(Rare words)
“It’s gr8 to see this”
(Misspellings)
“Well played
Flipkart! You’re
giving IRCTC a
run for their
money”
(Sarcasm)
Accuracy sometimes not
good enough for production
EXCITING DEEP LEARNING RESULTS
 Amazing results, esp. in image and speech
domain
 Image Net: 6% error rate
 Facial Recognition: 97.35% accuracy
 Speech Recognition: 25% error reduction
 Handwriting Recognition (ICDAR)
IMAGE MODELS
SENSIBLE ERRORS
DEEP LEARNING FOR NLP
 Positive – Negative Sentiment Analysis
 Accuracy increase: 85% to 96%
 73% error reduction
 State-of-the-art results on various text
classification tasks (Same Model)
 Tweets, Reviews, Emails
 Beyond Text Classification
Why does it outperform
statistical models?
STATISTICAL CLASSIFIERS
RAW DATA
Flipkart! You need to improve your delivery
FEATURE ENGINEERING
 Functions which transform input (raw) data into a
feature space
 Discriminative – for decision boundary
 Feature engineering is painful
 Deep Neural Networks: Identify the features
automatically
Neural Networks
DEEP NEURAL NETWORKS
Higher layers form higher levels of abstractions.
DEEP NEURAL NETWORKS
Unsupervised pre-training
DEEP LEARNING FOR NLP
 Why Deep Learning?
 Problems with applying deep-learning to
natural language
PROBLEMS WITH STATISTICAL
MODELS
BAG OF WORDS
“FLIPKART IS
BETTER THAN
AMAZON”
PROBLEMS WITH STATISTICAL
MODELS
 Word ordering information lost
 Data sparsity
 Words as atomic symbols
 Very hard to find higher level features
 Features other than BOW
HOW TO ENCODE THE
MEANING OF A WORD?
 Wordnet: Dictionary of synonyms
 Synonyms: Adept, expert, good, practiced,
proficient, skillful
WORD EMBEDDINGS: THE FIRST
BREAKTHROUGH
NEURAL LANGUAGE MODEL
WORD EMBEDDINGS:
VISUALIZATIONS
CAPTURE RELATIONSHIPS
WORD EMBEDDING: VISUALIZATIONS
WORD EMBEDDING: VISUALIZATIONS
WORD EMBEDDING:
VISUALIZATIONS
 Trained in a completely unsupervised way
 Reduce data sparsity
 Semantic Hashing
 Appear to carry semantic information
about the words
 Freely available for Out of Box usage
COMPOSITIONALITY
 How do we go beyond words (sentences and
paragraphs)?
 This turns out to be a very hard problem
 Simple Approaches
 Word Vector Averaging
 Weighted Word Vector Averaging
CONVOLUTIONAL NEURAL
NETWORKS
 Excellent feature extractors in image
 Features are detected regardless of position in
image
 NLP Almost from Scratch: Collobert et al 2011
 First applied CNN for NLP
CNN FOR TEXT
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
0.46 0.04 -0.09 Composition
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
Weight Matrix
(3 x 9)
[-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16]
[-0.33 0.56 0.98 -0.13 -0.81
-0.01 0.17 0.64 -0.16]
[0.46 0.04 -0.09]
0.46 0.04 -0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57 0.81 0.25
0.46
0.04
-0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.18 0.26 0.40
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40
-0.33
0.56
0.98
-0.13
-0.81
-0.01
0.17
0.64
-0.16
0.97
0.99
0.90
-0.23
0.16
0.68
-0.57
0.81
0.25
0.46
0.04
-0.09
-0.13
0.26
0.40
0.46
0.81
0.40
Neutral
DEMYSTIFYING MAX POOLING
 Finds the most important part(s) of sentence
CNN FOR TEXT
 Window sizes: 3,4,5
 Static mode
 Non Static mode
 Multichannel mode
 Multiclass Classification
RESULTS
Dataset Source Labels Statistical
Models
CNN
Flipkart Twitter
Sentiment
Twitter Pos, Neg 85% 96%
Flipkart Twitter
Sentiment
Twitter Pos, Neg, Neu 76% 89%
Fine grained
sentiment in Emails
Emails Angry, Sad, Complaint,
Request
55% 68%
SST2 Movie
Reviews
Pos, Neg 79.4% 87.5%
SemEval Task 4 Restaurant
Reviews
food / service / ambience /
price / misc
88.5% 89.6%
SENTIMENT: ANECDOTES
DRAWBACKS & LEARNINGS
 Computationally Expensive
 How to scale training?
 How to scale prediction?
 Libraries for Deep Learning
 Theano
 PyLearn2
 Torch
“I THINK YOU SHOULD BE MORE EXPLICIT HERE IN STEP TWO”
OPEN SOURCED
 https://github.com/flipkart-incubator/optimus
BEYOND TEXT CLASSIFICATION
 Text Classification covers a lot of NLP
problems (or problems can be reduced to it)
 Word Embedding
 Unsupervised Learning
 Sequence Learning
 RNN, LSTM
RECURRENT MODELS
 RNNs, LSTMs
 Machine Translation, Chat, Classification
ANY QUESTIONS ?

Deep Learning for Natural Language Processing

Editor's Notes

  • #3 Information Extraction Personalization….
  • #4 Information Extraction Personalization….
  • #6 Information Extraction Personalization…. Very hard problem for computers Science of deriving meaning from Natural Language Still, not enough good systems in production
  • #7 Information Extraction Personalization….
  • #13 Loosely inspired by what (little) we know about the biological brain
  • #14 Why image is hard?
  • #15 Information Extraction Personalization….
  • #16 Information Extraction Personalization….
  • #17 Real life: 1000sof D space
  • #18 Real life: 1000sof D space
  • #19 Information Extraction Personalization….
  • #20 Elaborate more on pain of feature engineeing Hundreds of thousands of features in real life
  • #21 Information Extraction Personalization….
  • #22 Information Extraction Personalization….
  • #23 Put unsup chart
  • #24 How to solve classification problems and getting semantic representations of Natural Language using DL? Revise
  • #25 Information Extraction Personalization….
  • #26 Bigram trigram
  • #27 Manual feature engineering disadvantages – not generic POS Tags Brown clusters Negation Manually created lexicons ….
  • #28 Mention LSA
  • #29 Cat and dog have lot of semantic similarity compared to say cat and ambulance
  • #30 Information Extraction Personalization….
  • #31 Information Extraction Personalization….
  • #32 Information Extraction Personalization….
  • #33 Information Extraction Personalization….
  • #34 Information Extraction Personalization….
  • #35 Trained on google news dataset
  • #36 Information Extraction Personalization….
  • #38 Information Extraction Personalization….
  • #40 Information Extraction Personalization….
  • #49 Information Extraction Personalization….
  • #50 Information Extraction Personalization….
  • #51 Information Extraction Personalization….
  • #52 Information Extraction Personalization….
  • #53 Information Extraction Personalization….
  • #55 Information Extraction Personalization….
  • #56 Information Extraction Personalization….
  • #57 Information Extraction Personalization….
  • #58 Information Extraction Personalization….