Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning for Natural Language Processing

22,421 views

Published on

This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart

Published in: Data & Analytics

Deep Learning for Natural Language Processing

  1. 1. Prerana Singhal
  2. 2. THE NEED FOR NATURAL LANGUAGE PROCESSING  No. of internet users – huge and growing  Treasure chest of data in the form of Natural Language
  3. 3. APPLICATIONS Search Customer SupportQ & A Summarization
  4. 4.  Sentiment Analysis
  5. 5. NATURAL LANGUAGE PROCESSING  Rule based systems (since 1960s)  Statistical Machine Learning (since late 1980s)  Naïve Bayes, SVM, HMM, LDA, …  Spam classifier, Google news, Google Translate
  6. 6. WHY IS NLP HARD? “Flipkart is a good website” (Easy)
  7. 7. “I didn’t receive the product on time” (Negation)
  8. 8. “Really shoddy service” (Rare words)
  9. 9. “It’s gr8 to see this” (Misspellings)
  10. 10. “Well played Flipkart! You’re giving IRCTC a run for their money” (Sarcasm)
  11. 11. Accuracy sometimes not good enough for production
  12. 12. EXCITING DEEP LEARNING RESULTS  Amazing results, esp. in image and speech domain  Image Net: 6% error rate  Facial Recognition: 97.35% accuracy  Speech Recognition: 25% error reduction  Handwriting Recognition (ICDAR)
  13. 13. IMAGE MODELS
  14. 14. SENSIBLE ERRORS
  15. 15. DEEP LEARNING FOR NLP  Positive – Negative Sentiment Analysis  Accuracy increase: 85% to 96%  73% error reduction  State-of-the-art results on various text classification tasks (Same Model)  Tweets, Reviews, Emails  Beyond Text Classification
  16. 16. Why does it outperform statistical models?
  17. 17. STATISTICAL CLASSIFIERS
  18. 18. RAW DATA Flipkart! You need to improve your delivery
  19. 19. FEATURE ENGINEERING  Functions which transform input (raw) data into a feature space  Discriminative – for decision boundary  Feature engineering is painful  Deep Neural Networks: Identify the features automatically
  20. 20. Neural Networks
  21. 21. DEEP NEURAL NETWORKS Higher layers form higher levels of abstractions.
  22. 22. DEEP NEURAL NETWORKS Unsupervised pre-training
  23. 23. DEEP LEARNING FOR NLP  Why Deep Learning?  Problems with applying deep-learning to natural language
  24. 24. PROBLEMS WITH STATISTICAL MODELS
  25. 25. BAG OF WORDS “FLIPKART IS BETTER THAN AMAZON”
  26. 26. PROBLEMS WITH STATISTICAL MODELS  Word ordering information lost  Data sparsity  Words as atomic symbols  Very hard to find higher level features  Features other than BOW
  27. 27. HOW TO ENCODE THE MEANING OF A WORD?  Wordnet: Dictionary of synonyms  Synonyms: Adept, expert, good, practiced, proficient, skillful
  28. 28. WORD EMBEDDINGS: THE FIRST BREAKTHROUGH
  29. 29. NEURAL LANGUAGE MODEL
  30. 30. WORD EMBEDDINGS: VISUALIZATIONS
  31. 31. CAPTURE RELATIONSHIPS
  32. 32. WORD EMBEDDING: VISUALIZATIONS
  33. 33. WORD EMBEDDING: VISUALIZATIONS
  34. 34. WORD EMBEDDING: VISUALIZATIONS  Trained in a completely unsupervised way  Reduce data sparsity  Semantic Hashing  Appear to carry semantic information about the words  Freely available for Out of Box usage
  35. 35. COMPOSITIONALITY  How do we go beyond words (sentences and paragraphs)?  This turns out to be a very hard problem  Simple Approaches  Word Vector Averaging  Weighted Word Vector Averaging
  36. 36. CONVOLUTIONAL NEURAL NETWORKS  Excellent feature extractors in image  Features are detected regardless of position in image  NLP Almost from Scratch: Collobert et al 2011  First applied CNN for NLP
  37. 37. CNN FOR TEXT
  38. 38. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68
  39. 39. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 0.46 0.04 -0.09 Composition
  40. 40. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 Weight Matrix (3 x 9) [-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16] [-0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16] [0.46 0.04 -0.09] 0.46 0.04 -0.09
  41. 41. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 -0.57 0.81 0.25 0.46 0.04 -0.09
  42. 42. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 -0.18 0.26 0.40 -0.57 0.81 0.25 0.46 0.04 -0.09
  43. 43. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 -0.57 0.81 0.25 0.46 0.04 -0.09 -0.13 0.26 0.40
  44. 44. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 -0.57 0.81 0.25 0.46 0.04 -0.09 -0.13 0.26 0.40 0.46 0.81 0.40
  45. 45. -0.33 0.56 0.98 -0.13 -0.81 -0.01 0.17 0.64 -0.16 0.97 0.99 0.90 -0.23 0.16 0.68 -0.57 0.81 0.25 0.46 0.04 -0.09 -0.13 0.26 0.40 0.46 0.81 0.40 Neutral
  46. 46. DEMYSTIFYING MAX POOLING  Finds the most important part(s) of sentence
  47. 47. CNN FOR TEXT  Window sizes: 3,4,5  Static mode  Non Static mode  Multichannel mode  Multiclass Classification
  48. 48. RESULTS Dataset Source Labels Statistical Models CNN Flipkart Twitter Sentiment Twitter Pos, Neg 85% 96% Flipkart Twitter Sentiment Twitter Pos, Neg, Neu 76% 89% Fine grained sentiment in Emails Emails Angry, Sad, Complaint, Request 55% 68% SST2 Movie Reviews Pos, Neg 79.4% 87.5% SemEval Task 4 Restaurant Reviews food / service / ambience / price / misc 88.5% 89.6%
  49. 49. SENTIMENT: ANECDOTES
  50. 50. DRAWBACKS & LEARNINGS  Computationally Expensive  How to scale training?  How to scale prediction?  Libraries for Deep Learning  Theano  PyLearn2  Torch
  51. 51. “I THINK YOU SHOULD BE MORE EXPLICIT HERE IN STEP TWO”
  52. 52. OPEN SOURCED  https://github.com/flipkart-incubator/optimus
  53. 53. BEYOND TEXT CLASSIFICATION  Text Classification covers a lot of NLP problems (or problems can be reduced to it)  Word Embedding  Unsupervised Learning  Sequence Learning  RNN, LSTM
  54. 54. RECURRENT MODELS  RNNs, LSTMs  Machine Translation, Chat, Classification
  55. 55. ANY QUESTIONS ?

×