Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ODSC London 2018

My ODSC presentation about DL techniques applied for NER

  • Login to see the comments

  • Be the first to like this

ODSC London 2018

  1. 1. 1 ODSC, London. Sep, 2018 Inside the Black Box: How Does a Neural Network Understand Names? Kfir Bar, Chief Scientist, Basis Technology
  2. 2. 2 Automatically find names of people, organizations, locations, and more in text across many languages. Named entity recognition (NER)
  3. 3. According to Elon Musk, Mars rocket will fly ‘short flights’ next year. 3
  4. 4. ?
  5. 5. 5 Context is important Edward Adelson Neuroscientist, MIT Checker shadow illusion The squares represented by A and B are of the same color
  6. 6. 6 Context is important Edward Adelson Neuroscientist, MIT Checker shadow illusion The squares represented by A and B are of the same color
  7. 7. Can't play Spain? Improve your playing via easy step-by-step video lessons! 7 But sometimes it gets ambiguous...
  8. 8. 8 But sometimes it gets ambiguous... Can't play Spain? Improve your playing via easy step-by-step video lessons!
  9. 9. Mom is a great TV show 9 But sometimes it gets ambiguous...
  10. 10. Mom is a great TV show 10 But sometimes it gets ambiguous... Mother
  11. 11. ➔ Processing one word after another ➔ Assigning label to each word, based on local as well as global features ➔ Labels are B-PER, I-PER, B-LOC, I-LOC, OTHER, etc. (a.k.a IOB) I/O am/O working/O for/O Basis/B-ORG Technology/I-ORG 11 NER as a sequence-labeling problem
  12. 12. 12 Use multiple engines Dictionaries Rule-based engine AI-based engine Decisions
  13. 13. Traditional ML vs. Deep Learning I love this movie words, part of speech tags, lemmas, brown clusters [00010010110000101001…..001] ☺ Positive Feature extraction Vectorization Modeling I love this movie Embeddings lookup [0.323, -0.3434, 0.901, …, -0.267] [-0.4923, 0.554, 0.001, …, -0.365] [1.58845, 0.478, 0.0901, …, -0.171] … [-0.0592, 0.588, -0.01, …, -0.111] Modeling ☺ Positive 13
  14. 14. Word embeddings - + BerlinJapan Germany German European Europe Africa Tokyo =
  15. 15. 15 Feed forward network for NER listen to while I Natural Language Processing (Almost) from Scratch (Collobert et al., 2011) B-PER B-LOC ... ... Layer 1 Layer 2 Output Spain I-PER ...
  16. 16. 16 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  17. 17. 17 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  18. 18. 18 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  19. 19. 19 Recurrent neural network (RNN) ➔ At each time step we process one word concatenated with the output from previous time steps ➔ It remembers information for many time steps
  20. 20. 20 Recurrent neural network (RNN) t-1 t t+1 B-PER I-PER OTHER ➔ At each time step we process one word concatenated with the output from previous time steps ➔ It remembers information for many time steps
  21. 21. 21 Long Short Term Memory (LSTM) LSTMIt can forget information when necessary LSTM LSTM t-1 t t+1 B-PER I-PER OTHER
  22. 22. 22 LSTM for Sequence Labeling LSTM Washington B-PER LSTM said OTHER LSTM in OTHER LSTM Chicago B-LOC LSTM last OTHER ...
  23. 23. + 23 Bidirectional LSTM for Sequence Labeling Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al., 2015) LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM ...
  24. 24. 24 Multilayer LSTM for Sequence Labeling + LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + +
  25. 25. 25 Multilayer LSTM for Sequence Labeling + LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + + LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + +
  26. 26. + 26 Alternative decoding using Conditional Random Fields (CRF) LSTM Washington LSTM + LSTM said LSTM + LSTM in LSTM + LSTM Chicago LSTM + LSTM last LSTM ... B-PER OTHER OTHER B-LOC OTHER
  27. 27. + 27 Alternative decoding using Conditional Random Fields (CRF) LSTM Washington LSTM + LSTM said LSTM + LSTM in LSTM + LSTM Chicago LSTM + LSTM last LSTM ... OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER
  28. 28. 28 Decoding with CRF The global score of a specific sequence of labels OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER
  29. 29. 29 Decoding with CRF The global score of a specific sequence of labels T [O, I-PER] < T [B-PER, I-PER]
  30. 30. 30 Decoding with CRF OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER OTHER I-LOC B-LOC I-PER B-PER argmax
  31. 31. + 31 Character encoding LSTM Washington LSTM + LSTM said LSTM + LSTM in LSTM + LSTM Chicago LSTM + LSTM last LSTM ... B-PER OTHER OTHER B-LOC OTHER + s a i d
  32. 32. 32 Character encoding results *Results are F score measured over Basis’ evaluation set English Arabic Korean BiLSTM 83.5 80.3 82.3 BiLSTM+Char 85.1 82.5 86.0
  33. 33. 33 Char encode, word encode, decode Char encoding Word encoding Decoding Washington said in Chicago last... Labels
  34. 34. 34 Reported combinations Char encoder Word encoder Decoder Collobert et al. (2011) None CNN CRF Mesnil et al. (2013) None RNN RNN Nguyen et al. (2016) None RNN GRU Huang et al. (2015) None LSTM CRF Lample et al. (2016) LSTM LSTM CRF Chiu & Nichols (2016) CNN LSTM CRF Zhai et al. (2017) CNN LSTM LSTM Yang et al. (2016) GRU GRU CRF Strubell et al. (2017) None Dilated CNN CRF Shen et al. (2018) CNN CNN LSTM Borrowed from Shen et al. (2018)
  35. 35. 35 What does LSTM actually learn?
  36. 36. 36 By Siddhartha Mukherjee The dying algorithm - predicts death for oncological patients “Here is the strange rub of such a deep learning system: It learns, but it cannot tell us why it has learned… ...the algorithm looks vacantly at us when we ask, Why? It is, like death, another black box.” Jan 2018
  37. 37. + 37 Bidirectional LSTM for NER LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM ...
  38. 38. + + + ++ 38 What does LSTM actually learn? LSTM Washington B-PER LSTM LSTM said OTHER LSTM LSTM in OTHER LSTM LSTM Chicago B-LOC LSTM LSTM last OTHER LSTM ...
  39. 39. + + + ++ 39 What does LSTM actually learn? LSTM Washington B-PER LSTM LSTM said OTHER LSTM LSTM in OTHER LSTM LSTM Chicago B-LOC LSTM LSTM last OTHER LSTM ... Let’s look at this cell vector over time ...
  40. 40. 40 What does LSTM actually learn?
  41. 41. 41 Neuron 280 - gets positive around some punctuation marks
  42. 42. 42 Neuron 189 - gets negative around potential locations
  43. 43. Thank you! 43 Questions? kfir@basistech.com @kfirbar

×