Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DataXDay - The wonders of deep learning: how to leverage it for natural language processing

24 views

Published on

In recent years, deep learning (DL) has proven to be a transformative force that has made impressive advances in different fields. In fact, within the area of natural language processing (NLP), deep learning has outperformed many former state of the art approaches, such as in machine translation or named entity recognition (NER). In this talk I will present various deep learning algorithms and architectures for NLP, with examples of how they can be leveraged to real world applications.


Ana Peleitero - Tendam
https://dataxday.fr

video available: https://www.youtube.com/watch?v=qpkt1sVHzd0

Published in: Technology
  • Be the first to comment

  • Be the first to like this

DataXDay - The wonders of deep learning: how to leverage it for natural language processing

  1. 1. THE WONDERS OF DEEP LEARNING: HOW TO LEVERAGE IT FOR NLP DATAXDAY 2018 Paris 17/05/2018 DR. ANA PELETEIRO RAMALLO DATA SCIENCE DIRECTOR @PeleteiroAna @TendamRetail @DataXDay
  2. 2. 1880 Founding year 2.000 Physical shops 89 Countries 10.394 Employees 1.299 Own shops 683 Franchises 2 @DataXDay
  3. 3. DEEP LEARNING FOR NLP Deep learning is having a transformative impact in many areas where machine learning has been applied. NLP was somewhat behind other fields in terms of adopting deep learning for applications. However, this has changed over the last few years, thanks to the use of RNNs, specifically LSTMs, as well as word embeddings. Distinct areas in which deep learning can be beneficial for NLP tasks, such as in named entity recognition, machine translation and language modelling, parsing, chunking, POS tagging, amongst others. 3 @DataXDay
  4. 4. WORD EMBEDDINGS 4 Representing as ids. Encodings are arbitrary. No information about the relationship between words. Data sparsity. https://www.tensorflow.org/tutorials/word2vec Better representation for words. Words in a continuous vector space where semantically similar words are mapped to nearby points. Learn dense embedding vectors. Skip-gram and CBOW • CBOW predicts target words from the context. E.g., Tendam ?? Talk • Skip-gram predicts source context-words from the target words. E.g., ?? conference ?? Standard preprocessing step for NLP. Used also as a feature in supervised approaches (e.g., clustering). Several parameters we can experiment with, e.g., the size of the word embedding or the context window. @DataXDay
  5. 5. CHARACTER EMBEDDINGS Word embeddings are able to capture syntactic and semantic information. POS-tagging and NER not enough. Not the intra-word morphological and shape information, learn sub-token patterns (suffix, prefix), etc. Out-of-vocabulary word (OOV) issue. In languages where text is not composed of separated words but individual characters (Chinese). We can overcome these problems by using character embeddings 5 @DataXDay
  6. 6. CNNs in NLP CNNs: effectiveness in computer vision tasks Ability to extract salient n-gram features from the input sentence to create an informative latent semantic Representa?on of the sentence for downstream tasks Several tasks: sentence classifica?on, summariza?on 6 @DataXDay
  7. 7. RECURRENT NEURAL NETWORKS 7 @DataXDay
  8. 8. 8 Why not basic Deep Nets or CNNs? @DataXDay Traditional neural networks and CNNs do not use information from the past, each entry is independent. This is fine for several applica=ons, such as classifying images. However, several applications, such as video, or language modelling, rely on what has happened in the past to predict the future. Recurrent Neural Networks (RNN) are capable of conditioning the model on previous units in the corpus. Capability of handling inputs of arbitrary length
  9. 9. RNNs Make use of sequen+al informa+on. Output is dependent on the previous informa+on. RNN shares the same parameter W for each step, so less parameters we need to learn. 9 @DataXDay h"p://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
  10. 10. 10 @DataXDay http://torch.ch/blog/2016/07/25/nce.html
  11. 11. In theory, RNNs are absolutely capable of handling such long-term dependencies. Practice is ”a bit” different. 11 RNNs (II) 1. Parameters are shared by all >me steps in the network, the gradient at each output depends not only on the calcula>ons of the current >me step, but also the previous >me steps. 2. Exploding gradients: 3. Vanishing gradients: 4. Easier to spot. 3.1. Clip the gradient to a maximum 3.2. Relus instead of sigmoid 4.3. @DataXDay 4.2. 4.1. Initialization of the matrix to identity matrix Harder to iden>fy
  12. 12. The oversized mannish coats looked positively edible over the bun- skimming dresses while combined with novelty knitwear such as punk- like fisherman's sweaters. As other look, the ballet pink Elizabeth and James jacket provides a cozy cocoon for the 20-year-old to top off her ensemble of a T-shirt and Parker Smith jeans. But I have to admit that my favorite is the bun-skimming dresses with the ?? • In theory, RNNs can handle of handling such long-term dependencies. 12 @DataXDay • However, in reality, they cannot. • LSTMs and GRUs avoid the long-term dependency problem. • Remove or add informaEon to the cell state, carefully regulated by structures called gates. • Gates are a way to opEonally let informaEon through.
  13. 13. 13 @DataXDay LSTMs http://cs224d.stanford.edu/lecture_notes/notes4.pdf http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  14. 14. 14 @DataXDay GRUs h"p://cs224d.stanford.edu/lecture_notes/notes4.pdf
  15. 15. 15 @DataXDay RNN architectures http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  16. 16. 16 @DataXDay ATTENTION MECHANISM h"p://www.wildml.com/2016/01/a"en5on-and-memory-in-deep-learning-and-nlp/ h"ps://medium.com/@Synced/a-brief-overview-of-a"en5on-mechanism-13c578ba9129
  17. 17. APPLICATIONS Word level classifica-on: NER Sentence classifica-on: tweet sen-ment polarity. Seman-c matching between text Text classifica-on Language modelling Speech recogni-on Cap-on genera-on Machine transla-on Document summariza-on Ques-on answering 17
  18. 18. EX1: TEXT GENERATION All text from Shakespeare (4.4MB) 3-layer RNN with 512 hidden nodes on each layer. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ https://github.com/martin-gorner/tensorflow-rnn-shakespeare 18 @DataXDay
  19. 19. Q&A 19 Pedro del Hierro SS18 How can I help you today? I was wondering what is trending this spring This spring is all about new wave slip, in for example jumpsuits Is that appropriate for a work dinner? Yes, it totally works! I would recommend you to use this chilly oil jumpsuit. You can combine it with a dark brown belt and cherry tomato heels. All from Pedro del Hierro That sounds great! @DataXDay
  20. 20. 20 PLENTY OF RESOURCES OUT THERE! • https://distill.pub/2016/misread-tsne/ • http://www.wildml.com • https://arxiv.org/pdf/1708.02709.pdf • http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf • http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • https://nlp.stanford.edu/courses/NAACL2013/ • http://cs224d.stanford.edu/syllabus.html • https://github.com/kjw0612/awesome-rnn • https://lvdmaaten.github.io/tsne/ • https://github.com/oxford-cs-deepnlp-2017 @DataXDay
  21. 21. THANKS! @PeleteiroAna 21

×