DataXDay - The wonders of deep learning: how to leverage it for natural language processing

THE WONDERS OF
DEEP LEARNING: HOW
TO LEVERAGE IT FOR
NLP
DATAXDAY 2018
Paris
17/05/2018
DR. ANA PELETEIRO RAMALLO
DATA SCIENCE DIRECTOR
@PeleteiroAna
@TendamRetail
@DataXDay

1880
Founding year
2.000
Physical shops
89
Countries
10.394
Employees
1.299
Own shops
683
Franchises
2
@DataXDay

DEEP LEARNING FOR NLP
Deep learning is having a transformative impact in many areas where machine learning has been
applied.
NLP was somewhat behind other fields in terms of adopting deep learning for applications.
However, this has changed over the last few years, thanks to the use of RNNs, specifically LSTMs,
as well as word embeddings.
Distinct areas in which deep learning can be beneficial for NLP tasks, such as in named entity
recognition, machine translation and language modelling, parsing, chunking, POS tagging,
amongst others.
3
@DataXDay

WORD EMBEDDINGS
4
Representing as ids.
Encodings are arbitrary.
No information about the relationship between words.
Data sparsity.
https://www.tensorflow.org/tutorials/word2vec
Better representation for words.
Words in a continuous vector space where semantically similar words are mapped to nearby points.
Learn dense embedding vectors.
Skip-gram and CBOW
• CBOW predicts target words from the context. E.g., Tendam ?? Talk
• Skip-gram predicts source context-words from the target words. E.g., ?? conference ??
Standard preprocessing step for NLP.
Used also as a feature in supervised approaches (e.g., clustering).
Several parameters we can experiment with, e.g., the size of the word
embedding or the context window.
@DataXDay

CHARACTER EMBEDDINGS
Word embeddings are able to capture syntactic and semantic information.
POS-tagging and NER not enough.
Not the intra-word morphological and shape information, learn sub-token patterns (suffix, prefix), etc.
Out-of-vocabulary word (OOV) issue.
In languages where text is not composed of separated words but individual characters (Chinese).
We can overcome these problems by using character embeddings
5
@DataXDay

CNNs in NLP
CNNs:
effectiveness in
computer vision
tasks
Ability to extract
salient n-gram
features from the
input sentence to
create an
informative latent
semantic
Representa?on of
the sentence for
downstream tasks
Several tasks:
sentence
classiﬁca?on,
summariza?on
6
@DataXDay

RECURRENT
NEURAL NETWORKS
7
@DataXDay

8
Why not basic Deep Nets or CNNs?
@DataXDay
Traditional neural networks and CNNs do not use information from the past,
each entry is independent.
This is ﬁne for several applica=ons, such as classifying images.
However, several applications, such as video, or language modelling, rely on
what has happened in the past to predict the future.
Recurrent Neural Networks (RNN) are capable of conditioning the model on
previous units in the corpus.
Capability of handling inputs of arbitrary length

RNNs
Make use of sequen+al informa+on.
Output is dependent on the previous informa+on.
RNN shares the same parameter W for each step,
so less parameters we need to learn.
9
@DataXDay
h"p://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf

10
@DataXDay
http://torch.ch/blog/2016/07/25/nce.html

In theory, RNNs
are absolutely
capable of
handling such
long-term
dependencies.
Practice is ”a
bit” different.
11
RNNs (II)
1.
Parameters are
shared by all
>me steps in
the network,
the gradient at
each output
depends not
only on the
calcula>ons of
the current >me
step, but also
the previous
>me steps.
2.
Exploding
gradients:
3.
Vanishing
gradients:
4.
Easier to spot.
3.1.
Clip the gradient to a
maximum
3.2.
Relus instead of
sigmoid
4.3.
@DataXDay
4.2.
4.1.
Initialization of the
matrix to identity
matrix
Harder to iden>fy

The oversized mannish coats looked positively edible over the bun-
skimming dresses while combined with novelty knitwear such as punk-
like fisherman's sweaters. As other look, the ballet pink Elizabeth and
James jacket provides a cozy cocoon for the 20-year-old to top off her
ensemble of a T-shirt and Parker Smith jeans. But I have to admit that
my favorite is the bun-skimming dresses with the ??
• In theory, RNNs can handle of handling such long-term dependencies.
12
@DataXDay
• However, in reality, they cannot.
• LSTMs and GRUs avoid the long-term dependency problem.
• Remove or add informaEon to the cell state, carefully regulated by
structures called gates.
• Gates are a way to opEonally let informaEon through.

13
@DataXDay
LSTMs
http://cs224d.stanford.edu/lecture_notes/notes4.pdf http://colah.github.io/posts/2015-08-Understanding-LSTMs/

14
@DataXDay
GRUs
h"p://cs224d.stanford.edu/lecture_notes/notes4.pdf

15
@DataXDay
RNN architectures
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

16
@DataXDay
ATTENTION MECHANISM
h"p://www.wildml.com/2016/01/a"en5on-and-memory-in-deep-learning-and-nlp/
h"ps://medium.com/@Synced/a-brief-overview-of-a"en5on-mechanism-13c578ba9129

APPLICATIONS
Word level classifica-on: NER
Sentence classifica-on: tweet sen-ment polarity. Seman-c matching between text
Text classifica-on
Language modelling
Speech recogni-on
Cap-on genera-on
Machine transla-on
Document summariza-on
Ques-on answering
17

EX1: TEXT GENERATION
All text from Shakespeare (4.4MB)
3-layer RNN with 512 hidden nodes on
each layer.
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
https://github.com/martin-gorner/tensorflow-rnn-shakespeare
18
@DataXDay

Q&A 19
Pedro del Hierro
SS18
How can I help you today?
I was wondering
what is trending this
spring
This spring is all about new
wave slip, in for example
jumpsuits
Is that appropriate
for a work dinner?
Yes, it totally works! I would
recommend you to use this
chilly oil jumpsuit. You can
combine it with a dark brown
belt and cherry tomato heels.
All from Pedro del Hierro
That sounds great!
@DataXDay

20
PLENTY OF RESOURCES OUT THERE!
• https://distill.pub/2016/misread-tsne/
• http://www.wildml.com
• https://arxiv.org/pdf/1708.02709.pdf
• http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• https://nlp.stanford.edu/courses/NAACL2013/
• http://cs224d.stanford.edu/syllabus.html
• https://github.com/kjw0612/awesome-rnn
• https://lvdmaaten.github.io/tsne/
• https://github.com/oxford-cs-deepnlp-2017
@DataXDay

DataXDay - The wonders of deep learning: how to leverage it for natural language processing

DataXDay - The wonders of deep learning: how to leverage it for natural language processing

Recommended

Recommended

More Related Content

Similar to DataXDay - The wonders of deep learning: how to leverage it for natural language processing

Similar to DataXDay - The wonders of deep learning: how to leverage it for natural language processing (20)

More from DataXDay Conference by Xebia

More from DataXDay Conference by Xebia (6)

Recently uploaded

Recently uploaded (20)

DataXDay - The wonders of deep learning: how to leverage it for natural language processing