Text Representations for Deep learning

TEXT REPRESENTATIONS
FOR DEEP LEARNING
ZACHARY BROWN
RVATECH SUMMIT - MARCH 14, 2019

OUTLINE
• Why Neural Methods for Natural Language Processing?
• Local vs. GlobalWord Representations
• Unsupervised Learning forWord Representations
• Including Sub-word Information
• Language Models for Contextual Representations

WHY NEURAL METHODS FOR NATURAL
LANGUAGE PROCESSING?

WHY NEURAL METHODS FOR NATURAL
LANGUAGE PROCESSING?
• Natural Language Processing (NLP) has moved largely to neural methods in recent years

WHAT IS "MODERN" NATURAL LANGUAGE
PROCESSING?
• Traditional NLP builds on years of research into language
representation
• Theoretical foundations can lead to model rigidity
• Tasks often rely on manually generated and curated
dictionaries and thesauruses
• Built upon local word representations

PROCESSING?
• Few to no assumptions need to be made
• Model architectures purpose built for tasks
• Very active area of research, with most open-source
• Ability to learn global and contextualized word
representations

PROCESSING?
• Few to no assumptions need to be made
• Model architectures purpose built for tasks
• Very active area of research, with most open-source
• Ability to learn global and contextual word
representations

LOCALVS. GLOBAL WORD
REPRESENTATIONS

LOCAL WORD REPRESENTATIONS
• Traditional approaches to word representations treat each word as a unique entity

LOCALVS. GLOBAL WORD REPRESENTATIONS
• Modern approaches move to a fixed dimensional vector size, with dense vectors

• These dense vectors are global representations, and obey some notion of distance-based
similarity

• Deep learning provides an avenue to compute these vectors in an unsupervised way

UNSUPERVISED LEARNING FOR WORD
REPRESENTATIONS

REPRESENTATIONS
• We generate a seemingly endless
amount of text data each day
• ~ 460k Tweets every minute
• ~ 510k Facebook posts every minute
• We have accumulated vast amounts of
text data in online repositories
• Wikipedia has 5.8M (English) articles
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#36a98a9260ba

REPRESENTATIONS
• We generate a seemingly endless
amount of text data each day
• ~ 460k Tweets every minute
• ~ 510k Facebook posts every minute
• We have accumulated vast amounts of
text data in online repositories
• Wikipedia has 5.8M (English) articles
1.6M words!
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

CONTEXTUAL INFORMATION IN FREE TEXT
https://arxiv.org/pdf/1301.3781.pdf

center word
context words

Can we predict the probability of a context word for a given center word?

(the, quick)
(the, brown)

(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)

(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)

(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)

(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
...

COMPUTING EFFICIENT WORD REPRESENTATIONS
• Our goal is to learn all of the elements in the dense matrix on the right

• To do this, we treat this matrix as a matrix of model weights, which will be optimized

• To optimize, we start with a single example from our training set:
(fox, quick)

• Grab the current vector of weights for those words:
(fox, quick)

• Use these vectors to calculate the probability of the context word, given the center word
(fox, quick)

• Then compare the probability to the true value (0 or 1), and update the weights accordingly
(fox, quick)

https://projector.tensorflow.org/

WORD2VEC EMBEDDINGVISUALIZATION
https://projector.tensorflow.org/

INCLUDING SUB-WORD INFORMATION

• Word2vec really opened the doors for computing global representations of
words, which can then be used for a variety of different tasks, but…
… is it possible to make it better?

• Word2vec really opened the doors for computing global representations of
words, which can then be used for a variety of different tasks, but…
• YES!What if we included sub-word information?
… is it possible to make it better?
https://fasttext.cc/

• Word2vec seeks to find a unique vector for each individual word
quick

• FastText seeks to find a vector for sequences of characters within each word
<quick>

• FastText seeks to find a vector for sequences of characters within each word
<quick>
<qu
qui
uic
uik
ck>

• Each word is then the sum of each of the sub-word vectors
<quick>
<qu
qui
uic
uik
ck>
+
+
+
+
+
= quick

(the, quick)
(the, brown)
(quick, the)
(quick, brown)
(quick, fox)
(brown, the)
(brown, quick)
(brown, fox)
(brown, jumps)
(fox, quick)
(fox, brown)
(fox, jumps)
(fox, over)
...
• From there, the FastText utilizes the same skip-gram model tor training as
word2vec (with low-level optimizations)

• Benefits of FastText
• Learn useful representations of prefixes and suffixes
• Learn useful word roots
• Out of vocabulary inference!!
• Drawbacks of FastText
• Very large number of model parameters
• Known to be difficult to tune

LANGUAGE MODELS FOR CONTEXTUAL
REPRESENTATIONS

THE NEED FOR CONTEXTUAL REPRESENTATIONS
• Skip-gram models are fantastic for computing a single, fixed representation for
a given word or token, but…
… words can have multiple meanings depending on the context…

THE NEED FOR CONTEXTUAL REPRESENTATIONS
• Skip-gram models are fantastic for computing a single, fixed representation for
a given word or token, but…
… words can have multiple meanings depending on the context…
context matters.

LANGUAGE MODELS
• Language models aim to predict the next word in a sequence of words

CONTEXTUAL REPRESENTATIONS
• Language models aim to predict the next vector in a sequence of words

CONTEXTUAL REPRESENTATIONS
• Language models aim to predict the next word in a sequence or words

• Neural language models can also be stacked, creating multiple representations
DEEP CONTEXTUAL REPRESENTATIONS

fox = + + + …

https://allennlp.org/elmo, https://arxiv.org/abs/1810.04805
fox = + + + …

THANKYOU FORYOUR ATTENTION!
QUESTIONS?

Text Representations for Deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Text Representations for Deep learning

Similar to Text Representations for Deep learning (20)

More from Zachary S. Brown

More from Zachary S. Brown (6)

Recently uploaded

Recently uploaded (20)

Text Representations for Deep learning