Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
@graphific
Roelof Pieters
Deep	
  Learning	
  for	
  Natural	
  
Language	
  Processing:	
  Word	
  
Embeddings
3	
  Dece...
Language Understanding
2
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is p...
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is p...
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is p...
[Karlgren 2014, NLP Sthlm Meetup]6
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is p...
ML: Traditional Approach
1. Gather as much LABELED data as you can get
2. Throw some algorithms at it (mainly put in an SV...
Machine Learning for NLP
Data
Classic Approach: Data is fed into a learning algorithm:
Learning 

Algorithm
9
Machine Learning for NLP
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treeb...
Penn Treebank
That’s a lot of “manual” work:
11
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN N...
Machine Learning for NLP
Learning 

Algorithm
Data
“Features”
Prediction
Prediction/

Classifier
train set
test set
13
Machine Learning for NLP
Learning 

Algorithm
“Features”
Prediction
Prediction/

Classifier
train set
test set
14
One Model rules them all ?



DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summ...
Deep Learning: Why for NLP ?
16
• What is the meaning of a word?

(Lexical semantics)
• What is the meaning of a sentence?

([Compositional] semantics)
• ...
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also k...
Word Representation
20
• Structure corresponds to meaning:
Structure and Meaning
21
• Semantics
• Syntax
22
NLP: what can we work with?
• Language models define probability distributions
over (natural language) strings or sentences
• Joint and Conditional Pro...
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
24
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
25
Word senses
What is the meaning of words?
• Most words have many different senses:

dog = animal or sausage?
How are the me...
Word senses
Polysemy:
• A lexeme is polysemous if it has different related
senses
• bank = financial institution or building...
Word senses: relations
Symmetric relations:
• Synonyms: couch/sofa

Two lemmas with the same sense
• Antonyms: cold/hot, r...
Distributional representations
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most success...
Distributional hypothesis
He filled the wampimuk, passed it
around and we all drunk some
We found a little, hairy wampimuk
...
Distributional semantics
Landauer and Dumais (1997), Turney and Pantel (2010), …
31
Distributional semantics
Distributional meaning as co-occurrence vector:
32
Distributional representations
• Taking it further:
• Continuous word embeddings
• Combine vector space semantics with the...
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCL...
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCL...
• Can theoretically (given enough units) approximate
“any” function
• and fit to “any” kind of data
• Efficient for NLP: hidd...
• Representation of words as continuous vectors has a
long history (Hinton et al. 1986; Rumelhart et al. 1986;
Elman 1990)...
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, ...
Compositionality
Principle of compositionality:
the “meaning (vector) of a
complex expression (sentence)
is determined by:...
• How do we handle the compositionality of language in
our models?
40
Compositionality
• How do we handle the compositionality of language in
our models?
• Recursion :

the same operator (same parameters) is
a...
• How do we handle the compositionality of language in
our models?
• Option 1: Recurrent Neural Networks (RNN)
42
RNN 1: R...
• How do we handle the compositionality of language in
our models?
• Option 2: Recursive Neural Networks (also
sometimes c...
• achieved SOTA in 2011 on
Language Modeling (WSJ AR
task) (Mikolov et al.,
INTERSPEECH 2011):
• and again at ASRU 2011:
4...
45
Recurrent Neural Networks
(simple recurrent 

neural network for LM)
input
hidden layer(s)
output layer
+ sigmoid activ...
46
Recurrent Neural Networks
backpropagation through time
47
Recurrent Neural Networks
backpropagation through time
class based recurrent NN
[code (Mikolov’s RNNLM Toolkit) and mor...
• Recursive Neural
Network for LM (Socher
et al. 2011; Socher
2014)
• achieved SOTA on new
Stanford Sentiment
Treebank dat...
Recursive Neural Tensor Network
49
code & info: http://www.socher.org/index.php/Main/
ParsingNaturalScenesAndNaturalLangua...
Recursive Neural Tensor Network
50
• RNN (Socher et al.
2011a)
Recursive Neural Network
51
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., N...
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
Recursive Neural Network
52
Socher, R., Per...
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
• Recursive Neural
Tensor Network (RNTN)
(S...
• negation detection:
Recursive Neural Network
54
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y...
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
Recurrent NN for Vector Space
55
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
56
Recurrent NN: CompositionalityRecurrent NN for Vec...
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
57
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
58
Recurrent NN: CompositionalityRecurrent NN f...
Vector Space + Word Embeddings: Socher
59
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
60
Recurrent NN for Vector Space
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general meth...
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general meth...
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (...
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)

Improving Wo...
Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/
Mikolov, T., Yih, W., & Zweig, G....
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . 

Exploiting Similarities among Lang...
Word Embeddings for MT: Kiros (2014)
67
Recursive Deep Models & Sentiment: Socher (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts,...
Paragraph Vectors: Le & Mikolov (2014)
Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents
...
Paragraph Vectors: Dai et al. (2014)
70
Paragraph Vectors: Dai et al. (2014)
71
Paragraph Vectors: Dai et al. (2014)
72
Global Vectors, GloVe: Stanford (2014)
Pennington, P., Socher, R., Manning,. D.M. (2014). 

GloVe: Global Vectors for Word...
Dependency-based Embeddings: Levy & Goldberg (2014)
Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings
code &...
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
75
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
76
Wanna Play ?
LSTM
77
• LSTMS
• Attention
Wanna Play ?
Recent breakthroughs
78
Attention
Gregor et al (2015) DRAW: A Recurrent Neural Network For Image
Generation (arxiv) (code)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other mul...
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other mul...
Wanna Play ?
QA & Memory
82
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Tur...
QA & Memory
83
Yyer et al. (2014) A Neural Network for Factoid Question Answering over
Paragraphs (paper)
Wanna Play ?
QA & Memory
84
• Memory Networks (Weston et al 2015)
• Dynamic Memory Network (Kumar et al 2015)
• Neural Tur...
Wanna Play ?
QA & Memory
85
Babl Dataset
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other mul...
Wanna Play ?
Text generation
87
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural
Networks (blog)
• Question-Answering Systems (&Memory)
• Summarization
• Text Generation
• Dialogue Systems
• Image Captioning & other mul...
Image-Text Embeddings
92
Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
Image-Captioning
• Andrej Karpathy Li Fei-Fei , 2015. 

Deep Visual-Semantic Alignments for Generating Image Descriptions ...
“A person riding a motorcycle on a dirt road.”???
Image-Captioning
“Two hockey players are fighting over the puck.”???
Image-Captioning
“A stop sign is flying in blue skies.”
“A herd of elephants flying in the blue skies.”
Elman Mansimov, Emilio Parisotto, Jim...
• TensorFlow: Recently released library by Google. 

http://tensorflow.org
• Theano - CPU/GPU symbolic expression compiler ...
• RNNLM (Mikolov)

http://rnnlm.org
• NB-SVM

https://github.com/mesnilgr/nbsvm
• Word2Vec (skipgrams/cbow)

https://code....
Questions?
roelof@kth.se
www.csc.kth.se/~roelof/
99
Code & Papers:
Collaborative Open Computer Science
.com
@graphific
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Upcoming SlideShare
Loading in …5
×

Deep Learning for Natural Language Processing: Word Embeddings

8,985 views

Published on

Guest Lecture on NLP & Deep Learning (Word Embeddings) at the course Language technology at KTH, Stockholm, 3 December 2015

  • Order Manifestation Magic Today For Up To 96% Off The Retail Price. Offer Expires Soon. Over 100,000 Satisfied Customers. Join Today And See For Yourself ■■■ https://bit.ly/30Ju5r6
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Unlock Her Legs - How to Turn a Girl On In 10 Minutes or Less... ★★★ http://scamcb.com/unlockher/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Attract Abundance Into Your Life - New musical "Angel tone" calls in your angels to help you manifest abundance and miracles into your life... starting in just minutes per day. Go here to listen now. ◆◆◆ https://bit.ly/30Ju5r6
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Increasing Sex Drive And Getting Harder Erections, Naturally ♣♣♣ https://bit.ly/30G1ZO1
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Justin Sinclair has helped thousands of women get their Ex boyfriends back using his methods. Learn them at: ▲▲▲ http://scamcb.com/exback123/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Learning for Natural Language Processing: Word Embeddings

  1. 1. 1 @graphific Roelof Pieters Deep  Learning  for  Natural   Language  Processing:  Word   Embeddings 3  December  2015  
 KTH www.csc.kth.se/~roelof/ roelof@kth.se
  2. 2. Language Understanding 2
  3. 3. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 3
  4. 4. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN • the students went to class DT NN VB P NN 4 Some of the challenges in Language Understanding:
  5. 5. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 5 Some of the challenges in Language Understanding:
  6. 6. [Karlgren 2014, NLP Sthlm Meetup]6
  7. 7. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 7
  8. 8. ML: Traditional Approach 1. Gather as much LABELED data as you can get 2. Throw some algorithms at it (mainly put in an SVM and keep it at that) 3. If you actually have tried more algos: Pick the best 4. Spend hours hand engineering some features / feature selection / dimensionality reduction (PCA, SVD, etc) 5. Repeat… For each new problem/question:: 8
  9. 9. Machine Learning for NLP Data Classic Approach: Data is fed into a learning algorithm: Learning 
 Algorithm 9
  10. 10. Machine Learning for NLP some of the (many) treebank datasets source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks ! 10
  11. 11. Penn Treebank That’s a lot of “manual” work: 11
  12. 12. • the students went to class DT NN VB P NN • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN With a lot of issues: Penn Treebank 12
  13. 13. Machine Learning for NLP Learning 
 Algorithm Data “Features” Prediction Prediction/
 Classifier train set test set 13
  14. 14. Machine Learning for NLP Learning 
 Algorithm “Features” Prediction Prediction/
 Classifier train set test set 14
  15. 15. One Model rules them all ?
 
 DL approaches have been successfully applied to: Deep Learning: Why for NLP ? Automatic summarization Coreference resolution Discourse analysis Machine translation Morphological segmentation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Part-of-speech tagging Parsing Question answering Relationship extraction sentence boundary disambiguation Sentiment analysis Speech recognition Speech segmentation Topic segmentation and recognition Word segmentation Word sense disambiguation Information retrieval (IR) Information extraction (IE) Speech processing 15
  16. 16. Deep Learning: Why for NLP ? 16
  17. 17. • What is the meaning of a word?
 (Lexical semantics) • What is the meaning of a sentence?
 ([Compositional] semantics) • What is the meaning of a longer piece of text? (Discourse semantics) Semantics: Meaning 18
  18. 18. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 ! 19
  19. 19. Word Representation 20
  20. 20. • Structure corresponds to meaning: Structure and Meaning 21
  21. 21. • Semantics • Syntax 22 NLP: what can we work with?
  22. 22. • Language models define probability distributions over (natural language) strings or sentences • Joint and Conditional Probability Language Model 23
  23. 23. • Language models define probability distributions over (natural language) strings or sentences Language Model 24
  24. 24. • Language models define probability distributions over (natural language) strings or sentences Language Model 25
  25. 25. Word senses What is the meaning of words? • Most words have many different senses:
 dog = animal or sausage? How are the meanings of different words related? • - Specific relations between senses:
 Animal is more general than dog. • - Semantic fields:
 money is related to bank 26
  26. 26. Word senses Polysemy: • A lexeme is polysemous if it has different related senses • bank = financial institution or building Homonyms: • Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation • bank = (financial) bank or (river) bank 27
  27. 27. Word senses: relations Symmetric relations: • Synonyms: couch/sofa
 Two lemmas with the same sense • Antonyms: cold/hot, rise/fall, in/out
 Two lemmas with the opposite sense Hierarchical relations: • Hypernyms and hyponyms: pet/dog
 The hyponym (dog) is more specific than the hypernym (pet) • Holonyms and meronyms: car/wheel
 The meronym (wheel) is a part of the holonym (car) 28
  28. 28. Distributional representations “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking • Hard (class based) clustering models • Soft clustering models 29
  29. 29. Distributional hypothesis He filled the wampimuk, passed it around and we all drunk some We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) 30
  30. 30. Distributional semantics Landauer and Dumais (1997), Turney and Pantel (2010), … 31
  31. 31. Distributional semantics Distributional meaning as co-occurrence vector: 32
  32. 32. Distributional representations • Taking it further: • Continuous word embeddings • Combine vector space semantics with the prediction of probabilistic models • Words are represented as a dense vector: Candy = 33
  33. 33. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: 34
  34. 34. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born 35
  35. 35. • Can theoretically (given enough units) approximate “any” function • and fit to “any” kind of data • Efficient for NLP: hidden layers can be used as word lookup tables • Dense distributed word vectors + efficient NN training algorithms: • Can scale to billions of words ! Why Neural Networks for NLP? 36
  36. 36. • Representation of words as continuous vectors has a long history (Hinton et al. 1986; Rumelhart et al. 1986; Elman 1990) • First neural network language model: NNLM (Bengio et al. 2001; Bengio et al. 2003) based on earlier ideas of distributed representations for symbols (Hinton 1986) How? 37
  37. 37. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born ? … 38
  38. 38. Compositionality Principle of compositionality: the “meaning (vector) of a complex expression (sentence) is determined by: — Gottlob Frege 
 (1848 - 1925) - the meanings of its constituent expressions (words) and - the rules (grammar) used to combine them” 39
  39. 39. • How do we handle the compositionality of language in our models? 40 Compositionality
  40. 40. • How do we handle the compositionality of language in our models? • Recursion :
 the same operator (same parameters) is applied repeatedly on different components 41 Compositionality
  41. 41. • How do we handle the compositionality of language in our models? • Option 1: Recurrent Neural Networks (RNN) 42 RNN 1: Recurrent Neural Networks
  42. 42. • How do we handle the compositionality of language in our models? • Option 2: Recursive Neural Networks (also sometimes called RNN) 43 RNN 2: Recursive Neural Networks
  43. 43. • achieved SOTA in 2011 on Language Modeling (WSJ AR task) (Mikolov et al., INTERSPEECH 2011): • and again at ASRU 2011: 44 Recurrent Neural Networks “Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.” “[ RNN LM trained on a] single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup” Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  44. 44. 45 Recurrent Neural Networks (simple recurrent 
 neural network for LM) input hidden layer(s) output layer + sigmoid activation function + softmax function: Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  45. 45. 46 Recurrent Neural Networks backpropagation through time
  46. 46. 47 Recurrent Neural Networks backpropagation through time class based recurrent NN [code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
  47. 47. • Recursive Neural Network for LM (Socher et al. 2011; Socher 2014) • achieved SOTA on new Stanford Sentiment Treebank dataset (but comparing it to many other models): Recursive Neural Network 48 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  48. 48. Recursive Neural Tensor Network 49 code & info: http://www.socher.org/index.php/Main/ ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 
 Parsing Natural Scenes and Natural Language with Recursive Neural Networks
  49. 49. Recursive Neural Tensor Network 50
  50. 50. • RNN (Socher et al. 2011a) Recursive Neural Network 51 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  51. 51. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) Recursive Neural Network 52 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  52. 52. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) • Recursive Neural Tensor Network (RNTN) (Socher et al. 2013) Recursive Neural Network 53
  53. 53. • negation detection: Recursive Neural Network 54 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  54. 54. NP PP/IN NP DT NN PRP$ NN Parse Tree Recurrent NN for Vector Space 55
  55. 55. NP PP/IN NP DT NN PRP$ NN Parse Tree INDT NN PRP NN Compositionality 56 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  56. 56. NP IN NP PRP NN Parse Tree DT NN Compositionality 57 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  57. 57. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” “meanings” Compositionality 58 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  58. 58. Vector Space + Word Embeddings: Socher 59 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  59. 59. Vector Space + Word Embeddings: Socher 60 Recurrent NN for Vector Space
  60. 60. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/61
  61. 61. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/ 62
  62. 62. Word Embeddings: Collobert & Weston (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch 63
  63. 63. Multi-embeddings: Stanford (2012) Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)
 Improving Word Representations via Global Context and Multiple Word Prototypes 64
  64. 64. Linguistic Regularities: Mikolov (2013) code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations 65
  65. 65. Word Embeddings for MT: Mikolov (2013) Mikolov, T., Le, V. L., Sutskever, I. (2013) . 
 Exploiting Similarities among Languages for Machine Translation 66
  66. 66. Word Embeddings for MT: Kiros (2014) 67
  67. 67. Recursive Deep Models & Sentiment: Socher (2013) Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. code & demo: http://nlp.stanford.edu/sentiment/index.html 68
  68. 68. Paragraph Vectors: Le & Mikolov (2014) Le, Q., Mikolov,. T. (2014) Distributed Representations of Sentences and Documents 69 • add context (sentence, paragraph, document) to word vectors during training ! Results on Stanford Sentiment 
 Treebank dataset:
  69. 69. Paragraph Vectors: Dai et al. (2014) 70
  70. 70. Paragraph Vectors: Dai et al. (2014) 71
  71. 71. Paragraph Vectors: Dai et al. (2014) 72
  72. 72. Global Vectors, GloVe: Stanford (2014) Pennington, P., Socher, R., Manning,. D.M. (2014). 
 GloVe: Global Vectors for Word Representation code & demo: http://nlp.stanford.edu/projects/glove/ vs results on the word analogy task “similar accuracy” 73
  73. 73. Dependency-based Embeddings: Levy & Goldberg (2014) Levy, O., Goldberg, Y. (2014). Dependency-Based Word Embeddings code & demo: https://levyomer.wordpress.com/2014/04/25/dependency-based-word- embeddings/ - Syntactic Dependency Context Australian scientist discovers star with telescope - Bag of Words (BoW) Context 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ Precision$ Recall$ “Dependency-based embeddings have more functional similarities” 74
  74. 74. • LSTMS • Attention Wanna Play ? Recent breakthroughs 75
  75. 75. • LSTMS • Attention Wanna Play ? Recent breakthroughs 76
  76. 76. Wanna Play ? LSTM 77
  77. 77. • LSTMS • Attention Wanna Play ? Recent breakthroughs 78
  78. 78. Attention Gregor et al (2015) DRAW: A Recurrent Neural Network For Image Generation (arxiv) (code)
  79. 79. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 80
  80. 80. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 81
  81. 81. Wanna Play ? QA & Memory 82 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Weston et al (2015) Memory Networks (arxiv)
  82. 82. QA & Memory 83 Yyer et al. (2014) A Neural Network for Factoid Question Answering over Paragraphs (paper)
  83. 83. Wanna Play ? QA & Memory 84 • Memory Networks (Weston et al 2015) • Dynamic Memory Network (Kumar et al 2015) • Neural Turing Machine (Graves et al 2014) Facebook Metamind DeepMind Zaremba & Sutskever (2015) Learning to Execute (arxiv)
  84. 84. Wanna Play ? QA & Memory 85 Babl Dataset
  85. 85. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 86
  86. 86. Wanna Play ? Text generation 87 Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  87. 87. Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)
  88. 88. • Question-Answering Systems (&Memory) • Summarization • Text Generation • Dialogue Systems • Image Captioning & other multimodal tasks Wanna Play ? Recent breakthroughs 91
  89. 89. Image-Text Embeddings 92 Socher et al (2013) Zero Shot Learning Through Cross-Modal Transfer (info)
  90. 90. Image-Captioning • Andrej Karpathy Li Fei-Fei , 2015. 
 Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code) • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A Neural Image Caption Generator (arxiv) • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arxiv) (info) (code)
  91. 91. “A person riding a motorcycle on a dirt road.”??? Image-Captioning
  92. 92. “Two hockey players are fighting over the puck.”??? Image-Captioning
  93. 93. “A stop sign is flying in blue skies.” “A herd of elephants flying in the blue skies.” Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples) Image-Captioning
  94. 94. • TensorFlow: Recently released library by Google. 
 http://tensorflow.org • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http://deeplearning.net/software/ theano/ • Caffe - Computer Vision oriented Deep Learning framework: caffe.berkeleyvision.org • Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ Wanna Play ? General Deep Learning 97
  95. 95. • RNNLM (Mikolov)
 http://rnnlm.org • NB-SVM
 https://github.com/mesnilgr/nbsvm • Word2Vec (skipgrams/cbow)
 https://code.google.com/p/word2vec/ (original)
 http://radimrehurek.com/gensim/models/word2vec.html (python) • GloVe
 http://nlp.stanford.edu/projects/glove/ (original)
 https://github.com/maciejkula/glove-python (python) • Socher et al / Stanford RNN Sentiment code:
 http://nlp.stanford.edu/sentiment/code.html • Deep Learning without Magic Tutorial:
 http://nlp.stanford.edu/courses/NAACL2013/ Wanna Play ? NLP 98
  96. 96. Questions? roelof@kth.se www.csc.kth.se/~roelof/ 99 Code & Papers: Collaborative Open Computer Science .com @graphific

×