SlideShare a Scribd company logo
1 of 53
Download to read offline
What Do Neural Models "Know" About Natural Language?
Ekaterina Vylomova
Vylomova, Ekaterina Neural models and Natural Language 1 / 53
1943: Artificial Neuron (McCulloch-Pitts)
... or, in other words, ˆy = f ( n
i=1 wi xi + b),
Vylomova, Ekaterina Neural models and Natural Language 2 / 53
1943: Artificial Neuron (McCulloch-Pitts)
... or, in other words, ˆy = f ( n
i=1 wi xi + b),
and activation function might be sigmoid: sig(x) = 1
1+e−x
Vylomova, Ekaterina Neural models and Natural Language 3 / 53
1957: Simple Perceptron
The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain
Trained with trial-and-error method
It can:
– generalize over characters
– discover character-specific features
But:
– failed to recognized badly written/different
size/partially closed characters
Vylomova, Ekaterina Neural models and Natural Language 4 / 53
1960s: Single Layer Perceptron
The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain
Perceptrons: an introduction to
computational geometry
XOR Problem
Vylomova, Ekaterina Neural models and Natural Language 5 / 53
1980s: Multi-Layer Perceptrons with Back-Propagation
Learning Internal Representations by Error Propagation
Solving problems with non-linearly
separable cases
Vylomova, Ekaterina Neural models and Natural Language 6 / 53
1980s: The Past Tense Debate
Rumelhart & McClelland (1985): On learning the past tenses of
English verbs
Vylomova, Ekaterina Neural models and Natural Language 7 / 53
1980s: The Past Tense Debate
Rumelhart & McClelland (1985): On learning the past tenses of
English verbs
Pinker &Prince, 1988: Extremely poor
empirical performance!
Vylomova, Ekaterina Neural models and Natural Language 8 / 53
1990s: RNNs
Finding structure in time
Exploring
– context-dependent learning
– structure in letter sequences
– learning lexical classes from word order
Vylomova, Ekaterina Neural models and Natural Language 9 / 53
1990s: CNNs
Backpropagation Applied to Handwritten Zip Code Recognition
Training
Data: 9,298 segmented numerals from U.S.
mail
Mislassified: Training – 0.14%; Test – 5.0%
Vylomova, Ekaterina Neural models and Natural Language 10 / 53
Meanwhile in NLP: Language Modelling (mostly Ngrams with Kneser-Ney
smoothing)
OK, Marvin, which word comes next: Two cats are ___
Hmmm, let me guess ...
sitting 3.01 ∗ 10−4
play 2.87 ∗ 10−4
running 2.53 ∗ 10−4
nice 2.32 ∗ 10−4
lost 1.97 ∗ 10−4
playing 1.66 ∗ 10−4
sat 1.54 ∗ 10−4
plays 1.32 ∗ 10−4
. .Vylomova, Ekaterina Neural models and Natural Language 11 / 53
2013: Word2Vec Skip-Gram
Distributed Representations of Words and Phrases and their
Compositionality
Training Objective
1
T
T
t=1 −c≤j≤c logp(wt+j |wt)
p(wo|wi ) = exp(v T
wo vwi )
W
w=1 exp(v T
w vwi )
For efficiency, softmax was replaced with Negative
Sampling.
Levy et al., 2015 experimented with positive pointwise
mutual information (PMI) matrix and showed that
Word2vec Skip-Gram with NS is implicit matrix
factorization.
Vylomova, Ekaterina Neural models and Natural Language 12 / 53
2013: Word2Vec CBOW
Efficient Estimation of Word Representations in Vector Space
Training Objective
1
T
T
t=1 logp(wt|w[t−c,t+c])
p(wo|wi ) =
exp(v T
wo −c≤j≤c vwi+j
)
W
w=1 exp(v T
w −c≤j≤c vwi+j
)
Vylomova, Ekaterina Neural models and Natural Language 13 / 53
2013: Word2Vec
Linear Relations and Compositionality
Vylomova, Ekaterina Neural models and Natural Language 14 / 53
2013: Word2Vec: Word Analogies
Linear Relations and Compositionality: Russia + river =
Volga_river
Vylomova, Ekaterina Neural models and Natural Language 15 / 53
2013: Word2Vec: Word Analogies
Linear Relations and Compositionality: king-man+woman = queen?
Vylomova, Ekaterina Neural models and Natural Language 16 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 17 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 18 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 19 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 20 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 21 / 53
Pre-trained Word2Vec (Google News): Bias and Stereotypes
Man is to Computer Programmer as Woman is to Homemaker?
Vylomova, Ekaterina Neural models and Natural Language 22 / 53
Word2vec trained of Reddit data: Bias and Stereotypes
Black is to Criminal as Caucasian is to Police
Vylomova, Ekaterina Neural models and Natural Language 23 / 53
Data Bias and Stereotypes
Gendered Language
Positive adjectives describing women are often related to their bodies, while positive adjectives
describing men are often related to their behavior.
Vylomova, Ekaterina Neural models and Natural Language 24 / 53
Word2Vec and similar models
What do the models learn?
Morphology
– Are capable of learning inflections but not much derivations (less regular and compositional)
Lexical Semantics
– Challenging, especially meronyms, antonyms, synonyms
Major Difficulties
– Polysemy (all word senses in a single vector)
– Negation
Vylomova, Ekaterina Neural models and Natural Language 25 / 53
Broader context – back to RNNs!
Vylomova, Ekaterina Neural models and Natural Language 26 / 53
Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014)
Vylomova, Ekaterina Neural models and Natural Language 27 / 53
The resulting LSTM has 384M params
64M are pure recurrent connections
BUT: Longer contexts – lower quality (vanishing gradient)
Long Short-Term Memory will solve it!
Vylomova, Ekaterina Neural models and Natural Language 28 / 53
Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014
Vylomova, Ekaterina Neural models and Natural Language 29 / 53
PCA projection of LSTM hidden state of the corresponding sequences
We can also use both directions (to encode source language)
Vylomova, Ekaterina Neural models and Natural Language 30 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
A whole sentence shouldn’t be compressed into a single vector! Use
Attention!
Vylomova, Ekaterina Neural models and Natural Language 31 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
A whole sentence shouldn’t be compressed into a single vector! Use
Attention!
Vylomova, Ekaterina Neural models and Natural Language 32 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
It learns alignment and it can be visualized!
Vylomova, Ekaterina Neural models and Natural Language 33 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
What do the models learn?
Belinkov et al., 2018a, 2018b
– Higher layers are better at learning semantics while lower layers tend to be better for
part-of-speech tagging
– Lower layers of the neural network are better at capturing morphology
Linzen et al., 2018, 2020
English Subject-Verb agreement:
–LSTMs were able to learn to perform the verb-number agreement task in most cases, although
their error rate increased on particularly difficult sentences.
– the LM objective is not by itself sufficient for learning structure-sensitive dependencies, and
suggest a joint training objective
Vylomova, Ekaterina Neural models and Natural Language 34 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
What do the models learn?
Vylomova et al., 2019
– Contextual inflection in 10 languages: Three little kitten were _sit_ on the mat. Predict:
sitting
– Agreement: Adjective-Noun ok, Subject-Verb more challenging
– Morphological complexity matters (Uralic languages are more challenging than Germanic)
– Inherent vs. contextual categories. Inherent (tense, noun number, w/o agreement or extra
signal) cannot be predicted
Vylomova, Ekaterina Neural models and Natural Language 35 / 53
Back to Past Tense Debate: Seq2Seq Models w/Attention
Kirov & Cotterell,2018: The model obviates most of Pinker and
Prince’s criticisms
SIGMORPHON 2016 Shared Task
Task 1: run + V;PRES;3SG → (runs)
On Arabic, Finnish, Georgian, German, Hungarian, Maltese, Navajo, Russian, Spanish
Vylomova, Ekaterina Neural models and Natural Language 36 / 53
Lake et al., 2018: Compositionality of RNNs
Vylomova, Ekaterina Neural models and Natural Language 37 / 53
Simplified version of the CommAI Navigation tasks
Lake et al., 2018: Compositionality of RNNs
Vylomova, Ekaterina Neural models and Natural Language 38 / 53
Simplified version of the CommAI Navigation tasks
Successful zero-shot generalizations when the differences between training and test command
Trained on "run", "jump" and "run twice" fails on "jump twice"
Contextualized Embeddings: Addressing the problem with polysemy!
Context matters! ELMo: Let’s make context-specific embeddings!
Features
– Two Independent(!) LSTMs
– Pre-trained embeddings
– Weighted-task specific sum of
embeddings (two hidden state +
word vector)
Vylomova, Ekaterina Neural models and Natural Language 39 / 53
Self-Attention (Cheng et al., 2016)
Relate parts of a single sequence to compute its representation
Vylomova, Ekaterina Neural models and Natural Language 40 / 53
Shows similarity to other parts!
Helpful for coreference resolution!
Contextualized Embeddings
Transformer: Attention is All you Need
Features
– No recursion but wide window (somewhat similar to
CNN)
– positional embeddings (to access token positions)
– Self-attention with several heads (matrices) and separate
key, query and value (masks)
Vylomova, Ekaterina Neural models and Natural Language 41 / 53
Contextualized Embeddings
BERT: Deep Bidirectional Transformers
Features
– Trained on: Masked tokens prediction + Next sentence prediction (binary) – BPE tokenization
– Window: 512, CLS – classification
Vylomova, Ekaterina Neural models and Natural Language 42 / 53
Contextualized Embeddings
BERT: Deep Bidirectional Transformers
Vylomova, Ekaterina Neural models and Natural Language 43 / 53
Contextualized Embeddings
BERTs
BERT BASE(L=12, H=768, A=12, Total Parameters=110M)
BERT LARGE(L=24, H=1024,A=16, Total Parameters=340M).
Vylomova, Ekaterina Neural models and Natural Language 44 / 53
Contextualized Embeddings: BERT
Vylomova, Ekaterina Neural models and Natural Language 45 / 53
Contextualized Embeddings: Word Sense Disambiguation
Word Sense Disambiguation
"A mouse consists of an object held in one’s hand, with one or more buttons."
"Mouse" – an electronic device
Vylomova, Ekaterina Neural models and Natural Language 46 / 53
Contextualized Embeddings: Word Sense Disambiguation
Word Sense Disambiguation
"A mouse consists of an object held in one’s hand, with one or more buttons."
"Mouse" – an electronic device
Vylomova, Ekaterina Neural models and Natural Language 47 / 53
Contextualized Embeddings: Coreference Resolution
Coreference resolution task
The secretary called the physician and told _him_ about a new patient.
him → physician
Vylomova, Ekaterina Neural models and Natural Language 48 / 53
Contextualized Embeddings: Coreference Resolution
Gender Bias in Coreference Resolution
WinoBias: a Winograd-schema style sentences with entities corresponding to people referred by
their occupation
Vylomova, Ekaterina Neural models and Natural Language 49 / 53
Contextualized Embeddings: Bias, bias, bias
Zhao et al., 2019
– Coref SOTA system that depends on ELMo inherits its bias and demonstrates significant bias
on the WinoBias
– training data for ELMo contains significantly more male than female entities
– the trained ELMo embeddings systematically encode gender information
– ELMo unequally encodes gender information about male and female entities
Vylomova, Ekaterina Neural models and Natural Language 50 / 53
Contextualized Embeddings: What does BERT know (Rogers et al., 2020)?
Syntax
– Representations are hierarchical rather than linear and encode POS and syntactic roles(Liu et
al., 2019a,b)
– Does not “understand” negation and is insensitive to malformed input (Ettinger, 2019)
Semantics
– Has some knowledge for semantic roles(Ettinger, 2019)
– Struggles with representations of numbers (floating point; Wallace et al., 2019b)
World Knowledge
– Cannot reason based on its world knowledge ("A dog entered the room" doesn’t yield that
"room is larger than the dog")
Vylomova, Ekaterina Neural models and Natural Language 51 / 53
Extra resources
NLP Progress
Hugging Face – Models
"Embeddings in Natural Language Processing" book
"Dive into Deep Learning" interactive book
Vylomova, Ekaterina Neural models and Natural Language 52 / 53
Thank you! Questions?
Vylomova, Ekaterina Neural models and Natural Language 53 / 53

More Related Content

What's hot

Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...Ilia Karpov
 
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Readingengedukamall
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisMostapha Benhenda
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesValeria de Paiva
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
Deciability (automata presentation)
Deciability (automata presentation)Deciability (automata presentation)
Deciability (automata presentation)Sagar Kumar
 
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple OntologiesA Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple OntologiesJie Bao
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Cs6503 theory of computation book notes
Cs6503 theory of computation book notesCs6503 theory of computation book notes
Cs6503 theory of computation book notesappasami
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDaisuke BEKKI
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Jie Bao
 

What's hot (20)

Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
Deciability (automata presentation)
Deciability (automata presentation)Deciability (automata presentation)
Deciability (automata presentation)
 
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple OntologiesA Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Cs6503 theory of computation book notes
Cs6503 theory of computation book notesCs6503 theory of computation book notes
Cs6503 theory of computation book notes
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
presentation
presentationpresentation
presentation
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
AI Lesson 09
AI Lesson 09AI Lesson 09
AI Lesson 09
 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)
 

Similar to Ekaterina vylomova-what-do-neural models-know-about-language-p1

Interview presentation
Interview presentationInterview presentation
Interview presentationJoseph Gubbins
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsDimitrios Kartsaklis
 
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...Katerina Vylomova
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsAndré Karpištšenko
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 

Similar to Ekaterina vylomova-what-do-neural models-know-about-language-p1 (13)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Interview presentation
Interview presentationInterview presentation
Interview presentation
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
 
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
ACL13_sakaguchi
ACL13_sakaguchiACL13_sakaguchi
ACL13_sakaguchi
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 

More from Katerina Vylomova

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Katerina Vylomova
 
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...Katerina Vylomova
 
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionSigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionKaterina Vylomova
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionKaterina Vylomova
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Katerina Vylomova
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyKaterina Vylomova
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological InflectionKaterina Vylomova
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyKaterina Vylomova
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Katerina Vylomova
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Katerina Vylomova
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Katerina Vylomova
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsKaterina Vylomova
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian languageKaterina Vylomova
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationKaterina Vylomova
 

More from Katerina Vylomova (15)

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages
 
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
 
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionSigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological Inflection
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational Morphology
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chants
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian language
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentation
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Ekaterina vylomova-what-do-neural models-know-about-language-p1

  • 1. What Do Neural Models "Know" About Natural Language? Ekaterina Vylomova Vylomova, Ekaterina Neural models and Natural Language 1 / 53
  • 2. 1943: Artificial Neuron (McCulloch-Pitts) ... or, in other words, ˆy = f ( n i=1 wi xi + b), Vylomova, Ekaterina Neural models and Natural Language 2 / 53
  • 3. 1943: Artificial Neuron (McCulloch-Pitts) ... or, in other words, ˆy = f ( n i=1 wi xi + b), and activation function might be sigmoid: sig(x) = 1 1+e−x Vylomova, Ekaterina Neural models and Natural Language 3 / 53
  • 4. 1957: Simple Perceptron The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain Trained with trial-and-error method It can: – generalize over characters – discover character-specific features But: – failed to recognized badly written/different size/partially closed characters Vylomova, Ekaterina Neural models and Natural Language 4 / 53
  • 5. 1960s: Single Layer Perceptron The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain Perceptrons: an introduction to computational geometry XOR Problem Vylomova, Ekaterina Neural models and Natural Language 5 / 53
  • 6. 1980s: Multi-Layer Perceptrons with Back-Propagation Learning Internal Representations by Error Propagation Solving problems with non-linearly separable cases Vylomova, Ekaterina Neural models and Natural Language 6 / 53
  • 7. 1980s: The Past Tense Debate Rumelhart & McClelland (1985): On learning the past tenses of English verbs Vylomova, Ekaterina Neural models and Natural Language 7 / 53
  • 8. 1980s: The Past Tense Debate Rumelhart & McClelland (1985): On learning the past tenses of English verbs Pinker &Prince, 1988: Extremely poor empirical performance! Vylomova, Ekaterina Neural models and Natural Language 8 / 53
  • 9. 1990s: RNNs Finding structure in time Exploring – context-dependent learning – structure in letter sequences – learning lexical classes from word order Vylomova, Ekaterina Neural models and Natural Language 9 / 53
  • 10. 1990s: CNNs Backpropagation Applied to Handwritten Zip Code Recognition Training Data: 9,298 segmented numerals from U.S. mail Mislassified: Training – 0.14%; Test – 5.0% Vylomova, Ekaterina Neural models and Natural Language 10 / 53
  • 11. Meanwhile in NLP: Language Modelling (mostly Ngrams with Kneser-Ney smoothing) OK, Marvin, which word comes next: Two cats are ___ Hmmm, let me guess ... sitting 3.01 ∗ 10−4 play 2.87 ∗ 10−4 running 2.53 ∗ 10−4 nice 2.32 ∗ 10−4 lost 1.97 ∗ 10−4 playing 1.66 ∗ 10−4 sat 1.54 ∗ 10−4 plays 1.32 ∗ 10−4 . .Vylomova, Ekaterina Neural models and Natural Language 11 / 53
  • 12. 2013: Word2Vec Skip-Gram Distributed Representations of Words and Phrases and their Compositionality Training Objective 1 T T t=1 −c≤j≤c logp(wt+j |wt) p(wo|wi ) = exp(v T wo vwi ) W w=1 exp(v T w vwi ) For efficiency, softmax was replaced with Negative Sampling. Levy et al., 2015 experimented with positive pointwise mutual information (PMI) matrix and showed that Word2vec Skip-Gram with NS is implicit matrix factorization. Vylomova, Ekaterina Neural models and Natural Language 12 / 53
  • 13. 2013: Word2Vec CBOW Efficient Estimation of Word Representations in Vector Space Training Objective 1 T T t=1 logp(wt|w[t−c,t+c]) p(wo|wi ) = exp(v T wo −c≤j≤c vwi+j ) W w=1 exp(v T w −c≤j≤c vwi+j ) Vylomova, Ekaterina Neural models and Natural Language 13 / 53
  • 14. 2013: Word2Vec Linear Relations and Compositionality Vylomova, Ekaterina Neural models and Natural Language 14 / 53
  • 15. 2013: Word2Vec: Word Analogies Linear Relations and Compositionality: Russia + river = Volga_river Vylomova, Ekaterina Neural models and Natural Language 15 / 53
  • 16. 2013: Word2Vec: Word Analogies Linear Relations and Compositionality: king-man+woman = queen? Vylomova, Ekaterina Neural models and Natural Language 16 / 53
  • 17. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 17 / 53
  • 18. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 18 / 53
  • 19. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 19 / 53
  • 20. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 20 / 53
  • 21. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 21 / 53
  • 22. Pre-trained Word2Vec (Google News): Bias and Stereotypes Man is to Computer Programmer as Woman is to Homemaker? Vylomova, Ekaterina Neural models and Natural Language 22 / 53
  • 23. Word2vec trained of Reddit data: Bias and Stereotypes Black is to Criminal as Caucasian is to Police Vylomova, Ekaterina Neural models and Natural Language 23 / 53
  • 24. Data Bias and Stereotypes Gendered Language Positive adjectives describing women are often related to their bodies, while positive adjectives describing men are often related to their behavior. Vylomova, Ekaterina Neural models and Natural Language 24 / 53
  • 25. Word2Vec and similar models What do the models learn? Morphology – Are capable of learning inflections but not much derivations (less regular and compositional) Lexical Semantics – Challenging, especially meronyms, antonyms, synonyms Major Difficulties – Polysemy (all word senses in a single vector) – Negation Vylomova, Ekaterina Neural models and Natural Language 25 / 53
  • 26. Broader context – back to RNNs! Vylomova, Ekaterina Neural models and Natural Language 26 / 53
  • 27. Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014) Vylomova, Ekaterina Neural models and Natural Language 27 / 53 The resulting LSTM has 384M params 64M are pure recurrent connections
  • 28. BUT: Longer contexts – lower quality (vanishing gradient) Long Short-Term Memory will solve it! Vylomova, Ekaterina Neural models and Natural Language 28 / 53
  • 29. Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014 Vylomova, Ekaterina Neural models and Natural Language 29 / 53 PCA projection of LSTM hidden state of the corresponding sequences
  • 30. We can also use both directions (to encode source language) Vylomova, Ekaterina Neural models and Natural Language 30 / 53
  • 31. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) A whole sentence shouldn’t be compressed into a single vector! Use Attention! Vylomova, Ekaterina Neural models and Natural Language 31 / 53
  • 32. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) A whole sentence shouldn’t be compressed into a single vector! Use Attention! Vylomova, Ekaterina Neural models and Natural Language 32 / 53
  • 33. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) It learns alignment and it can be visualized! Vylomova, Ekaterina Neural models and Natural Language 33 / 53
  • 34. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) What do the models learn? Belinkov et al., 2018a, 2018b – Higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging – Lower layers of the neural network are better at capturing morphology Linzen et al., 2018, 2020 English Subject-Verb agreement: –LSTMs were able to learn to perform the verb-number agreement task in most cases, although their error rate increased on particularly difficult sentences. – the LM objective is not by itself sufficient for learning structure-sensitive dependencies, and suggest a joint training objective Vylomova, Ekaterina Neural models and Natural Language 34 / 53
  • 35. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) What do the models learn? Vylomova et al., 2019 – Contextual inflection in 10 languages: Three little kitten were _sit_ on the mat. Predict: sitting – Agreement: Adjective-Noun ok, Subject-Verb more challenging – Morphological complexity matters (Uralic languages are more challenging than Germanic) – Inherent vs. contextual categories. Inherent (tense, noun number, w/o agreement or extra signal) cannot be predicted Vylomova, Ekaterina Neural models and Natural Language 35 / 53
  • 36. Back to Past Tense Debate: Seq2Seq Models w/Attention Kirov & Cotterell,2018: The model obviates most of Pinker and Prince’s criticisms SIGMORPHON 2016 Shared Task Task 1: run + V;PRES;3SG → (runs) On Arabic, Finnish, Georgian, German, Hungarian, Maltese, Navajo, Russian, Spanish Vylomova, Ekaterina Neural models and Natural Language 36 / 53
  • 37. Lake et al., 2018: Compositionality of RNNs Vylomova, Ekaterina Neural models and Natural Language 37 / 53 Simplified version of the CommAI Navigation tasks
  • 38. Lake et al., 2018: Compositionality of RNNs Vylomova, Ekaterina Neural models and Natural Language 38 / 53 Simplified version of the CommAI Navigation tasks Successful zero-shot generalizations when the differences between training and test command Trained on "run", "jump" and "run twice" fails on "jump twice"
  • 39. Contextualized Embeddings: Addressing the problem with polysemy! Context matters! ELMo: Let’s make context-specific embeddings! Features – Two Independent(!) LSTMs – Pre-trained embeddings – Weighted-task specific sum of embeddings (two hidden state + word vector) Vylomova, Ekaterina Neural models and Natural Language 39 / 53
  • 40. Self-Attention (Cheng et al., 2016) Relate parts of a single sequence to compute its representation Vylomova, Ekaterina Neural models and Natural Language 40 / 53 Shows similarity to other parts! Helpful for coreference resolution!
  • 41. Contextualized Embeddings Transformer: Attention is All you Need Features – No recursion but wide window (somewhat similar to CNN) – positional embeddings (to access token positions) – Self-attention with several heads (matrices) and separate key, query and value (masks) Vylomova, Ekaterina Neural models and Natural Language 41 / 53
  • 42. Contextualized Embeddings BERT: Deep Bidirectional Transformers Features – Trained on: Masked tokens prediction + Next sentence prediction (binary) – BPE tokenization – Window: 512, CLS – classification Vylomova, Ekaterina Neural models and Natural Language 42 / 53
  • 43. Contextualized Embeddings BERT: Deep Bidirectional Transformers Vylomova, Ekaterina Neural models and Natural Language 43 / 53
  • 44. Contextualized Embeddings BERTs BERT BASE(L=12, H=768, A=12, Total Parameters=110M) BERT LARGE(L=24, H=1024,A=16, Total Parameters=340M). Vylomova, Ekaterina Neural models and Natural Language 44 / 53
  • 45. Contextualized Embeddings: BERT Vylomova, Ekaterina Neural models and Natural Language 45 / 53
  • 46. Contextualized Embeddings: Word Sense Disambiguation Word Sense Disambiguation "A mouse consists of an object held in one’s hand, with one or more buttons." "Mouse" – an electronic device Vylomova, Ekaterina Neural models and Natural Language 46 / 53
  • 47. Contextualized Embeddings: Word Sense Disambiguation Word Sense Disambiguation "A mouse consists of an object held in one’s hand, with one or more buttons." "Mouse" – an electronic device Vylomova, Ekaterina Neural models and Natural Language 47 / 53
  • 48. Contextualized Embeddings: Coreference Resolution Coreference resolution task The secretary called the physician and told _him_ about a new patient. him → physician Vylomova, Ekaterina Neural models and Natural Language 48 / 53
  • 49. Contextualized Embeddings: Coreference Resolution Gender Bias in Coreference Resolution WinoBias: a Winograd-schema style sentences with entities corresponding to people referred by their occupation Vylomova, Ekaterina Neural models and Natural Language 49 / 53
  • 50. Contextualized Embeddings: Bias, bias, bias Zhao et al., 2019 – Coref SOTA system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias – training data for ELMo contains significantly more male than female entities – the trained ELMo embeddings systematically encode gender information – ELMo unequally encodes gender information about male and female entities Vylomova, Ekaterina Neural models and Natural Language 50 / 53
  • 51. Contextualized Embeddings: What does BERT know (Rogers et al., 2020)? Syntax – Representations are hierarchical rather than linear and encode POS and syntactic roles(Liu et al., 2019a,b) – Does not “understand” negation and is insensitive to malformed input (Ettinger, 2019) Semantics – Has some knowledge for semantic roles(Ettinger, 2019) – Struggles with representations of numbers (floating point; Wallace et al., 2019b) World Knowledge – Cannot reason based on its world knowledge ("A dog entered the room" doesn’t yield that "room is larger than the dog") Vylomova, Ekaterina Neural models and Natural Language 51 / 53
  • 52. Extra resources NLP Progress Hugging Face – Models "Embeddings in Natural Language Processing" book "Dive into Deep Learning" interactive book Vylomova, Ekaterina Neural models and Natural Language 52 / 53
  • 53. Thank you! Questions? Vylomova, Ekaterina Neural models and Natural Language 53 / 53