Journal club: Quantitative models of neural language representation

Quantitative models of neural
language representation
Journal club
2020.02.10
Takuya Koumura

2020.02.10
Takuya KOUMURA
p. 2
Research paradigm
Assumption:
A good quantitative model of neural representation should
⚫be able to linearly encode
⚫be linearly decodable from
⚫have high similarity with
the brain activities.
But see: Raman, R. & Hosoya, H. CNN explains
tuning properties of anterior, but not middle, face-
processing areas in macaque IT. bioRxiv 1–33 (2019).
Corpus
Language
representation
Training
Evaluation
StimulusStimulus
Language
representation
✕
Linear encoding
Linear decoding
Representational
similarity

2020.02.10
Takuya KOUMURA
p. 3
Papers
⚫ Mitchell, T. M. et al. Predicting human brain activity associated with the meanings of nouns. Science. 320,
1191–1195 (2008).
⚫ Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the
semantic maps that tile human cerebral cortex. Nature. 532, 453–458 (2016).
⚫ Pereira, F. et al. Toward a universal decoder of linguistic meaning from brain activation. Nat. Commun. 9,
(2018).
⚫ Wehbe, L., Vaswani, A., Knight, K. & Mitchell, T. Aligning context-based statistical models of language
with brain activity during reading. EMNLP. 233–243 (2014).
⚫ Qian, P., Qiu, X. & Huang, X. Bridging LSTM Architecture and the Neural Dynamics during Reading. IJCAI.
1953–1959 (2016).
⚫ Jain, S. & Huth, A. Incorporating Context into Language Encoding Models for fMRI. NeurIPS. 6628–6637
(2018).
⚫ Abnar, S., Beinborn, L., Choenni, R. & Zuidema, W. Blackbox meets blackbox: Representational
Similarity and Stability Analysis of Neural Language Models and Brains. arxiv. (2019).
⚫ Gauthier, J. & Ivanova, A. Does the brain represent words? An evaluation of brain decoding studies of
language understanding. arxiv. (2018).
⚫ Sun, J., Wang, S., Zhang, J. & Zong, C. Towards Sentence-Level Brain Decoding with Distributed
Representations. AAAI. 33, 7047–7054 (2019).
⚫ Gauthier, J. & Levy, R. Linking artificial and human neural representations of language. arxiv. (2019).

2020.02.10
Takuya KOUMURA
p. 4
Summary Language representation Brain activities
Paper Paradigm Model Training data Recording Stimulus Evaluation
Mitchell
2008
Encoding Co-occurrence frequency (25
words)
LDC2006T13 fMRI 60 noun-picture pairs
(visual)
Pairwise
classification
Huth
2016
Encoding Co-occurrence frequency
(985 words)
Moth stories, books,
Wikipedia pages,
reddit.com
fMRI The Moth Radio Hour (audio) Pearson
correlation for each
voxel
Pereira
2018
Decoding GloVe Pre-trained fMRI Sentence, word & picture, word cloud
(visual)
Pairwise
classification, Rank
accuracy
Wehbe
2014
Encoding RNN, CNN Harry Potter fan
fiction database
MEG Chapter 9 of Harry Potter and the
Philosopher’s Stone (visual, word-by-word)
Pairwise
classification
Qian
2016
Encoding LSTM Harry Porter and the
Philosopher’s Stone
fMRI chapter 9 from Harry Porter and the
Philosopher’s Stone (visual, word-by-word)
Cosine distance
Jain
2018
Encoding LSTM reddit.com Huth 2016 Huth 2016 Sum of r2 across
voxels
Abnar
2019
RSA, encoding GloVe, ELMO, GoogleLM,
UniSentEnc, BERT
Pre-trained fMRI (from
another
Wehbe 2014)
chapter 9 of Harry Potter and the Sorcerer’s
stone
Representational
similarity, r2
Gauthier
2018
Decoding GloVe, LSTM, BiLSTM,
CNN+attention
Pre-trained Pereira 2018 Pereira 2018 Average rank
Sun
2019
Decoding
(Similarity based,
linear, MLP)
Average, max, FastSent, SIF,
Skip-thought, Quick-Thought,
InferSent, GenSen
Pre-trained Pereira 2018 Pereira 2018 Pairwise matching,
Ranking
Gauthier
2019
Decoding BERT fine-tuned for various
tasks
Pre-trained Pereira 2018 Pereira 2018 MSE, average
ranking

2020.02.10
Takuya KOUMURA
p. 5

2020.02.10
Takuya KOUMURA
p. 6
My conclusions & impressions
⚫Contexts often improve modeling of neural representation
⚫Model complexity and intelligence does not always improve
modeling of neural representation
⚫No study tried raw stimulus reconstruction (as far as I read)

2020.02.10
Takuya KOUMURA
p. 7

2020.02.10
Takuya KOUMURA
p. 8
Methods
⚫ Encoding model
⚫ Language representation
⚪ Distributional word
representation (25 dimensional)
⚫ The frequency with which a
word co-occurs with the 25
chosen verbs: “see, hear, listen,
taste, smell, eat, touch, rub, lift,
manipulate, run, push, fill,
move, ride, say, fear, open,
approach, near, enter, drive,
wear, break, clean”
⚫ In a very large text corpus
(LDC2006T13)
⚫Brain activities
⚪fMRI
⚪60 noun-picture pairs
⚪Visually presented

2020.02.10
Takuya KOUMURA
p. 9
Methods
⚫Evaluation
⚪Pairwise classification between 2 words (chance accuracy = 0.5)
← Randomly chosen pair
Which is closer?
(Similarity metric is not described?)

2020.02.10
Takuya KOUMURA
p. 10
Results
⚫Accuracy for 9 participants = 0.83, 0.76, 0.78, 0.72, 0.78, 0.85, 0.73,
0.68, 0.82
⚫Manually selected 25 verbs were the best
Randomly
selected
25 words

2020.02.10
Takuya KOUMURA
p. 11
Results
⚫“push” activates the right postcentral gyrus (premotor planning)
⚫“run” activates the posterior portion of the right superior
temporal sulcus (perception of biological motion)

2020.02.10
Takuya KOUMURA
p. 12
Results
⚫Locations of most accurately predicted voxels.

2020.02.10
Takuya KOUMURA
p. 13

2020.02.10
Takuya KOUMURA
p. 14
Methods
⚫ Encoding model
⚪ L2-regularized linear regerssion
⚪ Distributional word representation
(985 dimensional)
⚫ normalized co-occurrence between
each word and a set of 985 common
English words
⚫ Wikipedia’s List of 1000 Basic Words
(contrary to the title, this list contained only 985
unique words at the time it was accessed)
⚫ Dataset
⚪ 13 Moth stories (including the
stimuli for fMRI)
⚪ 604 popular books
⚪ 2,405,569 Wikipedia pages,
⚪ 36,333,459 user comments from
reddit.com
⚫ Brain activities
⚪ fMRI
⚪ > 2 hours of The Moth Radio Hour
⚫ Evaluation
⚪ Pearson correlation for each
voxel

2020.02.10
Takuya KOUMURA
p. 15
Results
They also analyzed the regression weights (skipped today)

2020.02.10
Takuya KOUMURA
p. 16

2020.02.10
Takuya KOUMURA
p. 17
Methods
⚫Decoding model
⚪L2-regularized linear regression
⚫Language representation
⚪GloVe (300 dimensional word representation)
⚫ Pennington, J., Socher, R. & Manning, C.D. GloVe: Global Vectors for Word Representation. Proc. Conf.
Emp. Meth. Nat. Lang. Proc. 1532–1543 (2014)
⚪For sentences: average of all words in the sentence
⚫Brain activities
⚪fMRI
⚪Stimuli
⚫Experiment 1: 180 manually selected words
⚫Experiment 2: 24 manually selected concepts
⚫Experiment 3: 24 manually selected concepts

2020.02.10
Takuya KOUMURA
p. 18
Methods: Experiment 1
⚫180 manually selected
words
⚪Selected among 30,000
words by clustering
based on the word
representation
⚫ 30,000 words: Brysbaert, M.,
Warriner, A. B. & Kuperman, V.
Concreteness ratings for 40
thousand generally known English
word lemmas. Behav. Res.
Methods 46, 904–911 (2014).
⚪128 nouns, 22 verbs,
23 adjectives, 6
adverbs, 1 function
word

2020.02.10
Takuya KOUMURA
p. 19
Methods: Experiment 1
⚫180 manually selected words
⚫Presented
⚪In a sentence
⚪With a picture
⚪With other related words

2020.02.10
Takuya KOUMURA
p. 20
Methods: Experiment 2 & 3
⚫Experiment 2
⚪24 manually selected concepts
⚪A sentence that provided basic information about the concept
⚫Experiment 3
⚪24 manually selected concepts
⚪A passage related to the concept

2020.02.10
Takuya KOUMURA
p. 21
Methods
⚫Evaluation
⚪Pairwise classification
⚪Rank accuracy
⚫1 if true sentence is
at the top
⚫0 if true sentence is
at the bottom

2020.02.10
Takuya KOUMURA
p. 22
Results

2020.02.10
Takuya KOUMURA
p. 23
Results
⚫Distribution of the informative voxels
⚪Determined to maximize the prediction accuracy within the
training set
⚪Color = a fraction of subjects

2020.02.10
Takuya KOUMURA
p. 24
Results
⚫Distribution of the informative voxels
⚪Language: the frontotemporal language-selective network
⚫ Fedorenko, E., Behr, M. K. & Kanwisher, N. Functional specificity for high- level linguistic processing in
the human brain. PNAS 108, 16248–16433 (2011)
⚪Default: the default mode network
⚫ Buckner, R. L., Andrews-Hanna, J. R. & Schacter, D. L. The brain’s default network: anatomy, function,
and relevance to disease. Ann. N. Y. Acad. Sci. 1124,1–38 (2008).
⚫ Binder, J. R., Desai, R. H., Graves, W. W. & Conant, L. L. Where is the semantic system? A critical review
and meta-analysis of 120 functional neuroimaging studies. Cereb. Cortex 19, 2767–2796 (2009).
⚪Task: the task-positive network
⚫ Power, J. D. et al. Functional network organization of the human brain. Neuron 72, 665–678 (2011).
⚫ Buckner 2008 (above)
⚫ Binder 2009 (above)
⚪Visual: the visual network
⚫ Power 2011 (above)
⚫ Buckner 2008 (above)

2020.02.10
Takuya KOUMURA
p. 25

2020.02.10
Takuya KOUMURA
p. 26
Methods
⚫Encoding model (linear, ridge)
⚪RNN
⚫w: one-hot
⚪CNN (they call it neural probabilistic LM)
⚫u: one-hot
⚪Dataset: Harry Potter fan fiction database

2020.02.10
Takuya KOUMURA
p. 27
Methods
⚫Brain activities
⚪MEG
⚪Stimulus
⚫Chapter 9 of Harry Potter and the Philosopher’s Stone
⚫Words were presented one by one at the center of the screen for
0.5 s
⚫Evaluation
⚪Pairwise classification

2020.02.10
Takuya KOUMURA
p. 28
⚫Accuracy using all the time window and sensors
Results
They also show accuracy for each MEG sensor (skipped today)

2020.02.10
Takuya KOUMURA
p. 29
Results: timing
MEG Time window
0 = onset of word i

2020.02.10
Takuya KOUMURA
p. 30

2020.02.10
Takuya KOUMURA
p. 31
Methods
⚫ Encoding model (linear)
⚪ LSTM
⚪ Dataset: Harry Porter and the
Philosopher’s Stone (excluding
chapter 9)
⚫ Brain activities
⚪ fMRI
⚪ chapter 9 from Harry Porter and
the Philosopher’s Stone
⚪ Words presented one by one for
0.5 s
⚫ Evaluation
⚪ Cosine distance, transformed to [0,
1]

2020.02.10
Takuya KOUMURA
p. 32
Results

2020.02.10
Takuya KOUMURA
p. 33
Results
⚫They also tested other language representation for comparison
⚪tf-idf: frequency-inverse document frequency, classical features
for document retrieval
⚪AveEmbedding: average embeddins of a word sequence
They also conducted ablation study (skipped today)

2020.02.10
Takuya KOUMURA
p. 34
Results
⚫Color: correlation between true
and predicted brain activities
⚫For a single subject

2020.02.10
Takuya KOUMURA
p. 35

2020.02.10
Takuya KOUMURA
p. 36
Methods
⚫Encoding model (linear, ridge)
⚪LSTM
⚪Dataset: reddit.com
⚫Brain activities
⚪fMRI
⚪> 2 hour of The Moth Radio Hour
⚪ Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech
reveals the semantic maps that tile human cerebral cortex. Nature. 532, 453–458 (2016).
⚫Evaluation
⚪Sum of r2 across voxels

2020.02.10
Takuya KOUMURA
p. 37
Results
⚫Context formed by random words
⚫Context of a different sentence

2020.02.10
Takuya KOUMURA
p. 38
Results

2020.02.10
Takuya KOUMURA
p. 39
Results

2020.02.10
Takuya KOUMURA
p. 40
Results

2020.02.10
Takuya KOUMURA
p. 41

2020.02.10
Takuya KOUMURA
p. 42
Methods
⚫Representational similarity analysis | encoding model
⚫Brain activities
⚪fMRI
⚪Stimulus: chapter 9 of Harry Potter and the Sorcerer’s stone
⚪ Wehbe L, Murphy B, Talukdar P, Fyshe A, Ramdas A, Mitchell T (2014) Simultaneously Uncovering the
Patterns of Brain Regions Involved in Different Story Reading Subprocesses. PLoS One 9:e112575
⚫Evaluation
⚪Representational similarity analysis
⚪R2

2020.02.10
Takuya KOUMURA
p. 43
ResultsRepresentationalsimilarity

2020.02.10
Takuya KOUMURA
p. 44
Results
⚫Representational similarity in one subject

2020.02.10
Takuya KOUMURA
p. 45
ResultsR2

2020.02.10
Takuya KOUMURA
p. 46

2020.02.10
Takuya KOUMURA
p. 47
Methods
⚫Decoding model (L2-regularized)
⚫Brain activities
⚪fMRI
⚪ Pereira, F. et al. Toward a universal decoder of linguistic meaning from brain activation. Nat. Commun.
9, (2018).
⚫Evaluation
⚪Mean average rank

2020.02.10
Takuya KOUMURA
p. 48
ResultsBetterWorse

2020.02.10
Takuya KOUMURA
p. 49

2020.02.10
Takuya KOUMURA
p. 50
Methods
⚫Decoding model
⚪Similarity based
⚪L2-regularized linear
⚪multilayer perceptron

2020.02.10
Takuya KOUMURA
p. 51
Methods
⚪Unstructured model
⚫Simple pooling of word representation
⚪Average
⚪Max-pooling
⚪Concatenation of averaging & max-pooling
⚫Parameterized pooling
⚪FastSent (Hill, F.; Cho, K.; and Korhonen, A. 2016. Learning distributed representations of
sentences from unlabelled data. NAACL-HLT)
⚪SIF (Arora, S.; Liang, Y.; and Ma, T. 2016. A simple but tough-to- beat baseline for sentence
embeddings. ICLR)

2020.02.10
Takuya KOUMURA
p. 52
Methods
⚪Structured model
⚫Unsupervised methods
⚪Skip-thought (Kiros, R.; Zhu, Y.; Salakhutdinov, R. R.; Zemel, R.; Urtasun, R.; Torralba, A.;
and Fidler, S. 2015. Skip-thought vectors. NeurIPS, 3294–3302)
⚪Quick-Thought (Logeswaran, L., and Lee, H. 2018. An efficient framework for learning
sentence representations. arXiv:1803.02893)
⚫Supervised methods
⚪InferSent (Conneau, A.; Kiela, D.; Schwenk, H.; Barrault, L.; and Bordes, A. 2017. Supervised
learning of universal sentence representations from natural language inference data. EMNLP)
⚪GenSen (Subramanian, S.; Trischler, A.; Bengio, Y.; and Pal, C. J. 2018. Learning general purpose
distributed sentence representations via large scale multi-task learning. ICLR)

2020.02.10
Takuya KOUMURA
p. 53
Methods
⚫Brain activities
⚪fMRI
⚪ Pereira, F. et al. Toward a universal decoder of linguistic meaning from brain activation. Nat.
Commun. 9, (2018).
⚫Evaluation
⚪Pairwise matching
⚪Ranking

2020.02.10
Takuya KOUMURA
p. 54
Results
Pairwise matching
of similarity based
decoding
Ranking

2020.02.10
Takuya KOUMURA
p. 55
Results

2020.02.10
Takuya KOUMURA
p. 56

2020.02.10
Takuya KOUMURA
p. 57
Methods
⚫Decoding model (L2-regularized linear)
⚪Sentence representation in BERT
⚪Fine tuned on several tasks
⚪And custom tasks (modifications of masked
language model pre-training)
⚫Scrambled within sentences
⚫Scrambled within paragraphs
⚫Predicting only part-of-speech

2020.02.10
Takuya KOUMURA
p. 58
BERT: Bidirectional Encoder Representations from Transformers
⚫ Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding.
⚫Architecture
⚪Stacked self-attention
⚫Pre-training
⚪Masked language model
⚪Next sentence prediction

2020.02.10
Takuya KOUMURA
p. 59
BERT: Bidirectional Encoder Representations from Transformers
⚫Fine-tuning for various task
⚫Performance
⚪State-of-the-art on 11 tasks

2020.02.10
Takuya KOUMURA
p. 60
Methods
⚫Brain activities
⚪fMRI (Pereira, F. et al. Toward a universal decoder of
linguistic meaning from brain activation. Nat. Commun. 9,
(2018).)
⚫Evaluation
⚪Mean squared error
⚪Average ranking

2020.02.10
Takuya KOUMURA
p. 61
Results
Pretrained BERT without finetuning
BetterWorse

Journal club: Quantitative models of neural language representation

Recommended

Recommended

More Related Content

Similar to Journal club: Quantitative models of neural language representation

Similar to Journal club: Quantitative models of neural language representation (20)

More from Takuya Koumura

More from Takuya Koumura (20)

Recently uploaded

Recently uploaded (20)

Journal club: Quantitative models of neural language representation