SlideShare a Scribd company logo
Word Embedding Interpretation through Sense Weights
Training
Xinyi Jiang
Emory University
xinyi.jiang2@emory.edu
July 13, 2019
Xinyi Jiang (Emory University) July 13, 2019 1 / 31
Introduction
Word embeddings, representations of words produced based on their
distributional semantics and context, has been an important tool utilized
in various NLP tasks.
However, the dense quality of embeddings has also made it hard to
interpret meanings associated with each dimension, making word
embeddings black-box-like.
What is in word embeddidngs?
Xinyi Jiang (Emory University) July 13, 2019 2 / 31
Motivations
great improvements in recent word embeddings
increase in dimension size of word embeddings
popularity of context-based embedding models
Xinyi Jiang (Emory University) July 13, 2019 3 / 31
Objectives
evaluate word embeddings’ ability to infer sense information
determine the value of embedding dimensions
Xinyi Jiang (Emory University) July 13, 2019 4 / 31
Background
Distributional Word Embeddings
Word2Vec, GLoVe, Fasttext
Token as input
Dictionary based, embedding for every type of token
Fast
Contextualized Word Embeddings
ELMo, Flair, BERT
Sentence as input
Embedding for every token
Usually better performance in extrinsic evaluations
Xinyi Jiang (Emory University) July 13, 2019 5 / 31
Background
Word Sense
WordNet
Intrinsic Evaluations
Word Similarity Test
Word Analogy Test : y can be predicted by x + x − y. Example:
woman = king + man - queen
Subconscious Intrinsic Evaluations
Extrinsic Evaluations
Downstream NLP tasks:
POS tagging
Named Entity Recognition
Sentiment Analysis
Coreference Resolution
Xinyi Jiang (Emory University) July 13, 2019 6 / 31
Related Works in interpretability of word embeddings
Non-Negative Sparse Embedding (NNSE): a variant of sparse matrix
factorization [Murphy et al., 2012].
Modify the learning procedure of current embedding
models [Luo et al., 2015, Jang and Myaeng, 2017, Ko¸c et al., 2018].
Utilization of pre-trained word embeddings [Arora et al., 2016,
Zobnin, 2017, Park et al., 2017, Subramanian et al., 2017]
Xinyi Jiang (Emory University) July 13, 2019 7 / 31
Approach
Figure: ELMo [Peters et al., 2017]
Xinyi Jiang (Emory University) July 13, 2019 8 / 31
Approach
Figure: Flair [Akbik et al., 2018]
Figure: BERT [Devlin et al., 2018]
Xinyi Jiang (Emory University) July 13, 2019 9 / 31
Approach
Xinyi Jiang (Emory University) July 13, 2019 10 / 31
Approach
Contextualized Word Embedding in Conversational Dataset:
Dataset: the Friends TV show transcript
Method:
1 ELMo model (256) and Flair model (2048) are trained on the dataset
2 pre-trained ELMo model (256) and Flair (2048) models are tuned on
the dataset
3 pair-wise similarity ranked
4 Affinity Propagation Cluster
Xinyi Jiang (Emory University) July 13, 2019 11 / 31
Approach Test Result
Trained Tuned
Flair
preparation presentation
2 1
explanation explosion
connection collection
intimidating interesting
description decision
confidence coincidence
1 2
realise realize
he she
could can
are were
They You
somebody someone
ELMo
Barber Farber
Tribbiani Trrrribbiani
Yeller Geller
Lou You
The She
Bing Ring
friend girlfriend
a an
why Why
Where where
Of of
The the
What what
If if
Correlation: 0.000748, 0.001497
Xinyi Jiang (Emory University) July 13, 2019 12 / 31
Approach Test Result
Flair
Trained
forgot forgive follow forgotten forget
whoever however whenever whatever
really finally
talking telling thinking calling taking
Tuned
gets announces brings starts takes saves wins goes pulls
choosing tryin hoping deciding trying working starting
daughter uncle sister sisters
left spent lost kept slept ate hit stole stayed died ended
Xinyi Jiang (Emory University) July 13, 2019 13 / 31
Approach
Models Used & Datasets
Pre-trained Word Embeddings: ELMo with dimensions 256, 512 and
1024 with various output layer; Flair with dimensions 2048 and 4096;
BERT with dimensions 768 and 1024.
Datasets:
1 SemCor: annotated corpus containing 360,000 words with over 200,000
annotated
2 SCWS: 2003 word pairs along with their sentential contexts and human
ratings on paired word similarities
Xinyi Jiang (Emory University) July 13, 2019 14 / 31
Approach
Hypothesis
Some dimensions in word embeddings do not contribute to the meaning of
word sense.
Preliminary Tests
cosine similarity test and Normalized Euclidean test on sense groups
based on the SemCor dataset
correlation coefficient based on the SCWS dataset
Xinyi Jiang (Emory University) July 13, 2019 15 / 31
Approach
PCA approach for extracting common vectors within sense groups.
Algorithm 1 PCA for extracting common elements in sense groups
Input Given Sense Group S, embeddings v(w),{w, w ∈S}, and threshold
parameter d.
calculate the mean vector.
µ ← 1
|S| w∈S v(w), ˜v(w) ← v(w) − µ
Step 1: Compute the top d+1 PCA components
Step 2: Get the common embedding
v (w) ← µ + d
i=1(uT
i ˜v(w)ui )
Output v (w)
Xinyi Jiang (Emory University) July 13, 2019 16 / 31
Approach Test Result
Test results for PCA method
Model Dim Out ρ ρmean
BERT 768 -2 0.56530 0.591308
BERT 1024 -2 0.40665 0.475711
ELMo 256 0 0.55436 0.608066
ELMo 256 1 0.64007 0.633749
ELMo 256 2 0.68201 0.652321
ELMo 256 av 0.64918 0.635427
ELMo 512 0 0.59265 0.644203
ELMo 512 1 0.73306 0.597286
ELMo 512 2 0.74948 0.671543
ELMo 512 av 0.72212 0.64714
ELMo 1024 0 0.60177 0.672977
ELMo 1024 1 0.62365 0.686678
ELMo 1024 2 0.69659 0.721001
ELMo 1024 av 0.61454 0.699179
Xinyi Jiang (Emory University) July 13, 2019 17 / 31
Approach
Algorithm
Training algorithm for embedding dimension weight interpretation and
sense vector extraction
Evaluation Methods
Mask out dimensions with low importance and perform intrinsic
evaluations
Xinyi Jiang (Emory University) July 13, 2019 18 / 31
Approach
Algorithm 2 algorithm for learning dimension weights
for each sense group SG do
initialize weights w, learning rate γ0
initialize Spre ← vi ,vj ∈SG Cosine(vi , vj )
for each epoch i do
if i < n then
randomly generate N
N1, · · · , NN
else
generate N numbers according to explore-exploitation policy
N1, · · · , NN
end if
mask arrays on dimensions N1, · · · , NN
Scur ← vi ,vj ∈SG
grad = (Spre − Scur ) ∗ (w − 1) − λ ∗ sign(w)
w ← w + grad ∗ γi
update γi
end for
end for
Xinyi Jiang (Emory University) July 13, 2019 19 / 31
Test and evaluations
Model dim out Ave Dis C Dis Ave Cos C Cos co-eff
BERT 768 -2 34.3856 19.30123 0.497062 0.683877 0.647989
BERT 1024 -2 39.74155 21.47615 0.58236 0.864297 0.594294
ELMo 256 0 10.20746 17.06137 0.596261 0.058296 0.647328
ELMo 256 1 17.01257 15.38437 0.511956 0.283216 0.63676
ELMo 256 2 18.35877 12.79827 0.451013 0.429331 0.560226
ELMo 256 av 17.06634 14.2695 0.493568 0.341095 0.62173
ELMo 512 0 13.58388 23.34382 0.595626 0.0423 0.670698
ELMo 512 1 24.3524 20.71935 0.541471 0.439504 0.646561
ELMo 512 2 26.29227 17.36125 0.440618 0.463392 0.586672
ELMo 512 av 24.49851 19.42738 0.493186 0.403634 0.650462
ELMo 1024 0 18.40521 31.8018 0.592719 0.035835 0.679799
ELMo 1024 1 34.63281 27.59308 0.478008 0.356505 0.680234
ELMo 1024 2 37.62374 23.28052 0.39284 0.421985 0.601395
ELMo 1024 av 34.65655 26.45345 0.454141 0.347544 0.67123
Flair 2048 -1 49.67542 38.48288 0.559736 0.508835 0.555379
Flair 4096 -1 71.76762 52.52496 0.433499 0.332308 0.498259
Table: Test results with model, embedding dimension size, output layer, average of group
average pair-wise Normalized Euclidean Distance, average of pair-wise center vector Euclidean
Distance, Average of group average pair-wise cosine similarity, average of pair-wise center vector
cosine similarity and Spearman’s coefficient on SCWS dataset
Xinyi Jiang (Emory University) July 13, 2019 20 / 31
Test and evaluations
Figure: ELMo models
Xinyi Jiang (Emory University) July 13, 2019 21 / 31
Test and evaluations
Figure: ELMo models
Xinyi Jiang (Emory University) July 13, 2019 22 / 31
Test and evaluations
Figure: BERT models
Xinyi Jiang (Emory University) July 13, 2019 23 / 31
Test and evaluations
For ELMo model, the closer the output layer is, the better
performance the embedding has in all three tests with a lower
within-group Euclidean Distance, a bigger between-group Euclidean
Distance, a bigger within-group similarity, a smaller between-group
similarity and a higher coefficient.
For the BERT model, it displayed the results of a higher within-group
distance than between-group distance, a lower within-group similarity
than between-group cosine similarity. However, both BERT models
still outperformed the Flair models in the coefficient test and BERT
model with a dimension of 768 ranks 6 in the coefficient test.
Xinyi Jiang (Emory University) July 13, 2019 24 / 31
Test and evaluations
Figure: ”obtain.v.01”
Figure: ”have.v.01”
Xinyi Jiang (Emory University) July 13, 2019 25 / 31
Test and evaluations
Figure: ”obtain.v.01”
Figure: ”have.v.01”
Xinyi Jiang (Emory University) July 13, 2019 26 / 31
Test and evaluations
pair-wise cosine similarity within each sense group
the Spearman’s Rank-Order Correlation Coefficient ρ between the
cosine similarity of sense vectors extracted and path similarity scores
provided by WordNet
Xinyi Jiang (Emory University) July 13, 2019 27 / 31
Test and evaluations
Model Dim Out Dimmasked ρunmasked ρmasked cosunmasked cosmasked
BERT 768 -2 125 0.26814 0.26286 0.4971 0.5187
BERT 1024 -2 146 0.27423 0.26575 0.5811 0.5983
ELMo 256 0 95 0.12016 0.16017 0.5959 0.6134
ELMo 256 1 105 0.30903 0.37377 0.5119 0.5787
ELMo 256 2 218 0.2852 0.3042 0.4507 0.6914
ELMo 256 av 199 0.26553 0.36945 0.4932 0.6729
ELMo 512 0 136 0.17058 0.17336 0.5957 0.6051
ELMo 512 1 181 0.27967 0.25318 0.5414 0.5908
ELMo 512 2 281 0.29577 0.36943 0.4404 0.5346
ELMo 512 av 207 0.2949 0.30047 0.4930 0.5470
ELMo 1024 0 179 0.18504 0.17263 0.5930 0.5945
ELMo 1024 1 198 0.30897 0.30175 0.4783 0.4971
ELMo 1024 2 608 0.28406 0.30675 0.3927 0.4915
ELMo 1024 av 406 0.28331 0.27204 0.4542 0.5086
Flair 2048 -1 670 0.24891 0.28516 0.5560 0.6084
Table: Cosine similarity and correlation test results for unmasked and masked
word embeddings
Xinyi Jiang (Emory University) July 13, 2019 28 / 31
Test and evaluations
Model Dim Out Sense Nmasked Sense Nmasked Sense Nmasked
BERT 768 -2 ask.v.01 75 three.s.01 208 man.n.01 58
BERT 1024 -2 ask.v.01 40 three.s.01 211 man.n.01 116
ELMo 256 0 ask.v.01 78 three.s.01 182 man.n.01 242
ELMo 256 1 ask.v.01 78 three.s.01 99 man.n.01 212
ELMo 256 2 ask.v.01 241 three.s.01 243 man.n.01 212
ELMo 256 av ask.v.01 155 three.s.01 120 man.n.01 227
ELMo 512 0 ask.v.01 103 three.s.01 28 man.n.01 295
ELMo 512 1 ask.v.01 144 three.s.01 105 man.n.01 156
ELMo 512 2 ask.v.01 334 three.s.01 300 man.n.01 311
ELMo 512 av ask.v.01 205 three.s.01 122 man.n.01 317
ELMo 1024 0 ask.v.01 174 three.s.01 44 man.n.01 331
ELMo 1024 1 ask.v.01 60 three.s.01 106 man.n.01 220
ELMo 1024 2 ask.v.01 568 three.s.01 827 man.n.01 708
ELMo 1024 av ask.v.01 181 three.s.01 351 man.n.01 426
Flair 2048 -1 ask.v.01 193 three.s.01 862 man.n.01 1883
Table: table containing common sense groups with the embedding dimension
number masked out
Xinyi Jiang (Emory University) July 13, 2019 29 / 31
Conclusions
There are some dimensions that do not contribute to the
representation of sense groups and this conclusion can be drawn for
all word embeddings tested in this work.
Our algorithm is effective in learning the weights for grouped word
embeddings.
Xinyi Jiang (Emory University) July 13, 2019 30 / 31
Limitations and Future Directions
For the evaluation of the sense vectors, the path similarity provided by
the WordNet [Fellbaum and Miller, 1998] may not be the best to fit
human judgements.
The current tests were limited by the dataset corpus we used.
Future directions: Generalize our algorithm to analyze grouped
embedding
Applications of the sense vectors extracted by our algorithm.
word embeddings as combinations of concept vectors
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
Akbik, A., Blythe, D., and Vollgraf, R. (2018).
Contextual string embeddings for sequence labeling.
In Proceedings of the 27th International Conference on Computational
Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA.
Association for Computational Linguistics.
Arora, S., Li, Y., Liang, Y., Ma, T., and Risteski, A. (2016).
Linear algebraic structure of word senses, with applications to
polysemy.
CoRR, abs/1601.03764.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers for language
understanding.
CoRR, abs/1810.04805.
Fellbaum, C. and Miller, G. (1998).
Building Semantic Concordances.
MITP.
Jang, K.-R. and Myaeng, S.-H. (2017).
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
Elucidating conceptual properties from word embeddings.
In Proceedings of the 1st Workshop on Sense, Concept and Entity
Representations and their Applications, pages 91–95. Association for
Computational Linguistics.
Ko¸c, A., Utlu, I., Senel, L. K., and ¨Ozaktas, H. M. (2018).
Imparting interpretability to word embeddings.
CoRR, abs/1807.07279.
Luo, H., Liu, Z., Luan, H.-B., and Sun, M. (2015).
Online learning of interpretable word embeddings.
In EMNLP.
Murphy, B., Talukdar, P., and Mitchell, T. (2012).
Learning effective and interpretable semantic models using
non-negative sparse embedding.
In Proceedings of COLING 2012, pages 1933–1950. The COLING
2012 Organizing Committee.
Park, S., Bak, J., and Oh, A. (2017).
Rotated word vector representations and their interpretability.
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing, pages 401–411, Copenhagen, Denmark.
Association for Computational Linguistics.
Peters, M. E., Ammar, W., Bhagavatula, C., and Power, R. (2017).
Semi-supervised sequence tagging with bidirectional language models.
CoRR, abs/1705.00108.
Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., and
Hovy, E. H. (2017).
SPINE: sparse interpretable neural embeddings.
CoRR, abs/1711.08792.
Zobnin, A. (2017).
Rotations and interpretability of word embeddings: the case of the
russian language.
CoRR, abs/1707.04662.
Xinyi Jiang (Emory University) July 13, 2019 31 / 31

More Related Content

Similar to Incremental Sense Weight Training for Contextualized Word Embedding Interpretation

Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
Madhav Sigdel
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
CSCJournals
 
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
csijjournal
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
Dai-Hai Nguyen
 
Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...
ijcsa
 
Commonsense Knowledge Base Completion with Structural and Semantic Context slide
Commonsense Knowledge Base Completion with Structural and Semantic Context slideCommonsense Knowledge Base Completion with Structural and Semantic Context slide
Commonsense Knowledge Base Completion with Structural and Semantic Context slide
Korea University
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
tsysglobalsolutions
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Xavier Ochoa
 
Graph fusion of finger multimodal biometrics
Graph fusion of finger multimodal biometricsGraph fusion of finger multimodal biometrics
Graph fusion of finger multimodal biometrics
Anu Antony
 
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
sadique_ghitm
 
icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhangZhiding Yu
 
D017552025
D017552025D017552025
D017552025
IOSR Journals
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
Philip Ramsey
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
Jill Lyons
 
Face recognition across pose with estimation of pose parameters
Face recognition across pose with estimation of pose parametersFace recognition across pose with estimation of pose parameters
Face recognition across pose with estimation of pose parametersIAEME Publication
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
Sangmin Woo
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
The Statistical and Applied Mathematical Sciences Institute
 
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
SYRTO Project
 

Similar to Incremental Sense Weight Training for Contextualized Word Embedding Interpretation (20)

Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
 
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
Ch07
Ch07Ch07
Ch07
 
Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...
 
Commonsense Knowledge Base Completion with Structural and Semantic Context slide
Commonsense Knowledge Base Completion with Structural and Semantic Context slideCommonsense Knowledge Base Completion with Structural and Semantic Context slide
Commonsense Knowledge Base Completion with Structural and Semantic Context slide
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
 
Graph fusion of finger multimodal biometrics
Graph fusion of finger multimodal biometricsGraph fusion of finger multimodal biometrics
Graph fusion of finger multimodal biometrics
 
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
Study and Analysis of Novel Face Recognition Techniques using PCA, LDA and Ge...
 
icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhang
 
D017552025
D017552025D017552025
D017552025
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
Face recognition across pose with estimation of pose parameters
Face recognition across pose with estimation of pose parametersFace recognition across pose with estimation of pose parameters
Face recognition across pose with estimation of pose parameters
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
 
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
Ensemble models: theory and applications, Figini, Vezzoli. September, 3 2013
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
Jinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
Jinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Jinho Choi
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
Jinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
Jinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
Jinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
Jinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
Jinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
Jinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological Sort
Jinho Choi
 
Tries - Put
Tries - PutTries - Put
Tries - Put
Jinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Jinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Jinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
Jinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Incremental Sense Weight Training for Contextualized Word Embedding Interpretation

  • 1. Word Embedding Interpretation through Sense Weights Training Xinyi Jiang Emory University xinyi.jiang2@emory.edu July 13, 2019 Xinyi Jiang (Emory University) July 13, 2019 1 / 31
  • 2. Introduction Word embeddings, representations of words produced based on their distributional semantics and context, has been an important tool utilized in various NLP tasks. However, the dense quality of embeddings has also made it hard to interpret meanings associated with each dimension, making word embeddings black-box-like. What is in word embeddidngs? Xinyi Jiang (Emory University) July 13, 2019 2 / 31
  • 3. Motivations great improvements in recent word embeddings increase in dimension size of word embeddings popularity of context-based embedding models Xinyi Jiang (Emory University) July 13, 2019 3 / 31
  • 4. Objectives evaluate word embeddings’ ability to infer sense information determine the value of embedding dimensions Xinyi Jiang (Emory University) July 13, 2019 4 / 31
  • 5. Background Distributional Word Embeddings Word2Vec, GLoVe, Fasttext Token as input Dictionary based, embedding for every type of token Fast Contextualized Word Embeddings ELMo, Flair, BERT Sentence as input Embedding for every token Usually better performance in extrinsic evaluations Xinyi Jiang (Emory University) July 13, 2019 5 / 31
  • 6. Background Word Sense WordNet Intrinsic Evaluations Word Similarity Test Word Analogy Test : y can be predicted by x + x − y. Example: woman = king + man - queen Subconscious Intrinsic Evaluations Extrinsic Evaluations Downstream NLP tasks: POS tagging Named Entity Recognition Sentiment Analysis Coreference Resolution Xinyi Jiang (Emory University) July 13, 2019 6 / 31
  • 7. Related Works in interpretability of word embeddings Non-Negative Sparse Embedding (NNSE): a variant of sparse matrix factorization [Murphy et al., 2012]. Modify the learning procedure of current embedding models [Luo et al., 2015, Jang and Myaeng, 2017, Ko¸c et al., 2018]. Utilization of pre-trained word embeddings [Arora et al., 2016, Zobnin, 2017, Park et al., 2017, Subramanian et al., 2017] Xinyi Jiang (Emory University) July 13, 2019 7 / 31
  • 8. Approach Figure: ELMo [Peters et al., 2017] Xinyi Jiang (Emory University) July 13, 2019 8 / 31
  • 9. Approach Figure: Flair [Akbik et al., 2018] Figure: BERT [Devlin et al., 2018] Xinyi Jiang (Emory University) July 13, 2019 9 / 31
  • 10. Approach Xinyi Jiang (Emory University) July 13, 2019 10 / 31
  • 11. Approach Contextualized Word Embedding in Conversational Dataset: Dataset: the Friends TV show transcript Method: 1 ELMo model (256) and Flair model (2048) are trained on the dataset 2 pre-trained ELMo model (256) and Flair (2048) models are tuned on the dataset 3 pair-wise similarity ranked 4 Affinity Propagation Cluster Xinyi Jiang (Emory University) July 13, 2019 11 / 31
  • 12. Approach Test Result Trained Tuned Flair preparation presentation 2 1 explanation explosion connection collection intimidating interesting description decision confidence coincidence 1 2 realise realize he she could can are were They You somebody someone ELMo Barber Farber Tribbiani Trrrribbiani Yeller Geller Lou You The She Bing Ring friend girlfriend a an why Why Where where Of of The the What what If if Correlation: 0.000748, 0.001497 Xinyi Jiang (Emory University) July 13, 2019 12 / 31
  • 13. Approach Test Result Flair Trained forgot forgive follow forgotten forget whoever however whenever whatever really finally talking telling thinking calling taking Tuned gets announces brings starts takes saves wins goes pulls choosing tryin hoping deciding trying working starting daughter uncle sister sisters left spent lost kept slept ate hit stole stayed died ended Xinyi Jiang (Emory University) July 13, 2019 13 / 31
  • 14. Approach Models Used & Datasets Pre-trained Word Embeddings: ELMo with dimensions 256, 512 and 1024 with various output layer; Flair with dimensions 2048 and 4096; BERT with dimensions 768 and 1024. Datasets: 1 SemCor: annotated corpus containing 360,000 words with over 200,000 annotated 2 SCWS: 2003 word pairs along with their sentential contexts and human ratings on paired word similarities Xinyi Jiang (Emory University) July 13, 2019 14 / 31
  • 15. Approach Hypothesis Some dimensions in word embeddings do not contribute to the meaning of word sense. Preliminary Tests cosine similarity test and Normalized Euclidean test on sense groups based on the SemCor dataset correlation coefficient based on the SCWS dataset Xinyi Jiang (Emory University) July 13, 2019 15 / 31
  • 16. Approach PCA approach for extracting common vectors within sense groups. Algorithm 1 PCA for extracting common elements in sense groups Input Given Sense Group S, embeddings v(w),{w, w ∈S}, and threshold parameter d. calculate the mean vector. µ ← 1 |S| w∈S v(w), ˜v(w) ← v(w) − µ Step 1: Compute the top d+1 PCA components Step 2: Get the common embedding v (w) ← µ + d i=1(uT i ˜v(w)ui ) Output v (w) Xinyi Jiang (Emory University) July 13, 2019 16 / 31
  • 17. Approach Test Result Test results for PCA method Model Dim Out ρ ρmean BERT 768 -2 0.56530 0.591308 BERT 1024 -2 0.40665 0.475711 ELMo 256 0 0.55436 0.608066 ELMo 256 1 0.64007 0.633749 ELMo 256 2 0.68201 0.652321 ELMo 256 av 0.64918 0.635427 ELMo 512 0 0.59265 0.644203 ELMo 512 1 0.73306 0.597286 ELMo 512 2 0.74948 0.671543 ELMo 512 av 0.72212 0.64714 ELMo 1024 0 0.60177 0.672977 ELMo 1024 1 0.62365 0.686678 ELMo 1024 2 0.69659 0.721001 ELMo 1024 av 0.61454 0.699179 Xinyi Jiang (Emory University) July 13, 2019 17 / 31
  • 18. Approach Algorithm Training algorithm for embedding dimension weight interpretation and sense vector extraction Evaluation Methods Mask out dimensions with low importance and perform intrinsic evaluations Xinyi Jiang (Emory University) July 13, 2019 18 / 31
  • 19. Approach Algorithm 2 algorithm for learning dimension weights for each sense group SG do initialize weights w, learning rate γ0 initialize Spre ← vi ,vj ∈SG Cosine(vi , vj ) for each epoch i do if i < n then randomly generate N N1, · · · , NN else generate N numbers according to explore-exploitation policy N1, · · · , NN end if mask arrays on dimensions N1, · · · , NN Scur ← vi ,vj ∈SG grad = (Spre − Scur ) ∗ (w − 1) − λ ∗ sign(w) w ← w + grad ∗ γi update γi end for end for Xinyi Jiang (Emory University) July 13, 2019 19 / 31
  • 20. Test and evaluations Model dim out Ave Dis C Dis Ave Cos C Cos co-eff BERT 768 -2 34.3856 19.30123 0.497062 0.683877 0.647989 BERT 1024 -2 39.74155 21.47615 0.58236 0.864297 0.594294 ELMo 256 0 10.20746 17.06137 0.596261 0.058296 0.647328 ELMo 256 1 17.01257 15.38437 0.511956 0.283216 0.63676 ELMo 256 2 18.35877 12.79827 0.451013 0.429331 0.560226 ELMo 256 av 17.06634 14.2695 0.493568 0.341095 0.62173 ELMo 512 0 13.58388 23.34382 0.595626 0.0423 0.670698 ELMo 512 1 24.3524 20.71935 0.541471 0.439504 0.646561 ELMo 512 2 26.29227 17.36125 0.440618 0.463392 0.586672 ELMo 512 av 24.49851 19.42738 0.493186 0.403634 0.650462 ELMo 1024 0 18.40521 31.8018 0.592719 0.035835 0.679799 ELMo 1024 1 34.63281 27.59308 0.478008 0.356505 0.680234 ELMo 1024 2 37.62374 23.28052 0.39284 0.421985 0.601395 ELMo 1024 av 34.65655 26.45345 0.454141 0.347544 0.67123 Flair 2048 -1 49.67542 38.48288 0.559736 0.508835 0.555379 Flair 4096 -1 71.76762 52.52496 0.433499 0.332308 0.498259 Table: Test results with model, embedding dimension size, output layer, average of group average pair-wise Normalized Euclidean Distance, average of pair-wise center vector Euclidean Distance, Average of group average pair-wise cosine similarity, average of pair-wise center vector cosine similarity and Spearman’s coefficient on SCWS dataset Xinyi Jiang (Emory University) July 13, 2019 20 / 31
  • 21. Test and evaluations Figure: ELMo models Xinyi Jiang (Emory University) July 13, 2019 21 / 31
  • 22. Test and evaluations Figure: ELMo models Xinyi Jiang (Emory University) July 13, 2019 22 / 31
  • 23. Test and evaluations Figure: BERT models Xinyi Jiang (Emory University) July 13, 2019 23 / 31
  • 24. Test and evaluations For ELMo model, the closer the output layer is, the better performance the embedding has in all three tests with a lower within-group Euclidean Distance, a bigger between-group Euclidean Distance, a bigger within-group similarity, a smaller between-group similarity and a higher coefficient. For the BERT model, it displayed the results of a higher within-group distance than between-group distance, a lower within-group similarity than between-group cosine similarity. However, both BERT models still outperformed the Flair models in the coefficient test and BERT model with a dimension of 768 ranks 6 in the coefficient test. Xinyi Jiang (Emory University) July 13, 2019 24 / 31
  • 25. Test and evaluations Figure: ”obtain.v.01” Figure: ”have.v.01” Xinyi Jiang (Emory University) July 13, 2019 25 / 31
  • 26. Test and evaluations Figure: ”obtain.v.01” Figure: ”have.v.01” Xinyi Jiang (Emory University) July 13, 2019 26 / 31
  • 27. Test and evaluations pair-wise cosine similarity within each sense group the Spearman’s Rank-Order Correlation Coefficient ρ between the cosine similarity of sense vectors extracted and path similarity scores provided by WordNet Xinyi Jiang (Emory University) July 13, 2019 27 / 31
  • 28. Test and evaluations Model Dim Out Dimmasked ρunmasked ρmasked cosunmasked cosmasked BERT 768 -2 125 0.26814 0.26286 0.4971 0.5187 BERT 1024 -2 146 0.27423 0.26575 0.5811 0.5983 ELMo 256 0 95 0.12016 0.16017 0.5959 0.6134 ELMo 256 1 105 0.30903 0.37377 0.5119 0.5787 ELMo 256 2 218 0.2852 0.3042 0.4507 0.6914 ELMo 256 av 199 0.26553 0.36945 0.4932 0.6729 ELMo 512 0 136 0.17058 0.17336 0.5957 0.6051 ELMo 512 1 181 0.27967 0.25318 0.5414 0.5908 ELMo 512 2 281 0.29577 0.36943 0.4404 0.5346 ELMo 512 av 207 0.2949 0.30047 0.4930 0.5470 ELMo 1024 0 179 0.18504 0.17263 0.5930 0.5945 ELMo 1024 1 198 0.30897 0.30175 0.4783 0.4971 ELMo 1024 2 608 0.28406 0.30675 0.3927 0.4915 ELMo 1024 av 406 0.28331 0.27204 0.4542 0.5086 Flair 2048 -1 670 0.24891 0.28516 0.5560 0.6084 Table: Cosine similarity and correlation test results for unmasked and masked word embeddings Xinyi Jiang (Emory University) July 13, 2019 28 / 31
  • 29. Test and evaluations Model Dim Out Sense Nmasked Sense Nmasked Sense Nmasked BERT 768 -2 ask.v.01 75 three.s.01 208 man.n.01 58 BERT 1024 -2 ask.v.01 40 three.s.01 211 man.n.01 116 ELMo 256 0 ask.v.01 78 three.s.01 182 man.n.01 242 ELMo 256 1 ask.v.01 78 three.s.01 99 man.n.01 212 ELMo 256 2 ask.v.01 241 three.s.01 243 man.n.01 212 ELMo 256 av ask.v.01 155 three.s.01 120 man.n.01 227 ELMo 512 0 ask.v.01 103 three.s.01 28 man.n.01 295 ELMo 512 1 ask.v.01 144 three.s.01 105 man.n.01 156 ELMo 512 2 ask.v.01 334 three.s.01 300 man.n.01 311 ELMo 512 av ask.v.01 205 three.s.01 122 man.n.01 317 ELMo 1024 0 ask.v.01 174 three.s.01 44 man.n.01 331 ELMo 1024 1 ask.v.01 60 three.s.01 106 man.n.01 220 ELMo 1024 2 ask.v.01 568 three.s.01 827 man.n.01 708 ELMo 1024 av ask.v.01 181 three.s.01 351 man.n.01 426 Flair 2048 -1 ask.v.01 193 three.s.01 862 man.n.01 1883 Table: table containing common sense groups with the embedding dimension number masked out Xinyi Jiang (Emory University) July 13, 2019 29 / 31
  • 30. Conclusions There are some dimensions that do not contribute to the representation of sense groups and this conclusion can be drawn for all word embeddings tested in this work. Our algorithm is effective in learning the weights for grouped word embeddings. Xinyi Jiang (Emory University) July 13, 2019 30 / 31
  • 31. Limitations and Future Directions For the evaluation of the sense vectors, the path similarity provided by the WordNet [Fellbaum and Miller, 1998] may not be the best to fit human judgements. The current tests were limited by the dataset corpus we used. Future directions: Generalize our algorithm to analyze grouped embedding Applications of the sense vectors extracted by our algorithm. word embeddings as combinations of concept vectors Xinyi Jiang (Emory University) July 13, 2019 31 / 31
  • 32. Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA. Association for Computational Linguistics. Arora, S., Li, Y., Liang, Y., Ma, T., and Risteski, A. (2016). Linear algebraic structure of word senses, with applications to polysemy. CoRR, abs/1601.03764. Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805. Fellbaum, C. and Miller, G. (1998). Building Semantic Concordances. MITP. Jang, K.-R. and Myaeng, S.-H. (2017). Xinyi Jiang (Emory University) July 13, 2019 31 / 31
  • 33. Elucidating conceptual properties from word embeddings. In Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 91–95. Association for Computational Linguistics. Ko¸c, A., Utlu, I., Senel, L. K., and ¨Ozaktas, H. M. (2018). Imparting interpretability to word embeddings. CoRR, abs/1807.07279. Luo, H., Liu, Z., Luan, H.-B., and Sun, M. (2015). Online learning of interpretable word embeddings. In EMNLP. Murphy, B., Talukdar, P., and Mitchell, T. (2012). Learning effective and interpretable semantic models using non-negative sparse embedding. In Proceedings of COLING 2012, pages 1933–1950. The COLING 2012 Organizing Committee. Park, S., Bak, J., and Oh, A. (2017). Rotated word vector representations and their interpretability. Xinyi Jiang (Emory University) July 13, 2019 31 / 31
  • 34. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 401–411, Copenhagen, Denmark. Association for Computational Linguistics. Peters, M. E., Ammar, W., Bhagavatula, C., and Power, R. (2017). Semi-supervised sequence tagging with bidirectional language models. CoRR, abs/1705.00108. Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., and Hovy, E. H. (2017). SPINE: sparse interpretable neural embeddings. CoRR, abs/1711.08792. Zobnin, A. (2017). Rotations and interpretability of word embeddings: the case of the russian language. CoRR, abs/1707.04662. Xinyi Jiang (Emory University) July 13, 2019 31 / 31