In this work, we propose a new training procedure for learning the importance of dimensions of word embeddings in representing word meanings. Our algorithm advanced in the interpretation filed of word embeddings, which are extremely critical in the NLP filed due to the lack of understanding of word embeddings despite their superior ability in progressing NLP tasks. Although previous work has investigated in the interpretability of word embeddings through imparting interpretability to the embedding training models or through post-processing procedures of pre-trained embeddings, our algorithm proposes a new perspective to word embedding dimension interpretation where each dimension gets evaluated and can be visualized. Also, our algorithm adheres to a novel assumption that not all dimensions are necessary for representing a word sense (word meaning) and dimensions that are negligible get discarded, which have not been attempted in previous studies.
The circles of relations modifiers modeller from smartphone photo galleryAboul Ella Hassanien
This talk was delivered by Mostafa Zien, a master student at SRGE in the monthly workshop of the SRGE that held on Saturday 19 Sept 2015 at the Conf. center of Cairo University
Learning to Rank Entity Relatedness Through Embedding-Based FeaturesGaetano Rossiello, PhD
This paper describes the effect of introducing embedding-based features in a learning to rank approach to entity relatedness. We define several features that exploit word- and link-embedding approaches by relying on both links and the content that appear in Wikipedia articles. These features are combined with other state-of-the-art relatedness measures by using a learning to rank framework. In the evaluation, we report the performance of each feature individually. Moreover, we investigate the contribution of each feature to the ranking function by analysing the output of a feature selection algorithm. The results of this analysis prove that features based on word and link embeddings are able to increase the performance of the learning to rank algorithm.
information theoretic subspace clusteringali hassan
Information Theoretic Subspace Clustering
Ran He, Liang Wang, Zhenan Sun, Yingya Zhang, and Bo Li
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,
VOL. 27, NO. 12, DECEMBER 2016
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
We introduce an approach to word sense
induction and disambiguation. The method
is unsupervised and knowledge-free: sense
representations are learned from distributional
evidence and subsequently used to
disambiguate word instances in context.
These sense representations are obtained
by clustering dependency-based secondorder
similarity networks. We then add
features for disambiguation from heterogeneous
sources such as window-based and
sentence-wide co-occurrences, and explore
various schemes to combine these context
clues. Our method reaches a performance
comparable to the state-of-the-art unsupervised
word sense disambiguation systems
including top participants of the SemEval
2013 word sense induction task and two
more recent state-of-the-art neural word
sense induction systems
Full paper:
https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/konvens2016panchenko.pdf
Self similarity student for partial label histopathology image segmentationtaeseon ryu
녕하세요 딥논읽 입니다
오늘 소개드릴 논문은 Self similarity Student for Partial Label Histopathology Image Segmentation 라는 논문으로 2020 ECCV에 억셉된 논문입니다.
오늘 논문은 병리학 도메인에 특화된 논문 입니다. 특히 이미지 사이즈가 엄청나게 큰 WSI(Whole Slide Image)에 관한 논문인대요 실제로 WSI 를 보며 암이 어디 부위에 있는지를 검출하는 작업을 진행하게 됩니다.
해당 논문은 암 Detection, Segmentaion 테스크라고 생각하시면 될 것 같은데요 보통 이미지 사이즈가 10만 X 10만부터, 작더라도 5만 X 5만의 이미지 사이즈를 가지고 있습니다.
WSI 는 워낙 이미지가 크다보니 병리학자 분들이 Annotation을 할때도 어떤 Region에서는 Annotation이 잘 되어 있어도, 어느 Region에서는 Annotation이 잘 안되어 있을것이다 라는 가정을 하고 진행합니다.
오늘 논문 소개를 위해 펀디멘탈팀 송헌님이 자세한 배경 지식 설명과 리뷰를 도와주셨습니다.
오늘도 많은 관심 감사드립니다!
The circles of relations modifiers modeller from smartphone photo galleryAboul Ella Hassanien
This talk was delivered by Mostafa Zien, a master student at SRGE in the monthly workshop of the SRGE that held on Saturday 19 Sept 2015 at the Conf. center of Cairo University
Learning to Rank Entity Relatedness Through Embedding-Based FeaturesGaetano Rossiello, PhD
This paper describes the effect of introducing embedding-based features in a learning to rank approach to entity relatedness. We define several features that exploit word- and link-embedding approaches by relying on both links and the content that appear in Wikipedia articles. These features are combined with other state-of-the-art relatedness measures by using a learning to rank framework. In the evaluation, we report the performance of each feature individually. Moreover, we investigate the contribution of each feature to the ranking function by analysing the output of a feature selection algorithm. The results of this analysis prove that features based on word and link embeddings are able to increase the performance of the learning to rank algorithm.
information theoretic subspace clusteringali hassan
Information Theoretic Subspace Clustering
Ran He, Liang Wang, Zhenan Sun, Yingya Zhang, and Bo Li
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,
VOL. 27, NO. 12, DECEMBER 2016
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
We introduce an approach to word sense
induction and disambiguation. The method
is unsupervised and knowledge-free: sense
representations are learned from distributional
evidence and subsequently used to
disambiguate word instances in context.
These sense representations are obtained
by clustering dependency-based secondorder
similarity networks. We then add
features for disambiguation from heterogeneous
sources such as window-based and
sentence-wide co-occurrences, and explore
various schemes to combine these context
clues. Our method reaches a performance
comparable to the state-of-the-art unsupervised
word sense disambiguation systems
including top participants of the SemEval
2013 word sense induction task and two
more recent state-of-the-art neural word
sense induction systems
Full paper:
https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/konvens2016panchenko.pdf
Self similarity student for partial label histopathology image segmentationtaeseon ryu
녕하세요 딥논읽 입니다
오늘 소개드릴 논문은 Self similarity Student for Partial Label Histopathology Image Segmentation 라는 논문으로 2020 ECCV에 억셉된 논문입니다.
오늘 논문은 병리학 도메인에 특화된 논문 입니다. 특히 이미지 사이즈가 엄청나게 큰 WSI(Whole Slide Image)에 관한 논문인대요 실제로 WSI 를 보며 암이 어디 부위에 있는지를 검출하는 작업을 진행하게 됩니다.
해당 논문은 암 Detection, Segmentaion 테스크라고 생각하시면 될 것 같은데요 보통 이미지 사이즈가 10만 X 10만부터, 작더라도 5만 X 5만의 이미지 사이즈를 가지고 있습니다.
WSI 는 워낙 이미지가 크다보니 병리학자 분들이 Annotation을 할때도 어떤 Region에서는 Annotation이 잘 되어 있어도, 어느 Region에서는 Annotation이 잘 안되어 있을것이다 라는 가정을 하고 진행합니다.
오늘 논문 소개를 위해 펀디멘탈팀 송헌님이 자세한 배경 지식 설명과 리뷰를 도와주셨습니다.
오늘도 많은 관심 감사드립니다!
MultiModal Identification System in Monozygotic TwinsCSCJournals
With the increase in the number of twin births in recent decades, there is a need to develop alternate approaches that can secure the biometric system. In this paper an effective fusion scheme is presented that combines information presented by multiple domain experts based on the rank-level fusion integration method. The developed multimodal biometric system possesses a number of unique qualities, starting from utilizing Fisher’s Linear Discriminant methods for face matching, Principal Component Analysis for fingerprint matching and Local binary pattern features for iris matching and fused the information for effective recognition and authentication The importance of considering these boundary conditions, such as twins, where the possibility of errors is maximum will lead us to design a more reliable and robust security system.The proposed approach is tested on a real database consisting of 50 pair of identical twin images and shows promising results compared to other techniques. The Receiver Operating Characteristics also shows that the proposed method is superior compared to other techniques under study.
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...csijjournal
Face detection and recognition is a challenging problem in the field of image processing. In this paper, we reviewed some of the recent research works on face recognition. Issues with the previous face recognition
techniques are , time required is more for face recognition , recognition rate and database required to store the data . To overcome these problems sparse representation based classifier technique can be used.
Robust face recognition by applying partitioning around medoids over eigen fa...ijcsa
An unsupervised learning methodology for robust face recognition is proposed for enhancing invariance to
various changes in the face. The area of face recognition in spite of being the most unobtrusive biometric
modality of all has encountered challenges with high performance in uncontrolled environment owing to
frequently occurring, unavoidable variations in the face. These changes may be due to noise, outliers,
changing expressions, emotions, pose, illumination, facial distractions like makeup, spectacles, hair growth
etc. Methods for dealing with these variations have been developed in the past with different success.
However the cost and time efficiency play a crucial role in implementing any methodology in real world.
This paper presents a method to integrate the technique of Partitioning Around Medoids with Eigen Faces
and Fisher Faces to improve the efficiency of face recognition considerably. The system so designed has
higher resistance towards the impact of various changes in the face and performs well in terms of success
rate, cost involved and time complexity. The methodology can therefore be used in developing highly robust
face recognition systems for real time environment.
Graph fusion of finger multimodal biometricsAnu Antony
Graph fusion technique i.e., weighted graph structure model to characterize the finger biometrics, and present the fusion frameworks for the trimodal images of a finger.
An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo
Chen, Xinlei, Saining Xie, and Kaiming He. "An empirical study of training self-supervised vision transformers." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
In this lecture, I will present a general tour of some of the most commonly used kernel methods in statistical machine learning and data mining. I will touch on elements of artificial neural networks and then highlight their intricate connections to some general purpose kernel methods like Gaussian process learning machines. I will also resurrect the famous universal approximation theorem and will most likely ignite a [controversial] debate around the theme: could it be that [shallow] networks like radial basis function networks or Gaussian processes are all we need for well-behaved functions? Do we really need many hidden layers as the hype around Deep Neural Network architectures seem to suggest or should we heed Ockham’s principle of parsimony, namely “Entities should not be multiplied beyond necessity.” (“Entia non sunt multiplicanda praeter necessitatem.”) I intend to spend the last 15 minutes of this lecture sharing my personal tips and suggestions with our precious postdoctoral fellows on how to make the most of their experience.
MultiModal Identification System in Monozygotic TwinsCSCJournals
With the increase in the number of twin births in recent decades, there is a need to develop alternate approaches that can secure the biometric system. In this paper an effective fusion scheme is presented that combines information presented by multiple domain experts based on the rank-level fusion integration method. The developed multimodal biometric system possesses a number of unique qualities, starting from utilizing Fisher’s Linear Discriminant methods for face matching, Principal Component Analysis for fingerprint matching and Local binary pattern features for iris matching and fused the information for effective recognition and authentication The importance of considering these boundary conditions, such as twins, where the possibility of errors is maximum will lead us to design a more reliable and robust security system.The proposed approach is tested on a real database consisting of 50 pair of identical twin images and shows promising results compared to other techniques. The Receiver Operating Characteristics also shows that the proposed method is superior compared to other techniques under study.
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...csijjournal
Face detection and recognition is a challenging problem in the field of image processing. In this paper, we reviewed some of the recent research works on face recognition. Issues with the previous face recognition
techniques are , time required is more for face recognition , recognition rate and database required to store the data . To overcome these problems sparse representation based classifier technique can be used.
Robust face recognition by applying partitioning around medoids over eigen fa...ijcsa
An unsupervised learning methodology for robust face recognition is proposed for enhancing invariance to
various changes in the face. The area of face recognition in spite of being the most unobtrusive biometric
modality of all has encountered challenges with high performance in uncontrolled environment owing to
frequently occurring, unavoidable variations in the face. These changes may be due to noise, outliers,
changing expressions, emotions, pose, illumination, facial distractions like makeup, spectacles, hair growth
etc. Methods for dealing with these variations have been developed in the past with different success.
However the cost and time efficiency play a crucial role in implementing any methodology in real world.
This paper presents a method to integrate the technique of Partitioning Around Medoids with Eigen Faces
and Fisher Faces to improve the efficiency of face recognition considerably. The system so designed has
higher resistance towards the impact of various changes in the face and performs well in terms of success
rate, cost involved and time complexity. The methodology can therefore be used in developing highly robust
face recognition systems for real time environment.
Graph fusion of finger multimodal biometricsAnu Antony
Graph fusion technique i.e., weighted graph structure model to characterize the finger biometrics, and present the fusion frameworks for the trimodal images of a finger.
An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo
Chen, Xinlei, Saining Xie, and Kaiming He. "An empirical study of training self-supervised vision transformers." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
In this lecture, I will present a general tour of some of the most commonly used kernel methods in statistical machine learning and data mining. I will touch on elements of artificial neural networks and then highlight their intricate connections to some general purpose kernel methods like Gaussian process learning machines. I will also resurrect the famous universal approximation theorem and will most likely ignite a [controversial] debate around the theme: could it be that [shallow] networks like radial basis function networks or Gaussian processes are all we need for well-behaved functions? Do we really need many hidden layers as the hype around Deep Neural Network architectures seem to suggest or should we heed Ockham’s principle of parsimony, namely “Entities should not be multiplied beyond necessity.” (“Entia non sunt multiplicanda praeter necessitatem.”) I intend to spend the last 15 minutes of this lecture sharing my personal tips and suggestions with our precious postdoctoral fellows on how to make the most of their experience.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Incremental Sense Weight Training for Contextualized Word Embedding Interpretation
1. Word Embedding Interpretation through Sense Weights
Training
Xinyi Jiang
Emory University
xinyi.jiang2@emory.edu
July 13, 2019
Xinyi Jiang (Emory University) July 13, 2019 1 / 31
2. Introduction
Word embeddings, representations of words produced based on their
distributional semantics and context, has been an important tool utilized
in various NLP tasks.
However, the dense quality of embeddings has also made it hard to
interpret meanings associated with each dimension, making word
embeddings black-box-like.
What is in word embeddidngs?
Xinyi Jiang (Emory University) July 13, 2019 2 / 31
3. Motivations
great improvements in recent word embeddings
increase in dimension size of word embeddings
popularity of context-based embedding models
Xinyi Jiang (Emory University) July 13, 2019 3 / 31
4. Objectives
evaluate word embeddings’ ability to infer sense information
determine the value of embedding dimensions
Xinyi Jiang (Emory University) July 13, 2019 4 / 31
5. Background
Distributional Word Embeddings
Word2Vec, GLoVe, Fasttext
Token as input
Dictionary based, embedding for every type of token
Fast
Contextualized Word Embeddings
ELMo, Flair, BERT
Sentence as input
Embedding for every token
Usually better performance in extrinsic evaluations
Xinyi Jiang (Emory University) July 13, 2019 5 / 31
6. Background
Word Sense
WordNet
Intrinsic Evaluations
Word Similarity Test
Word Analogy Test : y can be predicted by x + x − y. Example:
woman = king + man - queen
Subconscious Intrinsic Evaluations
Extrinsic Evaluations
Downstream NLP tasks:
POS tagging
Named Entity Recognition
Sentiment Analysis
Coreference Resolution
Xinyi Jiang (Emory University) July 13, 2019 6 / 31
7. Related Works in interpretability of word embeddings
Non-Negative Sparse Embedding (NNSE): a variant of sparse matrix
factorization [Murphy et al., 2012].
Modify the learning procedure of current embedding
models [Luo et al., 2015, Jang and Myaeng, 2017, Ko¸c et al., 2018].
Utilization of pre-trained word embeddings [Arora et al., 2016,
Zobnin, 2017, Park et al., 2017, Subramanian et al., 2017]
Xinyi Jiang (Emory University) July 13, 2019 7 / 31
11. Approach
Contextualized Word Embedding in Conversational Dataset:
Dataset: the Friends TV show transcript
Method:
1 ELMo model (256) and Flair model (2048) are trained on the dataset
2 pre-trained ELMo model (256) and Flair (2048) models are tuned on
the dataset
3 pair-wise similarity ranked
4 Affinity Propagation Cluster
Xinyi Jiang (Emory University) July 13, 2019 11 / 31
12. Approach Test Result
Trained Tuned
Flair
preparation presentation
2 1
explanation explosion
connection collection
intimidating interesting
description decision
confidence coincidence
1 2
realise realize
he she
could can
are were
They You
somebody someone
ELMo
Barber Farber
Tribbiani Trrrribbiani
Yeller Geller
Lou You
The She
Bing Ring
friend girlfriend
a an
why Why
Where where
Of of
The the
What what
If if
Correlation: 0.000748, 0.001497
Xinyi Jiang (Emory University) July 13, 2019 12 / 31
13. Approach Test Result
Flair
Trained
forgot forgive follow forgotten forget
whoever however whenever whatever
really finally
talking telling thinking calling taking
Tuned
gets announces brings starts takes saves wins goes pulls
choosing tryin hoping deciding trying working starting
daughter uncle sister sisters
left spent lost kept slept ate hit stole stayed died ended
Xinyi Jiang (Emory University) July 13, 2019 13 / 31
14. Approach
Models Used & Datasets
Pre-trained Word Embeddings: ELMo with dimensions 256, 512 and
1024 with various output layer; Flair with dimensions 2048 and 4096;
BERT with dimensions 768 and 1024.
Datasets:
1 SemCor: annotated corpus containing 360,000 words with over 200,000
annotated
2 SCWS: 2003 word pairs along with their sentential contexts and human
ratings on paired word similarities
Xinyi Jiang (Emory University) July 13, 2019 14 / 31
15. Approach
Hypothesis
Some dimensions in word embeddings do not contribute to the meaning of
word sense.
Preliminary Tests
cosine similarity test and Normalized Euclidean test on sense groups
based on the SemCor dataset
correlation coefficient based on the SCWS dataset
Xinyi Jiang (Emory University) July 13, 2019 15 / 31
16. Approach
PCA approach for extracting common vectors within sense groups.
Algorithm 1 PCA for extracting common elements in sense groups
Input Given Sense Group S, embeddings v(w),{w, w ∈S}, and threshold
parameter d.
calculate the mean vector.
µ ← 1
|S| w∈S v(w), ˜v(w) ← v(w) − µ
Step 1: Compute the top d+1 PCA components
Step 2: Get the common embedding
v (w) ← µ + d
i=1(uT
i ˜v(w)ui )
Output v (w)
Xinyi Jiang (Emory University) July 13, 2019 16 / 31
17. Approach Test Result
Test results for PCA method
Model Dim Out ρ ρmean
BERT 768 -2 0.56530 0.591308
BERT 1024 -2 0.40665 0.475711
ELMo 256 0 0.55436 0.608066
ELMo 256 1 0.64007 0.633749
ELMo 256 2 0.68201 0.652321
ELMo 256 av 0.64918 0.635427
ELMo 512 0 0.59265 0.644203
ELMo 512 1 0.73306 0.597286
ELMo 512 2 0.74948 0.671543
ELMo 512 av 0.72212 0.64714
ELMo 1024 0 0.60177 0.672977
ELMo 1024 1 0.62365 0.686678
ELMo 1024 2 0.69659 0.721001
ELMo 1024 av 0.61454 0.699179
Xinyi Jiang (Emory University) July 13, 2019 17 / 31
18. Approach
Algorithm
Training algorithm for embedding dimension weight interpretation and
sense vector extraction
Evaluation Methods
Mask out dimensions with low importance and perform intrinsic
evaluations
Xinyi Jiang (Emory University) July 13, 2019 18 / 31
19. Approach
Algorithm 2 algorithm for learning dimension weights
for each sense group SG do
initialize weights w, learning rate γ0
initialize Spre ← vi ,vj ∈SG Cosine(vi , vj )
for each epoch i do
if i < n then
randomly generate N
N1, · · · , NN
else
generate N numbers according to explore-exploitation policy
N1, · · · , NN
end if
mask arrays on dimensions N1, · · · , NN
Scur ← vi ,vj ∈SG
grad = (Spre − Scur ) ∗ (w − 1) − λ ∗ sign(w)
w ← w + grad ∗ γi
update γi
end for
end for
Xinyi Jiang (Emory University) July 13, 2019 19 / 31
20. Test and evaluations
Model dim out Ave Dis C Dis Ave Cos C Cos co-eff
BERT 768 -2 34.3856 19.30123 0.497062 0.683877 0.647989
BERT 1024 -2 39.74155 21.47615 0.58236 0.864297 0.594294
ELMo 256 0 10.20746 17.06137 0.596261 0.058296 0.647328
ELMo 256 1 17.01257 15.38437 0.511956 0.283216 0.63676
ELMo 256 2 18.35877 12.79827 0.451013 0.429331 0.560226
ELMo 256 av 17.06634 14.2695 0.493568 0.341095 0.62173
ELMo 512 0 13.58388 23.34382 0.595626 0.0423 0.670698
ELMo 512 1 24.3524 20.71935 0.541471 0.439504 0.646561
ELMo 512 2 26.29227 17.36125 0.440618 0.463392 0.586672
ELMo 512 av 24.49851 19.42738 0.493186 0.403634 0.650462
ELMo 1024 0 18.40521 31.8018 0.592719 0.035835 0.679799
ELMo 1024 1 34.63281 27.59308 0.478008 0.356505 0.680234
ELMo 1024 2 37.62374 23.28052 0.39284 0.421985 0.601395
ELMo 1024 av 34.65655 26.45345 0.454141 0.347544 0.67123
Flair 2048 -1 49.67542 38.48288 0.559736 0.508835 0.555379
Flair 4096 -1 71.76762 52.52496 0.433499 0.332308 0.498259
Table: Test results with model, embedding dimension size, output layer, average of group
average pair-wise Normalized Euclidean Distance, average of pair-wise center vector Euclidean
Distance, Average of group average pair-wise cosine similarity, average of pair-wise center vector
cosine similarity and Spearman’s coefficient on SCWS dataset
Xinyi Jiang (Emory University) July 13, 2019 20 / 31
24. Test and evaluations
For ELMo model, the closer the output layer is, the better
performance the embedding has in all three tests with a lower
within-group Euclidean Distance, a bigger between-group Euclidean
Distance, a bigger within-group similarity, a smaller between-group
similarity and a higher coefficient.
For the BERT model, it displayed the results of a higher within-group
distance than between-group distance, a lower within-group similarity
than between-group cosine similarity. However, both BERT models
still outperformed the Flair models in the coefficient test and BERT
model with a dimension of 768 ranks 6 in the coefficient test.
Xinyi Jiang (Emory University) July 13, 2019 24 / 31
25. Test and evaluations
Figure: ”obtain.v.01”
Figure: ”have.v.01”
Xinyi Jiang (Emory University) July 13, 2019 25 / 31
26. Test and evaluations
Figure: ”obtain.v.01”
Figure: ”have.v.01”
Xinyi Jiang (Emory University) July 13, 2019 26 / 31
27. Test and evaluations
pair-wise cosine similarity within each sense group
the Spearman’s Rank-Order Correlation Coefficient ρ between the
cosine similarity of sense vectors extracted and path similarity scores
provided by WordNet
Xinyi Jiang (Emory University) July 13, 2019 27 / 31
29. Test and evaluations
Model Dim Out Sense Nmasked Sense Nmasked Sense Nmasked
BERT 768 -2 ask.v.01 75 three.s.01 208 man.n.01 58
BERT 1024 -2 ask.v.01 40 three.s.01 211 man.n.01 116
ELMo 256 0 ask.v.01 78 three.s.01 182 man.n.01 242
ELMo 256 1 ask.v.01 78 three.s.01 99 man.n.01 212
ELMo 256 2 ask.v.01 241 three.s.01 243 man.n.01 212
ELMo 256 av ask.v.01 155 three.s.01 120 man.n.01 227
ELMo 512 0 ask.v.01 103 three.s.01 28 man.n.01 295
ELMo 512 1 ask.v.01 144 three.s.01 105 man.n.01 156
ELMo 512 2 ask.v.01 334 three.s.01 300 man.n.01 311
ELMo 512 av ask.v.01 205 three.s.01 122 man.n.01 317
ELMo 1024 0 ask.v.01 174 three.s.01 44 man.n.01 331
ELMo 1024 1 ask.v.01 60 three.s.01 106 man.n.01 220
ELMo 1024 2 ask.v.01 568 three.s.01 827 man.n.01 708
ELMo 1024 av ask.v.01 181 three.s.01 351 man.n.01 426
Flair 2048 -1 ask.v.01 193 three.s.01 862 man.n.01 1883
Table: table containing common sense groups with the embedding dimension
number masked out
Xinyi Jiang (Emory University) July 13, 2019 29 / 31
30. Conclusions
There are some dimensions that do not contribute to the
representation of sense groups and this conclusion can be drawn for
all word embeddings tested in this work.
Our algorithm is effective in learning the weights for grouped word
embeddings.
Xinyi Jiang (Emory University) July 13, 2019 30 / 31
31. Limitations and Future Directions
For the evaluation of the sense vectors, the path similarity provided by
the WordNet [Fellbaum and Miller, 1998] may not be the best to fit
human judgements.
The current tests were limited by the dataset corpus we used.
Future directions: Generalize our algorithm to analyze grouped
embedding
Applications of the sense vectors extracted by our algorithm.
word embeddings as combinations of concept vectors
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
32. Akbik, A., Blythe, D., and Vollgraf, R. (2018).
Contextual string embeddings for sequence labeling.
In Proceedings of the 27th International Conference on Computational
Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA.
Association for Computational Linguistics.
Arora, S., Li, Y., Liang, Y., Ma, T., and Risteski, A. (2016).
Linear algebraic structure of word senses, with applications to
polysemy.
CoRR, abs/1601.03764.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers for language
understanding.
CoRR, abs/1810.04805.
Fellbaum, C. and Miller, G. (1998).
Building Semantic Concordances.
MITP.
Jang, K.-R. and Myaeng, S.-H. (2017).
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
33. Elucidating conceptual properties from word embeddings.
In Proceedings of the 1st Workshop on Sense, Concept and Entity
Representations and their Applications, pages 91–95. Association for
Computational Linguistics.
Ko¸c, A., Utlu, I., Senel, L. K., and ¨Ozaktas, H. M. (2018).
Imparting interpretability to word embeddings.
CoRR, abs/1807.07279.
Luo, H., Liu, Z., Luan, H.-B., and Sun, M. (2015).
Online learning of interpretable word embeddings.
In EMNLP.
Murphy, B., Talukdar, P., and Mitchell, T. (2012).
Learning effective and interpretable semantic models using
non-negative sparse embedding.
In Proceedings of COLING 2012, pages 1933–1950. The COLING
2012 Organizing Committee.
Park, S., Bak, J., and Oh, A. (2017).
Rotated word vector representations and their interpretability.
Xinyi Jiang (Emory University) July 13, 2019 31 / 31
34. In Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing, pages 401–411, Copenhagen, Denmark.
Association for Computational Linguistics.
Peters, M. E., Ammar, W., Bhagavatula, C., and Power, R. (2017).
Semi-supervised sequence tagging with bidirectional language models.
CoRR, abs/1705.00108.
Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., and
Hovy, E. H. (2017).
SPINE: sparse interpretable neural embeddings.
CoRR, abs/1711.08792.
Zobnin, A. (2017).
Rotations and interpretability of word embeddings: the case of the
russian language.
CoRR, abs/1707.04662.
Xinyi Jiang (Emory University) July 13, 2019 31 / 31