Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Deep Learning for
Semantic Relation
Classiﬁcation
Sneha Rajana
Amazon

Natural Language Understanding

Natural Language Understanding
Human language is a rich, varied, and
growing source of information

Challenges of NLU
• Multiple words with same meanings (Synonyms)
• Words with multiple meanings (polysemy) some of
which are entirely opposite in nature (auto-
antonyms)
• Words which behave differently when used as noun
and verb

Challenges of NLU
antonyms)
and verb
Hot water/Cool water
Hot topic/Cool topic

Challenges of NLU
antonyms)
and verb
Words make sense contextually in natural
language which humans can comprehend
and distinguish easily, but machines can’t

Recognizing various semantic
relations between entity pairs in
sentences is an important task in
Natural Language Processing (NLP).

Lexical Semantic Relations
hot/cold
good/bad
friend/enemy
enemy/fan
cold/lukewarm
ascend/slip
up/down
spirited/spiritless
cool/not cool
honest/dishonest
penguin/clown
cold/chilly
boat/rudder
agree/disagree
absence/presence
acceptable/intolerable
hello/ﬁne
voice/silent
love/hate
like/love
mouse/animal
nose/human
happy/joy

Lexical Semantic Relations
hot/cold
good/bad
friend/enemy
enemy/fan
cold/lukewarm
ascend/slip
up/down
spirited/spiritless
cool/not cool
honest/dishonest
penguin/clown
cold/chilly
boat/rudder
agree/disagree
absence/presence
hello/ﬁne
voice/silent
love/hate
like/love
mouse/animal
nose/human
happy/joy
hypernym
meronym
synonym
Antonyms: Semantically related but not
semantically similar!

What are Antonyms?
hot/cold
good/bad
friend/enemy
enemy/fan
cold/lukewarm
ascend/slip
up/down
spirited/spiritless
cool/not cool
honest/dishonest
penguin/clown
cold/chilly
boat/rudder
agree/disagree
absence/presence
hello/ﬁne
voice/silent
love/hate
like/love
mouse/animal
nose/human
happy/joy

What are Antonyms?
hot/cold
good/bad
friend/enemy
enemy/fan
cold/lukewarm
ascend/slip
up/down
spirited/spiritless
cool/not cool
honest/dishonest
penguin/clown
cold/chilly
boat/rudder
agree/disagree
absence/presence
hello/ﬁne
voice/silent
love/hate
like/love
mouse/animal
nose/human
happy/joy
strongly antonymous
not antonymous
semantically contrasting

Goal: Antonym Detection
Given two terms x and y, decide whether x
and y are antonyms of each other
Main Contributions:
• Learning antonyms with paraphrases
• Learning antonyms with a morphology-aware neural network
University of Pennsylvania

Deriving Antonyms from
Paraphrases
not allowed in here ~ not permitted
did not plan ~ had no intention
never mind about that ~ it matters not
Phrases expressing the same meaning usually
occurring in similar textual contexts or have
common translations in other languages

PPDB: The Paraphrase Database
An automatically extracted database containing
millions of paraphrases
• 22 different languages
• ~100M word and phrase pairs
• Big and noisy
• Currently the largest available collection of paraphrases

PPDB: The Paraphrase Database
An automatically extracted database containing
millions of paraphrases

Step 1: WordNet Seed Set
Direct
antonyms
E.g. clean/dirty
Indirect
antonyms
E.g. clean/foul
E.g. clean/grime
WORDNET
A large lexical English
database
Nouns, verbs, adjectives,
adverbs are grouped
into sets of cognitive
synonyms or sunsets
Synsets

Step 2: Antonyms from
Paraphrases
Negating word
(Not happy, unhappy)
-> (happy, unhappy)
Negating prefix
(unjustifiable,
unreasonable)
-> (justifiable,
unreasonable)
Used PPDB to retrieve
paraphrase mappings of
2 types
Negating word
(Not X, Y)
-> (X, Y)
Negating prefix
(Neg-Prefix(X), Y)
-> (X, Y)

Step 3: Indirect Antonyms
via Expansion
-> (happy, unhappy)
-> (happy, synsets(unhappy))
-> (synsets(happy), unhappy)
-> (justifiable, unreasonable)
-> (justifiable,
paraphrases(unreasonable))
-> (paraphrases(justifiable),
unreasonable)
(~X, Y)
-> (X, Y)
-> (X, synonyms(Y))
-> (synonyms(X), Y)
Synsets from
WordNet and
paraphrases from
PPDB

Antonym Generation
Direct Antonyms Indirect Antonyms
clean/dirty clean/foul
rise/fall rise/downfall
sleep/wake sleep/rise
above/below above/under
Paraphrase Pair Antonyms Pair
deactivated/turned off activated/turned off
unjustifiable/unreasonable justifiable/unreasonable
deforestation/destruction forestation/destruction
anti-hatred/non-hatred hatred/non-hatred
Paraphrase Pair Antonym Pair
not true/untrue true/untrue
not identical/different identical/different
not acceptable/objectionable acceptable/objectionable
not sufficient/insufficient sufficient/insufficient
WordNet expansion
Removal of negating word Removal of negating prefix

Antonyms derived from PPDB
0
22500
45000
67500
90000
Wordnet (direct) WordNet (indirect) (X,Y) from (~X,Y) Synset Expansion Paraphrase Expansion
81,221
35,686
80,669
14,9693,337
Number of unique antonyms generated

Classiﬁcation of non-antonyms
Unrelated
long/rare
much/worthless
disability/present/
equality/gap
Paraphrases
simply/merely
correct/that’s right
till/until
right/alright
Other
twinkle/dark
access/available
valuable/premium
naw/not gonna
Entailment
valid/equally
valid
signiﬁcant/
statistically
Categories
Africa/Asia
Jan/Feb
Black/Red
Blonde/
Brunette

Learning Antonyms with
Paraphrases and a Morphology-
aware Neural Network
*Sem 2017, Vancouver, Canada
Sneha Rajana*, Chris Callison-Burch*, Marianna
Appidianaki* 𝛹, Vered Shwartzϕ
*Computer and Information Science Department, University of
Pennsylvania, USA
𝛹LIMSI, CNRS, University Paris-Saclay, 91403 Orsay
ϕComputer Science Department, Bar-Ilan University, Israel

Background
• Prior work: Path-based, Distributional
• Integrated neural path-based (improved path-
based) and distributional method for detecting
Hypernymy - HypeNET [Vered et al., 2015]
• Integrated neural path-based (improved path-
based) and distributional method for detecting
multiple semantic relations - LexNET [Vered et al.,
2016]

Distributional Approach
Recognize the relation between x and y based on
their separate occurrences in the corpus
Distributional Hypothesis
Words that occur in similar contexts have similar meanings
Using x and y's word embeddings [Mikolov et al., 2013,
Pennington et al. 2014] as distributional vector representations

Supervised Distributional
Methods
• Represent (x, y) as a feature vector, based on the
term’s embeddings
• Train a classiﬁer to predict whether y is a <relation>
of x
Concatenation[Baroni et al. 2012]
x + y
They don’t learn the relation between x and y, but mostly that is a
prototypical relation!
E.g. (x, fruit), (x, animal) are always hypernyms

Path-based Approach
Recognize the relation between x and y based on
their joint occurrences in the corpus
Hearst Patterns [Hearst, 1992]
Patterns connecting x and y may indicate
that x is a <relation> of y
X is a Y (Hypernym)
Neither X nor Y
(Antonym)
Patterns can be represented using
dependency paths

Supervised Path-based Method
• Features: all dependency paths that connect x and
y in a corpus
• Supervised: Labelled training data (word pairs)
• Trained a logistic regression classiﬁer to predict a
relation
Feature space is too sparse!
Similar paths share no information
X inc. is a Y, X group is a Y, X organization is a Y

Neural path-based method
HypeNET
• Split each path between X and Y into edges
• Each edge consists of 4 components: lemma/POS/
dependency label/direction
• Learn embedding vectors for each component
LSTM LSTM LSTM LSTM

• Feed the edges sequentially to an LSTM
• Use the last output vector as the path embedding
• The LSTM may focus on edges that are more informative or
the classiﬁcation task, while ignoring others

• The LSTM encodes a single path
• Each pair of terms occurs in multiple paths
• Represent a term-pair as its averaged path embedding
• Classify for hypernym (or other lexical relationship)
LSTM LSTM LSTM LSTM

LexNET: Multiple Semantic
Relations
• LexNET: An extension of HypeNET to classify
multiple semantic relations (E.g. meronymy,
synonymy, antonymy etc.)
•

Term-pair Classiﬁcation
• Screenshot

AntNET
• Variant of LexNET
• Morphology aware path features
• Handles multi-word expressions

Improvement
X/NOUN/pobj/^/1 alongside/ADP/prep/V/0 non-negated(Y)/NOUN/conj/</2
LexNET
AntNET
non-negated(Y)/NOUN/conj/</2
lemma/POS/dep/direction
non-negated(lemma)/POS/dep/direction/neg

Replacement of word
embeddings
• Rare Paths: neither happy nor sad vs.
neither happy nor unhappy
• Seemingly negated words: valuable -
invaluable
• Multi-Word Expressions: not happy

AntNET: Network Architecture
• Screenshot
Term-Pair Classiﬁcation (Binary or Multiclass)

Integrated Model
• Add distributional information with path information
• Concatenate x and y’s word embeddings to the averaged path
• Classify for antonymy (integrated network)
• dd
• dd
• dd
•

Corpus and Dataset
Knowledge
resources
WikiPedia dump
English
May 2015
GloVe: Global
Vectors for Word
Representation
Unsupervised learning
algorithm for obtaining
vector representation of
words
Computed paths between
the most frequent
unigrams, bigrams, and
trigrams in Wikipedia
based on GloVe
vocabulary and the most
frequent 100K bigrams
and trigrams.
GloVe Embeddings
Used pre-trained word
embeddings of 50, 100,
and 200 dimensions
Vocabulary
PPDB words that were
contained in the most
common 400k words and
the most common 100k
bigrams and trigrams in
Wikipedia
Dataset
Generated from PPDB
Size so far: ~4000 pairs
Train/Test/Validation:
70/25/5

AntNET: Results
Metric Model Binary Multiclass
Precision
Path-based
Combined
0.732
0.803
0.652
0.746
Recall
Path-based
Combined
0.724
0.788
0.687
0.757
F1
Path-based
Combined
0.713
0.802**
0.661
0.746**
paired t-test, *p<0.1, **p<0.05

Effect of the negation-marking
feature
0.72
0.743
0.765
0.788
0.81
LexNET AntNET-neg AntNET AntNET-distance
0.734
0.746
0.7400.738
0.788
0.802
0.793
0.788
Binary Multiclass
Performance(F1Score)
lemma/POS/dep/
direction
lemma/POS/dep/
direction/neg
non-negated(lemma)/
POS/dep/direction/neg
non-negated(lemma)/
POS/dep/distance/neg

AntNET: Evaluation
0
0.225
0.45
0.675
0.9
Majority Class Word Embedding + SVM LexNET AntNET
0.750.74
0.34
0.30
0.800.79
0.44
0.39
Binary Multiclass
Performance(F1Score)

AntNET: Evaluation
Normalized Confusion Matrix

AntNET: Evaluation
absence-presence
absolute-relative
unfashionable-
fashionable
duck-stand up
imperviousness-
perviousness
ascertain-unclear
spiritless-spirited
ripe-rotten
turn-straight
cisc-risc
sawtoothed-toothless
interchange-unaltered
polite-sassy
black-white
large-minimum
indeterminate-influence
pear shaped - square
appropriately-ghastly
salutary-scary
irrelevant-discipline
T
T
F
F
gold
predicted

Artifacts
Code and Data
https://github.com/srajana/AntNET
Publication
http://www.aclweb.org/anthology/S/S17/S17-1002.pdf

Improvements
• In recent years, SOTA performance has been achieved using neural
models by incorporating lexical and syntactic features such as POS tags
and dependency trees.
• Although syntactic features are no doubt helpful, a known challenge is
that parsers are not available for every language, and even when
available, they may not be sufﬁciently robust, especially for out-of-domain
text, which may even hurt performance
• Recently, the NLP community has seen excitement around neural models
that make heavy use of pre-training based on language modeling
• Without using any external features, a simple BERT-based model can
achieve SOTA performance for Relation Extraction and Semantic Role
Labeling [Shi et. al. 2019, You et. al. 2019].

BERT-based models for
Multi-way classiﬁcation of semantic
relations (SemEval)
The task is, given a sentence and two tagged nominals,
to predict the relation between those nominals and the
direction of the relation.
Model F1 Score
Matching-the-Blanks (Baldini Soares
et al., 2019) 89.5
R-BERT (Wu et al. 2019)
89.25
Multi-Attention CNN (Wang et al. 2016) 88.0
Entity Attention Bi-LSTM (Lee et al.,
2019) - RNN-based Model
85.2

Thank You!
• Questions? Email srajana@amazon.com
• Twitter: @sneha_rajana
• Medium: @sneharajana
• LinkedIn: www.linkedin.com/in/sneha-rajana

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Similar to Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks