DETECTING OXYMORON IN A SINGLE STATEMENT

휴먼인터페이스 연구실
Human Interface Lab.
Detecting Oxymoron in a
Single Statement
Won Ik Cho
Nov. 01, 2017

Contents
• Introduction
 Word vector representation
 Word analogy test
• Proposed methods
 Oxymoron detection
 Overall scheme and flow chart
• Experiment and discussion
• Conclusion
2

Introduction
• Word meaning for computers
 Use a taxonomy like WordNet that has hypernyms (is-a)
relationships and synonym sets
 Problems with discreteness
Missing nuances
Missing new words
Subjective
Requires human labor
Hard to compute
accurate word similarity
4
ex) One-hot representation
hotel = [0 0 0 … 1 0 0 … 0 0 0]
motel = [0 0 0 … 0 1 0 … 0 0 0]
≈ ?
⊥ ?

Introduction
• In statistical NLP…
5
“You shall know a word by the company it keeps” (J. R. Firth 1957:11)
1) Capture co-occurrence counts directly (count-based)
2) Go through each word of the whole corpus and
predict surrounding words of each word (direct prediction)

Introduction
• Count based vs Direct prediction
6

Word vector representation
• Basic idea
 Define a model that assigns prediction between a center
word 𝑤𝑤𝑡𝑡 and 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 : 𝑃𝑃(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐|𝑤𝑤𝑡𝑡)
 Loss function 𝐽𝐽 = 1 − 𝑃𝑃(𝑤𝑤−𝑡𝑡|𝑤𝑤𝑡𝑡)
 Keep adjusting the vector representation of words to
minimize the loss
7
Feedforward neural network based LM
By Y. Bengio and H. Schwenk (2003)

Main idea of word2vec
• Mikolov et al., 2013
• Two algorithms
 Skip-grams (SG)
Predict context words given target (position independent)
 Continuous bag of words (CBOW)
Predict target word from BOW context
• Two (moderately efficient) training methods
 Hierarchical softmax
 Negative sampling
8

Main idea of GloVe
• Pennington et al., 2014
• Count-based :
 Primarily used to capture word similarities
 Do poorly on word analogy tasks
(sub-optimal vector space structure)
• Direct prediction :
 Learn word embeddings by making predictions in local
context windows
 Demonstrate the capacity to capture complex linguistic
patterns
 Fail to make use of the global co-occurrence statistics
9
How about converging advantages of each approach?

Word analogy test
• Performed to test how properly the representation
describes the relation between words
 Pennington et al.(2014)
10

Oxymoron detection
• Detecting contradiction caused by semantic
discrepancy between a pair of words
• Includes word analogy of :
antonym/synonym(with negation) or
words with an entailment error
• Differs from detecting paradox
 “There’s a pattern of unpredictability.” (oxymoron)
 “I am a compulsive liar.” (paradox)
12

Oxymoron detection
• Basic idea
 People recognize oxymoron in a text by existence of
incongruity between words
Antonym (ex) Sugar-free/Sweet
Words with entailment error (ex) Legalized/Robbery
Synonym with negation (ex) Much/not Enough
 Finding these relations (with some structural options) in
a single statement may imply the existence of oxymoron
(especially for short sentences)
 Let’s find the relation by comparing word vector offset!
13

Proposed scheme
• Offset vector set construction
 Offset vector of word 𝑎𝑎, 𝑏𝑏 :
For word embedding function 𝐹𝐹, offset vector 𝑟𝑟𝑟𝑟𝑟𝑟𝑎𝑎,𝑏𝑏 is defined :
𝑟𝑟𝑟𝑟𝑟𝑟𝑎𝑎,𝑏𝑏 = 𝐹𝐹 𝑎𝑎 − 𝐹𝐹 𝑏𝑏
 Offset vector set for antonyms :
For antonym word pairs 𝐴𝐴𝐴𝐴𝐴𝐴, 𝑖𝑖𝑡𝑡𝑡
antonym offset vector 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖
for 𝑖𝑖𝑡𝑡𝑡
antonym pair (𝑎𝑎𝑖𝑖, 𝑏𝑏𝑖𝑖) is defined :
𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖 = 𝐹𝐹 𝑎𝑎𝑖𝑖 − 𝐹𝐹 𝑏𝑏𝑖𝑖
 𝐴𝐴𝐴𝐴𝐴𝐴 includes words with entailment error as well
 This process repeats for synonym pairs 𝑆𝑆𝑆𝑆𝑆𝑆
14

Proposed scheme
• Antonym/synonym checking
 For input word pair (𝑥𝑥, 𝑦𝑦), 𝑎𝑎𝑎𝑎𝑎𝑎(𝑥𝑥, 𝑦𝑦) is defined to check
antonymy/synonymy
 Define 𝑑𝑑𝑎𝑎𝑎𝑎𝑎𝑎,𝑖𝑖 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑟𝑟𝑟𝑟𝑟𝑟𝑥𝑥,𝑦𝑦, 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖) for cosine distance
𝐶𝐶𝐶𝐶𝐶𝐶 = 1 −
𝑢𝑢∙𝑣𝑣
𝑢𝑢 |𝑣𝑣|
 (𝑥𝑥, 𝑦𝑦) is considered as antonym if 𝑑𝑑 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑖𝑖 𝑑𝑑𝑖𝑖 < 𝐷𝐷 for
threshold value 𝐷𝐷
 𝐷𝐷 is varied in an implementation
15

Proposed scheme
• Checking invalid cases
 Assumption :
(1) Only lexical words can have antonym/synonym relationship
(not grammatical)
(2) Contradict occurs if antonym indicate the same
object/situation simultaneously
 For (1), only [verbs, nouns, adjectives, adverbs] are
analyzed, with lemmatization
 For (2), dependency parsing could be applied (not in
current implementation)
16

Proposed scheme
• Negation counting
 Usually negation terms come few words before
(ex) no, not, never, n’t
 Define indicator 𝑛𝑛𝑛𝑛𝑛𝑛 as:
• For every word pair 𝑤𝑤𝑖𝑖, 𝑤𝑤𝑗𝑗 :
If 𝑎𝑎𝑎𝑎𝑎𝑎(𝑤𝑤𝑖𝑖, 𝑤𝑤𝑗𝑗) = 𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖𝑖𝑖 ≥ 0 and both 𝑤𝑤𝑖𝑖, 𝑤𝑤𝑗𝑗 are valid,
𝑤𝑤𝑖𝑖, 𝑤𝑤𝑗𝑗 are decided to be contradictory if
𝑎𝑎𝑎𝑎𝑎𝑎𝑖𝑖𝑖𝑖 + 𝑛𝑛𝑛𝑛𝑛𝑛 𝑤𝑤𝑖𝑖 + 𝑛𝑛𝑛𝑛𝑛𝑛 𝑤𝑤𝑗𝑗 ≡ 1 (𝑚𝑚𝑚𝑚𝑚𝑚 2)
If any word pair is decided to be contradictory, then the
statement contains oxymoron
17

Experiment
• Python coding with NLTK library (for tokenizing,
POS tagging, lemmatization)
• Pre-trained word vector based on GloVe
 Glove.6B.50d
50 dim, trained with Wikipedia 2014 and Gigaword 5
• Dataset : constructed based on manual search
 For antonym/synonym pairs
Michigan Proficiency Exams (http://www.michigan-
proficiencyexams.com/)
 For test sentences
Oxymoron List (http://www.oxymoronlist.com/)
1001 Truisms! (http://1001truisms.webs.com/truisms.htm)
20

Result
• Relatively low result
 Word vector was not trained on purpose of catching
antonym/synonym relations
 Dependency parsing not applied
 Determination of proper 𝐷𝐷 value necessary
Just high 𝐷𝐷 can improperly heighten the recall, thus
F-measure or accuracy should be used as an evaluation
measure
22

Discussion
• Advantages
 Easy to construct dataset (many open sources,
manageable amount of words/phrases)
 Does not need any additional training on sentences
(depends largely on the word vector)
 Checks how the word vector captures semantic relations
• To enhance the accuracy
 Setup suboptimal 𝐷𝐷 value based on optimization such as
bisection methods (Boyd and Vandenberghe, 2004)
 Use dependency parsers (Chen, 2014; Andor, 2016) to
check if the contradictory words really indicate same
object/situation
 Use word embedding regarding antonymy
23

Future work
• Applying dependency parsing
 Calculating the distance from the root with regard to the
lexical words (e.g. Nouns)
 Checking if two words are directly dependent
24

Future work
• Using word embeddings regarding antonymy
 M. Ono, M. Miwa, and Y. Sasaki, “Word Embedding-
based Antonym Detection using Thesauri and
Distributional Information,” In Proceedings of the
Human Language Technologies: The 2015 Annual
Conference of the North American Chapter of the ACL,
2015, pp. 984–989.
 J. Kim, M. De Marneffe, and E. Fosler-Lussier, “Adjusting
Word Embeddings with Semantic Intensity Orders,” In
Proceedings of the 1st Workshop on Representation
Learning for NLP, 2016, pp. 62–69.
25

Conclusion
• Deterministic scheme to check the oxymoron and
evaluate the word vector representation
• Suitable for word vectors that capture
antonym/synonym relations
• Several advantages over other contradict detection
 Produces stable result if a few options fixed
 Does not need training
 Also tells how the other word relations are not close to
the target relations
26

DETECTING OXYMORON IN A SINGLE STATEMENT

More Related Content

Similar to DETECTING OXYMORON IN A SINGLE STATEMENT

More from WarNik Chow

Recently uploaded

DETECTING OXYMORON IN A SINGLE STATEMENT