Semantic_properties-BlackboxNLP

Firearms and Tigers are
Dangerous, Kitchen Knives and
Zebras are Not:
Testing whether Word Embeddings
Can Tell
Pia Sommerauer & Antske Fokkens

Motivation
Are individual semantic properties are encoded in (patterns of) dimensions?
Man:woman ≈ king:queen (Mikolov et al. 2013)
King - man + woman ≈ queen
[If yes, we assume they can be learned by supervised machine learning]
Heavily criticized
(e.g. Linzen 2016)0
1
1
1
0
0
0
1
0
1
0
1
female
male
royal

Contributions
● Method to test if information is in the vectors
● First steps towards a dataset
● Specific hypotheses about semantic information in word vectors
● Initial tendencies

Method
Binary classification:
Does a word have the target property
given its vector?
Supervised classification
VS
N-nearest neighbors
(of property-vector - centroid over
positive training examples)

Data set
“Ideal” data set:
● Positive examples of a property
● Negative examples of a property
● Words in P and N are similar to each
other
New data set:
● CSLB property norms (Devereux et
al. 2014)
● Logical implications
● Crowd verification

CSLB property norms
● Human-elicited properties of
concrete, mostly monosemous
concepts
● 638 concepts
● Features listed by at least 2
participants
● 30 participants per concept
Et
(Devereux et al. 2014: 1121)
No negative
examples

Extension of the CSLB norms
Step 1:
→ look for logical implications to
find clear negative examples
e.g. is_food excludes has_wheels
Problem:
Overrepresentation of categories

Step 2:
→ Look for potential negative concepts
similar to the positive examples and verify
them with the crowd
Does X apply to Y?
● yes
● no
→ disagreements
Hat is blue?
Beer is yellow?

Crowd task
Does property X apply to concept Y?
❏ Yes
❏ Mostly
❏ Possibly
❏ No
Remaining disagreement:
● Salience, knowledge
● Interpretation of the property
Tomato is purple?
Chocolate is
brown?

How do we find out whether a vector has a property?
Possible outcomes
Represented
by the context
example Supervised classification Nearest neighbors
yes (category) is_a_bird high high
yes is_dangerous high low
no is_yellow low low

Results
is_dangerous yes
does_kill yes
is_used_in_cooking yes
has_wheels possibly
is_found_in_seas possibly
is_black no
is_red no
is_yellow no
made_of_wood no
is_dangerous yes ✔
does_kill yes ✔
is_used_in_cooking yes ✔
has_wheels possibly ✔
is_found_in_seas possibly ✔
is_black no ❌
is_red no ✔
is_yellow no ✔
made_of_wood no ❌

Results
property Correct pos Correct neg
has_wheels Unicycle, limousine, train,
carriage, ambulance,
porsche
Sled, skidoo
is_dangerous Rhinoceros (but not giraffe
or zebra),
meth, cocaine, oxycodone,
Hepatitis C, allergy
imitation pistol,
screwdriver
is_found_in_seas Seabird, gannet (in contrast
to many birds and
freshwater fish)

Discussion
Limitations
● Limited selection of properties
● Small size of datasets
● Possible over-representation of a category
● No parameter tuning
Future work
● Increase datasets
● More datasets for other properties
● Parameter-tuning
● Trace information from context to vector

Conclusion
Contributions
● Method to investigate semantic information
● Dataset
● Specific hypotheses
● Exploratory experiments
Insights
● Some properties are encoded in embeddings
○ Visual properties ❌
○ Function/interaction-related properties ✔

https://github.com/cltl/semantic_space_navigation/tree/master
/projects/semantic_property_space

References
Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. "Don't count, predict! A systematic comparison of context-counting vs. context-predicting
semantic vectors." In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp.
238-247. 2014.
Devereux, Barry J., Lorraine K. Tyler, Jeroen Geertzen, and Billi Randall. "The Centre for Speech, Language and the Brain (CSLB) concept property
norms." Behavior research methods, no. 4 (2014): 1119-1127.
Firth, John R. "A synopsis of linguistic theory, 1930-1955." Studies in linguistic analysis (1957).
Harris, Zellig S. "Distributional structure." Word 10, no. 2-3 (1954): 146-162.
Linzen, Tal. "Issues in evaluating semantic spaces using word analogies." ACL 2016 (2016): 13.
Ludwig, W. and Anscombe, G.E.M., 1953. Philosophical investigations. London, Basic Blackw.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient Estimation of Word Representations in Vector Space." arXiv preprint
arXiv:1301.3781 (2013a).
Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. "Linguistic regularities in continuous space word representations." In Proceedings of the 2013
Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746-751. 2013b.

Image sources
tiger_attack: https://www.flickr.com/photos/claudiogennari/3186012706
zebra: https://www.whats-your-sign.com/zebra-facts-and-symbolic-meaning.html
Hippo: https://pixabay.com/en/hippo-nature-animal-world-safari-3647749/
Elephant: https://www.tz.de/leben/tiere/afrikanischer-elefant-aussterben-bedroht-4845006.html
Pelican: https://pixabay.com/en/photos/pink%20pelican/
Grapefruit: https://balancebydeborahhutton.com.au/pink-grapefruit-and-lychee-salad/
Cheetah: https://sco.m.wikipedia.org/wiki/File:Cheetah_chase.jpg
Gun: https://pixabay.com/en/pistol-weapon-hand-gun-gun-2515496/
Heroin: https://commons.wikimedia.org/wiki/File:Heroin_Narcotic_drug.jpg
Beer-colors: https://www.flickr.com/photos/quinndombrowski/5200218267
Pink lemon: https://www.maxpixel.net/Acid-Fruit-Background-Juicy-Citrus-Lemon-Lime-3303842
Chocolate-mixed: https://commons.wikimedia.org/wiki/File:Chocolate.jpg
Purple tomato: https://www.flickr.com/photos/mjhbixby6/9175400555/
orange _wheels: https://commons.wikimedia.org/wiki/File:Outspan_Orange.jpg
Horizon: https://pixabay.com/en/infinity-blue-sea-horizon-sky-2211659/
Future: http://www.picserver.org/f/future.html
Text_magnifying_glass: https://pixnio.com/objects/books/paper-document-book-text-learning-reading-magnifying-glass
Crown: https://pixabay.com/en/crown-black-silhouette-symbol-312109/
Male: https://en.wikipedia.org/wiki/Male
Female: https://en.wikipedia.org/wiki/Female

Semantic_properties-BlackboxNLP

Recommended

Recommended

More Related Content

Similar to Semantic_properties-BlackboxNLP

Similar to Semantic_properties-BlackboxNLP (20)

Recently uploaded

Recently uploaded (20)

Semantic_properties-BlackboxNLP