2. Semantic space models are computational models of
human semantic representation that typically operate on
distributional data (co-occurrence statistics)
A common criticism: Not grounded in perception
and action
3. Emergence of “perceptually grounded” computational
models integrating experiential and distributional data
• Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and
distributional data to learn semantic representations.
• Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying
features in a lexical co-occurrence model of semantic memory.
• Jones, M. N. & Recchia, G. (2010). You can't wear a coat rack: A binding framework
to avoid illusory feature migrations in perceptually grounded semantic models.
• Steyvers, M. (2010). Combining feature norms and text data with topic models.
• Vigliocco, G., Vinson, D. P., Lewis, W., & Garrett, M. F. (2004). Representing the
meanings of object and action words: The featural and unitary semantic space
hypothesis.
4. Where do features come from?
• For humans: Experience with the real world
• For models: Human-generated property norms
9. Issues with “grounded” distributional models
• Not enough grounded concepts
• Features represented as discrete entities
10. How to get data…
“In this experiment, you will describe various words…”
11. How to get data…
“In this experiment, you will describe various words…”
fun game
12. von Ahn, L. and L. Dabbish. (2004). Labeling images with a computer game.
ACM Conference on Human Factors in Computing Systems, CHI 2004.
Baroni, M. & Lenci, A. (2008). Concepts and properties in word spaces.
16. • 45 subjects generated ten features for each of 16 to 48
words, resulting in at least 30 subjects having generated
features for each of the 48 words
• For comparison to McRae norms, features manually
remapped
(“gives bad breath” beh_-_causes_bad_breath,
“is a fruit” a_fruit, etc.)
• Word by feature matrix constructed: cell at
<w, f> contains the number of participants listing feature
f for word w
• Square word by word matrix constructed: cell at
<w1, w2> contains the cosines between the rows for
word w1 and word w2 in the word-by-word matrix
17. Do participants in the “game” task generate usable data?
• Word-by-feature matrix: Rows had high correlations, on
average, with the corresponding rows in McRae matrix
(M = .83, SD = .08)
• Word-by-word matrix correlations similarly high
(M = .96, SD = .03)
18.
19. Do participants in the “game” task generate usable data?
• Word-by-feature matrix: Rows had high correlations, on
average, with the corresponding rows in McRae matrix
(M = .83, SD = .08)
• Word-by-word matrix correlations similarly high
(with diagonal removed: M = .82, SD = .23)
21. Still not much of a game…
• Participant testimonials
– “It was hard”
– “Took too long”
– “After a while I just wanted it to be done”
• Can something like this be made into
something people would willingly do?
22. Using Verbosity: Common Sense Data
from Games with a Purpose
(Speer, Havasi, & Surana, 2010)
Speer, Havasi, & Surana (2010), Fig. 2
26. leg has lower limb
toy is a kind of little
sail is a boat
servant has paid help
produce is a type of fruits vegetables
attack is a tack
belief is a kind of be leaf
chord is typically in rhymes sword
heat looks like feat meat
machine looks like mush sheen
passion looks like fashion
wander is a type of wonder
27. Desiderata
• Open-ended, as opposed to restricting the
player to predefined frames
• Incentives for player to provide actual
features, as opposed to associates or
sound-alikes
• Minimize the effect that teammates’
guesses have on player’s descriptions
29. Challenges
• Two main types of players…
– Descriptions are single-word associates
(can’t be normed automatically)
– Descriptions are rich and many words long
(can’t be normed automatically)
• Possible approaches: Restrict to two/three word
descriptions? Classify semantic relations via
another game?
• Other data of interest?