Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Perceptually Grounded Selectional Preferences
Katia Shutova es407@cam.ac.uk
https://www.cl.cam.ac.uk/~es407/
Niket Tandon ...
Upcoming SlideShare
Loading in …5
×

Perceptually Grounded Selectional Preferences – Using Flickr Image and Video tags for Natural Language Semantics

1,125 views

Published on

Selectional preferences (SPs) are widely used in natural language processing as a rich source of semantic information. While SPs have been traditionally induced from textual data, human lexical acquisition is known to rely on both linguistic and perceptual experience. We present the first SP learning method that simultaneously draws knowledge from text, images and videos, using image and video descriptions to obtain visual features. Our results show that it outperforms linguistic and visual models in isolation, as well as the existing SP induction approaches.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Perceptually Grounded Selectional Preferences – Using Flickr Image and Video tags for Natural Language Semantics

  1. 1. Perceptually Grounded Selectional Preferences Katia Shutova es407@cam.ac.uk https://www.cl.cam.ac.uk/~es407/ Niket Tandon ntandon@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/~ntandon/ Gerard de Melo gdm@demelo.org http://gerard.demelo.org Contact 1. Philip Resnik (1993). Selection and information: A class-based approach to lexical relationships. Technical report, Univ. of Pennsylvania. 2. Frank Keller & Mirella Lapata (2003). Using the Web to obtain frequencies for unseen bigrams. Comp. Ling. 29(3):459–484. 3. Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, Franz Beil (1999). Inducing a semantically annotated lexicon via EM-based clustering. Proc. ACL 1999. 4. Sebastian Pado, Ulrike Pado, Katrin Erk (2007). Flexible, corpus-based modelling of human plausibility judgements. Proc. EMNLP-CoNLL 2007. 5. Diarmuid O ́Seaghdha (2010). Latent variable models of selectional preference. Proc. ACL 2010. 6. Ekaterina Shutova (2010) . Automatic metaphor interpretation as a paraphrasing task. Proc. NAACL 2010. References Selectional Preferences are semantic constraints of a predicate on its arguments The authors wrote a new paper. ✔ high plausibility The paper wrote a new author. ✘ Very low plausibility The cat is eating your sausage. ✔ high plausibility The carrot is eating your keys. ✘ Very low plausibility Knowledge of selectional preferences is useful in many NLP tasks: ● Word Sense Disambiguation ● Parsing (resolving attachments) ● Semantic Role Labelling ● Natural Language Inference ● Detecting multi-word expressions ● Etc. What are Selectional Preferences? Previous work uses purely text-based methods: ● Problem of topic bias / figurative uses of words: E.g. “cut” mainly occurs with “cost” and “price” as arguments in the BNC. ● → Skew towards abstract uses, different from our daily life experience of cutting Our Approach: Use Multimodal Data ● BNC for text (parsed using RASP parser) ● 100 million Flickr images/videos from Yahoo! Webscope Flickr-100M dataset Challenge: From a set of Flickr Tags to noun–verb pairs Collecting Multimodal Correlations Step 1: Acquisition of Argument Classes Observed data is sparse → Need to generalize Spectral Clustering of nouns using Jensen-Shannon divergence as sim. measure Step 2: Quantifying Selectional Preferences Selectional Preference Model Shutova (2010) approach: metaphor interpretation as paraphrasing “a carelessly leaked report” → “a carelessly disclosed report” 1) Take maximum likelihood candidate verbs 2) Filter by semantic similarity to target verb 3) Filter for a strong selectional preference fit (assuming it indicates literalness or conventionality) so as to remove figurative uses Application to Metaphor Interpretation Multimodal selectional preferences outperform ● purely linguistic and visual models, and ● previous state-of-the-art models Conclusions Method Seen Dataset Unseen Dataset Rooth et al. (1999) EM 0.487 0.520 Pado et al. (2007) VSM 0.490 0.430 O'Seaghda (2010) LDA 0.548 0.605 Visual Model 0.126 0.132 Linguistic Model 0.688 0.559 Interpolated Model 0.728 0.430 Direct Evaluation mother sitting baby lap rachel lind wristwatch pajamas Clothes etc. Ekaterina Shutova Niket Tandon Gerard de Melo University of Cambridge Max Planck Institute for Informatics Tsinghua University Shutova (2010) LSP ISP Mean Avg. Prec. (MAP) on Shutova (2010) gold data 0.62 0.62 0.65 Results on Keller & Lapata (2003) Datasets (Spearman Rho) Visual Features: verb lemmas co-occurring with nouns Linguistic Features: grammatical relations Approach 1) Stemming 2) Filtering: Remove rare words and named entities 3) POS tagging: by jointly disambiguating tags to WordNet synsets so as to maximize coherence WordNet priors similarities https://www.flickr.com/photos/seandreilinger/465827703/ canon rebel 400D ball portfolio yellow serve website racket roland garros etc. https://www.flickr.com/photos/pysanchis/2521372121/

×