Because of the ubiquity of metaphors in language, metaphor processing is a very important task in the field of natural language processing. The first step towards metaphor processing, and probably the most difficult one, is metaphor detection. In the first part of this paper, we review the theoretical background for metaphors and the models and implementations that have been proposed for their detection. We then build corpora for detecting three types of metaphors: IS-A metaphors, metaphors formed with the preposition ‘of’ and metaphors formed with a verb. For the first two tasks, we train supervised classifiers using semantic features. For the third task, we use features commonly used in text categorization
2. Contents
• The importance of metaphors
• Theoretical approaches to metaphor
detection
• The state of the art
• Detected metaphor types
• Semantic features
• Metaphor detection methodology
• Advantages and disadvantages
• Possible further research directions
3. The importance of metaphors
• Metaphors play an essential role in the way we
understand the world and form the basis of our
conceptual system.
• They are an omnipresent phenomenon, hence
their importance for natural language processing
(NLP).
• Metaphor detection can be useful for other NLP
tasks, such as machine translation, automatic
summarization, information extraction, etc.
4. Theoretical approaches to metaphor
detection
• Lakoff and Johnson (in ‘Metaphors we live by’ , 1980)
suggested that there is directionality in metaphor, in the
understanding of one concept in the terms of another one:
the less concrete (and vaguer, more abstract) concept is
understood in terms of the more concrete one, which is
better delineated in our experience.
• So far, the most influential account of metaphor
recognition for automatic metaphor recognition in text is
that of Wilks (Making preferences more active’, 1978),
according to which metaphors would represent a violation
of selectional restrictions (the semantic constraints that a
verb places onto its arguments in a given context).
5. The state of the art
• The first and, probably, the most difficult step in
metaphor processing is metaphor detection.
• During the last years, many methods have been
proposed for this task.
• The main disadvantage of early metaphor
detection systems was the fact that they either
used a great quantity of manually-input
information or they could only detect some
restricted metaphor patterns.
• Recently, many unsupervised methods have been
proposed.
6. Detected metaphor types
• IS-A metaphors – made up of two nouns or a
personal pronoun and a noun, linked together
by the verb ‘to be’ (e. g. : ‘That lawyer is a
shark’).
• ‘OF’ metaphors: two nouns linked together by
the ‘of’ preposition (e. g. : ‘child of evil’).
• Verb metaphors (metaphors formed with a
verb, other than ‘to be’).
7. Semantic features
• Similarity measures in WordNet: Leacock-
Chodorow, Resnik, Wu-Palmer, Jiang-Conrath,
Lin, Path Distance Similarity.
• Other similarity measures, using Google’s
search engine: normalized Google distance,
pointwise mutual information.
• Concreteness measures using WordNet.
8. IS-A and OF metaphor detection
methodology
• The dataset is built up using the Master
Metaphor List (Lakoff et al., 1991).
• New metaphorical senses are classified as
metaphorical and conventional metaphorical
senses and literal senses are classified as literal.
• To perform the supervised classification, we use
SVM’s; the final performance is obtained using
10-fold cross-validation (taking the average of
the classifier accuracies) on the dataset.
9. Verb metaphor detection
methodology
• For building up the dataset, we use the TroFi
Example Base (Birke, Sarkar, 2006).
• The labels assigned are those in Trofi.
• For classification, we test SVMs, Maximum
Entropy, Naïve Bayes and Decision Trees
classifiers, using features commonly used in text
categorization (like the presence or absence of a
word, grouping together a set of symbols, etc.).
• Feature selection is performed by using chi-
statistics, in order to reduce possible overfitting.
10. Results
• IS-A metaphors: 76% accuracy, (S.
Krishnakumaran and X. Zhu, 2007: 73.6%,
majority baseline: 59.6%).
• OF metaphors: 75% accuracy, majority
baseline: 52.77%.
• Verb metaphors: 67.3% accuracy, majority
baseline for all the verbs: 50.8%, majority
baseline for each verb: 66.7%.
11. Advantages and disadvantages
• Advantages: relatively fast implementations, easy
to implement using standard tools (Stanford
parser, Python Natural Language Toolkit).
• Disadvantages: only certain metaphor types are
detected (the metaphor ‘a budding artist’, for
example, would not be detected); the datasets
for the first two tasks are quite small (57 and 108
examples), so an evaluation on larger datasets
would be beneficial.
12. Possible further research
directions
• Combining the two types of features
(semantic and text categorization) for each
one of the tasks.
• Using TF/IDF scores instead of binary text
categorization features.
• Adding significant bigrams and trigrams to the
text categorization features.