SlideShare a Scribd company logo
1 of 61
P r e d i c t i n g t h e Semantic Orientation of A d j e c t i v e
s
V a s i l e i o s H a t z i v a s s i l o g l o u a n d K a t h l e e n
R . M c K e o w n
D e p a r t m e n t o f C o m p u t e r S c i e n c e
450 C o m p u t e r S c i e n c e B u i l d i n g
C o l u m b i a U n i v e r s i t y
N e w Y o r k , N . Y . 1 0 0 2 7 , U S A
{ v h , kathy)©cs, columbia, edu
Abstract
We identify and validate from a large cor-
pus constraints from conjunctions on the
positive or negative semantic orientation
of the conjoined adjectives. A log-linear
regression model uses these constraints to
predict whether conjoined adjectives are
of same or different orientations, achiev-
ing 82% accuracy in this task when each
conjunction is considered independently.
Combining the constraints across m a n y ad-
jectives, a clustering algorithm separates
the adjectives into groups of different orien-
tations, and finally, adjectives are labeled
positive or negative. Evaluations on real
d a t a and simulation experiments indicate
high levels of performance: classification
precision is more t h a n 90% for adjectives
t h a t occur in a modest number of conjunc-
tions in the corpus.
1 I n t r o d u c t i o n
T h e semantic orientation or polarity of a word indi-
cates the direction the word deviates from the norm
for its semantic group or lezical field (Lehrer, 1974).
It also constrains the word's usage in the language
(Lyons, 1977), due to its evaluative characteristics
(Battistella, 1990). For example, some nearly syn-
onymous words differ in orientation because one im-
plies desirability and the other does not (e.g., sim-
ple versus simplisfic). In linguistic constructs such
as conjunctions, which impose constraints on the se-
mantic orientation of their arguments (Anscombre
and Ducrot, 1983; Elhadad and McKeown, 1990),
the choices of arguments and connective are mutu-
ally constrained, as illustrated by:
T h e t a x proposal was
simple and well-received }
simplistic but well-received
*simplistic and well-received
by the public.
In addition, almost all antonyms have different se-
mantic orientations3 If we know t h a t two words
relate to the same property (for example, members
of the same scalar group such as hot and cold) but
have different orientations, we can usually infer t h a t
they are antonyms. Given t h a t semantically similar
words can be identified automatically on the basis of
distributional properties and linguistic cues (Brown
et al., 1992; Pereira et al., 1993; Hatzivassiloglou and
McKeown, 1993), identifying the semantic orienta-
tion of words would allow a system to fu rt h er refine
the retrieved semantic similarity relationships, ex-
tracting antonyms.
Unfortunately, dictionaries and similar sources
(theusari, WordNet (Miller et al., 1990)) do not in-
clude semantic orientation information. 2 Explicit
links between an t o n y m s and synonyms m a y also be
lacking, particularly when they depend on the do-
main of discourse; for example, the opposition bear-
bull appears only in stock market reports, where the
two words take specialized meanings.
In this paper, we present and evaluate a m e t h o d
t h a t automatically retrieves semantic orientation in-
fo rm at i o n using indirect information collected from
a large corpus. Because the m e t h o d relies on the cor-
pus, it extracts domain-dependent i n f o r m a t i o n and
automatically adapts to a new d o m ai n when the cor-
pus is changed. Our m e t h o d achieves high preci-
sion (more t h an 90%), and, while our focus to date
has been on adjectives, it can be directly applied to
other word classes. Ultimately, our goal is to use this
m e t h o d in a larger system to a u t o m a t i c a l l y identify
an t o n y m s and distinguish near synonyms.
2 O v e r v i e w o f O u r A p p r o a c h
Our approach relies on an analysis of t e x t u a l corpora
t h a t correlates linguistic features, or indicators, with
1 Exceptions include a small number of terms that are
both negative from a pragmatic viewpoint and yet stand
in all antonymic relationship; such terms frequently lex-
icalize two unwanted extremes, e.g., verbose-terse.
2 Except implicitly, in the form of definitions and us-
age examples.
174
semantic orientation. While no direct indicators of
positive or negative semantic orientation have been
proposed 3, we demonstrate t h a t conjunctions be-
tween adjectives provide indirect information ab o u t
orientation. For most connectives, the conjoined ad-
jectives usually are of the same orientation: compare
f a i r and legitimate and corrupt and brutal which ac-
tually occur in our corpus, with ~fair and brutal and
*corrupt and legitimate (or the other cross-products
of the above conjunctions) which are semantically
anomalous. T h e situation is reversed for but, which
usually connects two adjectives of different orienta-
tions.
T h e system identifies and uses this indirect infor-
m a t i o n in the following stages:
1. All conjunctions of adjectives are extracted
from the corpus along with relevant morpho-
logical relations.
2. A log-linear regression model combines informa-
tion from different conjunctions to determine
if each two conjoined adjectives are of same
or different orientation. T h e result is a graph
with hypothesized same- or different-orientation
links between adjectives.
3. A clustering algorithm separates the adjectives
into two subsets of different orientation. It
places as many words of same orientation as
possible into the same subset.
4. T h e average frequencies in each group are com-
pared and the group with the higher frequency
is labeled as positive.
In the following sections, we first present the set
of adjectives used for training and evaluation. We
next validate our hypothesis that conjunctions con-
strain the orientation of conjoined adjectives and
then describe the remaining three steps of the algo-
rithm. After presenting our results and evaluation,
we discuss simulation experiments that show how
our m e t h o d performs under different conditions of
sparseness of data.
3 D a t a C o l l e c t i o n
For our experiments, we use the 21 million word
1987 Wall Street Journal corpus 4, automatically an-
n o t a t e d with part-of-speech tags using the PARTS
tagger (Church, 1988).
In order to verify our hypothesis about the ori-
entations of conjoined adjectives, and also to train
and evaluate our subsequent algorithms, we need a
3Certain words inflected with negative affixes (such
as in- or un-) tend to be mostly negative, but this rule
applies only to a fraction of the negative words. Further-
more, there are words so inflected which have positive
orientation, e.g., independent and unbiased.
4Available form the ACL Data Collection Initiative
as CD ROM 1.
Positive: adequate central clever famous
intelligent remarkable reputed
sensitive slender thriving
Negative: contagious drunken ignorant lanky
listless primitive strident troublesome
unresolved unsuspecting
Figure 1: Randomly selected adjectives with positive
and negative orientations.
set of adjectives with predetermined orientation la-
bels. We constructed this set by taking all adjectives
appearing in our corpus 20 times or more, then re-
moving adjectives t h at have no orientation. These
are typically members of groups of complementary,
qualitative terms (Lyons, 1977), e.g., domestic or
medical.
We then assigned an orientation label (either + or
- ) to each adjective, using an evaluative approach.
Th e criterion was whether the use of this adjective
ascribes in general a positive or negative quality to
the modified item, making it b e t t e r or worse t h a n a
similar unmodified item. We were unable to reach
a unique label out of context for several adjectives
which we removed from consideration; for example,
cheap is positive if it is used as a sy n o n y m of in-
expensive, but negative if it implies inferior quality.
T h e operations of selecting adjectives and assigning
labels were performed before testing our conjunction
hypothesis or implementing any other algorithms, to
avoid any influence on our labels. T h e final set con-
tained 1,336 adjectives (657 positive and 679 nega-
tive terms). Figure 1 shows randomly selected terms
from this set.
To further validate our set of labeled adjectives,
we subsequently asked four people to independently
label a randomly drawn sample of 500 of these
adjectives. T h e y agreed with us t h a t the posi-
tive/negative concept applies to 89.15% of these ad-
jectives on average. For the adjectives where a pos-
itive or negative label was assigned by b o t h us and
the independent evaluators, the average agreement
on the label was 97.38%. Th e average inter-reviewer
agreement on labeled adjectives was 96.97%. These
results are extremely significant statistically and
compare favorably with validation studies performed
for other tasks (e.g., sense disambiguation) in the
past. T h e y show t h a t positive and negative orien-
tation are objective properties t h a t can be reliably
determined by humans.
To extract conjunctions between adjectives, we
used a two-level finite-state grammar, which covers
complex modification patterns and noun-adjective
apposition. Running this parser on the 21 mil-
lion word corpus, we collected 13,426 conjunctions
of adjectives, expanding to a t o t al o f 15,431 con-
joined adjective pairs. After morphological trans-
175
Conjunction category
Conjunction
types
analyzed
All appositive and conjunctions
All conjunctions 2,748
All and conjunctions 2,294
All or conjunctions 305
All but conjunctions 214
All attributive and conjunctions 1,077
All predicative and conjunctions 860
30
% same-
orientation
(types)
77.84%
81.73%
77.05%
30.84%
80.04%
84.77%
70.00%
% same-
orientation
(tokens)
72.39%
78.07%
60.97%
25.94%
76.82%
84.54%
63.64%
P-Value
(for types)
< i • i0 -I~
< 1 • 10 -1~
< 1 • 10 -1~
2 . 0 9 . 1 0 -:~
< 1. i 0 -16
< 1. i 0 -I~
0.04277
Table 1: Validation of our conjunction hypothesis. T h e P-
value is the probability t h a t similar
extreme results would have been obtained if same- and
different-orientation conjunction types were
equally distributed.
or more
actually
formations, the remaining 15,048 conjunction tokens
involve 9,296 distinct pairs of conjoined adjectives
(types). Each conjunction token is classified by the
parser according to three variables: the conjunction
used (and, or, bu~, either-or, or neither-nor), the
type of modification (attributive, predicative, appos-
itive, resultative), and the number of the modified
noun (singular or plural).
4 V a l i d a t i o n o f the Conjunction
H y p o t h e s i s
Using the three attributes extracted by the parser,
we constructed a cross-classification of the conjunc-
tions in a three-way table. We counted types and to-
kens of each conjoined pair t h a t had b o t h members
in t h e set o f pre-selected labeled adjectives discussed
above; 2,748 (29.56%) of all conjoined pairs (types)
and 4,024 (26.74%) of all conjunction occurrences
(tokens) met this criterion. We augmented this ta-
ble with marginal totals, arriving at 90 categories,
each of which represents a triplet of a t t r i b u t e values,
possibly with one or more "don't care" elements.
We then measured the percentage of conjunctions
in each category with adjectives of same or differ-
ent orientations. Under the null hypothesis of same
proportions of adjective pairs (types) of same and
different orientation in a given category, the num-
ber of same- or different-orientation pairs follows a
binomial distribution with p = 0.5 (Conover, 1980).
We show in Table 1 the results for several repre-
sentative categories, and summarize all results be-
low:
• Our conjunction hypothesis is validated overall
and for almost all individual cases. T h e results
are extremely significant statistically, except for
a few cases where the sample is small.
• Aside from the use of but with adjectives of
different orientations, there are, rather surpris-
ingly, small differences in the behavior of con-
junctions between linguistic environments (as
represented by the three attributes). Th ere are
a few exceptions, e.g., appositive and conjunc-
tions modifying plural nouns are evenly split
between same and different orientation. But
in these exceptional cases the sample is very
small, and the observed behavior m a y be due
to chance.
• Further analysis of different-orientation pairs in
conjunctions other t h an but shows t h a t con-
joined antonyms are far more frequent t h a n ex-
pected by chance, in agreement with (Justeson
and Katz, 1991).
5 P r e d i c t i o n o f L i n k T y p e
T h e analysis in the previous section suggests a base-
line m e t h o d for classifying links between adjectives:
since 77.84% of all links from conjunctions indicate
same orientation, we can achieve this level of perfor-
mance by always guessing t h a t a link is of the same-
orientation type. However, we can improve perfor-
mance by noting t h a t conjunctions using but exhibit
the opposite pattern, usually involving adjectives of
different orientations. Thus, a revised b u t still sim-
ple rule predicts a different-orientation link if the
two adjectives have been seen in a but conjunction,
and a same-orientation link otherwise, assuming the
two adjectives were seen connected by at least one
conjunction.
Morphological relationships between adjectives al-
so play a role. Adjectives related in form (e.g., ade-
quate-inadequate or thoughtful-thoughtless) almost
always have different semantic orientations. We im-
plemented a morphological analyzer which matches
adjectives related in this manner. T h i s process is
highly accurate, b u t unfortunately does not apply
to m an y of the possible pairs: in our set of 1,336
labeled adjectives (891,780 possible pairs), 102 pairs
are morphologically related; among them, 99 are of
different orientation, yielding 97.06% accuracy for
the morphology method. This information is orthog-
onal to t h a t extracted from conjunctions: only 12
of the 102 morphologically related pairs have been
observed in conjunctions in our corpus. Thus, we
176
Prediction
m e t h o d
Always predict
same orientation
But rule
Log-linear model
Morphology
used?
N o
Yes
N o
Yes
Accuracy on reported
same-orientation links
77.84%
78.18%
81.81%
82.20%
No
Yes
81.53%
82.00%
Accuracy on reported
different-orientation links
97.06%
69.16%
78.16%
73.70%
82.44%
Table 2: Accuracy of several link prediction models.
Overall
accuracy
77.84%
78.86%
80.82%
81.75%
80.97%
82.05%
add to the predictions made from conjunctions the
different-orientation links suggested by morphologi-
cal relationships.
We improve the accuracy of classifying links de-
rived from conjunctions as same or different orienta-
tion with a log-linear regression model (Santner and
Duffy, 1989), exploiting the differences between the
various conjunction categories. This is a generalized
linear model (McCullagh and Nelder, 1989) with a
linear predictor
= w W x
where x is the vector of the observed counts in the
various conjunction categories for the particular ad-
jective pair we try to classify and w is a vector of
weights to be learned during training. T h e response
y is non-linearly related to r/ through the inverse
logit function,
e0
Y = l q-e"
Note t h a t y E (0, 1), with each of these endpoints
associated with one of the possible outcomes.
We have 90 possible predictor variables, 42 of
which are linearly independent. Since using all the
42 independent predictors invites overfitting (Duda
and Hart, 1973), we have investigated subsets of the
full log-linear model for our d a t a using the m e t h o d
o f iterative stepwise refinement: starting with an ini-
tial model, variables are added or dropped if their
contribution to the reduction or increase of the resid-
ual deviance compares favorably to the resulting loss
or gain of residual degrees of freedom. This process
led to the selection of nine predictor variables.
We evaluated the three prediction models dis-
cussed above with and without the secondary source
of morphology relations. For the log-linear model,
we repeatedly partitioned our d a t a into equally sized
training and testing sets, estimated the weights on
the training set, and scored the model's performance
on the testing set, averaging the resulting scores. 5
Table 2 shows the results of these analyses. Al-
though the log-linear model offers only a small im-
provement on pair classification than the simpler but
prediction rule, it confers the i m p o r t a n t advantage
5When morphology is to be used as a supplementary
predictor, we remove the morphologically related pairs
from the training and testing sets.
of rating each prediction between 0 and 1. We make
extensive use of this in the next phase of our algo-
rithm.
6 Finding Groups of Same-Oriented
Adjectives
T h e third phase of our m et h o d assigns the adjectives
into groups, placing adjectives of the same (but un-
known) orientation in the same group. Each pair
of adjectives has an associated dissimilarity value
between 0 and 1; adjectives connected by same-
orientation links have low dissimilarities, and con-
versely, different-orientation links result in high dis-
similarities. Adjective pairs with no connecting links
are assigned the neutral dissimilarity 0.5.
T h e baseline and but methods make qualitative
distinctions only (i.e., same-orientation, different-
orientation, or unknown); for them, we define dis-
similarity for same-orientation links as one minus
the probability t h a t such a classification link is cor-
rect and dissimilarity for different-orientation links
as the probability t h a t such a classification is cor-
rect. These probabilities are estimated from sep-
arate training data. Note t h at for these prediction
models, dissimilarities are identical for similarly clas-
sifted links.
T h e log-linear model, on the other hand, offers
an estimate of how good each prediction is, since it
produces a value y between 0 and 1. We construct
the model so t h a t 1 corresponds to same-orientation,
and define dissimilarity as one minus the produced
value.
Same and different-orientation links between ad-
jectives form a graph. To partition the graph nodes
into subsets of the same orientation, we employ an
iterative optimization procedure on each connected
component, based on the exchange method, a non-
hierarchical clustering algorithm (Spgth, 1985). We
define an objective/unction ~ scoring each possible
partition 7 ) of the adjectives into two subgroups C1
and C2 as
i=1 x,y E Ci
177
Number of
adjectives in
test set ([An[)
2 730
3 516
4 369
5 236
Number of
links in
test set ([L~[)
2,568
2,159
1,742
1,238
Average number
o f l i n k s f o r
each adjective
7.04
8.37
9.44
10.49
Accuracy
78.08%
82.56%
87.26%
92.37%
Ratio o f average
group frequencies
1.8699
1.9235
L3486
1.4040
Table 3: Evaluation of the adjective classification and labeling
methods.
where [Cil stands for the cardinality of cluster i, and
d(z, y) is the dissimilarity between adjectives z and
y. We want to select the partition :Pmin t h a t min-
imizes ~, subject to the additional constraint t h a t
for each adjective z in a cluster C,
1 1
I C l - 1 d(=,y) < --IVl d(=, y) (1)
where C is the complement of cluster C, i.e., the
other member of the partition. This constraint,
based on Rousseeuw's (1987) s=lhoue~es, helps cor-
rect wrong cluster assignments.
To find Pmin, we first construct a r a n d o m parti-
tion of the adjectives, then locate the adjective t h a t
will most reduce the objective function if it is moved
from its current cluster. We move this adjective and
proceed with the next iteration until no movements
can improve the objective function. At the final it-
eration, the cluster assignment of any adjective t h a t
violates constraint (1) is changed. This is a steepest-
descent hill-climbing method, and thus is guaran-
teed to converge. However, it will in general find a
local minimum rather than the global one; the prob-
lem is NP-complete (Garey and $ohnson, 1979). We
can arbitrarily increase the probability of finding the
globally optimal solution by repeatedly running the
algorithm with different starting partitions.
7 L a b e l i n g t h e C l u s t e r s a s P o s i t i v e
o r N e g a t i v e
T h e clustering algorithm separates each component
of the graph into two groups of adjectives, b u t does
not actually label the adjectives as positive or neg-
ative. To accomplish that, we use a simple criterion
t h a t applies only to pairs or groups of words o f oppo-
site orientation. We have previously shown (Hatzi-
vassiloglou and McKeown, 1995) t h a t in oppositions
of gradable adjectives where one member is semanti-
cally unmarked, the unmarked member is the most
frequent one about 81% of the time. This is relevant
to our task because semantic markedness exhibits
a strong correlation with orientation, the unmarked
m e m b e r almost always having positive orientation
(Lehrer, 1985; Battistella, 1990).
We compute the average frequency of the words
in each group, expecting the group with higher av-
erage frequency to contain the positive terms. This
aggregation operation increases the precision of the
labeling dramatically since indicators for m a n y pairs
of words are combined, even when some of the words
are incorrectly assigned to their group.
8 R e s u l t s a n d E v a l u a t i o n
Since graph connectivity affects performance, we de-
vised a m e t h o d of selecting test sets t h a t makes this
dependence explicit. Note t h at the g rap h density is
largely a function of corpus size, and thus can be
increased by adding more data. Nevertheless, we
report results on sparser test sets to show how our
algorithm scales up.
We separated our sets of adjectives A (containing
1,336 adjectives) and conjunction- and morphology-
based links L (containing 2,838 links) into training
and testing groups by selecting, for several values
o f the p a r a m e t e r a, the maximal subset of A, An,
which includes an adjective z if and only if there
exist at least a links from L between x and other
elements of An. This operation in t u r n defines a
subset of L, L~, which includes all links between
members of An. We train our log-linear model on
L - La (excluding links between morphologically re-
lated adjectives), compute predictions and dissimi-
larities for the links in L~, and use these to classify
and label the adjectives in An. c~ must be at least
2, since we need to leave some links for training.
Table 3 shows the results of these experiments for
a = 2 to 5. Our m e t h o d produced the correct clas-
sification between 78% of the time on the sparsest
test set up to more than 92% of the time when a
higher number of links was present. Moreover, in all
cases, the ratio of the two group frequencies correctly
identified the positive subgroup. These results are
extremely significant statistically (P-value less t h a n
10 -16 ) when compared with the baseline m e t h o d of
ran d o m l y assigning orientations to adjectives, or the
baseline m e t h o d of always predicting the most fre-
quent (for types) category (50.82% o f the adjectives
in our collection are classified as negative). Figure 2
shows some of the adjectives in set A4 and their clas-
sifications.
178
Classified as positive:
b o l d d e c i s i v e disturbing generous good
honest important large mature patient
peaceful positive proud sound
stimulating s t r a i g h t f o r w a r d strange
talented vigorous witty
Classified as negative:
ambiguous cautious cynical evasive
harmful hypocritical inefficient insecure
i r r a t i o n a l irresponsible minor outspoken
pleasant reckless risky selfish tedious
unsupported vulnerable wasteful
Figure 2: Sample retrieved classifications of adjec-
tives from set A4. Correctly m a t c h e d adjectives are
shown in bold.
9 G r a p h C o n n e c t i v i t y a n d
P e r f o r m a n c e
A strong point of our m e t h o d is t h a t decisions on
individual words are aggregated to provide decisions
on how to group words into a class and whether to
label the class as positive or negative. Thus, the
overall result can be much more accurate t h a n the
individual indicators. To verify this, we ran a series
o f simulation experiments. Each experiment m e a -
sures how our algorithm performs for a given level
o f precision P for identifying links and a given av-
erage n u m b e r of links k for each word. T h e goal is
to show t h a t even when P is low, given enough d a t a
(i.e., high k), we can achieve high performance for
the grouping.
As we noted earlier, the corpus d a t a is eventually
represented in our s y s t e m as a graph, with the nodes
corresponding to adjectives and the links to predic-
tions a b o u t whether the two connected adjectives
have the s a m e or different orientation. Thus the pa-
r a m e t e r P in the simulation experiments measures
how well we are able to predict each link indepen-
dently of the others, and the p a r a m e t e r k measures
the n u m b e r of distinct adjectives each adjective ap-
pears with in conjunctions. P therefore directly rep-
resents the precision of the link classification algo-
r i t h m , while k indirectly represents the corpus size.
To m e a s u r e the effect of P and k (which are re-
flected in the g r a p h topology), we need to carry out a
series of experiments where we systematically vary
their values. For example, as k (or the a m o u n t of
d a t a ) increases for a given level of precision P for in-
dividual links, we want to measure how this affects
overall accuracy of the resulting groups of nodes.
T h u s , we need to construct a series of d a t a sets,
or graphs, which represent different scenarios cor-
responding to a given combination of values of P
a n d k. To do this, we construct a r a n d o m g r a p h
by r a n d o m l y assigning 50 nodes to the two possible
orientations. Because we d o n ' t have frequency and
m o r p h o l o g y information on these a b s t r a c t nodes, we
cannot predict whether two nodes are o f the s a m e
or different orientation. Rather, we r a n d o m l y as-
sign links between nodes so t h a t , on average, each
node participates in k links and 100 x P % o f all
links connect nodes of the s a m e orientation. T h e n
we consider these links as identified by the link pre-
diction algorithm as connecting two nodes with the
s a m e orientation (so t h a t 100 x P % of these pre-
dictions will be correct). This is equivalent to the
baseline link classification m e t h o d , and provides a
lower bound on the performance of the a l g o r i t h m
actually used in our system (Section 5).
Because of the lack of actual m e a s u r e m e n t s such
as frequency on these abstract nodes, we also de-
couple the partitioning and labeling c o m p o n e n t s of
our s y s t e m and score the partition found under the
best m a t c h i n g conditions for the actual labels. T h u s
the simulation measures only how well the s y s t e m
separates positive f r o m negative adjectives, not how
well it determines which is which. However, in all
the experiments performed on real corpus d a t a (Sec-
tion 8), the s y s t e m correctly found the labels o f the
groups; any misclassifications came f r o m misplacing
an adjective in the wrong group. T h e whole proce-
dure of constructing the r a n d o m g r a p h and finding
and scoring the groups is repeated 200 times for any
given combination of P and k, and the results are
averaged, thus avoiding accidentally evaluating our
s y s t e m on a g r a p h t h a t is not truly representative of
graphs with the given P and k.
We observe (Figure 3) t h a t even for relatively low
t9, our ability to correctly classify the nodes ap-
proaches very high levels with a m o d e s t n u m b e r of
links. For P = 0.8, we need only a b o u t ? links
per adjective for classification performance over 90%
and only 12 links per adjective for p e r f o r m a n c e over
99%. s T h e difference between low and high values
of P is in the rate at which increasing d a t a increases
overall precision. These results are s o m e w h a t m o r e
optimistic t h a n those obtained with real d a t a (Sec-
tion 8), a difference which is p r o b a b l y due to the uni-
f o r m distributional assumptions in the simulation.
Nevertheless, we expect the trends to be similar to
the ones shown in Figure 3 and the results of T a b l e 3
on real d a t a support this expectation.
1 0 C o n c l u s i o n a n d F u t u r e W o r k
We have proposed and verified from corpus d a t a con-
straints on the semantic orientations of conjoined ad-
jectives. We used these constraints to a u t o m a t i c a l l y
construct a log-linear regression model, which, com-
bined with s u p p l e m e n t a r y m o r p h o l o g y rules,
pre-
dicts whether two conjoined adjectives are o f s a m e
812 links per adjective for a set of n adjectives requires
6n conjunctions between the n adjectives in the corpus.
179
~ 75'
70.
65.
60"
55-
50 ~
0 i 2 ~ 4 5 6 7 8 9 1 ( ) 1'2 14 16 18 20
Avem0e neiohbo~ per node
(a) P = 0.75
25 30 32.77
95.
90.
85.
~75'
Average neighbors per node
( b ) P = 0.8
,~ 70
65
6O
5,5
50
Average netghbo~ per node
(c) P = 0.85
25 28.64
Figure 3: Simulation results obtained on 50 nodes.
10(
95
9O
85
P
~ 7o
55
Average neighb0m per node
(d) P = 0.9
In each figure, the last z coordinate indicates the
(average) m a x i m u m possible value of k for this P , and the
d o t t ed line shows the performance o f a r a n d o m
classifier.
or different orientation with 82% accuracy. We then
classified several sets of adjectives according to the
links inferred in this way and labeled them as posi-
tive or negative, obtaining 92% accuracy on the clas-
sification task for reasonably dense graphs and 100%
accuracy on the labeling task. Simulation experi-
ments establish t h a t very high levels of performance
can be obtained with a modest number of links per
word, even when the links themselves are not always
correctly classified.
As p a r t of our clustering algorithm's output, a
"goodness-of-fit" measure for each word is com-
puted, based on Rousseeuw's (1987) silhouettes.
T h i s measure ranks the words according to how well
t h e y fit in their group, and can thus be used as
a quantitative measure of orientation, refining the
binary positive-negative distinction. By restricting
the labeling decisions to words with high values of
this measure we can also increase the precision of
our system, at the cost of sacrificing some coverage.
We are currently combining the o u t p u t of this sys-
t e m with a semantic group finding system so t h a t we
can automatically identify a n t o n y m s from the cor-
pus, without access to any semantic descriptions.
T h e learned semantic categorization of the adjec-
tives can also be used in the reverse direction, to
help in interpreting the conjunctions they partici-
pate. We will also extend our analyses to nouns and
verbs.
A c k n o w l e d g e m e n t s
Th i s work was supported in p art by the Office
of Naval Research under grant N00014-95-1-0745,
j o i n t l y by the Office of Naval Research and the
Advanced Research Projects Agency under grant
N00014-89-J-1782, by the National Science Founda-
180
tion under grant GER-90-24069, and by the New
York State Center for Advanced Technology un-
der contracts NYSSTF-CAT(95)-013 and NYSSTF-
CAT(96)-013. We thank Ken Church and the
AT&T Bell Laboratories for making the PARTS
part-of-speech tagger available to us. We also thank
Dragomir Radev, Eric Siegel, and Gregory Sean
McKinley who provided models for the categoriza-
tion of the adjectives in our training and testing sets
as positive and negative.
R e f e r e n c e s
Jean-Claude Anscombre and Oswald Ducrot. 1983.
L ' Argumentation dans la Langue. Philosophic et
Langage. Pierre Mardaga, Brussels, Belgium.
Edwin L. Battistella. 1990. Markedness: The Eval-
uative Superstructure of Language. State Univer-
sity of New York Press, Albany, New York.
Peter F. Brown, Vincent J. della Pietra, Peter V.
de Souza, Jennifer C. Lai, and Robert L. Mercer.
1992. Class-based n-gram models of natural lan-
guage. Computational Linguistics, 18(4):487-479.
Kenneth W. Church. 1988. A stochastic parts
program and noun phrase parser for unrestricted
text. In Proceedings of the Second Conference on
Applied Natural Language Processing (ANLP-88),
pages 136-143, Austin, Texas, February. Associa-
tion for Computational Linguistics.
W. J. Conover. 1980. Practical Nonparametric
Statistics. Wiley, New York, 2nd edition.
Richard O. Duda and Peter E. Hart. 1973. Pattern
Classification and Scene Analysis. Wiley, New
York.
Michael Elhadad and Kathleen R. McKeown. 1990.
A procedure for generating connectives. In Pro-
ceedings of COLING, Helsinki, Finland, July.
Michael R. Garey and David S. Johnson. 1979.
Computers and Intractability: A Guide to the
Theory ofNP-Completeness. W. H. Freeman, San
Francisco, California.
Vasileios Hatzivassiloglou and Kathleen R. McKe-
own. 1993. Towards the automatic identification
of adjectival scales: Clustering adjectives accord-
ing to meaning. In Proceedings of the 31st Annual
Meeting of the ACL, pages 172-182, Columbus,
Ohio, June. Association for Computational Lin-
guistics.
Vasileios I-Iatzivassiloglou and Kathleen R. MeKe-
own. 1995. A quantitative evaluation of linguis-
tic tests for the automatic prediction of semantic
markedness. In Proceedings of the 83rd Annual
Meeting of the ACL, pages 197-204, Boston, Mas-
sachusetts, June. Association for Computational
Linguistics.
John S. Justeson and Slava M. Katz. 1991. Co-
occurrences of antonymous adjectives and their
contexts. Computational Linguistics, 17(1):1-19.
Adrienne Lehrer. 1974. Semantic Fields and Lezical
Structure. North Holland, Amsterdam and New
York.
Adrienne Lehrer. 1985. Markedness and antonymy.
Journal of Linguistics, 31(3):397-429, September.
John Lyons. 1977. Semantics, volume 1. Cambridge
University Press, Cambridge, England.
Peter McCullagh and John A. Nelder. 1989. Gen-
eralized Linear Models. Chapman and Hall, Lon-
don, 2nd edition.
George A. Miller, Richard Beckwith, Christiane Fell-
baum, Derek Gross, and Katherine J. Miller.
1990. Introduction to WordNet: An on-line lexi-
cal database. International Journal of Lexicogra-
phy (special issue), 3(4):235-312.
Fernando Pereira, Naftali Tishby, and Lillian Lee.
1993. Distributional clustering of English words.
In Proceedings of the 3Ist Annual Meeting of the
ACL, pages 183-190, Columbus, Ohio, June. As-
sociation for Computational Linguistics.
Peter J. Rousseeuw. 1987. Silhouettes: A graphical
aid to the interpretation and validation of cluster
analysis. Journal of Computational and Applied
Mathematics, 20:53-65.
Thomas J. Santner and Diane E. Duffy. 1989. The
Statistical Analysis of Discrete Data. Springer-
Verlag, New York.
Helmuth Sp~ith. 1985. Cluster Dissection and Anal-
ysis: Theory, FORTRAN Programs, Examples.
Ellis Horwo0d, Chiehester, West Sussex, England.
181
ROBERT:
What are the characteristics of someone having you as an
abundance or deficiency?
Deficiency:
-Goiter- Those who lack adequate levels of me, causes the
thyroid to progressively grown larger as it tries to keep up with
the demand for the thyroid hormone production. The goiter is
large and can often interfere with breathing and eating and lead
to choking. I do not cause this problem in the United States but
I am a major reason that goiters are present Worldwide.
- Hypothyroidism- As my levels drop another possible disorder
is hypothyroidism which is the lack or low levels of the TH in
the body. It is not common for lack of me to be the cause of
hypothyroidism in the United States but it is more common
worldwide.
- In Pregnancy- When my levels are low in infants and pregnant
women mean that I can cause miscarriages, stillbirth, preterm
labor, and congenital abnormalities in the babies.
Abundance:
Taking too much of me can also cause problems. This is
especially true in individuals that already have thyroid
problems, such as nodules, hyperthyroidism and autoimmune
thyroid disease. Administration of large amounts of me through
medications, radiology procedures and dietary excess can cause
or worsen hyperthyroidism and hypothyroidism.
In addition, individuals who move from a deficient region (for
example, parts of Europe) to a region with adequate intake (for
example, the United States) may also develop thyroid problems
since their thyroids have become very good at taking up and
using small amounts.
How would your recommended daily allowance affect special
populations (pregnant women, children, men, women, etc.)?
Men: 150 μg per day
Women: 150 μg per day
Birth to 6 months: 110 μg
7 months to 12 months: 130 μg
1- 8 years: 90 μg
9-13 years: 120 μg
Teens: 150 μg
Pregnant Women: 220 μg
Breastfeeding Women: 290 μg
These levels are specific for the United States.
What are some of the ways you might adjust the amount of this
mineral/water in a person's diet?
The easiest way is table salt and making sure that I am included
in the salt you chose. You can find me in some multivitamins
but not often in the United States.
What are some of the more serious long term and detrimental
effects (deficiency disorders, etc.) that might affect the body?
Goiters and Hypothyroidism are the most serious disorders that
can happen with long term low levels.
What other prevention, treatment, and safety measures might be
beneficial to your patient?
There are no tests to confirm if you have enough of me in your
body. When deficiency is seen in an entire population, it is best
managed by ensuring that common foods that people eat contain
sufficient levels of me. Since even mild deficiency during
pregnancy can have effects on delivery and the developing
baby, all pregnant and breastfeeding women should take a
multivitamin containing at least 150 μg per day.
PAT:
I am an essential part of an electrolyte, but for some reason the
electrolyte drink Gatorade doesn’t think I am important and
only puts 37mg in an 8oz bottle.
Sodium is my co-worker and we keep each other out of trouble,
but I am the most positively charged ion and together we help to
attract some glucose into our work building called the “cell”.
Have you ever had a cramp? Well sometime it is caused by my
co-worker sodium overloading in cells. No worries, eat a
banana because I know how to regulate sodium and will help to
alleviate that cramp.
Don’t you dare neglect me but don’t over indulge either.
Without me you may have may be a weakling, have trouble
breathing or moving stool. But a lack of me can cause abnormal
heart contractions too. Conversely, if there is too much of this
good thing I can cause the heart to stop.
You are not going to find me in garbage food, so enough with
the Micky-dees and Dairy queen. Hook yourself up with fruits
and veggies, whole grains and high quality meats.
Once you have made your initial post, the second portion of this
weekly discussion requires that you respond to three different
classmates.
AERICK:
I am a mineral that is very important for the human body. I have
many effects on your body both in the event of a deficiency as
well as having too much of me. I also mix well with many other
elements such as chloride to help clear ice from your driveway
during winter. Too much of me may cause kidney stones and
cause interference in the way your heart and brain work.
However, too little of me can result in serious health issues
such as osteopenia and osteoporosis. I can weaken the most
essential parts of your body and Vitamin D helps me.
I am important for pregnant women and they should consume at
least 1,000mg a day of me. For infants I should exist up to
700mg a day which amounts should increase to over 1000mg per
day as they grow older. Adults should regularly be taking it at
least 1000mg of me a day.
You can find me regularly in foods, however if you are lacking
you can also find me in supplements but I can also exist in
medications such as anti-acids.
Symptoms of disorders as a result of abusing me may end up
causing confusion and memory loss as well as muscle spasms,
numbness and tingling of the hands feet and face.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to
Unsupervised Classification of Reviews
Peter D. Turney
Institute for Information Technology
National Research Council of Canada
Ottawa, Ontario, Canada, K1A 0R6
[email protected]
Abstract
This paper presents a simple unsupervised
learning algorithm for classifying reviews
as recommended (thumbs up) or not rec-
ommended (thumbs down). The classifi-
cation of a review is predicted by the
average semantic orientation of the
phrases in the review that contain adjec-
tives or adverbs. A phrase has a positive
semantic orientation when it has good as-
sociations (e.g., “ subtle nuances” ) and a
negative semantic orientation when it has
bad associations (e.g., “ very cavalier” ). In
this paper, the semantic orientation of a
phrase is calculated as the mutual infor-
mation between the given phrase and the
word “ excellent” minus the mutual
information between the given phrase and
the word “ poor” . A review is classified as
recommended if the average semantic ori-
entation of its phrases is positive. The al-
gorithm achieves an average accuracy of
74% when evaluated on 410 reviews from
Epinions, sampled from four different
domains (reviews of automobiles, banks,
movies, and travel destinations). The ac-
curacy ranges from 84% for automobile
reviews to 66% for movie reviews.
1 Introduction
If you are considering a vacation in Akumal, Mex-
ico, you might go to a search engine and enter the
query “ Akumal travel review” . However, in this
case, Google1 reports about 5,000 matches. It
would be useful to know what fraction of these
matches recommend Akumal as a travel destina-
tion. With an algorithm for automatically classify-
ing a review as “ thumbs up” or “ thumbs down” , it
would be possible for a search engine to report
such summary statistics. This is the motivation for
the research described here. Other potential appli-
cations include recognizing “ flames” (abusive
newsgroup messages) (Spertus, 1997) and develop-
ing new kinds of search tools (Hearst, 1992).
In this paper, I present a simple unsupervised
learning algorithm for classifying a review as rec-
ommended or not recommended. The algorithm
takes a written review as input and produces a
classification as output. The first step is to use a
part-of-speech tagger to identify phrases in the in-
put text that contain adjectives or adverbs (Brill,
1994). The second step is to estimate the semantic
orientation of each extracted phrase (Hatzivassi-
loglou & McKeown, 1997). A phrase has a posi-
tive semantic orientation when it has good
associations (e.g., “ romantic ambience” ) and a
negative semantic orientation when it has bad as-
sociations (e.g., “ horrific events” ). The third step is
to assign the given review to a class, recommended
or not recommended, based on the average seman-
tic orientation of the phrases extracted from the re-
view. If the average is positive, the prediction is
that the review recommends the item it discusses.
Otherwise, the prediction is that the item is not
recommended.
The PMI-IR algorithm is employed to estimate
the semantic orientation of a phrase (Turney,
2001). PMI-IR uses Pointwise Mutual Information
(PMI) and Information Retrieval (IR) to measure
the similarity of pairs of words or phrases. The se-
1 http://www.google.com
mantic orientation of a given phrase is calculated
by comparing its similarity to a positive reference
word (“ excellent” ) with its similarity to a negative
reference word (“ poor” ). More specifically, a
phrase is assigned a numerical rating by taking the
mutual information between the given phrase and
the word “ excellent” and subtracting the mutual
information between the given phrase and the word
“ poor” . In addition to determining the direction of
the phrase’ s semantic orientation (positive or nega-
tive, based on the sign of the rating), this numerical
rating also indicates the strength of the semantic
orientation (based on the magnitude of the num-
ber). The algorithm is presented in Section 2.
Hatzivassiloglou and McKeown (1997) have
also developed an algorithm for predicting seman-
tic orientation. Their algorithm performs well, but
it is designed for isolated adjectives, rather than
phrases containing adjectives or adverbs. This is
discussed in more detail in Section 3, along with
other related work.
The classification algorithm is evaluated on 410
reviews from Epinions2, randomly sampled from
four different domains: reviews of automobiles,
banks, movies, and travel destinations. Reviews at
Epinions are not written by professional writers;
any person with a Web browser can become a
member of Epinions and contribute a review. Each
of these 410 reviews was written by a different au-
thor. Of these reviews, 170 are not recommended
and the remaining 240 are recommended (these
classifications are given by the authors). Always
guessing the majority class would yield an accu-
racy of 59%. The algorithm achieves an average
accuracy of 74%, ranging from 84% for automo-
bile reviews to 66% for movie reviews. The ex-
perimental results are given in Section 4.
The interpretation of the experimental results,
the limitations of this work, and future work are
discussed in Section 5. Potential applications are
outlined in Section 6. Finally, conclusions are pre-
sented in Section 7.
2 Classifying Reviews
The first step of the algorithm is to extract phrases
containing adjectives or adverbs. Past work has
demonstrated that adjectives are good indicators of
subjective, evaluative sentences (Hatzivassiloglou
2 http://www.epinions.com
& Wiebe, 2000; Wiebe, 2000; Wiebe et al., 2001).
However, although an isolated adjective may indi-
cate subjectivity, there may be insufficient context
to determine semantic orientation. For example,
the adjective “ unpredictable” may have a negative
orientation in an automotive review, in a phrase
such as “ unpredictable steering” , but it could have
a positive orientation in a movie review, in a
phrase such as “ unpredictable plot” . Therefore the
algorithm extracts two consecutive words, where
one member of the pair is an adjective or an adverb
and the second provides context.
First a part-of-speech tagger is applied to the
review (Brill, 1994).3 Two consecutive words are
extracted from the review if their tags conform to
any of the patterns in Table 1. The JJ tags indicate
adjectives, the NN tags are nouns, the RB tags are
adverbs, and the VB tags are verbs.4 The second
pattern, for example, means that two consecutive
words are extracted if the first word is an adverb
and the second word is an adjective, but the third
word (which is not extracted) cannot be a noun.
NNP and NNPS (singular and plural proper nouns)
are avoided, so that the names of the items in the
review cannot influence the classification.
Table 1. Patterns of tags for extracting two-word
phrases from reviews.
First Word Second Word Third Word
(Not Extracted)
1. JJ NN or NNS anything
2. RB, RBR, or
RBS
JJ not NN nor NNS
3. JJ JJ not NN nor NNS
4. NN or NNS JJ not NN nor NNS
5. RB, RBR, or
RBS
VB, VBD,
VBN, or VBG
anything
The second step is to estimate the semantic ori-
entation of the extracted phrases, using the PMI-IR
algorithm. This algorithm uses mutual information
as a measure of the strength of semantic associa-
tion between two words (Church & Hanks, 1989).
PMI-IR has been empirically evaluated using 80
synonym test questions from the Test of English as
a Foreign Language (TOEFL), obtaining a score of
74% (Turney, 2001). For comparison, Latent Se-
mantic Analysis (LSA), another statistical measure
of word association, attains a score of 64% on the
3 http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z
4 See Santorini (1995) for a complete description of the tags.
same 80 TOEFL questions (Landauer & Dumais,
1997).
The Pointwise Mutual Information (PMI) be-
tween two words, word1 and word2, is defined as
follows (Church & Hanks, 1989):
p(word1 & word2)
PMI(word1, word2) = log2
p(word1) p(word2)
(1)
Here, p(word1 & word2) is the probability that
word1 and word2 co-occur. If the words are statisti-
cally independent, then the probability that they
co-occur is given by the product p(word1)
p(word2). The ratio between p(word1 & word2) and
p(word1) p(word2) is thus a measure of the degree
of statistical dependence between the words. The
log of this ratio is the amount of information that
we acquire about the presence of one of the words
when we observe the other.
The Semantic Orientation (SO) of a phrase,
phrase, is calculated here as follows:
SO(phrase) = PMI(phrase, “ excellent” )
- PMI(phrase, “ poor” )
(2)
The reference words “ excellent” and “ poor” were
chosen because, in the five star review rating sys-
tem, it is common to define one star as “ poor” and
five stars as “ excellent” . SO is positive when
phrase is more strongly associated with “ excellent”
and negative when phrase is more strongly associ-
ated with “ poor” .
PMI-IR estimates PMI by issuing queries to a
search engine (hence the IR in PMI-IR) and noting
the number of hits (matching documents). The fol-
lowing experiments use the AltaVista Advanced
Search engine5, which indexes approximately 350
million web pages (counting only those pages that
are in English). I chose AltaVista because it has a
NEAR operator. The AltaVista NEAR operator
constrains the search to documents that contain the
words within ten words of one another, in either
order. Previous work has shown that NEAR per-
forms better than AND when measuring the
strength of semantic association between words
(Turney, 2001).
Let hits(query) be the number of hits returned,
given the query query. The following estimate of
SO can be derived from equations (1) and (2) with
5 http://www.altavista.com/sites/search/adv
some minor algebraic manipulation, if co-
occurrence is interpreted as NEAR:
SO(phrase) =
hits(phrase NEAR “ excellent” ) hits(“ poor” )
log2
hits(phrase NEAR “ poor” ) hits(“ excellent” )
(3)
Equation (3) is a log-odds ratio (Agresti, 1996).
To avoid division by zero, I added 0.01 to the hits.
I also skipped phrase when both hits(phrase
NEAR “ excellent” ) and hits(phrase NEAR
“ poor” ) were (simultaneously) less than four.
These numbers (0.01 and 4) were arbitrarily cho-
sen. To eliminate any possible influence from the
testing data, I added “ AND (NOT host:epinions)”
to every query, which tells AltaVista not to include
the Epinions Web site in its searches.
The third step is to calculate the average seman-
tic orientation of the phrases in the given review
and classify the review as recommended if the av-
erage is positive and otherwise not recommended.
Table 2 shows an example for a recommended
review and Table 3 shows an example for a not
recommended review. Both are reviews of the
Bank of America. Both are in the collection of 410
reviews from Epinions that are used in the experi-
ments in Section 4.
Table 2. An example of the processing of a review that
the author has classified as recommended.6
Extracted Phrase Part-of-Speech
Tags
Semantic
Orientation
online experience JJ NN 2.253
low fees JJ NNS 0.333
local branch JJ NN 0.421
small part JJ NN 0.053
online service JJ NN 2.780
printable version JJ NN -0.705
direct deposit JJ NN 1.288
well other RB JJ 0.237
inconveniently
located
RB VBN -1.541
other bank JJ NN -0.850
true service JJ NN -0.732
Average Semantic Orientation 0.322
6 The semantic orientation in the following tables is calculated
using the natural logarithm (base e), rather than base 2. The
natural log is more common in the literature on log-odds ratio.
Since all logs are equivalent up to a constant factor, it makes
no difference for the algorithm.
Table 3. An example of the processing of a review that
the author has classified as not recommended.
Extracted Phrase Part-of-Speech
Tags
Semantic
Orientation
little difference JJ NN -1.615
clever tricks JJ NNS -0.040
programs such NNS JJ 0.117
possible moment JJ NN -0.668
unethical practices JJ NNS -8.484
low funds JJ NNS -6.843
old man JJ NN -2.566
other problems JJ NNS -2.748
probably wondering RB VBG -1.830
virtual monopoly JJ NN -2.050
other bank JJ NN -0.850
extra day JJ NN -0.286
direct deposits JJ NNS 5.771
online web JJ NN 1.936
cool thing JJ NN 0.395
very handy RB JJ 1.349
lesser evil RBR JJ -2.288
Average Semantic Orientation -1.218
3 Related Work
This work is most closely related to Hatzivassi-
loglou and McKeown’ s (1997) work on predicting
the semantic orientation of adjectives. They note
that there are linguistic constraints on the semantic
orientations of adjectives in conjunctions. As an
example, they present the following three sen-
tences (Hatzivassiloglou & McKeown, 1997):
1. The tax proposal was simple and well-
received by the public.
2. The tax proposal was simplistic but well-
received by the public.
3. (* ) The tax proposal was simplistic and
well-received by the public.
The third sentence is incorrect, because we use
“ and” with adjectives that have the same semantic
orientation (“ simple” and “ well-received” are both
positive), but we use “ but” with adjectives that
have different semantic orientations (“ simplistic”
is negative).
Hatzivassiloglou and McKeown (1997) use a
four-step supervised learning algorithm to infer the
semantic orientation of adjectives from constraints
on conjunctions:
1. All conjunctions of adjectives are extracted
from the given corpus.
2. A supervised learning algorithm combines
multiple sources of evidence to label pairs of
adjectives as having the same semantic orienta-
tion or different semantic orientations. The re-
sult is a graph where the nodes are adjectives
and links indicate sameness or difference of
semantic orientation.
3. A clustering algorithm processes the graph
structure to produce two subsets of adjectives,
such that links across the two subsets are
mainly different-orientation links, and links in-
side a subset are mainly same-orientation links.
4. Since it is known that positive adjectives
tend to be used more frequently than negative
adjectives, the cluster with the higher average
frequency is classified as having positive se-
mantic orientation.
This algorithm classifies adjectives with accuracies
ranging from 78% to 92%, depending on the
amount of training data that is available. The algo-
rithm can go beyond a binary positive-negative dis-
tinction, because the clustering algorithm (step 3
above) can produce a “ goodness-of-fit” measure
that indicates how well an adjective fits in its as-
signed cluster.
Although they do not consider the task of clas-
sifying reviews, it seems their algorithm could be
plugged into the classification algorithm presented
in Section 2, where it would replace PMI-IR and
equation (3) in the second step. However, PMI-IR
is conceptually simpler, easier to implement, and it
can handle phrases and adverbs, in addition to iso-
lated adjectives.
As far as I know, the only prior published work
on the task of classifying reviews as thumbs up or
down is Tong’ s (2001) system for generating sen-
timent timelines. This system tracks online discus-
sions about movies and displays a plot of the
number of positive sentiment and negative senti-
ment messages over time. Messages are classified
by looking for specific phrases that indicate the
sentiment of the author towards the movie (e.g.,
“ great acting” , “ wonderful visuals” , “ terrible
score” , “ uneven editing” ). Each phrase must be
manually added to a special lexicon and manually
tagged as indicating positive or negative sentiment.
The lexicon is specific to the domain (e.g., movies)
and must be built anew for each new domain. The
company Mindfuleye7 offers a technology called
Lexant™ that appears similar to Tong’ s (2001)
system.
Other related work is concerned with determin-
ing subjectivity (Hatzivassiloglou & Wiebe, 2000;
Wiebe, 2000; Wiebe et al., 2001). The task is to
distinguish sentences that present opinions and
evaluations from sentences that objectively present
factual information (Wiebe, 2000). Wiebe et al.
(2001) list a variety of potential applications for
automated subjectivity tagging, such as recogniz-
ing “ flames” (Spertus, 1997), classifying email,
recognizing speaker role in radio broadcasts, and
mining reviews. In several of these applications,
the first step is to recognize that the text is subjec-
tive and then the natural second step is to deter-
mine the semantic orientation of the subjective
text. For example, a flame detector cannot merely
detect that a newsgroup message is subjective, it
must further detect that the message has a negative
semantic orientation; otherwise a message of praise
could be classified as a flame.
Hearst (1992) observes that most search en-
gines focus on finding documents on a given topic,
but do not allow the user to specify the directional-
ity of the documents (e.g., is the author in favor of,
neutral, or opposed to the event or item discussed
in the document?). The directionality of a docu-
ment is determined by its deep argumentative
structure, rather than a shallow analysis of its ad-
jectives. Sentences are interpreted metaphorically
in terms of agents exerting force, resisting force,
and overcoming resistance. It seems likely that
there could be some benefit to combining shallow
and deep analysis of the text.
4 Experiments
Table 4 describes the 410 reviews from Epinions
that were used in the experiments. 170 (41%) of
the reviews are not recommended and the remain-
ing 240 (59%) are recommended. Always guessing
the majority class would yield an accuracy of 59%.
The third column shows the average number of
phrases that were extracted from the reviews.
Table 5 shows the experimental results. Except
for the travel reviews, there is surprisingly little
variation in the accuracy within a domain. In addi-
7 http://www.mindfuleye.com/
tion to recommended and not recommended, Epin-
ions reviews are classified using the five star rating
system. The third column shows the correlation be-
tween the average semantic orientation and the
number of stars assigned by the author of the re-
view. The results show a strong positive correla-
tion between the average semantic orientation and
the author’ s rating out of five stars.
Table 4. A summary of the corpus of reviews.
Domain of Review Number of
Reviews
Average
Phrases per
Review
Automobiles 75 20.87
Honda Accord 37 18.78
Volkswagen Jetta 38 22.89
Banks 120 18.52
Bank of America 60 22.02
Washington Mutual 60 15.02
Movies 120 29.13
The Matrix 60 19.08
Pearl Harbor 60 39.17
Travel Destinations 95 35.54
Cancun 59 30.02
Puerto Vallarta 36 44.58
All 410 26.00
Table 5. The accuracy of the classification and the cor-
relation of the semantic orientation with the star rating.
Domain of Review Accuracy Correlation
Automobiles 84.00 % 0.4618
Honda Accord 83.78 % 0.2721
Volkswagen Jetta 84.21 % 0.6299
Banks 80.00 % 0.6167
Bank of America 78.33 % 0.6423
Washington Mutual 81.67 % 0.5896
Movies 65.83 % 0.3608
The Matrix 66.67 % 0.3811
Pearl Harbor 65.00 % 0.2907
Travel Destinations 70.53 % 0.4155
Cancun 64.41 % 0.4194
Puerto Vallarta 80.56 % 0.1447
All 74.39 % 0.5174
5 Discussion of Results
A natural question, given the preceding results, is
what makes movie reviews hard to classify? Table
6 shows that classification by the average SO tends
to err on the side of guessing that a review is not
recommended, when it is actually recommended.
This suggests the hypothesis that a good movie
will often contain unpleasant scenes (e.g., violence,
death, mayhem), and a recommended movie re-
view may thus have its average semantic orienta-
tion reduced if it contains descriptions of these un-
pleasant scenes. However, if we add a constant
value to the average SO of the movie reviews, to
compensate for this bias, the accuracy does not
improve. This suggests that, just as positive re-
views mention unpleasant things, so negative re-
views often mention pleasant scenes.
Table 6. The confusion matrix for movie classifications.
Author’ s Classification
Average
Semantic
Orientation
Thumbs
Up
Thumbs
Down
Sum
Positive 28.33 % 12.50 % 40.83 %
Negative 21.67 % 37.50 % 59.17 %
Sum 50.00 % 50.00 % 100.00 %
Table 7 shows some examples that lend support
to this hypothesis. For example, the phrase “ more
evil” does have negative connotations, thus an SO
of -4.384 is appropriate, but an evil character does
not make a bad movie. The difficulty with movie
reviews is that there are two aspects to a movie, the
events and actors in the movie (the elements of the
movie), and the style and art of the movie (the
movie as a gestalt; a unified whole). This is likely
also the explanation for the lower accuracy of the
Cancun reviews: good beaches do not necessarily
add up to a good vacation. On the other hand, good
automotive parts usually do add up to a good
automobile and good banking services add up to a
good bank. It is not clear how to address this issue.
Future work might look at whether it is possible to
tag sentences as discussing elements or wholes.
Another area for future work is to empirically
compare PMI-IR and the algorithm of Hatzivassi-
loglou and McKeown (1997). Although their algo-
rithm does not readily extend to two-word phrases,
I have not yet demonstrated that two-word phrases
are necessary for accurate classification of reviews.
On the other hand, it would be interesting to evalu-
ate PMI-IR on the collection of 1,336 hand-labeled
adjectives that were used in the experiments of
Hatzivassiloglou and McKeown (1997). A related
question for future work is the relationship of ac-
curacy of the estimation of semantic orientation at
the level of individual phrases to accuracy of re-
view classification. Since the review classification
is based on an average, it might be quite resistant
to noise in the SO estimate for individual phrases.
But it is possible that a better SO estimator could
produce significantly better classifications.
Table 7. Sample phrases from misclassified reviews.
Movie: The Matrix
Author’ s Rating: recommended (5 stars)
Average SO: -0.219 (not recommended)
Sample Phrase: more evil [RBR JJ]
SO of Sample
Phrase:
-4.384
Context of Sample
Phrase:
The slow, methodical way he
spoke. I loved it! It made him
seem more arrogant and even
more evil.
Movie: Pearl Harbor
Author’ s Rating: recommended (5 stars)
Average SO: -0.378 (not recommended)
Sample Phrase: sick feeling [JJ NN]
SO of Sample
Phrase:
-8.308
Context of Sample
Phrase:
During this period I had a sick
feeling, knowing what was
coming, knowing what was
part of our history.
Movie: The Matrix
Author’ s Rating: not recommended (2 stars)
Average SO: 0.177 (recommended)
Sample Phrase: very talented [RB JJ]
SO of Sample
Phrase:
1.992
Context of Sample
Phrase:
Well as usual Keanu Reeves is
nothing special, but surpris-
ingly, the very talented Laur-
ence Fishbourne is not so good
either, I was surprised.
Movie: Pearl Harbor
Author’ s Rating: not recommended (3 stars)
Average SO: 0.015 (recommended)
Sample Phrase: blue skies [JJ NNS]
SO of Sample
Phrase:
1.263
Context of Sample
Phrase:
Anyone who saw the trailer in
the theater over the course of
the last year will never forget
the images of Japanese war
planes swooping out of the
blue skies, flying past the
children playing baseball, or
the truly remarkable shot of a
bomb falling from an enemy
plane into the deck of the USS
Arizona.
Equation (3) is a very simple estimator of se-
mantic orientation. It might benefit from more so-
phisticated statistical analysis (Agresti, 1996). One
possibility is to apply a statistical significance test
to each estimated SO. There is a large statistical
literature on the log-odds ratio, which might lead
to improved results on this task.
This paper has focused on unsupervised classi-
fication, but average semantic orientation could be
supplemented by other features, in a supervised
classification system. The other features could be
based on the presence or absence of specific
words, as is common in most text classification
work. This could yield higher accuracies, but the
intent here was to study this one feature in isola-
tion, to simplify the analysis, before combining it
with other features.
Table 5 shows a high correlation between the
average semantic orientation and the star rating of
a review. I plan to experiment with ordinal classi-
fication of reviews in the five star rating system,
using the algorithm of Frank and Hall (2001). For
ordinal classification, the average semantic orienta-
tion would be supplemented with other features in
a supervised classification system.
A limitation of PMI-IR is the time required to
send queries to AltaVista. Inspection of Equation
(3) shows that it takes four queries to calculate the
semantic orientation of a phrase. However, I
cached all query results, and since there is no need
to recalculate hits(“ poor” ) and hits(“ excellent” ) for
every phrase, each phrase requires an average of
slightly less than two queries. As a courtesy to
AltaVista, I used a five second delay between que-
ries.8 The 410 reviews yielded 10,658 phrases, so
the total time required to process the corpus was
roughly 106,580 seconds, or about 30 hours.
This might appear to be a significant limitation,
but extrapolation of current trends in computer
memory capacity suggests that, in about ten years,
the average desktop computer will be able to easily
store and search AltaVista’ s 350 million Web
pages. This will reduce the processing time to less
than one second per review.
6 Applications
There are a variety of potential applications for
automated review rating. As mentioned in the in-
8 This line of research depends on the good will of the major
search engines. For a discussion of the ethics of Web robots,
see http://www.robotstxt.org/wc/robots.html. For query robots,
the proposed extended standard for robot exclusion would be
useful. See http://www.conman.org/people/spc/robots2.html.
troduction, one application is to provide summary
statistics for search engines. Given the query
“ Akumal travel review” , a search engine could re-
port, “ There are 5,000 hits, of which 80% are
thumbs up and 20% are thumbs down.” The search
results could be sorted by average semantic orien-
tation, so that the user could easily sample the most
extreme reviews. Similarly, a search engine could
allow the user to specify the topic and the rating of
the desired reviews (Hearst, 1992).
Preliminary experiments indicate that semantic
orientation is also useful for summarization of re-
views. A positive review could be summarized by
picking out the sentence with the highest positive
semantic orientation and a negative review could
be summarized by extracting the sentence with the
lowest negative semantic orientation.
Epinions asks its reviewers to provide a short
description of pros and cons for the reviewed item.
A pro/con summarizer could be evaluated by
measuring the overlap between the reviewer’ s pros
and cons and the phrases in the review that have
the most extreme semantic orientation.
Another potential application is filtering
“ flames” for newsgroups (Spertus, 1997). There
could be a threshold, such that a newsgroup mes-
sage is held for verification by the human modera-
tor when the semantic orientation of a phrase drops
below the threshold. A related use might be a tool
for helping academic referees when reviewing
journal and conference papers. Ideally, referees are
unbiased and objective, but sometimes their criti-
cism can be unintentionally harsh. It might be pos-
sible to highlight passages in a draft referee’ s
report, where the choice of words should be modi-
fied towards a more neutral tone.
Tong’ s (2001) system for detecting and track-
ing opinions in on-line discussions could benefit
from the use of a learning algorithm, instead of (or
in addition to) a hand-built lexicon. With auto-
mated review rating (opinion rating), advertisers
could track advertising campaigns, politicians
could track public opinion, reporters could track
public response to current events, stock traders
could track financial opinions, and trend analyzers
could track entertainment and technology trends.
7 Conclusions
This paper introduces a simple unsupervised learn-
ing algorithm for rating a review as thumbs up or
down. The algorithm has three steps: (1) extract
phrases containing adjectives or adverbs, (2) esti-
mate the semantic orientation of each phrase, and
(3) classify the review based on the average se-
mantic orientation of the phrases. The core of the
algorithm is the second step, which uses PMI-IR to
calculate semantic orientation (Turney, 2001).
In experiments with 410 reviews from Epin-
ions, the algorithm attains an average accuracy of
74%. It appears that movie reviews are difficult to
classify, because the whole is not necessarily the
sum of the parts; thus the accuracy on movie re-
views is about 66%. On the other hand, for banks
and automobiles, it seems that the whole is the sum
of the parts, and the accuracy is 80% to 84%.
Travel reviews are an intermediate case.
Previous work on determining the semantic ori-
entation of adjectives has used a complex algo-
rithm that does not readily extend beyond isolated
adjectives to adverbs or longer phrases (Hatzivassi-
loglou and McKeown, 1997). The simplicity of
PMI-IR may encourage further work with semantic
orientation.
The limitations of this work include the time
required for queries and, for some applications, the
level of accuracy that was achieved. The former
difficulty will be eliminated by progress in hard-
ware. The latter difficulty might be addressed by
using semantic orientation combined with other
features in a supervised classification algorithm.
Acknowledgements
Thanks to Joel Martin and Michael Littman for
helpful comments.
References
Agresti, A. 1996. An introduction to categorical data
analysis. New York: Wiley.
Brill, E. 1994. Some advances in transformation-based
part of speech tagging. Proceedings of the Twelfth
National Conference on Artificial Intelligence (pp.
722-727). Menlo Park, CA: AAAI Press.
Church, K.W., & Hanks, P. 1989. Word association
norms, mutual information and lexicography. Pro-
ceedings of the 27th Annual Conference of the ACL
(pp. 76-83). New Brunswick, NJ: ACL.
Frank, E., & Hall, M. 2001. A simple approach to ordi-
nal classification. Proceedings of the Twelfth Euro-
pean Conference on Machine Learning (pp. 145-
156). Berlin: Springer-Verlag.
Hatzivassiloglou, V., & McKeown, K.R. 1997. Predict-
ing the semantic orientation of adjectives. Proceed-
ings of the 35th Annual Meeting of the ACL and the
8th Conference of the European Chapter of the ACL
(pp. 174-181). New Brunswick, NJ: ACL.
Hatzivassiloglou, V., & Wiebe, J.M. 2000. Effects of
adjective orientation and gradability on sentence sub-
jectivity. Proceedings of 18th International Confer-
ence on Computational Linguistics. New Brunswick,
NJ: ACL.
Hearst, M.A. 1992. Direction-based text interpretation
as an information access refinement. In P. Jacobs
(Ed.), Text-Based Intelligent Systems: Current Re-
search and Practice in Information Extraction and
Retrieval. Mahwah, NJ: Lawrence Erlbaum Associ-
ates.
Landauer, T.K., & Dumais, S.T. 1997. A solution to
Plato’ s problem: The latent semantic analysis theory
of the acquisition, induction, and representation of
knowledge. Psychological Review, 104, 211-240.
Santorini, B. 1995. Part-of-Speech Tagging Guidelines
for the Penn Treebank Project (3rd revision, 2nd
printing). Technical Report, Department of Computer
and Information Science, University of Pennsylvania.
Spertus, E. 1997. Smokey: Automatic recognition of
hostile messages. Proceedings of the Conference on
Innovative Applications of Artificial Intelligence (pp.
1058-1065). Menlo Park, CA: AAAI Press.
Tong, R.M. 2001. An operational system for detecting
and tracking opinions in on-line discussions. Working
Notes of the ACM SIGIR 2001 Workshop on Opera-
tional Text Classification (pp. 1-6). New York, NY:
ACM.
Turney, P.D. 2001. Mining the Web for synonyms:
PMI-IR versus LSA on TOEFL. Proceedings of the
Twelfth European Conference on Machine Learning
(pp. 491-502). Berlin: Springer-Verlag.
Wiebe, J.M. 2000. Learning subjective adjectives from
corpora. Proceedings of the 17th National Confer-
ence on Artificial Intelligence. Menlo Park, CA:
AAAI Press.
Wiebe, J.M., Bruce, R., Bell, M., Martin, M., & Wilson,
T. 2001. A corpus study of evaluative and specula-
tive language. Proceedings of the Second ACL SIG
on Dialogue Workshop on Discourse and Dialogue.
Aalborg, Denmark.

More Related Content

Similar to P r e d i c t i n g t h e Semantic Orientation of A d j e c .docx

Limiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationLimiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationErnesto Jimenez Ruiz
 
Correlational research
Correlational researchCorrelational research
Correlational researchJijo G John
 
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxjeanettehully
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkarsachinudepurkar
 
2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdfs_p2000
 
Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 OcN Rabe
 
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsAutomatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsJeff Nelson
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
 
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEY
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEYEXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEY
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEYijnlc
 
Running head SELF-EXPERIMENT ASSIGNMENT 1.docx
Running head SELF-EXPERIMENT ASSIGNMENT                        1.docxRunning head SELF-EXPERIMENT ASSIGNMENT                        1.docx
Running head SELF-EXPERIMENT ASSIGNMENT 1.docxcharisellington63520
 
Data Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and CorrelationData Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and CorrelationJanet Penilla
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)Pedro Martinez
 
Inferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUInferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUEqraBaig
 
goodness of fit in ordinal regression.pdf
goodness of fit in ordinal regression.pdfgoodness of fit in ordinal regression.pdf
goodness of fit in ordinal regression.pdfHenokBuno
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
 
Day 10 prediction and regression
Day 10 prediction and regressionDay 10 prediction and regression
Day 10 prediction and regressionElih Sutisna Yanto
 

Similar to P r e d i c t i n g t h e Semantic Orientation of A d j e c .docx (20)

Biva riate analysis pdf
Biva riate analysis pdfBiva riate analysis pdf
Biva riate analysis pdf
 
Limiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through NegotiationLimiting Logical Violations in Ontology Alignnment Through Negotiation
Limiting Logical Violations in Ontology Alignnment Through Negotiation
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
 
2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf
 
Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 Oc
 
Correltional research
Correltional researchCorreltional research
Correltional research
 
Ch 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regressionCh 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regression
 
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsAutomatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...
 
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEY
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEYEXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEY
EXTRACTION OF HYPONYMY, MERONYMY, AND ANTONYMY RELATION PAIRS: A BRIEF SURVEY
 
Running head SELF-EXPERIMENT ASSIGNMENT 1.docx
Running head SELF-EXPERIMENT ASSIGNMENT                        1.docxRunning head SELF-EXPERIMENT ASSIGNMENT                        1.docx
Running head SELF-EXPERIMENT ASSIGNMENT 1.docx
 
Data Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and CorrelationData Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and Correlation
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)
 
Inferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUInferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOU
 
goodness of fit in ordinal regression.pdf
goodness of fit in ordinal regression.pdfgoodness of fit in ordinal regression.pdf
goodness of fit in ordinal regression.pdf
 
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
 
Day 10 prediction and regression
Day 10 prediction and regressionDay 10 prediction and regression
Day 10 prediction and regression
 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 

More from gerardkortney

· Describe strategies to build rapport with inmates and offenders .docx
· Describe strategies to build rapport with inmates and offenders .docx· Describe strategies to build rapport with inmates and offenders .docx
· Describe strategies to build rapport with inmates and offenders .docxgerardkortney
 
· Debates continue regarding what constitutes an appropriate rol.docx
· Debates continue regarding what constitutes an appropriate rol.docx· Debates continue regarding what constitutes an appropriate rol.docx
· Debates continue regarding what constitutes an appropriate rol.docxgerardkortney
 
· Critical thinking paper ·  ·  · 1. A case study..docx
· Critical thinking paper ·  ·  · 1. A case study..docx· Critical thinking paper ·  ·  · 1. A case study..docx
· Critical thinking paper ·  ·  · 1. A case study..docxgerardkortney
 
· Create a Press Release for your event - refer to slide 24 in thi.docx
· Create a Press Release for your event - refer to slide 24 in thi.docx· Create a Press Release for your event - refer to slide 24 in thi.docx
· Create a Press Release for your event - refer to slide 24 in thi.docxgerardkortney
 
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docxgerardkortney
 
· Complete the following problems from your textbook· Pages 378.docx
· Complete the following problems from your textbook· Pages 378.docx· Complete the following problems from your textbook· Pages 378.docx
· Complete the following problems from your textbook· Pages 378.docxgerardkortney
 
· Consider how different countries approach aging. As you consid.docx
· Consider how different countries approach aging. As you consid.docx· Consider how different countries approach aging. As you consid.docx
· Consider how different countries approach aging. As you consid.docxgerardkortney
 
· Clarifying some things on the Revolution I am going to say som.docx
· Clarifying some things on the Revolution I am going to say som.docx· Clarifying some things on the Revolution I am going to say som.docx
· Clarifying some things on the Revolution I am going to say som.docxgerardkortney
 
· Chapter 9 – Review the section on Establishing a Security Cultur.docx
· Chapter 9 – Review the section on Establishing a Security Cultur.docx· Chapter 9 – Review the section on Establishing a Security Cultur.docx
· Chapter 9 – Review the section on Establishing a Security Cultur.docxgerardkortney
 
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docxgerardkortney
 
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docxgerardkortney
 
· Chap 2 and 3· what barriers are there in terms of the inter.docx
· Chap 2 and  3· what barriers are there in terms of the inter.docx· Chap 2 and  3· what barriers are there in terms of the inter.docx
· Chap 2 and 3· what barriers are there in terms of the inter.docxgerardkortney
 
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docxgerardkortney
 
· Briefly describe the technologies that are leading businesses in.docx
· Briefly describe the technologies that are leading businesses in.docx· Briefly describe the technologies that are leading businesses in.docx
· Briefly describe the technologies that are leading businesses in.docxgerardkortney
 
· Assignment List· My Personality Theory Paper (Week Four)My.docx
· Assignment List· My Personality Theory Paper (Week Four)My.docx· Assignment List· My Personality Theory Paper (Week Four)My.docx
· Assignment List· My Personality Theory Paper (Week Four)My.docxgerardkortney
 
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docxgerardkortney
 
· Assignment 3 Creating a Compelling VisionLeaders today must be .docx
· Assignment 3 Creating a Compelling VisionLeaders today must be .docx· Assignment 3 Creating a Compelling VisionLeaders today must be .docx
· Assignment 3 Creating a Compelling VisionLeaders today must be .docxgerardkortney
 
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docxgerardkortney
 
· Assignment 2 Leader ProfileMany argue that the single largest v.docx
· Assignment 2 Leader ProfileMany argue that the single largest v.docx· Assignment 2 Leader ProfileMany argue that the single largest v.docx
· Assignment 2 Leader ProfileMany argue that the single largest v.docxgerardkortney
 
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docxgerardkortney
 

More from gerardkortney (20)

· Describe strategies to build rapport with inmates and offenders .docx
· Describe strategies to build rapport with inmates and offenders .docx· Describe strategies to build rapport with inmates and offenders .docx
· Describe strategies to build rapport with inmates and offenders .docx
 
· Debates continue regarding what constitutes an appropriate rol.docx
· Debates continue regarding what constitutes an appropriate rol.docx· Debates continue regarding what constitutes an appropriate rol.docx
· Debates continue regarding what constitutes an appropriate rol.docx
 
· Critical thinking paper ·  ·  · 1. A case study..docx
· Critical thinking paper ·  ·  · 1. A case study..docx· Critical thinking paper ·  ·  · 1. A case study..docx
· Critical thinking paper ·  ·  · 1. A case study..docx
 
· Create a Press Release for your event - refer to slide 24 in thi.docx
· Create a Press Release for your event - refer to slide 24 in thi.docx· Create a Press Release for your event - refer to slide 24 in thi.docx
· Create a Press Release for your event - refer to slide 24 in thi.docx
 
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx
· Coronel & Morris Chapter 7, Problems 1, 2 and 3.docx
 
· Complete the following problems from your textbook· Pages 378.docx
· Complete the following problems from your textbook· Pages 378.docx· Complete the following problems from your textbook· Pages 378.docx
· Complete the following problems from your textbook· Pages 378.docx
 
· Consider how different countries approach aging. As you consid.docx
· Consider how different countries approach aging. As you consid.docx· Consider how different countries approach aging. As you consid.docx
· Consider how different countries approach aging. As you consid.docx
 
· Clarifying some things on the Revolution I am going to say som.docx
· Clarifying some things on the Revolution I am going to say som.docx· Clarifying some things on the Revolution I am going to say som.docx
· Clarifying some things on the Revolution I am going to say som.docx
 
· Chapter 9 – Review the section on Establishing a Security Cultur.docx
· Chapter 9 – Review the section on Establishing a Security Cultur.docx· Chapter 9 – Review the section on Establishing a Security Cultur.docx
· Chapter 9 – Review the section on Establishing a Security Cultur.docx
 
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx
· Chapter 10 The Early Elementary Grades 1-3The primary grades.docx
 
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx
· Chapter 5, Formulating the Research Design”· Section 5.2, Ch.docx
 
· Chap 2 and 3· what barriers are there in terms of the inter.docx
· Chap 2 and  3· what barriers are there in terms of the inter.docx· Chap 2 and  3· what barriers are there in terms of the inter.docx
· Chap 2 and 3· what barriers are there in terms of the inter.docx
 
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx
· Case Study 2 Improving E-Mail Marketing ResponseDue Week 8 an.docx
 
· Briefly describe the technologies that are leading businesses in.docx
· Briefly describe the technologies that are leading businesses in.docx· Briefly describe the technologies that are leading businesses in.docx
· Briefly describe the technologies that are leading businesses in.docx
 
· Assignment List· My Personality Theory Paper (Week Four)My.docx
· Assignment List· My Personality Theory Paper (Week Four)My.docx· Assignment List· My Personality Theory Paper (Week Four)My.docx
· Assignment List· My Personality Theory Paper (Week Four)My.docx
 
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx
· Assignment List· Week 7 - Philosophical EssayWeek 7 - Philos.docx
 
· Assignment 3 Creating a Compelling VisionLeaders today must be .docx
· Assignment 3 Creating a Compelling VisionLeaders today must be .docx· Assignment 3 Creating a Compelling VisionLeaders today must be .docx
· Assignment 3 Creating a Compelling VisionLeaders today must be .docx
 
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx
· Assignment 4· Week 4 – Assignment Explain Theoretical Perspec.docx
 
· Assignment 2 Leader ProfileMany argue that the single largest v.docx
· Assignment 2 Leader ProfileMany argue that the single largest v.docx· Assignment 2 Leader ProfileMany argue that the single largest v.docx
· Assignment 2 Leader ProfileMany argue that the single largest v.docx
 
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx
· Assignment 1 Diversity Issues in Treating AddictionThe comple.docx
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

P r e d i c t i n g t h e Semantic Orientation of A d j e c .docx

  • 1. P r e d i c t i n g t h e Semantic Orientation of A d j e c t i v e s V a s i l e i o s H a t z i v a s s i l o g l o u a n d K a t h l e e n R . M c K e o w n D e p a r t m e n t o f C o m p u t e r S c i e n c e 450 C o m p u t e r S c i e n c e B u i l d i n g C o l u m b i a U n i v e r s i t y N e w Y o r k , N . Y . 1 0 0 2 7 , U S A { v h , kathy)©cs, columbia, edu Abstract We identify and validate from a large cor- pus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev- ing 82% accuracy in this task when each conjunction is considered independently. Combining the constraints across m a n y ad- jectives, a clustering algorithm separates the adjectives into groups of different orien- tations, and finally, adjectives are labeled positive or negative. Evaluations on real
  • 2. d a t a and simulation experiments indicate high levels of performance: classification precision is more t h a n 90% for adjectives t h a t occur in a modest number of conjunc- tions in the corpus. 1 I n t r o d u c t i o n T h e semantic orientation or polarity of a word indi- cates the direction the word deviates from the norm for its semantic group or lezical field (Lehrer, 1974). It also constrains the word's usage in the language (Lyons, 1977), due to its evaluative characteristics (Battistella, 1990). For example, some nearly syn- onymous words differ in orientation because one im- plies desirability and the other does not (e.g., sim- ple versus simplisfic). In linguistic constructs such as conjunctions, which impose constraints on the se- mantic orientation of their arguments (Anscombre and Ducrot, 1983; Elhadad and McKeown, 1990), the choices of arguments and connective are mutu- ally constrained, as illustrated by: T h e t a x proposal was simple and well-received } simplistic but well-received *simplistic and well-received by the public. In addition, almost all antonyms have different se- mantic orientations3 If we know t h a t two words relate to the same property (for example, members of the same scalar group such as hot and cold) but have different orientations, we can usually infer t h a t
  • 3. they are antonyms. Given t h a t semantically similar words can be identified automatically on the basis of distributional properties and linguistic cues (Brown et al., 1992; Pereira et al., 1993; Hatzivassiloglou and McKeown, 1993), identifying the semantic orienta- tion of words would allow a system to fu rt h er refine the retrieved semantic similarity relationships, ex- tracting antonyms. Unfortunately, dictionaries and similar sources (theusari, WordNet (Miller et al., 1990)) do not in- clude semantic orientation information. 2 Explicit links between an t o n y m s and synonyms m a y also be lacking, particularly when they depend on the do- main of discourse; for example, the opposition bear- bull appears only in stock market reports, where the two words take specialized meanings. In this paper, we present and evaluate a m e t h o d t h a t automatically retrieves semantic orientation in- fo rm at i o n using indirect information collected from a large corpus. Because the m e t h o d relies on the cor- pus, it extracts domain-dependent i n f o r m a t i o n and automatically adapts to a new d o m ai n when the cor- pus is changed. Our m e t h o d achieves high preci- sion (more t h an 90%), and, while our focus to date has been on adjectives, it can be directly applied to other word classes. Ultimately, our goal is to use this m e t h o d in a larger system to a u t o m a t i c a l l y identify an t o n y m s and distinguish near synonyms. 2 O v e r v i e w o f O u r A p p r o a c h Our approach relies on an analysis of t e x t u a l corpora t h a t correlates linguistic features, or indicators, with
  • 4. 1 Exceptions include a small number of terms that are both negative from a pragmatic viewpoint and yet stand in all antonymic relationship; such terms frequently lex- icalize two unwanted extremes, e.g., verbose-terse. 2 Except implicitly, in the form of definitions and us- age examples. 174 semantic orientation. While no direct indicators of positive or negative semantic orientation have been proposed 3, we demonstrate t h a t conjunctions be- tween adjectives provide indirect information ab o u t orientation. For most connectives, the conjoined ad- jectives usually are of the same orientation: compare f a i r and legitimate and corrupt and brutal which ac- tually occur in our corpus, with ~fair and brutal and *corrupt and legitimate (or the other cross-products of the above conjunctions) which are semantically anomalous. T h e situation is reversed for but, which usually connects two adjectives of different orienta- tions. T h e system identifies and uses this indirect infor- m a t i o n in the following stages: 1. All conjunctions of adjectives are extracted from the corpus along with relevant morpho- logical relations. 2. A log-linear regression model combines informa- tion from different conjunctions to determine if each two conjoined adjectives are of same
  • 5. or different orientation. T h e result is a graph with hypothesized same- or different-orientation links between adjectives. 3. A clustering algorithm separates the adjectives into two subsets of different orientation. It places as many words of same orientation as possible into the same subset. 4. T h e average frequencies in each group are com- pared and the group with the higher frequency is labeled as positive. In the following sections, we first present the set of adjectives used for training and evaluation. We next validate our hypothesis that conjunctions con- strain the orientation of conjoined adjectives and then describe the remaining three steps of the algo- rithm. After presenting our results and evaluation, we discuss simulation experiments that show how our m e t h o d performs under different conditions of sparseness of data. 3 D a t a C o l l e c t i o n For our experiments, we use the 21 million word 1987 Wall Street Journal corpus 4, automatically an- n o t a t e d with part-of-speech tags using the PARTS tagger (Church, 1988). In order to verify our hypothesis about the ori- entations of conjoined adjectives, and also to train and evaluate our subsequent algorithms, we need a 3Certain words inflected with negative affixes (such as in- or un-) tend to be mostly negative, but this rule
  • 6. applies only to a fraction of the negative words. Further- more, there are words so inflected which have positive orientation, e.g., independent and unbiased. 4Available form the ACL Data Collection Initiative as CD ROM 1. Positive: adequate central clever famous intelligent remarkable reputed sensitive slender thriving Negative: contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting Figure 1: Randomly selected adjectives with positive and negative orientations. set of adjectives with predetermined orientation la- bels. We constructed this set by taking all adjectives appearing in our corpus 20 times or more, then re- moving adjectives t h at have no orientation. These are typically members of groups of complementary, qualitative terms (Lyons, 1977), e.g., domestic or medical. We then assigned an orientation label (either + or - ) to each adjective, using an evaluative approach. Th e criterion was whether the use of this adjective ascribes in general a positive or negative quality to the modified item, making it b e t t e r or worse t h a n a similar unmodified item. We were unable to reach a unique label out of context for several adjectives which we removed from consideration; for example, cheap is positive if it is used as a sy n o n y m of in- expensive, but negative if it implies inferior quality.
  • 7. T h e operations of selecting adjectives and assigning labels were performed before testing our conjunction hypothesis or implementing any other algorithms, to avoid any influence on our labels. T h e final set con- tained 1,336 adjectives (657 positive and 679 nega- tive terms). Figure 1 shows randomly selected terms from this set. To further validate our set of labeled adjectives, we subsequently asked four people to independently label a randomly drawn sample of 500 of these adjectives. T h e y agreed with us t h a t the posi- tive/negative concept applies to 89.15% of these ad- jectives on average. For the adjectives where a pos- itive or negative label was assigned by b o t h us and the independent evaluators, the average agreement on the label was 97.38%. Th e average inter-reviewer agreement on labeled adjectives was 96.97%. These results are extremely significant statistically and compare favorably with validation studies performed for other tasks (e.g., sense disambiguation) in the past. T h e y show t h a t positive and negative orien- tation are objective properties t h a t can be reliably determined by humans. To extract conjunctions between adjectives, we used a two-level finite-state grammar, which covers complex modification patterns and noun-adjective apposition. Running this parser on the 21 mil- lion word corpus, we collected 13,426 conjunctions of adjectives, expanding to a t o t al o f 15,431 con- joined adjective pairs. After morphological trans- 175
  • 8. Conjunction category Conjunction types analyzed All appositive and conjunctions All conjunctions 2,748 All and conjunctions 2,294 All or conjunctions 305 All but conjunctions 214 All attributive and conjunctions 1,077 All predicative and conjunctions 860 30 % same- orientation (types) 77.84% 81.73% 77.05% 30.84% 80.04% 84.77% 70.00% % same- orientation (tokens) 72.39%
  • 9. 78.07% 60.97% 25.94% 76.82% 84.54% 63.64% P-Value (for types) < i • i0 -I~ < 1 • 10 -1~ < 1 • 10 -1~ 2 . 0 9 . 1 0 -:~ < 1. i 0 -16 < 1. i 0 -I~ 0.04277 Table 1: Validation of our conjunction hypothesis. T h e P- value is the probability t h a t similar extreme results would have been obtained if same- and different-orientation conjunction types were equally distributed. or more actually formations, the remaining 15,048 conjunction tokens involve 9,296 distinct pairs of conjoined adjectives (types). Each conjunction token is classified by the parser according to three variables: the conjunction used (and, or, bu~, either-or, or neither-nor), the type of modification (attributive, predicative, appos- itive, resultative), and the number of the modified noun (singular or plural).
  • 10. 4 V a l i d a t i o n o f the Conjunction H y p o t h e s i s Using the three attributes extracted by the parser, we constructed a cross-classification of the conjunc- tions in a three-way table. We counted types and to- kens of each conjoined pair t h a t had b o t h members in t h e set o f pre-selected labeled adjectives discussed above; 2,748 (29.56%) of all conjoined pairs (types) and 4,024 (26.74%) of all conjunction occurrences (tokens) met this criterion. We augmented this ta- ble with marginal totals, arriving at 90 categories, each of which represents a triplet of a t t r i b u t e values, possibly with one or more "don't care" elements. We then measured the percentage of conjunctions in each category with adjectives of same or differ- ent orientations. Under the null hypothesis of same proportions of adjective pairs (types) of same and different orientation in a given category, the num- ber of same- or different-orientation pairs follows a binomial distribution with p = 0.5 (Conover, 1980). We show in Table 1 the results for several repre- sentative categories, and summarize all results be- low: • Our conjunction hypothesis is validated overall and for almost all individual cases. T h e results are extremely significant statistically, except for a few cases where the sample is small. • Aside from the use of but with adjectives of different orientations, there are, rather surpris- ingly, small differences in the behavior of con-
  • 11. junctions between linguistic environments (as represented by the three attributes). Th ere are a few exceptions, e.g., appositive and conjunc- tions modifying plural nouns are evenly split between same and different orientation. But in these exceptional cases the sample is very small, and the observed behavior m a y be due to chance. • Further analysis of different-orientation pairs in conjunctions other t h an but shows t h a t con- joined antonyms are far more frequent t h a n ex- pected by chance, in agreement with (Justeson and Katz, 1991). 5 P r e d i c t i o n o f L i n k T y p e T h e analysis in the previous section suggests a base- line m e t h o d for classifying links between adjectives: since 77.84% of all links from conjunctions indicate same orientation, we can achieve this level of perfor- mance by always guessing t h a t a link is of the same- orientation type. However, we can improve perfor- mance by noting t h a t conjunctions using but exhibit the opposite pattern, usually involving adjectives of different orientations. Thus, a revised b u t still sim- ple rule predicts a different-orientation link if the two adjectives have been seen in a but conjunction, and a same-orientation link otherwise, assuming the two adjectives were seen connected by at least one conjunction. Morphological relationships between adjectives al- so play a role. Adjectives related in form (e.g., ade- quate-inadequate or thoughtful-thoughtless) almost
  • 12. always have different semantic orientations. We im- plemented a morphological analyzer which matches adjectives related in this manner. T h i s process is highly accurate, b u t unfortunately does not apply to m an y of the possible pairs: in our set of 1,336 labeled adjectives (891,780 possible pairs), 102 pairs are morphologically related; among them, 99 are of different orientation, yielding 97.06% accuracy for the morphology method. This information is orthog- onal to t h a t extracted from conjunctions: only 12 of the 102 morphologically related pairs have been observed in conjunctions in our corpus. Thus, we 176 Prediction m e t h o d Always predict same orientation But rule Log-linear model Morphology used? N o Yes N o Yes Accuracy on reported
  • 13. same-orientation links 77.84% 78.18% 81.81% 82.20% No Yes 81.53% 82.00% Accuracy on reported different-orientation links 97.06% 69.16% 78.16% 73.70% 82.44% Table 2: Accuracy of several link prediction models. Overall accuracy 77.84% 78.86% 80.82% 81.75% 80.97% 82.05% add to the predictions made from conjunctions the different-orientation links suggested by morphologi- cal relationships.
  • 14. We improve the accuracy of classifying links de- rived from conjunctions as same or different orienta- tion with a log-linear regression model (Santner and Duffy, 1989), exploiting the differences between the various conjunction categories. This is a generalized linear model (McCullagh and Nelder, 1989) with a linear predictor = w W x where x is the vector of the observed counts in the various conjunction categories for the particular ad- jective pair we try to classify and w is a vector of weights to be learned during training. T h e response y is non-linearly related to r/ through the inverse logit function, e0 Y = l q-e" Note t h a t y E (0, 1), with each of these endpoints associated with one of the possible outcomes. We have 90 possible predictor variables, 42 of which are linearly independent. Since using all the 42 independent predictors invites overfitting (Duda and Hart, 1973), we have investigated subsets of the full log-linear model for our d a t a using the m e t h o d o f iterative stepwise refinement: starting with an ini- tial model, variables are added or dropped if their contribution to the reduction or increase of the resid- ual deviance compares favorably to the resulting loss or gain of residual degrees of freedom. This process led to the selection of nine predictor variables.
  • 15. We evaluated the three prediction models dis- cussed above with and without the secondary source of morphology relations. For the log-linear model, we repeatedly partitioned our d a t a into equally sized training and testing sets, estimated the weights on the training set, and scored the model's performance on the testing set, averaging the resulting scores. 5 Table 2 shows the results of these analyses. Al- though the log-linear model offers only a small im- provement on pair classification than the simpler but prediction rule, it confers the i m p o r t a n t advantage 5When morphology is to be used as a supplementary predictor, we remove the morphologically related pairs from the training and testing sets. of rating each prediction between 0 and 1. We make extensive use of this in the next phase of our algo- rithm. 6 Finding Groups of Same-Oriented Adjectives T h e third phase of our m et h o d assigns the adjectives into groups, placing adjectives of the same (but un- known) orientation in the same group. Each pair of adjectives has an associated dissimilarity value between 0 and 1; adjectives connected by same- orientation links have low dissimilarities, and con- versely, different-orientation links result in high dis- similarities. Adjective pairs with no connecting links are assigned the neutral dissimilarity 0.5. T h e baseline and but methods make qualitative distinctions only (i.e., same-orientation, different-
  • 16. orientation, or unknown); for them, we define dis- similarity for same-orientation links as one minus the probability t h a t such a classification link is cor- rect and dissimilarity for different-orientation links as the probability t h a t such a classification is cor- rect. These probabilities are estimated from sep- arate training data. Note t h at for these prediction models, dissimilarities are identical for similarly clas- sifted links. T h e log-linear model, on the other hand, offers an estimate of how good each prediction is, since it produces a value y between 0 and 1. We construct the model so t h a t 1 corresponds to same-orientation, and define dissimilarity as one minus the produced value. Same and different-orientation links between ad- jectives form a graph. To partition the graph nodes into subsets of the same orientation, we employ an iterative optimization procedure on each connected component, based on the exchange method, a non- hierarchical clustering algorithm (Spgth, 1985). We define an objective/unction ~ scoring each possible partition 7 ) of the adjectives into two subgroups C1 and C2 as i=1 x,y E Ci 177 Number of adjectives in
  • 17. test set ([An[) 2 730 3 516 4 369 5 236 Number of links in test set ([L~[) 2,568 2,159 1,742 1,238 Average number o f l i n k s f o r each adjective 7.04 8.37 9.44 10.49 Accuracy 78.08% 82.56% 87.26% 92.37% Ratio o f average group frequencies
  • 18. 1.8699 1.9235 L3486 1.4040 Table 3: Evaluation of the adjective classification and labeling methods. where [Cil stands for the cardinality of cluster i, and d(z, y) is the dissimilarity between adjectives z and y. We want to select the partition :Pmin t h a t min- imizes ~, subject to the additional constraint t h a t for each adjective z in a cluster C, 1 1 I C l - 1 d(=,y) < --IVl d(=, y) (1) where C is the complement of cluster C, i.e., the other member of the partition. This constraint, based on Rousseeuw's (1987) s=lhoue~es, helps cor- rect wrong cluster assignments. To find Pmin, we first construct a r a n d o m parti- tion of the adjectives, then locate the adjective t h a t will most reduce the objective function if it is moved from its current cluster. We move this adjective and proceed with the next iteration until no movements can improve the objective function. At the final it- eration, the cluster assignment of any adjective t h a t violates constraint (1) is changed. This is a steepest- descent hill-climbing method, and thus is guaran- teed to converge. However, it will in general find a local minimum rather than the global one; the prob- lem is NP-complete (Garey and $ohnson, 1979). We can arbitrarily increase the probability of finding the
  • 19. globally optimal solution by repeatedly running the algorithm with different starting partitions. 7 L a b e l i n g t h e C l u s t e r s a s P o s i t i v e o r N e g a t i v e T h e clustering algorithm separates each component of the graph into two groups of adjectives, b u t does not actually label the adjectives as positive or neg- ative. To accomplish that, we use a simple criterion t h a t applies only to pairs or groups of words o f oppo- site orientation. We have previously shown (Hatzi- vassiloglou and McKeown, 1995) t h a t in oppositions of gradable adjectives where one member is semanti- cally unmarked, the unmarked member is the most frequent one about 81% of the time. This is relevant to our task because semantic markedness exhibits a strong correlation with orientation, the unmarked m e m b e r almost always having positive orientation (Lehrer, 1985; Battistella, 1990). We compute the average frequency of the words in each group, expecting the group with higher av- erage frequency to contain the positive terms. This aggregation operation increases the precision of the labeling dramatically since indicators for m a n y pairs of words are combined, even when some of the words are incorrectly assigned to their group. 8 R e s u l t s a n d E v a l u a t i o n Since graph connectivity affects performance, we de- vised a m e t h o d of selecting test sets t h a t makes this dependence explicit. Note t h at the g rap h density is
  • 20. largely a function of corpus size, and thus can be increased by adding more data. Nevertheless, we report results on sparser test sets to show how our algorithm scales up. We separated our sets of adjectives A (containing 1,336 adjectives) and conjunction- and morphology- based links L (containing 2,838 links) into training and testing groups by selecting, for several values o f the p a r a m e t e r a, the maximal subset of A, An, which includes an adjective z if and only if there exist at least a links from L between x and other elements of An. This operation in t u r n defines a subset of L, L~, which includes all links between members of An. We train our log-linear model on L - La (excluding links between morphologically re- lated adjectives), compute predictions and dissimi- larities for the links in L~, and use these to classify and label the adjectives in An. c~ must be at least 2, since we need to leave some links for training. Table 3 shows the results of these experiments for a = 2 to 5. Our m e t h o d produced the correct clas- sification between 78% of the time on the sparsest test set up to more than 92% of the time when a higher number of links was present. Moreover, in all cases, the ratio of the two group frequencies correctly identified the positive subgroup. These results are extremely significant statistically (P-value less t h a n 10 -16 ) when compared with the baseline m e t h o d of ran d o m l y assigning orientations to adjectives, or the baseline m e t h o d of always predicting the most fre- quent (for types) category (50.82% o f the adjectives in our collection are classified as negative). Figure 2 shows some of the adjectives in set A4 and their clas- sifications.
  • 21. 178 Classified as positive: b o l d d e c i s i v e disturbing generous good honest important large mature patient peaceful positive proud sound stimulating s t r a i g h t f o r w a r d strange talented vigorous witty Classified as negative: ambiguous cautious cynical evasive harmful hypocritical inefficient insecure i r r a t i o n a l irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful Figure 2: Sample retrieved classifications of adjec- tives from set A4. Correctly m a t c h e d adjectives are shown in bold. 9 G r a p h C o n n e c t i v i t y a n d P e r f o r m a n c e A strong point of our m e t h o d is t h a t decisions on individual words are aggregated to provide decisions on how to group words into a class and whether to label the class as positive or negative. Thus, the overall result can be much more accurate t h a n the individual indicators. To verify this, we ran a series o f simulation experiments. Each experiment m e a - sures how our algorithm performs for a given level o f precision P for identifying links and a given av-
  • 22. erage n u m b e r of links k for each word. T h e goal is to show t h a t even when P is low, given enough d a t a (i.e., high k), we can achieve high performance for the grouping. As we noted earlier, the corpus d a t a is eventually represented in our s y s t e m as a graph, with the nodes corresponding to adjectives and the links to predic- tions a b o u t whether the two connected adjectives have the s a m e or different orientation. Thus the pa- r a m e t e r P in the simulation experiments measures how well we are able to predict each link indepen- dently of the others, and the p a r a m e t e r k measures the n u m b e r of distinct adjectives each adjective ap- pears with in conjunctions. P therefore directly rep- resents the precision of the link classification algo- r i t h m , while k indirectly represents the corpus size. To m e a s u r e the effect of P and k (which are re- flected in the g r a p h topology), we need to carry out a series of experiments where we systematically vary their values. For example, as k (or the a m o u n t of d a t a ) increases for a given level of precision P for in- dividual links, we want to measure how this affects overall accuracy of the resulting groups of nodes. T h u s , we need to construct a series of d a t a sets, or graphs, which represent different scenarios cor- responding to a given combination of values of P a n d k. To do this, we construct a r a n d o m g r a p h by r a n d o m l y assigning 50 nodes to the two possible orientations. Because we d o n ' t have frequency and m o r p h o l o g y information on these a b s t r a c t nodes, we cannot predict whether two nodes are o f the s a m e or different orientation. Rather, we r a n d o m l y as- sign links between nodes so t h a t , on average, each
  • 23. node participates in k links and 100 x P % o f all links connect nodes of the s a m e orientation. T h e n we consider these links as identified by the link pre- diction algorithm as connecting two nodes with the s a m e orientation (so t h a t 100 x P % of these pre- dictions will be correct). This is equivalent to the baseline link classification m e t h o d , and provides a lower bound on the performance of the a l g o r i t h m actually used in our system (Section 5). Because of the lack of actual m e a s u r e m e n t s such as frequency on these abstract nodes, we also de- couple the partitioning and labeling c o m p o n e n t s of our s y s t e m and score the partition found under the best m a t c h i n g conditions for the actual labels. T h u s the simulation measures only how well the s y s t e m separates positive f r o m negative adjectives, not how well it determines which is which. However, in all the experiments performed on real corpus d a t a (Sec- tion 8), the s y s t e m correctly found the labels o f the groups; any misclassifications came f r o m misplacing an adjective in the wrong group. T h e whole proce- dure of constructing the r a n d o m g r a p h and finding and scoring the groups is repeated 200 times for any given combination of P and k, and the results are averaged, thus avoiding accidentally evaluating our s y s t e m on a g r a p h t h a t is not truly representative of graphs with the given P and k. We observe (Figure 3) t h a t even for relatively low t9, our ability to correctly classify the nodes ap- proaches very high levels with a m o d e s t n u m b e r of links. For P = 0.8, we need only a b o u t ? links per adjective for classification performance over 90% and only 12 links per adjective for p e r f o r m a n c e over 99%. s T h e difference between low and high values
  • 24. of P is in the rate at which increasing d a t a increases overall precision. These results are s o m e w h a t m o r e optimistic t h a n those obtained with real d a t a (Sec- tion 8), a difference which is p r o b a b l y due to the uni- f o r m distributional assumptions in the simulation. Nevertheless, we expect the trends to be similar to the ones shown in Figure 3 and the results of T a b l e 3 on real d a t a support this expectation. 1 0 C o n c l u s i o n a n d F u t u r e W o r k We have proposed and verified from corpus d a t a con- straints on the semantic orientations of conjoined ad- jectives. We used these constraints to a u t o m a t i c a l l y construct a log-linear regression model, which, com- bined with s u p p l e m e n t a r y m o r p h o l o g y rules, pre- dicts whether two conjoined adjectives are o f s a m e 812 links per adjective for a set of n adjectives requires 6n conjunctions between the n adjectives in the corpus. 179 ~ 75' 70. 65. 60" 55- 50 ~
  • 25. 0 i 2 ~ 4 5 6 7 8 9 1 ( ) 1'2 14 16 18 20 Avem0e neiohbo~ per node (a) P = 0.75 25 30 32.77 95. 90. 85. ~75' Average neighbors per node ( b ) P = 0.8 ,~ 70 65 6O 5,5 50 Average netghbo~ per node (c) P = 0.85 25 28.64
  • 26. Figure 3: Simulation results obtained on 50 nodes. 10( 95 9O 85 P ~ 7o 55 Average neighb0m per node (d) P = 0.9 In each figure, the last z coordinate indicates the (average) m a x i m u m possible value of k for this P , and the d o t t ed line shows the performance o f a r a n d o m classifier. or different orientation with 82% accuracy. We then classified several sets of adjectives according to the links inferred in this way and labeled them as posi- tive or negative, obtaining 92% accuracy on the clas- sification task for reasonably dense graphs and 100% accuracy on the labeling task. Simulation experi- ments establish t h a t very high levels of performance can be obtained with a modest number of links per word, even when the links themselves are not always correctly classified. As p a r t of our clustering algorithm's output, a
  • 27. "goodness-of-fit" measure for each word is com- puted, based on Rousseeuw's (1987) silhouettes. T h i s measure ranks the words according to how well t h e y fit in their group, and can thus be used as a quantitative measure of orientation, refining the binary positive-negative distinction. By restricting the labeling decisions to words with high values of this measure we can also increase the precision of our system, at the cost of sacrificing some coverage. We are currently combining the o u t p u t of this sys- t e m with a semantic group finding system so t h a t we can automatically identify a n t o n y m s from the cor- pus, without access to any semantic descriptions. T h e learned semantic categorization of the adjec- tives can also be used in the reverse direction, to help in interpreting the conjunctions they partici- pate. We will also extend our analyses to nouns and verbs. A c k n o w l e d g e m e n t s Th i s work was supported in p art by the Office of Naval Research under grant N00014-95-1-0745, j o i n t l y by the Office of Naval Research and the Advanced Research Projects Agency under grant N00014-89-J-1782, by the National Science Founda- 180 tion under grant GER-90-24069, and by the New York State Center for Advanced Technology un- der contracts NYSSTF-CAT(95)-013 and NYSSTF-
  • 28. CAT(96)-013. We thank Ken Church and the AT&T Bell Laboratories for making the PARTS part-of-speech tagger available to us. We also thank Dragomir Radev, Eric Siegel, and Gregory Sean McKinley who provided models for the categoriza- tion of the adjectives in our training and testing sets as positive and negative. R e f e r e n c e s Jean-Claude Anscombre and Oswald Ducrot. 1983. L ' Argumentation dans la Langue. Philosophic et Langage. Pierre Mardaga, Brussels, Belgium. Edwin L. Battistella. 1990. Markedness: The Eval- uative Superstructure of Language. State Univer- sity of New York Press, Albany, New York. Peter F. Brown, Vincent J. della Pietra, Peter V. de Souza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural lan- guage. Computational Linguistics, 18(4):487-479. Kenneth W. Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing (ANLP-88), pages 136-143, Austin, Texas, February. Associa- tion for Computational Linguistics. W. J. Conover. 1980. Practical Nonparametric Statistics. Wiley, New York, 2nd edition. Richard O. Duda and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
  • 29. Michael Elhadad and Kathleen R. McKeown. 1990. A procedure for generating connectives. In Pro- ceedings of COLING, Helsinki, Finland, July. Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory ofNP-Completeness. W. H. Freeman, San Francisco, California. Vasileios Hatzivassiloglou and Kathleen R. McKe- own. 1993. Towards the automatic identification of adjectival scales: Clustering adjectives accord- ing to meaning. In Proceedings of the 31st Annual Meeting of the ACL, pages 172-182, Columbus, Ohio, June. Association for Computational Lin- guistics. Vasileios I-Iatzivassiloglou and Kathleen R. MeKe- own. 1995. A quantitative evaluation of linguis- tic tests for the automatic prediction of semantic markedness. In Proceedings of the 83rd Annual Meeting of the ACL, pages 197-204, Boston, Mas- sachusetts, June. Association for Computational Linguistics. John S. Justeson and Slava M. Katz. 1991. Co- occurrences of antonymous adjectives and their contexts. Computational Linguistics, 17(1):1-19. Adrienne Lehrer. 1974. Semantic Fields and Lezical Structure. North Holland, Amsterdam and New York. Adrienne Lehrer. 1985. Markedness and antonymy.
  • 30. Journal of Linguistics, 31(3):397-429, September. John Lyons. 1977. Semantics, volume 1. Cambridge University Press, Cambridge, England. Peter McCullagh and John A. Nelder. 1989. Gen- eralized Linear Models. Chapman and Hall, Lon- don, 2nd edition. George A. Miller, Richard Beckwith, Christiane Fell- baum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexi- cal database. International Journal of Lexicogra- phy (special issue), 3(4):235-312. Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 3Ist Annual Meeting of the ACL, pages 183-190, Columbus, Ohio, June. As- sociation for Computational Linguistics. Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53-65. Thomas J. Santner and Diane E. Duffy. 1989. The Statistical Analysis of Discrete Data. Springer- Verlag, New York. Helmuth Sp~ith. 1985. Cluster Dissection and Anal- ysis: Theory, FORTRAN Programs, Examples. Ellis Horwo0d, Chiehester, West Sussex, England. 181
  • 31. ROBERT: What are the characteristics of someone having you as an abundance or deficiency? Deficiency: -Goiter- Those who lack adequate levels of me, causes the thyroid to progressively grown larger as it tries to keep up with the demand for the thyroid hormone production. The goiter is large and can often interfere with breathing and eating and lead to choking. I do not cause this problem in the United States but I am a major reason that goiters are present Worldwide. - Hypothyroidism- As my levels drop another possible disorder is hypothyroidism which is the lack or low levels of the TH in the body. It is not common for lack of me to be the cause of hypothyroidism in the United States but it is more common worldwide. - In Pregnancy- When my levels are low in infants and pregnant women mean that I can cause miscarriages, stillbirth, preterm labor, and congenital abnormalities in the babies. Abundance: Taking too much of me can also cause problems. This is especially true in individuals that already have thyroid problems, such as nodules, hyperthyroidism and autoimmune thyroid disease. Administration of large amounts of me through medications, radiology procedures and dietary excess can cause or worsen hyperthyroidism and hypothyroidism. In addition, individuals who move from a deficient region (for example, parts of Europe) to a region with adequate intake (for example, the United States) may also develop thyroid problems since their thyroids have become very good at taking up and using small amounts. How would your recommended daily allowance affect special populations (pregnant women, children, men, women, etc.)?
  • 32. Men: 150 μg per day Women: 150 μg per day Birth to 6 months: 110 μg 7 months to 12 months: 130 μg 1- 8 years: 90 μg 9-13 years: 120 μg Teens: 150 μg Pregnant Women: 220 μg Breastfeeding Women: 290 μg These levels are specific for the United States. What are some of the ways you might adjust the amount of this mineral/water in a person's diet? The easiest way is table salt and making sure that I am included in the salt you chose. You can find me in some multivitamins but not often in the United States. What are some of the more serious long term and detrimental effects (deficiency disorders, etc.) that might affect the body? Goiters and Hypothyroidism are the most serious disorders that can happen with long term low levels. What other prevention, treatment, and safety measures might be beneficial to your patient? There are no tests to confirm if you have enough of me in your body. When deficiency is seen in an entire population, it is best managed by ensuring that common foods that people eat contain sufficient levels of me. Since even mild deficiency during pregnancy can have effects on delivery and the developing baby, all pregnant and breastfeeding women should take a multivitamin containing at least 150 μg per day.
  • 33. PAT: I am an essential part of an electrolyte, but for some reason the electrolyte drink Gatorade doesn’t think I am important and only puts 37mg in an 8oz bottle. Sodium is my co-worker and we keep each other out of trouble, but I am the most positively charged ion and together we help to attract some glucose into our work building called the “cell”. Have you ever had a cramp? Well sometime it is caused by my co-worker sodium overloading in cells. No worries, eat a banana because I know how to regulate sodium and will help to alleviate that cramp. Don’t you dare neglect me but don’t over indulge either. Without me you may have may be a weakling, have trouble breathing or moving stool. But a lack of me can cause abnormal heart contractions too. Conversely, if there is too much of this good thing I can cause the heart to stop. You are not going to find me in garbage food, so enough with the Micky-dees and Dairy queen. Hook yourself up with fruits and veggies, whole grains and high quality meats. Once you have made your initial post, the second portion of this weekly discussion requires that you respond to three different classmates. AERICK: I am a mineral that is very important for the human body. I have many effects on your body both in the event of a deficiency as well as having too much of me. I also mix well with many other elements such as chloride to help clear ice from your driveway during winter. Too much of me may cause kidney stones and cause interference in the way your heart and brain work. However, too little of me can result in serious health issues such as osteopenia and osteoporosis. I can weaken the most essential parts of your body and Vitamin D helps me. I am important for pregnant women and they should consume at
  • 34. least 1,000mg a day of me. For infants I should exist up to 700mg a day which amounts should increase to over 1000mg per day as they grow older. Adults should regularly be taking it at least 1000mg of me a day. You can find me regularly in foods, however if you are lacking you can also find me in supplements but I can also exist in medications such as anti-acids. Symptoms of disorders as a result of abusing me may end up causing confusion and memory loss as well as muscle spasms, numbness and tingling of the hands feet and face. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews Peter D. Turney Institute for Information Technology National Research Council of Canada Ottawa, Ontario, Canada, K1A 0R6 [email protected] Abstract This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not rec- ommended (thumbs down). The classifi- cation of a review is predicted by the average semantic orientation of the phrases in the review that contain adjec- tives or adverbs. A phrase has a positive semantic orientation when it has good as-
  • 35. sociations (e.g., “ subtle nuances” ) and a negative semantic orientation when it has bad associations (e.g., “ very cavalier” ). In this paper, the semantic orientation of a phrase is calculated as the mutual infor- mation between the given phrase and the word “ excellent” minus the mutual information between the given phrase and the word “ poor” . A review is classified as recommended if the average semantic ori- entation of its phrases is positive. The al- gorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The ac- curacy ranges from 84% for automobile reviews to 66% for movie reviews. 1 Introduction If you are considering a vacation in Akumal, Mex- ico, you might go to a search engine and enter the query “ Akumal travel review” . However, in this case, Google1 reports about 5,000 matches. It would be useful to know what fraction of these matches recommend Akumal as a travel destina- tion. With an algorithm for automatically classify- ing a review as “ thumbs up” or “ thumbs down” , it would be possible for a search engine to report such summary statistics. This is the motivation for the research described here. Other potential appli- cations include recognizing “ flames” (abusive newsgroup messages) (Spertus, 1997) and develop- ing new kinds of search tools (Hearst, 1992).
  • 36. In this paper, I present a simple unsupervised learning algorithm for classifying a review as rec- ommended or not recommended. The algorithm takes a written review as input and produces a classification as output. The first step is to use a part-of-speech tagger to identify phrases in the in- put text that contain adjectives or adverbs (Brill, 1994). The second step is to estimate the semantic orientation of each extracted phrase (Hatzivassi- loglou & McKeown, 1997). A phrase has a posi- tive semantic orientation when it has good associations (e.g., “ romantic ambience” ) and a negative semantic orientation when it has bad as- sociations (e.g., “ horrific events” ). The third step is to assign the given review to a class, recommended or not recommended, based on the average seman- tic orientation of the phrases extracted from the re- view. If the average is positive, the prediction is that the review recommends the item it discusses. Otherwise, the prediction is that the item is not recommended. The PMI-IR algorithm is employed to estimate the semantic orientation of a phrase (Turney, 2001). PMI-IR uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words or phrases. The se- 1 http://www.google.com mantic orientation of a given phrase is calculated by comparing its similarity to a positive reference word (“ excellent” ) with its similarity to a negative
  • 37. reference word (“ poor” ). More specifically, a phrase is assigned a numerical rating by taking the mutual information between the given phrase and the word “ excellent” and subtracting the mutual information between the given phrase and the word “ poor” . In addition to determining the direction of the phrase’ s semantic orientation (positive or nega- tive, based on the sign of the rating), this numerical rating also indicates the strength of the semantic orientation (based on the magnitude of the num- ber). The algorithm is presented in Section 2. Hatzivassiloglou and McKeown (1997) have also developed an algorithm for predicting seman- tic orientation. Their algorithm performs well, but it is designed for isolated adjectives, rather than phrases containing adjectives or adverbs. This is discussed in more detail in Section 3, along with other related work. The classification algorithm is evaluated on 410 reviews from Epinions2, randomly sampled from four different domains: reviews of automobiles, banks, movies, and travel destinations. Reviews at Epinions are not written by professional writers; any person with a Web browser can become a member of Epinions and contribute a review. Each of these 410 reviews was written by a different au- thor. Of these reviews, 170 are not recommended and the remaining 240 are recommended (these classifications are given by the authors). Always guessing the majority class would yield an accu- racy of 59%. The algorithm achieves an average accuracy of 74%, ranging from 84% for automo- bile reviews to 66% for movie reviews. The ex- perimental results are given in Section 4.
  • 38. The interpretation of the experimental results, the limitations of this work, and future work are discussed in Section 5. Potential applications are outlined in Section 6. Finally, conclusions are pre- sented in Section 7. 2 Classifying Reviews The first step of the algorithm is to extract phrases containing adjectives or adverbs. Past work has demonstrated that adjectives are good indicators of subjective, evaluative sentences (Hatzivassiloglou 2 http://www.epinions.com & Wiebe, 2000; Wiebe, 2000; Wiebe et al., 2001). However, although an isolated adjective may indi- cate subjectivity, there may be insufficient context to determine semantic orientation. For example, the adjective “ unpredictable” may have a negative orientation in an automotive review, in a phrase such as “ unpredictable steering” , but it could have a positive orientation in a movie review, in a phrase such as “ unpredictable plot” . Therefore the algorithm extracts two consecutive words, where one member of the pair is an adjective or an adverb and the second provides context. First a part-of-speech tagger is applied to the review (Brill, 1994).3 Two consecutive words are extracted from the review if their tags conform to any of the patterns in Table 1. The JJ tags indicate adjectives, the NN tags are nouns, the RB tags are adverbs, and the VB tags are verbs.4 The second
  • 39. pattern, for example, means that two consecutive words are extracted if the first word is an adverb and the second word is an adjective, but the third word (which is not extracted) cannot be a noun. NNP and NNPS (singular and plural proper nouns) are avoided, so that the names of the items in the review cannot influence the classification. Table 1. Patterns of tags for extracting two-word phrases from reviews. First Word Second Word Third Word (Not Extracted) 1. JJ NN or NNS anything 2. RB, RBR, or RBS JJ not NN nor NNS 3. JJ JJ not NN nor NNS 4. NN or NNS JJ not NN nor NNS 5. RB, RBR, or RBS VB, VBD, VBN, or VBG anything The second step is to estimate the semantic ori- entation of the extracted phrases, using the PMI-IR algorithm. This algorithm uses mutual information as a measure of the strength of semantic associa- tion between two words (Church & Hanks, 1989). PMI-IR has been empirically evaluated using 80
  • 40. synonym test questions from the Test of English as a Foreign Language (TOEFL), obtaining a score of 74% (Turney, 2001). For comparison, Latent Se- mantic Analysis (LSA), another statistical measure of word association, attains a score of 64% on the 3 http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z 4 See Santorini (1995) for a complete description of the tags. same 80 TOEFL questions (Landauer & Dumais, 1997). The Pointwise Mutual Information (PMI) be- tween two words, word1 and word2, is defined as follows (Church & Hanks, 1989): p(word1 & word2) PMI(word1, word2) = log2 p(word1) p(word2) (1) Here, p(word1 & word2) is the probability that word1 and word2 co-occur. If the words are statisti- cally independent, then the probability that they co-occur is given by the product p(word1) p(word2). The ratio between p(word1 & word2) and p(word1) p(word2) is thus a measure of the degree of statistical dependence between the words. The log of this ratio is the amount of information that we acquire about the presence of one of the words when we observe the other.
  • 41. The Semantic Orientation (SO) of a phrase, phrase, is calculated here as follows: SO(phrase) = PMI(phrase, “ excellent” ) - PMI(phrase, “ poor” ) (2) The reference words “ excellent” and “ poor” were chosen because, in the five star review rating sys- tem, it is common to define one star as “ poor” and five stars as “ excellent” . SO is positive when phrase is more strongly associated with “ excellent” and negative when phrase is more strongly associ- ated with “ poor” . PMI-IR estimates PMI by issuing queries to a search engine (hence the IR in PMI-IR) and noting the number of hits (matching documents). The fol- lowing experiments use the AltaVista Advanced Search engine5, which indexes approximately 350 million web pages (counting only those pages that are in English). I chose AltaVista because it has a NEAR operator. The AltaVista NEAR operator constrains the search to documents that contain the words within ten words of one another, in either order. Previous work has shown that NEAR per- forms better than AND when measuring the strength of semantic association between words (Turney, 2001). Let hits(query) be the number of hits returned, given the query query. The following estimate of SO can be derived from equations (1) and (2) with
  • 42. 5 http://www.altavista.com/sites/search/adv some minor algebraic manipulation, if co- occurrence is interpreted as NEAR: SO(phrase) = hits(phrase NEAR “ excellent” ) hits(“ poor” ) log2 hits(phrase NEAR “ poor” ) hits(“ excellent” ) (3) Equation (3) is a log-odds ratio (Agresti, 1996). To avoid division by zero, I added 0.01 to the hits. I also skipped phrase when both hits(phrase NEAR “ excellent” ) and hits(phrase NEAR “ poor” ) were (simultaneously) less than four. These numbers (0.01 and 4) were arbitrarily cho- sen. To eliminate any possible influence from the testing data, I added “ AND (NOT host:epinions)” to every query, which tells AltaVista not to include the Epinions Web site in its searches. The third step is to calculate the average seman- tic orientation of the phrases in the given review and classify the review as recommended if the av- erage is positive and otherwise not recommended. Table 2 shows an example for a recommended review and Table 3 shows an example for a not recommended review. Both are reviews of the Bank of America. Both are in the collection of 410 reviews from Epinions that are used in the experi-
  • 43. ments in Section 4. Table 2. An example of the processing of a review that the author has classified as recommended.6 Extracted Phrase Part-of-Speech Tags Semantic Orientation online experience JJ NN 2.253 low fees JJ NNS 0.333 local branch JJ NN 0.421 small part JJ NN 0.053 online service JJ NN 2.780 printable version JJ NN -0.705 direct deposit JJ NN 1.288 well other RB JJ 0.237 inconveniently located RB VBN -1.541 other bank JJ NN -0.850 true service JJ NN -0.732 Average Semantic Orientation 0.322 6 The semantic orientation in the following tables is calculated using the natural logarithm (base e), rather than base 2. The natural log is more common in the literature on log-odds ratio. Since all logs are equivalent up to a constant factor, it makes no difference for the algorithm.
  • 44. Table 3. An example of the processing of a review that the author has classified as not recommended. Extracted Phrase Part-of-Speech Tags Semantic Orientation little difference JJ NN -1.615 clever tricks JJ NNS -0.040 programs such NNS JJ 0.117 possible moment JJ NN -0.668 unethical practices JJ NNS -8.484 low funds JJ NNS -6.843 old man JJ NN -2.566 other problems JJ NNS -2.748 probably wondering RB VBG -1.830 virtual monopoly JJ NN -2.050 other bank JJ NN -0.850 extra day JJ NN -0.286 direct deposits JJ NNS 5.771 online web JJ NN 1.936 cool thing JJ NN 0.395 very handy RB JJ 1.349 lesser evil RBR JJ -2.288 Average Semantic Orientation -1.218 3 Related Work This work is most closely related to Hatzivassi- loglou and McKeown’ s (1997) work on predicting the semantic orientation of adjectives. They note that there are linguistic constraints on the semantic
  • 45. orientations of adjectives in conjunctions. As an example, they present the following three sen- tences (Hatzivassiloglou & McKeown, 1997): 1. The tax proposal was simple and well- received by the public. 2. The tax proposal was simplistic but well- received by the public. 3. (* ) The tax proposal was simplistic and well-received by the public. The third sentence is incorrect, because we use “ and” with adjectives that have the same semantic orientation (“ simple” and “ well-received” are both positive), but we use “ but” with adjectives that have different semantic orientations (“ simplistic” is negative). Hatzivassiloglou and McKeown (1997) use a four-step supervised learning algorithm to infer the semantic orientation of adjectives from constraints on conjunctions: 1. All conjunctions of adjectives are extracted from the given corpus. 2. A supervised learning algorithm combines multiple sources of evidence to label pairs of adjectives as having the same semantic orienta- tion or different semantic orientations. The re- sult is a graph where the nodes are adjectives and links indicate sameness or difference of semantic orientation.
  • 46. 3. A clustering algorithm processes the graph structure to produce two subsets of adjectives, such that links across the two subsets are mainly different-orientation links, and links in- side a subset are mainly same-orientation links. 4. Since it is known that positive adjectives tend to be used more frequently than negative adjectives, the cluster with the higher average frequency is classified as having positive se- mantic orientation. This algorithm classifies adjectives with accuracies ranging from 78% to 92%, depending on the amount of training data that is available. The algo- rithm can go beyond a binary positive-negative dis- tinction, because the clustering algorithm (step 3 above) can produce a “ goodness-of-fit” measure that indicates how well an adjective fits in its as- signed cluster. Although they do not consider the task of clas- sifying reviews, it seems their algorithm could be plugged into the classification algorithm presented in Section 2, where it would replace PMI-IR and equation (3) in the second step. However, PMI-IR is conceptually simpler, easier to implement, and it can handle phrases and adverbs, in addition to iso- lated adjectives. As far as I know, the only prior published work on the task of classifying reviews as thumbs up or down is Tong’ s (2001) system for generating sen- timent timelines. This system tracks online discus- sions about movies and displays a plot of the number of positive sentiment and negative senti-
  • 47. ment messages over time. Messages are classified by looking for specific phrases that indicate the sentiment of the author towards the movie (e.g., “ great acting” , “ wonderful visuals” , “ terrible score” , “ uneven editing” ). Each phrase must be manually added to a special lexicon and manually tagged as indicating positive or negative sentiment. The lexicon is specific to the domain (e.g., movies) and must be built anew for each new domain. The company Mindfuleye7 offers a technology called Lexant™ that appears similar to Tong’ s (2001) system. Other related work is concerned with determin- ing subjectivity (Hatzivassiloglou & Wiebe, 2000; Wiebe, 2000; Wiebe et al., 2001). The task is to distinguish sentences that present opinions and evaluations from sentences that objectively present factual information (Wiebe, 2000). Wiebe et al. (2001) list a variety of potential applications for automated subjectivity tagging, such as recogniz- ing “ flames” (Spertus, 1997), classifying email, recognizing speaker role in radio broadcasts, and mining reviews. In several of these applications, the first step is to recognize that the text is subjec- tive and then the natural second step is to deter- mine the semantic orientation of the subjective text. For example, a flame detector cannot merely detect that a newsgroup message is subjective, it must further detect that the message has a negative semantic orientation; otherwise a message of praise could be classified as a flame.
  • 48. Hearst (1992) observes that most search en- gines focus on finding documents on a given topic, but do not allow the user to specify the directional- ity of the documents (e.g., is the author in favor of, neutral, or opposed to the event or item discussed in the document?). The directionality of a docu- ment is determined by its deep argumentative structure, rather than a shallow analysis of its ad- jectives. Sentences are interpreted metaphorically in terms of agents exerting force, resisting force, and overcoming resistance. It seems likely that there could be some benefit to combining shallow and deep analysis of the text. 4 Experiments Table 4 describes the 410 reviews from Epinions that were used in the experiments. 170 (41%) of the reviews are not recommended and the remain- ing 240 (59%) are recommended. Always guessing the majority class would yield an accuracy of 59%. The third column shows the average number of phrases that were extracted from the reviews. Table 5 shows the experimental results. Except for the travel reviews, there is surprisingly little variation in the accuracy within a domain. In addi- 7 http://www.mindfuleye.com/ tion to recommended and not recommended, Epin- ions reviews are classified using the five star rating system. The third column shows the correlation be- tween the average semantic orientation and the number of stars assigned by the author of the re-
  • 49. view. The results show a strong positive correla- tion between the average semantic orientation and the author’ s rating out of five stars. Table 4. A summary of the corpus of reviews. Domain of Review Number of Reviews Average Phrases per Review Automobiles 75 20.87 Honda Accord 37 18.78 Volkswagen Jetta 38 22.89 Banks 120 18.52 Bank of America 60 22.02 Washington Mutual 60 15.02 Movies 120 29.13 The Matrix 60 19.08 Pearl Harbor 60 39.17 Travel Destinations 95 35.54 Cancun 59 30.02 Puerto Vallarta 36 44.58 All 410 26.00 Table 5. The accuracy of the classification and the cor- relation of the semantic orientation with the star rating. Domain of Review Accuracy Correlation Automobiles 84.00 % 0.4618
  • 50. Honda Accord 83.78 % 0.2721 Volkswagen Jetta 84.21 % 0.6299 Banks 80.00 % 0.6167 Bank of America 78.33 % 0.6423 Washington Mutual 81.67 % 0.5896 Movies 65.83 % 0.3608 The Matrix 66.67 % 0.3811 Pearl Harbor 65.00 % 0.2907 Travel Destinations 70.53 % 0.4155 Cancun 64.41 % 0.4194 Puerto Vallarta 80.56 % 0.1447 All 74.39 % 0.5174 5 Discussion of Results A natural question, given the preceding results, is what makes movie reviews hard to classify? Table 6 shows that classification by the average SO tends to err on the side of guessing that a review is not recommended, when it is actually recommended. This suggests the hypothesis that a good movie will often contain unpleasant scenes (e.g., violence, death, mayhem), and a recommended movie re- view may thus have its average semantic orienta- tion reduced if it contains descriptions of these un- pleasant scenes. However, if we add a constant value to the average SO of the movie reviews, to compensate for this bias, the accuracy does not
  • 51. improve. This suggests that, just as positive re- views mention unpleasant things, so negative re- views often mention pleasant scenes. Table 6. The confusion matrix for movie classifications. Author’ s Classification Average Semantic Orientation Thumbs Up Thumbs Down Sum Positive 28.33 % 12.50 % 40.83 % Negative 21.67 % 37.50 % 59.17 % Sum 50.00 % 50.00 % 100.00 % Table 7 shows some examples that lend support to this hypothesis. For example, the phrase “ more evil” does have negative connotations, thus an SO of -4.384 is appropriate, but an evil character does not make a bad movie. The difficulty with movie reviews is that there are two aspects to a movie, the events and actors in the movie (the elements of the movie), and the style and art of the movie (the movie as a gestalt; a unified whole). This is likely also the explanation for the lower accuracy of the Cancun reviews: good beaches do not necessarily add up to a good vacation. On the other hand, good automotive parts usually do add up to a good
  • 52. automobile and good banking services add up to a good bank. It is not clear how to address this issue. Future work might look at whether it is possible to tag sentences as discussing elements or wholes. Another area for future work is to empirically compare PMI-IR and the algorithm of Hatzivassi- loglou and McKeown (1997). Although their algo- rithm does not readily extend to two-word phrases, I have not yet demonstrated that two-word phrases are necessary for accurate classification of reviews. On the other hand, it would be interesting to evalu- ate PMI-IR on the collection of 1,336 hand-labeled adjectives that were used in the experiments of Hatzivassiloglou and McKeown (1997). A related question for future work is the relationship of ac- curacy of the estimation of semantic orientation at the level of individual phrases to accuracy of re- view classification. Since the review classification is based on an average, it might be quite resistant to noise in the SO estimate for individual phrases. But it is possible that a better SO estimator could produce significantly better classifications. Table 7. Sample phrases from misclassified reviews. Movie: The Matrix Author’ s Rating: recommended (5 stars) Average SO: -0.219 (not recommended) Sample Phrase: more evil [RBR JJ] SO of Sample Phrase: -4.384
  • 53. Context of Sample Phrase: The slow, methodical way he spoke. I loved it! It made him seem more arrogant and even more evil. Movie: Pearl Harbor Author’ s Rating: recommended (5 stars) Average SO: -0.378 (not recommended) Sample Phrase: sick feeling [JJ NN] SO of Sample Phrase: -8.308 Context of Sample Phrase: During this period I had a sick feeling, knowing what was coming, knowing what was part of our history. Movie: The Matrix Author’ s Rating: not recommended (2 stars) Average SO: 0.177 (recommended) Sample Phrase: very talented [RB JJ] SO of Sample Phrase: 1.992 Context of Sample Phrase:
  • 54. Well as usual Keanu Reeves is nothing special, but surpris- ingly, the very talented Laur- ence Fishbourne is not so good either, I was surprised. Movie: Pearl Harbor Author’ s Rating: not recommended (3 stars) Average SO: 0.015 (recommended) Sample Phrase: blue skies [JJ NNS] SO of Sample Phrase: 1.263 Context of Sample Phrase: Anyone who saw the trailer in the theater over the course of the last year will never forget the images of Japanese war planes swooping out of the blue skies, flying past the children playing baseball, or the truly remarkable shot of a bomb falling from an enemy plane into the deck of the USS Arizona. Equation (3) is a very simple estimator of se- mantic orientation. It might benefit from more so- phisticated statistical analysis (Agresti, 1996). One
  • 55. possibility is to apply a statistical significance test to each estimated SO. There is a large statistical literature on the log-odds ratio, which might lead to improved results on this task. This paper has focused on unsupervised classi- fication, but average semantic orientation could be supplemented by other features, in a supervised classification system. The other features could be based on the presence or absence of specific words, as is common in most text classification work. This could yield higher accuracies, but the intent here was to study this one feature in isola- tion, to simplify the analysis, before combining it with other features. Table 5 shows a high correlation between the average semantic orientation and the star rating of a review. I plan to experiment with ordinal classi- fication of reviews in the five star rating system, using the algorithm of Frank and Hall (2001). For ordinal classification, the average semantic orienta- tion would be supplemented with other features in a supervised classification system. A limitation of PMI-IR is the time required to send queries to AltaVista. Inspection of Equation (3) shows that it takes four queries to calculate the semantic orientation of a phrase. However, I cached all query results, and since there is no need to recalculate hits(“ poor” ) and hits(“ excellent” ) for every phrase, each phrase requires an average of slightly less than two queries. As a courtesy to AltaVista, I used a five second delay between que- ries.8 The 410 reviews yielded 10,658 phrases, so
  • 56. the total time required to process the corpus was roughly 106,580 seconds, or about 30 hours. This might appear to be a significant limitation, but extrapolation of current trends in computer memory capacity suggests that, in about ten years, the average desktop computer will be able to easily store and search AltaVista’ s 350 million Web pages. This will reduce the processing time to less than one second per review. 6 Applications There are a variety of potential applications for automated review rating. As mentioned in the in- 8 This line of research depends on the good will of the major search engines. For a discussion of the ethics of Web robots, see http://www.robotstxt.org/wc/robots.html. For query robots, the proposed extended standard for robot exclusion would be useful. See http://www.conman.org/people/spc/robots2.html. troduction, one application is to provide summary statistics for search engines. Given the query “ Akumal travel review” , a search engine could re- port, “ There are 5,000 hits, of which 80% are thumbs up and 20% are thumbs down.” The search results could be sorted by average semantic orien- tation, so that the user could easily sample the most extreme reviews. Similarly, a search engine could allow the user to specify the topic and the rating of the desired reviews (Hearst, 1992). Preliminary experiments indicate that semantic orientation is also useful for summarization of re-
  • 57. views. A positive review could be summarized by picking out the sentence with the highest positive semantic orientation and a negative review could be summarized by extracting the sentence with the lowest negative semantic orientation. Epinions asks its reviewers to provide a short description of pros and cons for the reviewed item. A pro/con summarizer could be evaluated by measuring the overlap between the reviewer’ s pros and cons and the phrases in the review that have the most extreme semantic orientation. Another potential application is filtering “ flames” for newsgroups (Spertus, 1997). There could be a threshold, such that a newsgroup mes- sage is held for verification by the human modera- tor when the semantic orientation of a phrase drops below the threshold. A related use might be a tool for helping academic referees when reviewing journal and conference papers. Ideally, referees are unbiased and objective, but sometimes their criti- cism can be unintentionally harsh. It might be pos- sible to highlight passages in a draft referee’ s report, where the choice of words should be modi- fied towards a more neutral tone. Tong’ s (2001) system for detecting and track- ing opinions in on-line discussions could benefit from the use of a learning algorithm, instead of (or in addition to) a hand-built lexicon. With auto- mated review rating (opinion rating), advertisers could track advertising campaigns, politicians could track public opinion, reporters could track public response to current events, stock traders could track financial opinions, and trend analyzers
  • 58. could track entertainment and technology trends. 7 Conclusions This paper introduces a simple unsupervised learn- ing algorithm for rating a review as thumbs up or down. The algorithm has three steps: (1) extract phrases containing adjectives or adverbs, (2) esti- mate the semantic orientation of each phrase, and (3) classify the review based on the average se- mantic orientation of the phrases. The core of the algorithm is the second step, which uses PMI-IR to calculate semantic orientation (Turney, 2001). In experiments with 410 reviews from Epin- ions, the algorithm attains an average accuracy of 74%. It appears that movie reviews are difficult to classify, because the whole is not necessarily the sum of the parts; thus the accuracy on movie re- views is about 66%. On the other hand, for banks and automobiles, it seems that the whole is the sum of the parts, and the accuracy is 80% to 84%. Travel reviews are an intermediate case. Previous work on determining the semantic ori- entation of adjectives has used a complex algo- rithm that does not readily extend beyond isolated adjectives to adverbs or longer phrases (Hatzivassi- loglou and McKeown, 1997). The simplicity of PMI-IR may encourage further work with semantic orientation. The limitations of this work include the time
  • 59. required for queries and, for some applications, the level of accuracy that was achieved. The former difficulty will be eliminated by progress in hard- ware. The latter difficulty might be addressed by using semantic orientation combined with other features in a supervised classification algorithm. Acknowledgements Thanks to Joel Martin and Michael Littman for helpful comments. References Agresti, A. 1996. An introduction to categorical data analysis. New York: Wiley. Brill, E. 1994. Some advances in transformation-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 722-727). Menlo Park, CA: AAAI Press. Church, K.W., & Hanks, P. 1989. Word association norms, mutual information and lexicography. Pro- ceedings of the 27th Annual Conference of the ACL (pp. 76-83). New Brunswick, NJ: ACL. Frank, E., & Hall, M. 2001. A simple approach to ordi- nal classification. Proceedings of the Twelfth Euro- pean Conference on Machine Learning (pp. 145- 156). Berlin: Springer-Verlag. Hatzivassiloglou, V., & McKeown, K.R. 1997. Predict- ing the semantic orientation of adjectives. Proceed- ings of the 35th Annual Meeting of the ACL and the
  • 60. 8th Conference of the European Chapter of the ACL (pp. 174-181). New Brunswick, NJ: ACL. Hatzivassiloglou, V., & Wiebe, J.M. 2000. Effects of adjective orientation and gradability on sentence sub- jectivity. Proceedings of 18th International Confer- ence on Computational Linguistics. New Brunswick, NJ: ACL. Hearst, M.A. 1992. Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Re- search and Practice in Information Extraction and Retrieval. Mahwah, NJ: Lawrence Erlbaum Associ- ates. Landauer, T.K., & Dumais, S.T. 1997. A solution to Plato’ s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240. Santorini, B. 1995. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd revision, 2nd printing). Technical Report, Department of Computer and Information Science, University of Pennsylvania. Spertus, E. 1997. Smokey: Automatic recognition of hostile messages. Proceedings of the Conference on Innovative Applications of Artificial Intelligence (pp. 1058-1065). Menlo Park, CA: AAAI Press. Tong, R.M. 2001. An operational system for detecting and tracking opinions in on-line discussions. Working Notes of the ACM SIGIR 2001 Workshop on Opera- tional Text Classification (pp. 1-6). New York, NY: ACM.
  • 61. Turney, P.D. 2001. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (pp. 491-502). Berlin: Springer-Verlag. Wiebe, J.M. 2000. Learning subjective adjectives from corpora. Proceedings of the 17th National Confer- ence on Artificial Intelligence. Menlo Park, CA: AAAI Press. Wiebe, J.M., Bruce, R., Bell, M., Martin, M., & Wilson, T. 2001. A corpus study of evaluative and specula- tive language. Proceedings of the Second ACL SIG on Dialogue Workshop on Discourse and Dialogue. Aalborg, Denmark.