COLING 2014: Joint Opinion Relation Detection Using One-Class Deep Neural Network

Joint Opinion Relation Detection
Using One-Class Deep Neural Network
COLING 2014 reading @ Komachi Lab
Komachi Lab. M1 Peinan ZHANG

Abstract & Introduction
有効な opinion relation には3つ必要な条件がある。
1. 感情極性を含んだ opinion word
2. 現在のドメインに関連した opinion target
3. opinion word が opinion target を修飾している
これをタプルとして表現すると
o = (s, t, r)
となり、それぞれ
s = opinion word
t = opinion target
r = linking relation between s and t
2

Assumption 1
Terms are likely to have linking relation with the seed
terms are believed to be opinion words or opinion
targets.
seed terms とのリンクを持つ単語は opinion words
もしくは opinion targets である。
3

Abstract & Introduction
Example 1.
This mp3 has a clear screen.
s = {clear}, t = {screen}, r = {clear, screen}
Example 2.
This mp3 has many good things.
s = {good}, t = {things}, r = {good, things}
o = (s, t, r)
s = opinion word
t = opinion target
r = linking relation
between s and t
Is the word ‘things’ related to the domain ‘mp3’ ?
4
NO!!

The Problems
前述した3つの条件
1. 感情極性を含んだ opinion word
2. 現在のドメインに関連した opinion target
3. opinion word が opinion target を修飾している
のうち、2つにのみ焦点を当てていた。
そのため先行研究は雑多なノイズ単語を抽出してしま
い、その影響を受けることになった。
Minging Hu and Bing Liu., 2004., Mining and summarizing
customer reviews, ACM SIGKDD
5

Assumption 2
The three requirements: the opinion word, the
opinion target and the linking relation between them,
shall be all verified during opinion relation detection.
3つの必要な条件である opinion word と opinion
target 、それらの linking relation は同時に使わなけ
ればならない。
6

Approach
a novel Joint Opinion Relation Detection Method
opinion words, opinion targets and linking relations are simultaneously
considered in a classification scenario.
HOW TO
1. provide a small set of seeds for supervision, which are regarded
as positive labeled examples.
n small set of seeds: opinion words, opinion targets
n negative examples (i.e. noise terms) are hard to acquire, because we
do not know which term is not an opinion word or target.
2. This leads to One-Class Classification (OCC) problem.
n the key to OCC is semantic similarity measuring between terms.
n Deep Neural Network (DNN) with word embeddings is a powerful
tool to handle this problem.
7

Approach
a novel Joint Opinion Relation Detection Method
opinion words, opinion targets and linking relations are simultaneously
considered in a classification scenario.
HOW TO
1. provide a small set of seeds for supervision, which are regarded
as positive labeled examples.
n small set: opinion words, opinion targets
n negative examples (i.e. noise terms) are hard to acquire, because we
do not know which term is not an opinion word or target.
2. This leads to One-Class Classification (OCC) problem.
n the key to OCC is semantic similarity measuring between terms.
n Deep Neural Network (DNN) with word embeddings is a powerful
tool to handle this problem.
8

The Architecture of OCDNN
9
Consists of two levels.
Lower Level: Learn features
n Left
uses words embedding to
represent opinion words/
targets.
n Right
maps linking relations to
embedding vectors by a
recursive auto-encoder
Higher Level: use the learnt
feature to perform one-class
classification

Outline
1. Abstract & Introduction
2. Approach
3. The Architecture of OCDNN
1. Generate Opinion Seeds
2. Generate Opinion Relation Candidates
3. Represent Words
4. Represent Linking Relation
5. One-Class Classification
4. Datasets & Experiments
5. Conclusion
10

Outline
2. Approach
3. Represent Words
5. Conclusion
11

Opinion Seed Generation
Opinion Word Seeds
We manually pick 186 domain independent opinion words from
SentiWordNet as the opinion seed set SS.
Opinion Target Seeds
We measure Likelihood Ratio Tests (LRT) between domain name
and all opinion target candidates. Then highest N terms with
highest LRT scores are added into the opinion target seed set TS.
Linking Relation Seeds
We employ an automatic syntactic opinion pattern learning method
called Sentiment Graph Walking and get 12 opinion patterns with
highest confidence as the linking relation seed set RS.
12

Opinion Relation Candidate Generation
Opinion Word Candidates:
p adjectives or verbs
Opinion Target Candidates:
p noun or noun phrases
Opinion Relation Candidates:
p get dependency tree of a sentence using Stanford Parser
p the shortest dependency path between a c_s and a c_t is
taken as a c_r
p to avoid introducing too many noise candidates, we
constrain that there are at most 4 terms in a c_r
13

Word Representation by
Word Embedding Learning
Words are embedded into a hyperspace, where 2 words
that are more semantically similar to each other are
located closer. (something like word2vec)
See more in the paper below,
Ronan Collobert et al., 2011., Natural language processing
(almost) from scratch, Journal of Machine Learning Research
14

Linking Relation Representation by
Using Recursive Auto-encoder
Goal: represent the linking relation between an opinion
word and an opinion target by a n-element vector as we do
during word representation.
We combined embedding vectors of words in a linking
relation by a recursive auto-encoder according to syntactic
dependency structure.
15

Linking Relation Representation by
Using Recursive Auto-encoder
1. c_s と c_t との間の係
り受け関係を取ってくる。
2. c_s と c_t をそれぞれ
[SC] と [ST] に置き換え
る。
3. 点線内を3層からなる
auto-encoder とし、2つ
の n-element vector を
中間層で1つの n-element
vector に圧縮す
る（下式）。
4. W を入力と出力のユー
クリッド距離が最小にな
るまで更新していく。
16
Example. too loud to listen to the player

One-Class Classification for
Opinion Relation Detection
We represent an opinion relation candidate c_o by a vector
v_o=[v_s; v_t; v_r], and this vector v_o is to feed to
upper level auto-encoder.
For opinion relation detection, error scores that are smaller
than a threshold theta are classified as positive.
To estimate theta, we need to introduce a positive
proportion (pp) score as follows,
17

Opinion Target Expansion
We apply bootstrapping to iteratively expand opinion
targets seeds.
p because the vocabulary of seed set is limited, which cannot fully
represent the distribution of opinion targets.
After training OCDNN, all opinion relation candidates are
classified, and opinion targets are ranked in descent order
by,
Then, top M candidates are added into the target seed set
TS for the next training iteration.
18

Datasets
Datasets
p Customer Review Dataset (CRD)
n contains review on five products (denoted by D1 to D5)
p benchmark dataset on MP3 and Hotel
p crawled from www.amazon.com, which involves Mattress and
Phone
Annotation
p 10000 sentences are randomly selected from reviews and
annotators are required to judge whether each term is an opinion
word or an opinion target.
p 5000 sentences are annotated for MP3 and Hotel. Annotators are
required to carefully read through each sentence and find out
every opinion relation detection.
19

Evaluation Settings
AdjRule
extract opinion words/targets by using adjacency rules
LRTBOOT
bootstrapping algorithm which employs Likelihood Ration Test as the
co-occurrence statistical measure
DP
denotes the Double Propagation algorithm
DP-HITS
enhanced version of DP by using HITS algorithm
OCDNN
proposed method. The target seed size N=40, the opinion targets
expanded in each iteration M=20, and the max bootstrapping
iteration number is X=10.
20
statistical co-occurrence-based
method
syntax-based method

Experiments
DP-HITS does not extract opinion words so their results for
opinion words are not taken into account.
21

Experiments
22

Experiments
23
n our method outperforms co-occurrence-based methods AdjRule
and LRTBOOT
n but achieves comparable or a little worse results than syntax-based
methods DP and DP-HITS
n because CRD is quite small, which only contains several
hundred sentences for each product review set. In this case,
methods based on careful-designed syntax rules have
superiority over those based on statistics.
n our method outperforms all of the competitors
n OCDNN vs. DP-HITS: those two use similar term ranking metrics,
but OCDNN significantly outperforms DP-HITS. Therefore,
positive proportion is more effective than the importance score.
n OCDNN vs. LRTBOOT: LRTBOOT is better recall but lower
precision. This is because LRTBOOT follows Assumption 1, which
suffers a lot from error propagation, while our joint classification
approach effectively alleviates this issue.

Assumption 1 vs. Assumption 2
24

25
Assumption 1 vs. Assumption 2
n OCDNN significantly outperforms all competitors. The average
improvement of F-measure over the best competitor is 6% on
CRD and 9% on Hotel and MP3.
n As Assumption 1 only verifies 2 of the requirements, it would
inevitably introduce noise terms.
n For syntax-based method DP, it extracts many false opinion
relations such as good thing and nice one or objective expressions
like another mp3 and every mp3.
n For co-occurrence statistical methods AdjRule and LRTBOOT, it is
very hard to deal with ambiguous linking relations. For example, in
phrase this mp3 is very good except the size, co-occurrence
statistical methods could hardly tell which opinion target does
good modify (mp3 or size).

Conclusion
p この論文では joint opinion relation detection を One-
Class Deep Neural Network に適応させて分類を行った。
p 特徴的な点は、 opinion words/targets/relations を同時
に参照して分類することにある。
p そして実験では、条件の2つしか適応させていなかった
手法よりも良い結果を示すことが出来た。
26

Conclusion
2. Approach
3. Represent Words
5. Conclusion
27

COLING 2014: Joint Opinion Relation Detection Using One-Class Deep Neural Network

More Related Content

What's hot

Similar to COLING 2014: Joint Opinion Relation Detection Using One-Class Deep Neural Network

More from Peinan ZHANG

Recently uploaded

COLING 2014: Joint Opinion Relation Detection Using One-Class Deep Neural Network