文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields

文献紹介 2014/06/05
長岡技術科学大学
自然言語処理研究室
岡田正平

文献情報
BishanYang and Claire Cardie.
Extracting Opinion Expressions with semi- Markov Conditional Random Fields.
In Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, pp. 1335-1345. (2012)
2014/6/5 文献紹介 2
※本スライド中の数式，図表はすべて文献より引用

概要
•
従来tokenlevelで行われていた意見抽出を segment levelで行う
•
semi-CRFモデルを拡張し，任意の長さの表現を扱えるようにした
•
意見抽出手法のstate-of-the-artな手法を上回る結果が出た
2014/6/5 文献紹介 3

背景
2014/6/5 文献紹介 4

背景| 意見抽出
•
2種類の意見表現のラベリング問題として
–
direct subjective expressions (DSEs)
•
個人の状態の明確な言及や発話事象
–
expressive subjective expressions (ESEs)
•
感情等を示す表現するが，明確には伝えていない表現
2014/6/5 文献紹介 5

背景| 意見抽出
•
The International Committee of the Red Cross, [as usual][ESE], [has refused to make any statements][DSE].
•
The Chief Minister [said][DSE]that [the demon they have reared will eat up their own vitals][ESE].
2014/6/5 文献紹介 6

背景| CRFs
•
これまではCRFsによる手法
–
token-levelで順次ラベリングを行う
–
該当tokenと1個前のtokenにより決定
–
segment basedの素性が利用できない
2014/6/5 文献紹介 7

背景| semi-CRFs
•
semi-CRFs (Sarawagiand Cohen, 2004)
–
segment level でラベリングが行われる
–
固有表現抽出で効果あり
–
意見抽出には適用されてない
SunitaSarawagiand William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. In Proceedings of NIPS 2004. (2004)
2014/6/5 文献紹介 8

背景| Contribution
•
構文解析結果の情報を利用したsemi-CRF- basedの意見抽出へのアプローチ

任意の長さの表現を扱うことが出来る
2014/6/5 文献紹介 9

手法
2014/6/5 文献紹介 10

Semi-CRFs
文푥は連続するsegmentsで表現される
푠=푠1,⋯,푠푛
푠푖=(푡푖,푢푖,푦푖)
•
푡푖,푢푖: segmentの始端・終端位置(1≤푢푖−푡푖+1≤퐿)
•
푦푖: ラベル
•
퐿 :コーパス中で観測された最大の長さ
2014/6/5 文献紹介 11

Semi-CRFs
•
素性はsegmentlevelで生成される
–
素性関数は푔(푖,푥,푠)
–
푔푥,푡푖,푢푖,푦푖,푦푖−1とも書ける(first-order Markovianassumptionによる)
푝푠푥= 1 푍푥 exp෍෍휆푘푔푘(푖,푥,푠) 푘푖
2014/6/5 文献紹介 12

Semi-CRFs
•
正しいsegmentationは抽出すべきentities とそうでないsegmentの列で定義される
2014/6/5 文献紹介 13
(The,NONE),(Chief,NONE),(Minister,NONE), (said,DSE),(that,NONE),(the demon they have reared will eat up their own vitals,ESE),(.,NONE)
例

提案手法| 概説
•
퐿を固定しない
–
文全体が意見表現になることもあるため

構文解析の情報を利用
–
segmentunitを構文木に基づいて決定
–
leaf phrase 又はleaf wordがunitになり得る
2014/6/5 文献紹介 14

提案手法| segmentation
2014/6/5 文献紹介 15

2014/6/5 文献紹介 16
このようなsegmentにはなり得ない

2014/6/5 文献紹介 17
最右端を共有するunitは同じsegmentになり得る

•
訓練データ各文に対して正しい segmentationを得る
2014/6/5 文献紹介 18
(The ChiefMinister,NONE),(said,DSE),
(that,NONE),(the demon they have reared will eat up their own vitals,ESE),(.,NONE)
例

提案手法| 学習
•
semi-CRFモデルの学習を行う
–
対数尤度を最大化するパラメータ휆を見つける
–
limited-memory BFGSアルゴリズムを使用
2014/6/5 文献紹介 19

提案手法| 素性
•
CRF-style features (token-level)
–
その単語の文字列，POS，辞書に基づく素性
•
segment-level features
–
segmentの位置，構文に基づく素性
2014/6/5 文献紹介 20

•
辞書に基づく素性
–
subjectivity lexicon (Wilson et al. 2005)
–
strong/weak cues to subjectivity として働き得る単語集合
–
token-level 푥 is 푔푔푔푔푔 →segment level 푠 contains 푔푔푔푔푔
2014/6/5 文献紹介 21
Theresa Wilson, JanyceWiebe, and Paul Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT ‘05. (2005)

•
構文に基づく素性
–
動詞句に注目
2014/6/5 文献紹介 22

2014/6/5 文献紹介 23

2014/6/5 文献紹介 24
VPROOT

2014/6/5 文献紹介 25
VPROOT
VPLEAF

2014/6/5 文献紹介 26
VPROOT
VPLEAF
predicate
argument

2014/6/5 文献紹介 27
verb-cluster segment
VP segment

•
VPcluster: verb-cluster structureかどうか
•
VPpred: predicate
•
VParg: argument
•
VPsubj: subjective lexiconのentryを含むか
2014/6/5 文献紹介 28

実験
2014/6/5 文献紹介 29

実験| 設定
•
MPQA 1.2 corpus
–
ニュース535記事，11,114文
–
phrase levelでアノテート済み
•
135記事をdevelopment setとして使用
•
残り400記事で10分割交差検定による評価
2014/6/5 文献紹介 30

実験| 評価尺度
•
precision, recall, F-measure
–
意見表現の境界線は明確ではない

Binary-Overlap (Brecket al. 2007)

Proportional-Overlap(Johansson and Moschitti2010)
2014/6/5 文献紹介 31
Eric Breck, YejinChoi, and Claire Cardie. Identifying expressions of opinion in context. IJCAI ’07. (2007)
Richard Johansson and Alessandro Moschitti. Syntactic and semantic structure for opinion expression detection. In Proceedings of CoNLL’10. (2010)

実験| Baselines
•
CRF: token-level CRF-based approach
•
segment-CRF: parsingの結果をsegmentとして利用
•
syntactic-CRF: token-level CRFにsegment-level の
構文情報を素性として利用
•
semi-CRF: Sarawagiand Corhenのmodel
2014/6/5 文献紹介 32

実験| 結果
•
Binary-Overlap metric
2014/6/5 文献紹介 33

実験| 結果
•
Proportional-Overlap metric
2014/6/5 文献紹介 34

実験| 結果
•
構文情報の素性の効果
2014/6/5 文献紹介 35

実験| 考察
•
semi-CRF(-new)のprecisionがCRFより低い
–
CRFは正解データの半分程度しか抽出していないため
•
事実を述べるときの“said”や”told”をDSEsとして抽出してしまう
–
素性の追加によって改善が期待できる
•
“enjoy a relative advantage”をESEとして抽出してしまう
–
主語（この場合”products”）を考慮することで改善
2014/6/5 文献紹介 36

Conclusion
•
意見抽出のsemi-CRF-basedなアプローチ
–
構文解析情報を利用する
•
主観性のcueを考慮する素性を追加することで性能が改善できる
•
今後は他の意見分析タスクへの応用
2014/6/5 文献紹介 37

文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (20)

Similar to 文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields

Similar to 文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields (19)

More from Shohei Okada

More from Shohei Okada (20)

文献紹介：Extracting Opinion Expression with semi-Markov Conditional Random Fields