13 naacl-a latent variable model-qiu and jiang-slides

A Latent Variable Model for
Viewpoint Discovery
from Threaded Forum Posts

Minghui Qiu and Jing Jiang
School of Information System
Singapore Management University

1

Threaded Forums

• Threaded structure
• With „reply-to‟ relations (User interactions)
• Multiple threads on the same issue
2

Contrastive viewpoints in Threaded Forums
Each Coin Has Two Sides

the Chinese athlete Liu Xiang quit the London Olympic game

Pro Obama or
Anti Obama?

How to find contrastive viewpoints
from threaded forum posts?
3

Task and Method Overview
Finding viewpoints for posts

Finding viewpoints for users

A set of corpus on
one controversial issue

Method
• A unified model for finding contrastive viewpoints (two-viewpoint)
from threaded forum posts
• We build our model based on three observations
4

Observation 1: Different Viewpoints Will
Have Different Topic Preference
• Our findings on ``LiuXiang” data set (``Will you
support LiuXiang after he failed in London Olympic
game?‟‟)
0.16
0.14
0.12

disappointed,
athlete, ad
sponsors

Support LiuXiang
Against LiuXiang

Olympic
hero, sympath
y on his injury

0.1
0.08
0.06
0.04
0.02
0
21 34 39 28 22

6

19 31

4

37 14

8

16 12 13 30 17 11

7

18

Topic focus of two viewpoints on “LiuXiang” Data Set
5

Observation 1: Different Viewpoints Will
Have Different Topic Preference
• Framing1
– Users with different sentiments/positions would focus on
different aspects of the topic. E.g.:
– For “iPhone” users: “hardware and build”, “siri”, “ios”
– Against “iPhone” users: “physical keyboard”, “android”, “galaxy”

• Model assumption
– Each viewpoint has its own topic distribution

1D.

Tversky, Amos; Kahneman. The framing of decisions and
the psychology of choice. pages 453–458, 1981.
6

Observation 2: the Same User Will Hold
the Same Viewpoint Towards an Issue
• User consistency
– Posts from the same user tend to have the same
viewpoint towards an issue
– A viewpoint can be derived from the set of posts
towards the same issue grouped by the same user ID

– There is a user-level viewpoint distribution
– For each post by a user, its viewpoint is drawn from
the corresponding user‟s viewpoint distribution

7

Observation 3: User Interactions Reveal
User Viewpoints
• User interaction
– User interaction: a post in reply to another user
– Users with the same viewpoint tend to have positive
interactions among themselves, while with different
viewpoint tend to have negative interactions

• Sample positive and negative interactions

8

Observation 3: User Interactions Reveal
User Viewpoints
– Interaction polarity is generated based on the
viewpoint of the current post and the viewpoint of
recipient post(s)
User 1
Id

2

Viewpoint

v1

User 2

Content

Post Id

V1

2

V1
V1

5

?

…

Positive Interaction

1
3

I agree with your post Dan. Obama
is so …

Viewpoint

?

p(POS):
p(NEG): 1 - p(POS)
Y
9

Overview of the Model
• A probabilistic model based on three
observations
– Each viewpoint‟s topic preference
– User consistency
– User interaction

10

Related Works
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal
– No user interaction

• Cross-Perspective Topic Model (Fang et al.,
WSDM‟12)
– Supervised model

• Subgroup detection
– Mining user opinions (Abu-Jbara et al., ACL‟12)
– User interaction (Hassan et al., EMNLP‟12)
– Does not model viewpoints
11

A Probabilistic Model
Topic specific word distribution

Viewpoint specific topic distribution

Y

T

w

•U: # of users
•N: # of posts
•L: # of words
•z: a topic label
•x: a switch
•x=0: w is background word
•x=1: w is topical word
•y: a viewpoint label
•s: a interaction type

z

x

User-level
viewpoint
distribution

L
y

s
Interaction type

N

U

The polarity of interaction type is learnt
beforehand.
12

Polarity Prediction for Interaction Type
• Supervised learning
– Requiring labeled data

• Unsupervised approach
– Sample sentence: I agree with you
– Finding interaction expressions
• Finding sentences contains mentions of the recipient (user
name or 2nd-person pronoun). E.g. you
• Surrounding words: a text window of 8 words. E.g.: I agree

– Interaction polarity
• Positive if there are more positive sentiment words, otherwise
negative

13

Evaluation
• Data Sets
– English Data Sets
• Three most discussed threads from Abu-Jbara et al., ACL‟12

– Chinese Data Sets
• Three popular controversial issues in TianYaClub (one of the
most popular Chinese online forums)

• Statistics

14

Data Annotation
• Identification of viewpoints
– 150 randomly sampled posts, two annotators
(Cohen‟s kappa agreement ≥ 0.61)

• Identification of user groups
– 150 randomly sampled users, two annotators
(Cohen‟s kappa agreement ≥ 0.70)

To label a user‟s viewpoint is easier
than to label a post‟s viewpoint
15

Baselines
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal

• Degenerate variants of our model
– UIM: User interaction model (part of our model)
– JVTM: Joint viewpoint-topic model (our model without
interaction)
– JVTM-G: JVTM with a global viewpoint distribution

16

Identification of Viewpoints
• Task
– To identify each post‟s viewpoint

• Results
• Our model significantly
outperforms other models (at
10% significance level)
• Effectiveness of assumptions
•
•
•

Each viewpoint’s topic preference:
JVTM > TAM
User consistency: JVTM > JVTM-G
User interaction: JVTM-UI > others

• User interaction is more important
than other factors
Averaged results of the models in
identification of viewpoints
17

Identification of User Groups
• Subgroup detection
– To detect ideological subgroups, i.e.: user groups with
different viewpoints

• Results
• Our model significantly
outperforms other methods (at
10% significance level)

• Effectiveness of assumptions
•
•
•

Each viewpoint’s topic preference:
JVTM > TAM
User consistency: JVTM > JVTM-G
User interaction: JVTM-UI > others

Averaged results of the models in
identification of viewpoints
18

Qualitative Analysis
• User interaction network on “will you vote
obama”

Green (left) and white (right) nodes represent users with two
different viewpoints discovered by our model. Red (thin) edges
represent negative interactions while blue (thick) edges represent
positive interactions
More intra-cluster positive interactions and
More inter-cluster negative interactions
19

• Users with different viewpoints tend to have
different topic focus
0.16

Support LiuXiang

0.14

Against LiuXiang

0.12
0.1
0.08
0.06
0.04
0.02
0
21 34 39 28 22

6

19 31

4

37 14

8

16 12 13 30 17 11

7

18

Topic focus of two viewpoints on “LiuXiang” Data Set

20

• Top 4 topics for “supporting LiuXiang” viewpoint
Word

Translation Word

Translation Word

刘翔

LiuXiang

栏

hurdle

运动员

athlete

第一

first

冠军

champion

伤

injury

奥运会

时间

time

赛后

after-game

成绩

record

跟腱

Olympic
Achilles's
tendon

奥运

Olympic

田径

摔倒

fall

北京

beijing

获得

achieve

男子

track and
field
man

13秒

13s

脚

foot

一个

one

最后

finally

手术

surgery

london

届

time

刘

liu

决赛

final

伦敦
田联

IAAF

情况

condition

奥运会

Olympic

英国

Britain

医生

doctor

train

参加

attend

受伤

hurt

上海

Shang Hai

训练
重

跑

run

field

导致

result in

already

broken

记者
好

reporter

已经

赛场
断裂

good

遗憾

纪录

record

英雄

hero

团队

team

联赛

12秒

12s

first heat

需要

that time
retire

夺冠
跳
跑道

champion

当时
退役

预赛
2012年
罗伯斯

pity
league
matches
need

jump
report

第二
伟大

2nd
great

2012
Robles

Translation Word

Translation

heavy

21

• Top 4 topics for “against LiuXiang” viewpoint
Word
帖
社区

Translation
post
community

Word
发自
随时

Translation
orgin from
anytime

Word
天涯
楼主

Translation
tianya
poster

Word
天涯
抵制

Translation
tianya
Resist

热点

hot

老板

boss

猫

sneak

骗子

lier

围观

apathetic

政协

CPPCC

妈

F**K

体坛

sports

傻逼

fool

帮

those

水

spam

最
钱
水军
笑
骂
孙子

least
money
spam
laugh
scold
foolish

medal
唯金牌论 gold theory
only
smile
微笑
support
顶
nausea
恶心
可口可乐 Coca Cola
drink
喝
joke
笑话

孙子
啤酒
杨
全家
别有用心
躲

foolish
bear
yang
whole family
ulterior motive
hide

提
吃
牌
苦笑
高尚
有力

你们

you

加油

cheer up

歪风

bad tendency 劳民伤财

多么

extremly

脱离

有人

someone

枪眼

脸上

face

神位

separate
看看
force of public 滩
opinion
fame
精神

look

黑

mention
eat
medal
bitter smile
noble
powerful
a waste of
money
and
manpower
spam

those

黄继光

a hero

spirit

神像

fame
22

Summary
• Conclusion
• A viewpoint discovery model for threaded forums
• Modeling three observations
• Viewpoint-specific topic distribution (Framing)
– User consistency
– Interplay between user interactions and viewpoints

– Future work
–
–
–
–

Document representation: complex lexical units
A more accurate interaction polarity classifier
Contrastive viewpoint summarization
Mining controversial issues and finding viewpoints
23

Reference
• [Paul et al., AAAI‟10] Paul, M. J. and Girju, R. (2010). A twodimensional topic-aspect model for discovering multi-faceted topics.
In AAAI.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al. (2012), Subgroup
detection in ideological discussions. In ACL.
• [Yi Fang et al. WSDM‟12] Yi Fang et al. (2012), Mining contrastive
opinions on political texts using cross-perspective topic model. In
WSDM, pages 63–72.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al., (2012). Subgroup
detection in ideological discussions. In ACL.
• [Hassan et al., EMNLP‟12] Hassan et al., (2012). Detecting
subgroups in online discussions by modeling positive and negative
relations among participants. In EMNLP.

25

13 naacl-a latent variable model-qiu and jiang-slides

Recommended

Recommended

More Related Content

Similar to 13 naacl-a latent variable model-qiu and jiang-slides

Similar to 13 naacl-a latent variable model-qiu and jiang-slides (20)

Recently uploaded

Recently uploaded (20)

13 naacl-a latent variable model-qiu and jiang-slides

Editor's Notes