Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically
discover the viewpoints or stances on hot is-sues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discov-ery from threaded forum posts. Our model is a principled generative latent variable model which captures three important factors: view-point specific topic preference, user id and user interactions. Evaluation results show that our model clearly outperforms a number of baseline models in terms of both clustering posts based on viewpoints and clustering users with different viewpoints.
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
13 naacl-a latent variable model-qiu and jiang-slides
1. A Latent Variable Model for
Viewpoint Discovery
from Threaded Forum Posts
Minghui Qiu and Jing Jiang
School of Information System
Singapore Management University
1
2. Threaded Forums
• Threaded structure
• With „reply-to‟ relations (User interactions)
• Multiple threads on the same issue
2
3. Contrastive viewpoints in Threaded Forums
Each Coin Has Two Sides
the Chinese athlete Liu Xiang quit the London Olympic game
Pro Obama or
Anti Obama?
How to find contrastive viewpoints
from threaded forum posts?
3
4. Task and Method Overview
Finding viewpoints for posts
Finding viewpoints for users
A set of corpus on
one controversial issue
Method
• A unified model for finding contrastive viewpoints (two-viewpoint)
from threaded forum posts
• We build our model based on three observations
4
5. Observation 1: Different Viewpoints Will
Have Different Topic Preference
• Our findings on ``LiuXiang” data set (``Will you
support LiuXiang after he failed in London Olympic
game?‟‟)
0.16
0.14
0.12
disappointed,
athlete, ad
sponsors
Support LiuXiang
Against LiuXiang
Olympic
hero, sympath
y on his injury
0.1
0.08
0.06
0.04
0.02
0
21 34 39 28 22
6
19 31
4
37 14
8
16 12 13 30 17 11
7
18
Topic focus of two viewpoints on “LiuXiang” Data Set
5
6. Observation 1: Different Viewpoints Will
Have Different Topic Preference
• Framing1
– Users with different sentiments/positions would focus on
different aspects of the topic. E.g.:
– For “iPhone” users: “hardware and build”, “siri”, “ios”
– Against “iPhone” users: “physical keyboard”, “android”, “galaxy”
• Model assumption
– Each viewpoint has its own topic distribution
1D.
Tversky, Amos; Kahneman. The framing of decisions and
the psychology of choice. pages 453–458, 1981.
6
7. Observation 2: the Same User Will Hold
the Same Viewpoint Towards an Issue
• User consistency
– Posts from the same user tend to have the same
viewpoint towards an issue
– A viewpoint can be derived from the set of posts
towards the same issue grouped by the same user ID
• Model assumption
– There is a user-level viewpoint distribution
– For each post by a user, its viewpoint is drawn from
the corresponding user‟s viewpoint distribution
7
8. Observation 3: User Interactions Reveal
User Viewpoints
• User interaction
– User interaction: a post in reply to another user
– Users with the same viewpoint tend to have positive
interactions among themselves, while with different
viewpoint tend to have negative interactions
• Sample positive and negative interactions
8
9. Observation 3: User Interactions Reveal
User Viewpoints
• Model assumption
– Interaction polarity is generated based on the
viewpoint of the current post and the viewpoint of
recipient post(s)
User 1
Id
2
Viewpoint
v1
User 2
Content
Post Id
V1
2
V1
V1
5
?
…
Positive Interaction
1
3
I agree with your post Dan. Obama
is so …
Viewpoint
?
p(POS):
p(NEG): 1 - p(POS)
Y
9
10. Overview of the Model
• A probabilistic model based on three
observations
– Each viewpoint‟s topic preference
– User consistency
– User interaction
10
11. Related Works
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal
– No user interaction
• Cross-Perspective Topic Model (Fang et al.,
WSDM‟12)
– Supervised model
• Subgroup detection
– Mining user opinions (Abu-Jbara et al., ACL‟12)
– User interaction (Hassan et al., EMNLP‟12)
– Does not model viewpoints
11
12. A Probabilistic Model
Topic specific word distribution
Viewpoint specific topic distribution
Y
T
w
•U: # of users
•N: # of posts
•L: # of words
•z: a topic label
•x: a switch
•x=0: w is background word
•x=1: w is topical word
•y: a viewpoint label
•s: a interaction type
z
x
User-level
viewpoint
distribution
L
y
s
Interaction type
N
U
The polarity of interaction type is learnt
beforehand.
12
13. Polarity Prediction for Interaction Type
• Supervised learning
– Requiring labeled data
• Unsupervised approach
– Sample sentence: I agree with you
– Finding interaction expressions
• Finding sentences contains mentions of the recipient (user
name or 2nd-person pronoun). E.g. you
• Surrounding words: a text window of 8 words. E.g.: I agree
– Interaction polarity
• Positive if there are more positive sentiment words, otherwise
negative
13
14. Evaluation
• Data Sets
– English Data Sets
• Three most discussed threads from Abu-Jbara et al., ACL‟12
– Chinese Data Sets
• Three popular controversial issues in TianYaClub (one of the
most popular Chinese online forums)
• Statistics
14
15. Data Annotation
• Identification of viewpoints
– 150 randomly sampled posts, two annotators
(Cohen‟s kappa agreement ≥ 0.61)
• Identification of user groups
– 150 randomly sampled users, two annotators
(Cohen‟s kappa agreement ≥ 0.70)
To label a user‟s viewpoint is easier
than to label a post‟s viewpoint
15
16. Baselines
• Topic-Aspect Model (TAM, Paul et al., AAAI‟10)
– A viewpoint-topic model where viewpoint and topic
are orthogonal
• Degenerate variants of our model
– UIM: User interaction model (part of our model)
– JVTM: Joint viewpoint-topic model (our model without
interaction)
– JVTM-G: JVTM with a global viewpoint distribution
16
17. Identification of Viewpoints
• Task
– To identify each post‟s viewpoint
• Results
• Our model significantly
outperforms other models (at
10% significance level)
• Effectiveness of assumptions
•
•
•
Each viewpoint’s topic preference:
JVTM > TAM
User consistency: JVTM > JVTM-G
User interaction: JVTM-UI > others
• User interaction is more important
than other factors
Averaged results of the models in
identification of viewpoints
17
18. Identification of User Groups
• Subgroup detection
– To detect ideological subgroups, i.e.: user groups with
different viewpoints
• Results
• Our model significantly
outperforms other methods (at
10% significance level)
• Effectiveness of assumptions
•
•
•
Each viewpoint’s topic preference:
JVTM > TAM
User consistency: JVTM > JVTM-G
User interaction: JVTM-UI > others
Averaged results of the models in
identification of viewpoints
18
19. Qualitative Analysis
• User interaction network on “will you vote
obama”
Green (left) and white (right) nodes represent users with two
different viewpoints discovered by our model. Red (thin) edges
represent negative interactions while blue (thick) edges represent
positive interactions
More intra-cluster positive interactions and
More inter-cluster negative interactions
19
20. Qualitative Analysis
• Users with different viewpoints tend to have
different topic focus
0.16
Support LiuXiang
0.14
Against LiuXiang
0.12
0.1
0.08
0.06
0.04
0.02
0
21 34 39 28 22
6
19 31
4
37 14
8
16 12 13 30 17 11
7
18
Topic focus of two viewpoints on “LiuXiang” Data Set
20
21. Qualitative Analysis
• Top 4 topics for “supporting LiuXiang” viewpoint
Word
Translation Word
Translation Word
刘翔
LiuXiang
栏
hurdle
运动员
athlete
第一
first
冠军
champion
伤
injury
奥运会
时间
time
赛后
after-game
成绩
record
跟腱
Olympic
Achilles's
tendon
奥运
Olympic
田径
摔倒
fall
北京
beijing
获得
achieve
男子
track and
field
man
13秒
13s
脚
foot
一个
one
最后
finally
手术
surgery
london
届
time
刘
liu
决赛
final
伦敦
田联
IAAF
情况
condition
奥运会
Olympic
英国
Britain
医生
doctor
train
参加
attend
受伤
hurt
上海
Shang Hai
训练
重
跑
run
field
导致
result in
already
broken
记者
好
reporter
已经
赛场
断裂
good
遗憾
纪录
record
英雄
hero
团队
team
联赛
12秒
12s
first heat
需要
that time
retire
夺冠
跳
跑道
champion
当时
退役
预赛
2012年
罗伯斯
pity
league
matches
need
jump
report
第二
伟大
2nd
great
2012
Robles
Translation Word
Translation
heavy
21
22. Qualitative Analysis
• Top 4 topics for “against LiuXiang” viewpoint
Word
帖
社区
Translation
post
community
Word
发自
随时
Translation
orgin from
anytime
Word
天涯
楼主
Translation
tianya
poster
Word
天涯
抵制
Translation
tianya
Resist
热点
hot
老板
boss
猫
sneak
骗子
lier
围观
apathetic
政协
CPPCC
妈
F**K
体坛
sports
傻逼
fool
帮
those
水
spam
最
钱
水军
笑
骂
孙子
least
money
spam
laugh
scold
foolish
medal
唯金牌论 gold theory
only
smile
微笑
support
顶
nausea
恶心
可口可乐 Coca Cola
drink
喝
joke
笑话
孙子
啤酒
杨
全家
别有用心
躲
foolish
bear
yang
whole family
ulterior motive
hide
提
吃
牌
苦笑
高尚
有力
你们
you
加油
cheer up
歪风
bad tendency 劳民伤财
多么
extremly
脱离
有人
someone
枪眼
脸上
face
神位
separate
看看
force of public 滩
opinion
fame
精神
look
黑
mention
eat
medal
bitter smile
noble
powerful
a waste of
money
and
manpower
spam
those
黄继光
a hero
spirit
神像
fame
22
23. Summary
• Conclusion
• A viewpoint discovery model for threaded forums
• Modeling three observations
• Viewpoint-specific topic distribution (Framing)
– User consistency
– Interplay between user interactions and viewpoints
– Future work
–
–
–
–
Document representation: complex lexical units
A more accurate interaction polarity classifier
Contrastive viewpoint summarization
Mining controversial issues and finding viewpoints
23
25. Reference
• [Paul et al., AAAI‟10] Paul, M. J. and Girju, R. (2010). A twodimensional topic-aspect model for discovering multi-faceted topics.
In AAAI.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al. (2012), Subgroup
detection in ideological discussions. In ACL.
• [Yi Fang et al. WSDM‟12] Yi Fang et al. (2012), Mining contrastive
opinions on political texts using cross-perspective topic model. In
WSDM, pages 63–72.
• [Abu-Jbara et al., ACL‟12] Amjad Abu-Jbara et al., (2012). Subgroup
detection in ideological discussions. In ACL.
• [Hassan et al., EMNLP‟12] Hassan et al., (2012). Detecting
subgroups in online discussions by modeling positive and negative
relations among participants. In EMNLP.
25
Editor's Notes
Users with the same viewpoint tend to have positive interactions among themselvesUsers with different viewpoints tend to have negative interactions among themselves
The polarity of an interaction expression is generated based on the viewpoint of the current post and the viewpoint of the post(s) that the current post replies to