Research on AITV

Unsupervised Model for Topic Viewpoint
Discovery in Online Debates Leveraging
Author Interactions
2018/11/8
SAKATA-MORI LAB M1
SYO KYOJIN

Summary
2
【Title and Authors】
- Unsupervised Model for Topic Viewpoint Discovery in
Online Debates Leveraging Author Interactions
- Amine Trabelsi, Osmar R. Za ̈ıane
- ICWSM 2018
【Data】
- Six datasets about four controversial issues, extracted from
4Forums.com (Abbott et al. 2016) and CreateDebate.com
(Hasan and Ng 2014).
【Contributions】
- Propose a purely unsupervised model to detect the
viewpoints at the post and author levels from the post
contents and the authors’ interactions.
- Achieve SOTA in the post level viewpoint identification and
close performance to the supervised SOTA approach in the
author level viewpoint identification.
- Discuss about author level viewpoint clustering.

Backgrounds
3
【Motivation】
- Recently, more and more people become able to to express their opinions on different political and social issues at
social media and online forums.
- Contentious issues : legalization of abortion, same-sex marriage and Donald Trump
- Those data are really useful for decision makers and politicians, but the amount of data is too huge. So an automatic
tool to provide a contrasting overview of the main viewpoints and reasons given by both sides is necessary.
- identify the viewpoints at the post and author’s levels
- detect the relevant discourse used to express opinion and reasoning themes.

Backgrounds
4
【Challenges】
- Applying sentiment analysis techniques on contentious documents is not sufficient to produce an effective solution
to the problem (Hasan and Ng 2013; Walker et al. 2012b).
- Mohammad, Sobhani, and Kiritchenko(2017) show that both positive and negative lexicons are used, in
contentious text, to express the same stance.
- Moreover, the stance can be implicitly conveyed through reasons and arguments, and not necessarily expressed
through polarity sentiment words.
- Challenges to accurate viewpoint detection can arise because of the dialogic nature of online debate (Hasan and Ng
2014; Boltuzˇic ́ and Sˇnajder 2014).
- There are many unstructured and colloquial language in the dialog of online debate i.e., containing non-
argumentative portions and irrelevant dialogs
- The similarity in words’ usage between authors leads to clustering errors when the model is entirely based on textual
features.
- This is often the case when a post rephrases the opposing side’s premise while attacking it or when asking a
rhetorical question.
- It has been shown that complementary features like the nature of the authors’ interactions at post level can
enhance pure text-based approaches in viewpoint distinguishing (Walker et al. 2012b).

Related Works
5
【Category】
- Evaluation series 2016 (SemEval-16) propose a shared task and datasets
for stance detection in Twitter (Mohammad et al. 2016).
- Viewpoint identification can be classified in terms of the type of
- Data : Twitter, Online Debate site
- Feature : Text, Authors’ interaction
- Task : Post level, Author level (Discourse level)
- Methods : Supervised, Unsupervised
【Supervised Learning】
- Sobhani, Inkpen, and Matwin (2015) tackle the viewpoint identification
for news comments using Topic Modeling.
- Hasan and Ng (2013) identify the stance at the post and the sentence
levels of online debates corpora. They construct a rich feature set of
linguistic and semantic features
- Sridhar et al. (2015) model disagreement and collectively predict
stances at the post and the author levels on online debate corpora.
They try different modeling approaches based on Probabilistic Soft Logic
(PSL).
【Examples of Legalization of Abortion】
<AGAINST>
Life, what a beautiful choice. Adoption. Not
abortion. #SemST
<AGAINST>
If you don't want your kid, put it up for
adoption. #sorrynotsorry #SemST
<FAVOR>
Lmao my school thinks giving women the
right to an abortion is against feminism.
They don't know feminism #wtf #SemST
<FAVOR>
@AsaSoltan putting abortion in with choices
for women just won my heart over.
Yaaaaaas! #shahs #SemST

Related works
6
【Unsupervised Learning】
- The most Popular approach is Topic-Viewpoint modeling, which hypothesizes the existence of underlying topic and
viewpoint variables that influence the author’s word choice. There are many different applied targets.
- Opinion polls, editorials (Paul, Zhai, and Girju 2010)
- Online forums (Trabelsi and Za ̈ıane 2016)
- Twitter (Joshi, Bhattacharyya, and Carman 2016)
- Parliamentary debates (Vilares and He 2017).
- Qiu and Jiang (2013) exploit the authors’ interactions to discover stances of posts and cluster authors with different
viewpoints. In their work, the polarity of the interactions between the authors, positive or negative, is guided and
determined using lexicon-based methods.
- Thonet et al. (2017) propose extended versions of Social Network-LDA (SN-LDA) (Sachan et al. 2014) that model the
viewpoint discovery in SNS : the Social Network Viewpoint Discovery Models (SNVDMs) in political Twitter data.
- Dong et al. (2017) focuse on a predicting the author’s stance and propose a weakly guided Stance-based Text
Generative Model with Link Regularization (STML) using the text content and the authors’ interactions in news
comments and online debates. The number of a discussion’s turns, the presence of agreement or disagreement
signals are used to guide.

Method
7
【Model】
- Propose Author Interaction Topic Viewpoint (AITV) Model
- For each word 𝑤 𝑑𝑎 in the post, the author draws a topic 𝑧 𝑛𝑑 from
𝜃 𝑑𝑎, then, samples each word 𝑤 𝑛𝑑 from the topic-viewpoint
distribution corresponding to topic 𝑧 𝑛𝑑 and viewpoint 𝜈 𝑑𝑎 ,
𝜙 𝑧 𝑛𝑑 𝜈 𝑑𝑎
.
- A : one author
- 𝐷 𝑎 : the number of the author’s post
- 𝑁𝑑𝑎 : the number of words in each 𝑑 𝑎
- K : the total number of topics
- L : the total number of viewpoints( = 2)
- 𝜃 𝑑𝑎 : the probability distribution of K topics under a post 𝑑 𝑎
- 𝜓 𝑎 : the probability distributions of viewpoints for an author a
- 𝜈 𝑑𝑎 : a viewpoint chosen from the distribution 𝜓 𝑎
- 𝜙 𝑎 : the multinomial probability distribution over words
associated with a topic k and a viewpoint l.

Method
8
【Model】
- 𝑅𝑏𝑖𝑑 : If the reply attacks the previous author then 𝑅𝑏𝑖𝑑 is set to 1
else if takes 0
- 𝑖𝑑 : a current post with index
- Parent posts : the posts which is the current post’s author is
replying to.
- 𝑅𝑏𝑖𝑑 depends on the degree of opposition between the viewpoint
𝜈𝑖𝑑 of the current post and the viewpoints 𝜈𝑖𝑑
𝑝𝑎𝑟
of its parent posts.
- where I(condition) equals 1 if the condition is true and η is a
smoothing parameter. This modeling of authors interactions is
similar to the users interactions setting presented in (Qiu and Jiang
2013).

Method
9
【Experiment Set Up】
- Remove identical portions of text in replying posts. We remove
stop and rare words. We consider working with the stemmed
version of the words.
- The 4Forums datasets contain the ground truth stance labels at the
author level, while the CreateDebate haven’t.
- Assign to each author the majority label of their
corresponding posts (Sridhar et al. 2015).
- All the reported results of AITV in this section correspond to
aggregation measures on 10 runs or repeats.
- The number of Topics K is 30 unless specified otherwise.
- Hyper parameters are set as follows: α = 0.1; β = 1; γ = 1; η =
0.01.
- The number of the Gibbs Sampling iterations is 1500.
- The words occurring less than 20 times are considered rare
words and are removed.

Results
10
【post level identification】
- AITV clearly outperforms PSL on each of the datasets.
- the best performances are recorded on the largest and
highest connected datasets (% reply posts).
- Compared AITV’s performance against its degenerate version
“AITV-Rebuttal Known” to compare AITV to a close version to
the weakly-guided work of Qiu and Jiang (2013).
- The difference of the knowledge of rebuttal is not
significant.
- This may be due to the automatic initialization based on
authors’ interactions which helps in reducing the variance
of the non-deterministic outputs, due to the Gibbs
Sampling. .

Results
11
【author level identification】
- AITV outperforms its rival unsupervised method SNVDM,
specifically for the datasets containing many interactions. It has
also close to comparable performance with the weakly guided
STML. However, AITV’s performance remains far from that of
the supervised PSL.
- We notice a big drop in accuracies between the post level
and the author level for AITV.
- We evaluate the two unspervised methods with the BCubed F-
Measure. the performances of AITV are overall better than
SNVDM-GPU.

Results
12
【author level identification】
- We evaluate the AITV’s Topic-Viewpoint clustering of the
unigram words.
- inferring the topic of the cluster by the unigram is
difficult.
- the used vocabulary may be very similar.
- In order to perform a better evaluation we choose to use
a representative sentence retrieved from a original data
by each topic-view unigram.
- Even in the same position, example 1 and example 2
are standing on different viewpoints.
- Similary, example 5 and example 7 are standing on
different viewpoints.

THANK YOU FOR LISTENING
ANY DISCUSSION?

Research on AITV

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Research on AITV

Similar to Research on AITV (20)

Recently uploaded

Recently uploaded (17)

Research on AITV

Editor's Notes