SlideShare a Scribd company logo
1 of 69
Download to read offline
Topic Models for Unsupervised Discovery
of Viewpoints on the Web
Modèles thématiques pour la découverte non supervisée
de points de vue sur le Web
PhD defense
Thibaut THONET
Advised by Guillaume CABANAC, Karen PINEL-SAUVAGNAT, Mohand BOUGHANEM
23 November 2017
Talk outline
1. Introduction
2. Literature Review
3. C1: Viewpoint Discovery in Text Documents
4. C2: Viewpoint Discovery in Social Networks
5. Conclusion
2 / 31
Introduction
Introduction
3 / 31
Introduction
‘Traditional’ opinion mining
Massive amount of opinions on the Web
=⇒ Need for automated methods to identify,
classify and summarize opinions
Traditional opinion mining research mainly focused on product/service review analysis
=⇒ Identification of a review’s polarity w.r.t. a target: positive/negative
Images and reviews taken from Wikipedia and Amazon.com, February 2016.
4 / 31
Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions
E.g., to deal with filter bubbles [Pariser, 2011] & echo chambers [Sunstein, 2009]
Image taken from wired.com, October 2017.
5 / 31
Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
A viewpoint is defined as the position adopted by a group of people on a given issue (e.g.,
related to policy, society or economy) and underlies a set of specific values, beliefs or principles
Image taken from social.rollins.edu, November 2017.
5 / 31
Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
Application: to build an argument map to help decision makers and the general public
Image taken from shale-gas-information-platform.org, March 2016.
5 / 31
Introduction
Beyond traditional opinion mining: towards viewpoint mining
. . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions
Application: to build an argument map to help decision makers and the general public
Image taken from shale-gas-information-platform.org, March 2016.
5 / 31
Introduction
Challenges
Challenges compared to traditional opinion mining:
Viewpoints expressed in a more subtle way
than reviews’ polar opinions (“I like”, “I hate”)
and more domain dependent. . .
. . . Opinion lexicons then less useful for
viewpoint mining
Domain knowledge is not always available
and costly to gather
=⇒ Need for unsupervised approaches
E.g., latent variable models / topic models a la
Latent Dirichlet Allocation [Blei+, NIPS ’01]
Images taken from Wikipedia and jewishjournal.com, February 2016.
6 / 31
Literature Review
Literature Review
7 / 31
Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]   
[Fang+, WSDM ’12]   
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
  
Ours [Thonet+, ECIR ’16]   
Identification of viewpoint-specific discourse given (known) documents’ viewpoints:
Cross-cultural topic model by [Paul+, EMNLP ’09] to study culture-specific discourse
Cross-perspective topic model by [Fang+, WSDM ’12] based on part-of-speech to partition topical
words (topic-specific words) and opinion words (viewpoint/topic-specific words)
8 / 31
Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]   
[Fang+, WSDM ’12]   
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
  
Ours [Thonet+, ECIR ’16]   
Learning documents’ viewpoint assignments based on text content:
Topic-Aspect Model by [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint model by [Trabelsi+, ICDM ’14] to extract arguing expressions
8 / 31
Literature Review
Related work: unsupervised viewpoint discovery in text documents
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
[Paul+, EMNLP ’09]   
[Fang+, WSDM ’12]   
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
  
Ours [Thonet+, ECIR ’16]   
Our first contribution [Thonet+, ECIR ’16] investigates the utility of topical/opinion
word partitioning based on part-of-speech to learn documents’ viewpoint
assignments
8 / 31
Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]     
[Fang+, WSDM ’12]     
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
    
Ours [Thonet+, ECIR ’16]     
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
    
[Joshi+, WASSA@NAACL ’16]     
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
    
[Barberá, Polit. Anal. ’15]     
Ours [Thonet+, CIKM ’17]     
Identifying users’ viewpoints in social media data:
Viewpoint modeling in forum posts by [Qiu+, NAACL ’13] and [Qiu+, CIKM ’13] based on a topic
model that leverages post reply information
Political affiliation (≈ viewpoint) prediction of Twitter users in [Joshi+, WASSA@NAACL ’16] based
on tweet content and part-of-speech but not social interactions
9 / 31
Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]     
[Fang+, WSDM ’12]     
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
    
Ours [Thonet+, ECIR ’16]     
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
    
[Joshi+, WASSA@NAACL ’16]     
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
    
[Barberá, Polit. Anal. ’15]     
Ours [Thonet+, CIKM ’17]     
Community detection in social networks:
Social Network Latent Dirichlet Allocation by [Sachan+, WSDM ’14] to discover communities
(≈ viewpoints) in social networks based on text content and social interactions
Similar model to SN-LDA in [Liu+, SDM ’14] but non-parametric and dynamic
9 / 31
Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]     
[Fang+, WSDM ’12]     
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
    
Ours [Thonet+, ECIR ’16]     
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
    
[Joshi+, WASSA@NAACL ’16]     
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
    
[Barberá, Polit. Anal. ’15]     
Ours [Thonet+, CIKM ’17]     
Ideal point model by [Barberá, Polit. Anal. ’15] to identify Twitter users’ (real-valued)
ideology (≈ viewpoint) based on follow interactions
9 / 31
Literature Review
Related work: unsupervised viewpoint discovery in social media
Reference
Learning of
viewpoint
assignments
Identification of
viewpoint-specific
discourse
Words’ viewpoint-
dependency guided
by parts of speech
Designed for
social media
Leveraging of
social network
interactions
[Paul+, EMNLP ’09]     
[Fang+, WSDM ’12]     
[Paul+, AAAI ’10];
[Trabelsi+, ICDM ’14]
    
Ours [Thonet+, ECIR ’16]     
[Qiu+, NAACL ’13];
[Qiu+, CIKM ’13]
    
[Joshi+, WASSA@NAACL ’16]     
[Sachan+, WSDM ’14];
[Liu+, SDM ’14]
    
[Barberá, Polit. Anal. ’15]     
Ours [Thonet+, CIKM ’17]     
Our second contribution [Thonet+, CIKM ’17] proposes to identify users’ viewpoints
in social networks based on both text content and social interactions
9 / 31
C1: Viewpoint Discovery in Text Documents
C1: Viewpoint Discovery in Text Documents
10 / 31
C1: Viewpoint Discovery in Text Documents
Task
Discover topics and viewpoints from documents based on text content
11 / 31
C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
12 / 31
C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Topical words (topic-dependent) and opinion words (viewpoint/topic-dependent) partitioning
Inspired by opinion/viewpoint mining works: e.g., [Turney, ACL ’02], [Fang+, WSDM ’12]
Partition based on part-of-speech
=⇒ A word w is a topical word if its part-of-speech category x is 0 (noun) or an opinion
word if its part-of-speech category x is 1 (adjective, verb, adverb...)
12 / 31
C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Sentence-level topic assignments z instead of word-level to better align topical words and
opinion words with the sentence’s topic
Document-level viewpoint assignments v: an opinionated document is usually written by
one author, i.e., according to one viewpoint
Viewpoint-specific topic distributions θ instead of document-specific: [Qiu+, NAACL ’13]
observed that different viewpoints have different dominating topics
12 / 31
C1: Viewpoint Discovery in Text Documents
VODUM: the Viewpoint and Opinion Discovery Unification Model
We designed a novel topic model to address our research task: the Viewpoint and Opinion
Discovery Unification Model [Thonet+, ECIR ’16]
Approximate posterior inference using collapsed Gibbs sampling
Dirichlet distributions θ, π, φ0, φ1 integrated out
Successively sample discrete latent variables z, v from their posterior distributions (i.e., given
observations w, x)
=⇒ Provides distributions’ estimators ˆθ, ˆπ, ˆφ0, ˆφ1 and assignments v and z for all docs/sentences
12 / 31
C1: Viewpoint Discovery in Text Documents
Experimental setup
Hyperparameters set to fixed values: α = 0.01, β0 = β1 = 0.01, η = 100
Evaluation based on the Bitterlemons collection (http://www.bitterlemons.net/),
introduced by [Lin+, CoNLL ’06], containing essays about the Israeli-Palestinian conflict
=⇒ Number of viewpoints V set to 2
Total number
of documents
Number of essays written
by Israeli authors
Number of essays written
by Palestinian authors
594 297 297
Viewpoint clustering performance measured in terms of Accuracy (≈ to what extent obtained
clusters and groundtruth classes overlap)
13 / 31
C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-D
14 / 31
C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-O
14 / 31
C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-W
14 / 31
C1: Viewpoint Discovery in Text Documents
Baselines
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints
Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining
Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V
Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S
VODUM vs . . .
. . . VODUM-S
14 / 31
C1: Viewpoint Discovery in Text Documents
Evaluation: viewpoint clustering
Clustering of document-level Israeli/Palestinian viewpoints (each boxplot drawn from 50 chains)
Higher accuracy = better clustering performance
VODUM  TAM, JTV, LDA: VODUM overall beats
state-of-the-art baselines
TAM JTV, LDA: TAM performs the best among
state-of-the-art baselines
VODUM  VODUM-D: viewpoint-specific
topic distributions slightly improve accuracy
VODUM VODUM-O: opinion and topical words
partitioning considerably improves accuracy
VODUM VODUM-W: sentence-level topic
assignments considerably improve accuracy
VODUM  VODUM-S: document-level viewpoint
assignments slightly improve accuracy
VODUM
TAM
JTV
LDA
VODUM−D
VODUM−O
VODUM−W
VODUM−S
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
Models
Accuracy
15 / 31
C1: Viewpoint Discovery in Text Documents
Evaluation: qualitative analysis
Most probable topical/opinion words for a topic (manually annotated as “Middle East conflicts”)
Middle East conflicts
Topical words
israel palestinian syria jihad war iraq dai suicid destruct iran
Middle East conflicts
Opinion words (I)
islam isra terrorist recent militari intern like heavi close american
Middle East conflicts
Opinion words (P)
need win think sai don strong new sure believ commit
Topical words are unbiased towards a viewpoint and clearly reflect Middle East conflicts:
e.g., syria, war, iraq and destruct
Coherent opinion words for the Israeli viewpoint: e.g., islam, terrorist and american
Palestinian viewpoint remains non-specific about conflicts and do not mention islam or
terrorism: e.g., win, strong and commit
16 / 31
C2: Viewpoint Discovery in Social Networks
C2: Viewpoint Discovery in Social Networks
17 / 31
C2: Viewpoint Discovery in Social Networks
Task
Discover topics and viewpoints from social networking data, leveraging both posted text
content and social interactions between users
18 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
Observed data: tokens occurring
in documents posted by users
=⇒ 3 nested plates
Latent topics assigned to each
token
Latent viewpoints assigned at
document-level
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Text content component
Following the Topic-Aspect Model
from [Paul+, AAAI ’10], definition of
four word types specified by switch
variables (level) and x (route):
Background words
=⇒ = 0, x = 0
Viewpoint words
=⇒ = 0, x = 1
Topic words
=⇒ = 1, x = 0
Viewpoint-topic words
=⇒ = 1, x = 1
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
Outgoing interactions for user u =
interactions initiated by u on another
user (recipient r)
I #GOP
RT
Following SN-LDA from [Sachan+,
WSDM ’14], viewpoints assigned to
outgoing interactions (homophily)
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
. . . But outgoing interactions
insufficient for some users
I #GOP
RT
RT
@
=⇒ We propose to also exploit
incoming interactions
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Social interaction component
Incoming interactions for user u =
interactions initiated by another user
(sender s) on u
I #GOP
RT
Viewpoint assigned to the document
being interacted upon
19 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM: the Social Network Viewpoint Discovery Model
We defined the Social Network Viewpoint Discovery Model to jointly discover topics and
viewpoints from posted text content and social interactions [Thonet+, CIKM ’17]
Approximate inference based on
Collapsed Gibbs Sampling
Dirichlet/Bernoulli distributions σ,
ψ, θ, π, φ, ξ integrated out
Successively sample discrete
latent variables , x, z, v, v from
their posterior distributions (i.e.,
given observations w, s, r)
Hyperparameters δ, γ, α, η, µ
sampled according to the
auxiliary variable technique
following [Newman+, J. Mach.
Learn. Res. ’09] and β fixed to
0.01
19 / 31
C2: Viewpoint Discovery in Social Networks
Limits of SNVDM’s social interaction component
Some users have very few social interactions
=⇒ Difficult to identify their viewpoints based on scarce direct interactions
20 / 31
C2: Viewpoint Discovery in Social Networks
Limits of SNVDM’s social interaction component
We propose to extend SNVDM to leverage “aquaintances of acquaintances” (≈ friends of friends)
How? =⇒ Generalized Pólya Urn scheme
20 / 31
C2: Viewpoint Discovery in Social Networks
SNVDM-GPU: extension of SNVDM based on Generalized Pólya Urn
Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling
E.g., for outgoing interaction o from user u on user u :
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + µ 1
U
nv• + µ
SNVDM vs . . .
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + u ∈R
u τ
λ nvu + µ 1
U
nv• + U
u =1 u ∈R
u τ
λ nvu + µ
. . . SNVDM-GPU
21 / 31
C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
Dataset
#Users
#Tweets #Tokens Vocabulary #Interactions
Yes/Dem. No/Rep.
Indyref 589 575 270,075 2,043,204 38,942 696,654
Midterms 767 778 113,545 975,199 25,312 241,741
22 / 31
C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
=⇒ Only text content to discover viewpoints and topics
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
=⇒ Text content and outgoing interactions to discover communities (≈ viewpoints) and topics
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
=⇒ Text content to discover viewpoints and topics, and parts of speech to distinguish between
topic words and viewpoint-topic words
22 / 31
C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
SNVDM vs . . . . . . SNVDM-WI
22 / 31
C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
Proposed models:
SNVDM
SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns
SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns
22 / 31
C2: Viewpoint Discovery in Social Networks
Experimental setup
Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence
Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican)
State-of-the-art baselines:
Topic-Aspect Model (TAM) from [Paul+, AAAI ’10]
Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14]
Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16]
Degenerate version of SNVDM: SNVDM-WI (without incoming interactions)
Proposed models:
SNVDM
SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns
SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns
Viewpoint clustering performance measured in terms of Purity (≈ to what extent obtained
clusters are homogeneous) and Normalized Mutual Information (information theoretic
clustering measure)
22 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 1: consistent results across different numbers of topics
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 2: SNVDM, SNVDM-GPU (τ = 10), SNVDM-GPU (τ = ∞)  all baselines
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 3: SN-LDA TAM, VODUM =⇒ interactions ↑↑↑
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 4: SNVDM  SNVDM-WI =⇒ incoming interactions ↑↑
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Indyref
Observation 5: SNVDM-GPU (τ = ∞)  SNVDM-GPU (τ = 10)  SNVDM =⇒ GPU ↑
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: viewpoint clustering on Midterms
Observation 6: similar trends on Midterms but greater improvement for our models over baselines
23 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: impact of social network sparsity
Clustering of users’ viewpoints on Indyref for different degrees of network sparsity (T = 10)
Observation: performance degraded for lower percentage of interactions
24 / 31
C2: Viewpoint Discovery in Social Networks
Evaluation: qualitative analysis
Most probable topic words and viewpoint-topic words for topics from Indyref and Midterms
Topic: Scottish independence
Neutral Viewpoint: Yes Viewpoint: No
#indyref #voteyes #indyref
scotland yes uk
independence scotland salmond
vote independence #bettertogether
campaign westminster #scotdecides
scottish vote separation
uk independent currency
people country thanks
future #yes today
independent #scotland say
Topic: Energy and resources
Neutral Viewpoint: Dem. Viewpoint: Rep.
energy #actonclimate #4jobs
house climate #obamacare
new #p2 #jobs
gas change gop
natural #climatechange obama
#energy clean bills
#ff oil jobs
#kxl energy house
support #gop act
economic seec watch
Reasonable coherence of topic words and viewpoint-topic words
Topic words indeed unbiased towards any viewpoints
Use of viewpoint-specific hashtags and mention of different issues for different viewpoints
25 / 31
Conclusion
Conclusion
26 / 31
Conclusion
Summary of contributions
VODUM discovers viewpoints and topics
in text documents, exploiting parts of
speech to distinguish between topical
words and opinion words [Thonet+,
ECIR ’16]
Lessons learned: opinion and topical
words partitioning ↑↑↑, sentence-level
topic assignments ↑↑↑
SNVDM(-GPU) discovers viewpoints and
topics in social networks, leveraging
both posted text content and social
interactions [Thonet+, CIKM ’17]
Lessons learned: social interactions ↑↑↑
27 / 31
Conclusion
Perspectives
Integrate time dimension and geolocation, e.g., to analyze party support during elections
Model viewpoints as real-valued variables to better capture nuanced opinions
Design a viewpoint summarization framework to build argument maps and help mitigate
the filter bubble and echo chamber phenomenon
28 / 31
Conclusion
Thank you!
29 / 31
Conclusion
References
Barberá, P. (2015). Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using
Twitter Data. Polit. Anal., 23(1), 76–91.
Blei, D. M., Ng, A. Y.,  Jordan, M. I. (2001). Latent Dirichlet Allocation. In Proc. of NIPS ’01 (pp. 601–608).
Brigadir, I., Greene, D.,  Cunningham, P. (2015). Analyzing Discourse Communities with Distributional
Semantic Models. In Proc. of WebSci ’15.
Fang, Y., Si, L., Somasundaram, N.,  Yu, Z. (2012). Mining Contrastive Opinions on Political Texts using
Cross-Perspective Topic Model. In Proc. of WSDM ’12 (pp. 63–72).
Joshi, A., Bhattacharyya, P.,  Carman, M. (2016). Political Issue Extraction Model: A Novel Hierarchical
Topic Model That Uses Tweets By Political And Non-Political Authors. In Proc. of WASSA@NAACL-HLT
’16 (pp. 82–90).
Lin, W.-H., Wilson, T., Wiebe, J.,  Hauptmann, A. (2006). Which Side are You on? Identifying
Perspectives at the Document and Sentence Levels. In Proc. of CoNLL ’06 (pp. 109–116).
Liu, Z., Zheng, Q., Wang, F., Tian, Z.,  Li, B. (2014). A Dynamic Nonparametric Model for Characterizing
the Topical Communities in Social Streams. In Proc. of SDM ’14 (pp. 379–387).
Newman, D., Asuncion, A., Smyth, P.,  Welling, M. (2009). Distributed Algorithms for Topic Models. J. of
Mach. Learn. Res., 10, 1801–1828.
Paul, M. J.,  Girju, R. (2010). A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted
Topics. In Proc. of AAAI ’10 (pp. 545–550).
30 / 31
Conclusion
References (continued)
Paul, M. J., Zhai, C.,  Girju, R. (2010). Summarizing Contrastive Viewpoints in Opinionated Text. In Proc.
of EMNLP ’10 (pp. 66–76).
Qiu, M.,  Jiang, J. (2013). A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts.
In Proc. NAACL-HLT ’13 (pp. 1031–1040).
Qiu, M., Yang, L.,  Jiang, J. (2013). Modeling Interaction Features for Debate Side Clustering. In Proc. of
CIKM ’13 (pp. 873–878).
Sachan, M., Dubey, A., Srivastava, S., Xing, E. P.,  Hovy, E. (2014). Spatial Compactness meets Topical
Consistency: Jointly Modeling Links and Content for Community Detection. In Proc. of WSDM ’14 (pp.
503–512).
Thonet, T., Cabanac, G., Boughanem, M.,  Pinel-Sauvagnat, K. (2016). VODUM: A Topic Model Unifying
Viewpoint, Topic and Opinion Discovery. In Proc. of ECIR ’16 (pp. 533–545).
Thonet, T., Cabanac, G., Boughanem, M.,  Pinel-Sauvagnat, K. (2017). Users Are Known by the
Company They Keep: Topic Models for Viewpoint Discovery in Social Networks. In Proc. of CIKM ’17 (pp.
87-96).
Trabelsi, A.,  Zaiane, O. R. (2014). Mining Contentious Documents Using an Unsupervised Topic Model
Based Approach. In Proc. of ICDM ’14 (pp. 550–559).
Turney, P. D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews. In Proc. of ACL ’02 (pp. 417–424).
31 / 31
Conclusion
Appendix: clustering metrics
Given the groundtruth classes S = {S1, S2}, the obtained clusters C = {C1, C2}, and the
document collection D:
Acc(C, S) =
1
|D|
max |C1 ∩ S1| + |C2 ∩ S2|, |C1 ∩ S2| + |C2 ∩ S1|
Purity(C, S) =
1
|D|
max |C1 ∩ S1|, |C1 ∩ S2| + max |C2 ∩ S1|, |C2 ∩ S2|
NMI(C, S) =
2 I(C, S)
H(C) + H(S)
with I(C, S) =
j,k
|Cj ∩ Sk|
|D|
log
|D| |Cj ∩ Sk|
|Cj| |Sk|
and H(C) = −
j
|Cj|
|D|
log
|Cj|
|D|
31 / 31
Conclusion
Appendix: perplexity analysis for VODUM and baselines
Held-out perplexity computed through 10-fold cross validation for T ∈ {5, 10, 15, 20, 30, 50}
Lower perplexity = better generalization
performance
TAM  LDA for small number of topics
( 20) and LDA  TAM for large
number of topics ( 20)
JTV  TAM, LDA for all number of
topics
VODUM  TAM, JTV, LDA for all
number of topics
! VODUM’s vocabulary is different
with that of TAM, JTV and LDA:
partitioning of topical words and
opinion words
10 20 30 40 50
400500600700800900
Number of topics (T)
Averageperplexity
Models
VODUM
TAM
JTV
LDA
31 / 31
Conclusion
Appendix: execution time for SNVDM and baselines
Execution time (in seconds) of one Gibbs sampling iteration on Indyref (with T = 10) and
Midterms (with T = 15)
Indyref Midterms
TAM 1.45 0.87
SN-LDA 1.18 0.64
VODUM 2.78 1.85
SNVDM-WII 2.08 1.08
SNVDM 2.49 1.15
SNVDM-GPU (τ = 10) 3.47 1.34
SNVDM-GPU (τ = ∞) 14.67 2.56
31 / 31
Conclusion
Appendix: Simple Pólya Urn scheme
The compound Dirichlet-Multinomial distribution (used in LDA-based topic models) can be
interpreted as an urn sampling metaphor with an over-replacement policy
+
Urn
2. Duplicate the drawn ball
3. Put back in the urn the original ball
and its duplicate
Infinite ball generator
1. Randomly draw a ball from the urn
31 / 31
Conclusion
Appendix: Generalized Pólya Urn scheme
The Simple Pólya Urn scheme can be generalized by modifying the replacement rule to exploit
similarities between balls’ colors [Mahmoud, 2008]
+
Urn
2. Duplicate the drawn ball and
generate parts of balls for those
similar to the drawn ball
3. Put back in the urn the original ball,
its duplicate and the parts of similar
balls
Infinite ball generator
+ +
1. Randomly draw a ball from the urn
31 / 31
Conclusion
Appendix: SNVDM-GPU
Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling
E.g., for outgoing interaction o from user u on user u :
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
nvu + µ 1
U
nv• + µ
SNVDM vs . . .
p(vuo = v|ruo = u , rest)
∝
nuv + η 1
V
nu• + η
·
U
u =1
Au u nvu + µ 1
U
U
u =1
Au • nvu + µ
. . . SNVDM-GPU
The addition matrix A defines the weight to put on count
nvu for each u :
Au u =



1 if u = u ,
λ if u is among top τ acquaintances of u ,
0 otherwise
with 0 ≤ λ ≤ 1 (λ = 0 =⇒ “vanilla” SNVDM) and τ ∈ N
weaker link
31 / 31

More Related Content

Similar to Topic Models for Unsupervised Discovery of Viewpoints on the Web

Social media as a tool for terminological research
Social media as a tool for terminological researchSocial media as a tool for terminological research
Social media as a tool for terminological researchTERMCAT
 
Anticipatory Factors in Dialogic Design ISSS 2016
Anticipatory Factors in Dialogic Design ISSS 2016Anticipatory Factors in Dialogic Design ISSS 2016
Anticipatory Factors in Dialogic Design ISSS 2016Peter Jones
 
Essay writing & Research Workshop
Essay writing & Research WorkshopEssay writing & Research Workshop
Essay writing & Research WorkshopMathew Toll
 
A modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfA modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfJessica Navarro
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsSimon Buckingham Shum
 
ENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsDr. Russell Rodrigo
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2Michele Knobel
 
"It's the Conversation, Stupid!" - Social media systems design for open innov...
"It's the Conversation, Stupid!" - Social media systems design for open innov..."It's the Conversation, Stupid!" - Social media systems design for open innov...
"It's the Conversation, Stupid!" - Social media systems design for open innov...CommunitySense
 
Aspects Of Research Through Design
Aspects Of Research Through DesignAspects Of Research Through Design
Aspects Of Research Through DesignAndrea Porter
 
GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2Michele Knobel
 
Literature review nov16 (1)
Literature review nov16 (1)Literature review nov16 (1)
Literature review nov16 (1)tzoubir
 
AAAS 2018 Meeting Presentation: Science CommunicationTraining Landscape
AAAS 2018 Meeting Presentation: Science CommunicationTraining LandscapeAAAS 2018 Meeting Presentation: Science CommunicationTraining Landscape
AAAS 2018 Meeting Presentation: Science CommunicationTraining LandscapeJohn C. Besley
 
Bibliometrics analysis for selecting the best field of study
Bibliometrics analysis for selecting the best field of studyBibliometrics analysis for selecting the best field of study
Bibliometrics analysis for selecting the best field of studyNader Ale Ebrahim
 
Communities as the fundament of social learning
Communities as the fundament of social learningCommunities as the fundament of social learning
Communities as the fundament of social learningLetsConnect
 
Communities as the fundament of social learning - Social Connections
Communities as the fundament of social learning - Social ConnectionsCommunities as the fundament of social learning - Social Connections
Communities as the fundament of social learning - Social ConnectionsBeck et al. GmbH
 
Systemic Design Contexts ISSS 2014
Systemic Design Contexts ISSS 2014Systemic Design Contexts ISSS 2014
Systemic Design Contexts ISSS 2014Peter Jones
 
Writing a literature review
Writing a literature reviewWriting a literature review
Writing a literature reviewWDCNewcastle
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaÓscar Muñoz García
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...Ed Chi
 

Similar to Topic Models for Unsupervised Discovery of Viewpoints on the Web (20)

Social media as a tool for terminological research
Social media as a tool for terminological researchSocial media as a tool for terminological research
Social media as a tool for terminological research
 
Anticipatory Factors in Dialogic Design ISSS 2016
Anticipatory Factors in Dialogic Design ISSS 2016Anticipatory Factors in Dialogic Design ISSS 2016
Anticipatory Factors in Dialogic Design ISSS 2016
 
Essay writing & Research Workshop
Essay writing & Research WorkshopEssay writing & Research Workshop
Essay writing & Research Workshop
 
A modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdfA modern, simplified citation style and student response.pdf
A modern, simplified citation style and student response.pdf
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning Analytics
 
ENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and results
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2
 
"It's the Conversation, Stupid!" - Social media systems design for open innov...
"It's the Conversation, Stupid!" - Social media systems design for open innov..."It's the Conversation, Stupid!" - Social media systems design for open innov...
"It's the Conversation, Stupid!" - Social media systems design for open innov...
 
Aspects Of Research Through Design
Aspects Of Research Through DesignAspects Of Research Through Design
Aspects Of Research Through Design
 
GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2
 
Literature review nov16 (1)
Literature review nov16 (1)Literature review nov16 (1)
Literature review nov16 (1)
 
AAAS 2018 Meeting Presentation: Science CommunicationTraining Landscape
AAAS 2018 Meeting Presentation: Science CommunicationTraining LandscapeAAAS 2018 Meeting Presentation: Science CommunicationTraining Landscape
AAAS 2018 Meeting Presentation: Science CommunicationTraining Landscape
 
Bibliometrics analysis for selecting the best field of study
Bibliometrics analysis for selecting the best field of studyBibliometrics analysis for selecting the best field of study
Bibliometrics analysis for selecting the best field of study
 
Provocations for CLA Dashboard
Provocations for CLA DashboardProvocations for CLA Dashboard
Provocations for CLA Dashboard
 
Communities as the fundament of social learning
Communities as the fundament of social learningCommunities as the fundament of social learning
Communities as the fundament of social learning
 
Communities as the fundament of social learning - Social Connections
Communities as the fundament of social learning - Social ConnectionsCommunities as the fundament of social learning - Social Connections
Communities as the fundament of social learning - Social Connections
 
Systemic Design Contexts ISSS 2014
Systemic Design Contexts ISSS 2014Systemic Design Contexts ISSS 2014
Systemic Design Contexts ISSS 2014
 
Writing a literature review
Writing a literature reviewWriting a literature review
Writing a literature review
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
 

Recently uploaded

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 

Recently uploaded (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 

Topic Models for Unsupervised Discovery of Viewpoints on the Web

  • 1. Topic Models for Unsupervised Discovery of Viewpoints on the Web Modèles thématiques pour la découverte non supervisée de points de vue sur le Web PhD defense Thibaut THONET Advised by Guillaume CABANAC, Karen PINEL-SAUVAGNAT, Mohand BOUGHANEM 23 November 2017
  • 2. Talk outline 1. Introduction 2. Literature Review 3. C1: Viewpoint Discovery in Text Documents 4. C2: Viewpoint Discovery in Social Networks 5. Conclusion 2 / 31
  • 4. Introduction ‘Traditional’ opinion mining Massive amount of opinions on the Web =⇒ Need for automated methods to identify, classify and summarize opinions Traditional opinion mining research mainly focused on product/service review analysis =⇒ Identification of a review’s polarity w.r.t. a target: positive/negative Images and reviews taken from Wikipedia and Amazon.com, February 2016. 4 / 31
  • 5. Introduction Beyond traditional opinion mining: towards viewpoint mining . . . But need to go beyond plain positive/negative opinions E.g., to deal with filter bubbles [Pariser, 2011] & echo chambers [Sunstein, 2009] Image taken from wired.com, October 2017. 5 / 31
  • 6. Introduction Beyond traditional opinion mining: towards viewpoint mining . . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions A viewpoint is defined as the position adopted by a group of people on a given issue (e.g., related to policy, society or economy) and underlies a set of specific values, beliefs or principles Image taken from social.rollins.edu, November 2017. 5 / 31
  • 7. Introduction Beyond traditional opinion mining: towards viewpoint mining . . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions Application: to build an argument map to help decision makers and the general public Image taken from shale-gas-information-platform.org, March 2016. 5 / 31
  • 8. Introduction Beyond traditional opinion mining: towards viewpoint mining . . . But need to go beyond plain positive/negative opinions =⇒ viewpoint-based opinions Application: to build an argument map to help decision makers and the general public Image taken from shale-gas-information-platform.org, March 2016. 5 / 31
  • 9. Introduction Challenges Challenges compared to traditional opinion mining: Viewpoints expressed in a more subtle way than reviews’ polar opinions (“I like”, “I hate”) and more domain dependent. . . . . . Opinion lexicons then less useful for viewpoint mining Domain knowledge is not always available and costly to gather =⇒ Need for unsupervised approaches E.g., latent variable models / topic models a la Latent Dirichlet Allocation [Blei+, NIPS ’01] Images taken from Wikipedia and jewishjournal.com, February 2016. 6 / 31
  • 11. Literature Review Related work: unsupervised viewpoint discovery in text documents Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] Identification of viewpoint-specific discourse given (known) documents’ viewpoints: Cross-cultural topic model by [Paul+, EMNLP ’09] to study culture-specific discourse Cross-perspective topic model by [Fang+, WSDM ’12] based on part-of-speech to partition topical words (topic-specific words) and opinion words (viewpoint/topic-specific words) 8 / 31
  • 12. Literature Review Related work: unsupervised viewpoint discovery in text documents Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] Learning documents’ viewpoint assignments based on text content: Topic-Aspect Model by [Paul+, AAAI ’10] where aspects ≈ viewpoints Joint Topic Viewpoint model by [Trabelsi+, ICDM ’14] to extract arguing expressions 8 / 31
  • 13. Literature Review Related work: unsupervised viewpoint discovery in text documents Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] Our first contribution [Thonet+, ECIR ’16] investigates the utility of topical/opinion word partitioning based on part-of-speech to learn documents’ viewpoint assignments 8 / 31
  • 14. Literature Review Related work: unsupervised viewpoint discovery in social media Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech Designed for social media Leveraging of social network interactions [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] [Qiu+, NAACL ’13]; [Qiu+, CIKM ’13] [Joshi+, WASSA@NAACL ’16] [Sachan+, WSDM ’14]; [Liu+, SDM ’14] [Barberá, Polit. Anal. ’15] Ours [Thonet+, CIKM ’17] Identifying users’ viewpoints in social media data: Viewpoint modeling in forum posts by [Qiu+, NAACL ’13] and [Qiu+, CIKM ’13] based on a topic model that leverages post reply information Political affiliation (≈ viewpoint) prediction of Twitter users in [Joshi+, WASSA@NAACL ’16] based on tweet content and part-of-speech but not social interactions 9 / 31
  • 15. Literature Review Related work: unsupervised viewpoint discovery in social media Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech Designed for social media Leveraging of social network interactions [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] [Qiu+, NAACL ’13]; [Qiu+, CIKM ’13] [Joshi+, WASSA@NAACL ’16] [Sachan+, WSDM ’14]; [Liu+, SDM ’14] [Barberá, Polit. Anal. ’15] Ours [Thonet+, CIKM ’17] Community detection in social networks: Social Network Latent Dirichlet Allocation by [Sachan+, WSDM ’14] to discover communities (≈ viewpoints) in social networks based on text content and social interactions Similar model to SN-LDA in [Liu+, SDM ’14] but non-parametric and dynamic 9 / 31
  • 16. Literature Review Related work: unsupervised viewpoint discovery in social media Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech Designed for social media Leveraging of social network interactions [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] [Qiu+, NAACL ’13]; [Qiu+, CIKM ’13] [Joshi+, WASSA@NAACL ’16] [Sachan+, WSDM ’14]; [Liu+, SDM ’14] [Barberá, Polit. Anal. ’15] Ours [Thonet+, CIKM ’17] Ideal point model by [Barberá, Polit. Anal. ’15] to identify Twitter users’ (real-valued) ideology (≈ viewpoint) based on follow interactions 9 / 31
  • 17. Literature Review Related work: unsupervised viewpoint discovery in social media Reference Learning of viewpoint assignments Identification of viewpoint-specific discourse Words’ viewpoint- dependency guided by parts of speech Designed for social media Leveraging of social network interactions [Paul+, EMNLP ’09] [Fang+, WSDM ’12] [Paul+, AAAI ’10]; [Trabelsi+, ICDM ’14] Ours [Thonet+, ECIR ’16] [Qiu+, NAACL ’13]; [Qiu+, CIKM ’13] [Joshi+, WASSA@NAACL ’16] [Sachan+, WSDM ’14]; [Liu+, SDM ’14] [Barberá, Polit. Anal. ’15] Ours [Thonet+, CIKM ’17] Our second contribution [Thonet+, CIKM ’17] proposes to identify users’ viewpoints in social networks based on both text content and social interactions 9 / 31
  • 18. C1: Viewpoint Discovery in Text Documents C1: Viewpoint Discovery in Text Documents 10 / 31
  • 19. C1: Viewpoint Discovery in Text Documents Task Discover topics and viewpoints from documents based on text content 11 / 31
  • 20. C1: Viewpoint Discovery in Text Documents VODUM: the Viewpoint and Opinion Discovery Unification Model We designed a novel topic model to address our research task: the Viewpoint and Opinion Discovery Unification Model [Thonet+, ECIR ’16] 12 / 31
  • 21. C1: Viewpoint Discovery in Text Documents VODUM: the Viewpoint and Opinion Discovery Unification Model We designed a novel topic model to address our research task: the Viewpoint and Opinion Discovery Unification Model [Thonet+, ECIR ’16] Topical words (topic-dependent) and opinion words (viewpoint/topic-dependent) partitioning Inspired by opinion/viewpoint mining works: e.g., [Turney, ACL ’02], [Fang+, WSDM ’12] Partition based on part-of-speech =⇒ A word w is a topical word if its part-of-speech category x is 0 (noun) or an opinion word if its part-of-speech category x is 1 (adjective, verb, adverb...) 12 / 31
  • 22. C1: Viewpoint Discovery in Text Documents VODUM: the Viewpoint and Opinion Discovery Unification Model We designed a novel topic model to address our research task: the Viewpoint and Opinion Discovery Unification Model [Thonet+, ECIR ’16] Sentence-level topic assignments z instead of word-level to better align topical words and opinion words with the sentence’s topic Document-level viewpoint assignments v: an opinionated document is usually written by one author, i.e., according to one viewpoint Viewpoint-specific topic distributions θ instead of document-specific: [Qiu+, NAACL ’13] observed that different viewpoints have different dominating topics 12 / 31
  • 23. C1: Viewpoint Discovery in Text Documents VODUM: the Viewpoint and Opinion Discovery Unification Model We designed a novel topic model to address our research task: the Viewpoint and Opinion Discovery Unification Model [Thonet+, ECIR ’16] Approximate posterior inference using collapsed Gibbs sampling Dirichlet distributions θ, π, φ0, φ1 integrated out Successively sample discrete latent variables z, v from their posterior distributions (i.e., given observations w, x) =⇒ Provides distributions’ estimators ˆθ, ˆπ, ˆφ0, ˆφ1 and assignments v and z for all docs/sentences 12 / 31
  • 24. C1: Viewpoint Discovery in Text Documents Experimental setup Hyperparameters set to fixed values: α = 0.01, β0 = β1 = 0.01, η = 100 Evaluation based on the Bitterlemons collection (http://www.bitterlemons.net/), introduced by [Lin+, CoNLL ’06], containing essays about the Israeli-Palestinian conflict =⇒ Number of viewpoints V set to 2 Total number of documents Number of essays written by Israeli authors Number of essays written by Palestinian authors 594 297 297 Viewpoint clustering performance measured in terms of Accuracy (≈ to what extent obtained clusters and groundtruth classes overlap) 13 / 31
  • 25. C1: Viewpoint Discovery in Text Documents Baselines State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S VODUM vs . . . . . . VODUM-D 14 / 31
  • 26. C1: Viewpoint Discovery in Text Documents Baselines State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S VODUM vs . . . . . . VODUM-O 14 / 31
  • 27. C1: Viewpoint Discovery in Text Documents Baselines State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S VODUM vs . . . . . . VODUM-W 14 / 31
  • 28. C1: Viewpoint Discovery in Text Documents Baselines State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] where aspects ≈ viewpoints Joint Topic Viewpoint (JTV) from [Trabelsi+, ICDM ’14] for arguing expression mining Latent Dirichlet Allocation (LDA) from [Blei+, NIPS ’01] with T = V Degenerate versions of VODUM: VODUM-D, VODUM-O, VODUM-W and VODUM-S VODUM vs . . . . . . VODUM-S 14 / 31
  • 29. C1: Viewpoint Discovery in Text Documents Evaluation: viewpoint clustering Clustering of document-level Israeli/Palestinian viewpoints (each boxplot drawn from 50 chains) Higher accuracy = better clustering performance VODUM TAM, JTV, LDA: VODUM overall beats state-of-the-art baselines TAM JTV, LDA: TAM performs the best among state-of-the-art baselines VODUM VODUM-D: viewpoint-specific topic distributions slightly improve accuracy VODUM VODUM-O: opinion and topical words partitioning considerably improves accuracy VODUM VODUM-W: sentence-level topic assignments considerably improve accuracy VODUM VODUM-S: document-level viewpoint assignments slightly improve accuracy VODUM TAM JTV LDA VODUM−D VODUM−O VODUM−W VODUM−S 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 Models Accuracy 15 / 31
  • 30. C1: Viewpoint Discovery in Text Documents Evaluation: qualitative analysis Most probable topical/opinion words for a topic (manually annotated as “Middle East conflicts”) Middle East conflicts Topical words israel palestinian syria jihad war iraq dai suicid destruct iran Middle East conflicts Opinion words (I) islam isra terrorist recent militari intern like heavi close american Middle East conflicts Opinion words (P) need win think sai don strong new sure believ commit Topical words are unbiased towards a viewpoint and clearly reflect Middle East conflicts: e.g., syria, war, iraq and destruct Coherent opinion words for the Israeli viewpoint: e.g., islam, terrorist and american Palestinian viewpoint remains non-specific about conflicts and do not mention islam or terrorism: e.g., win, strong and commit 16 / 31
  • 31. C2: Viewpoint Discovery in Social Networks C2: Viewpoint Discovery in Social Networks 17 / 31
  • 32. C2: Viewpoint Discovery in Social Networks Task Discover topics and viewpoints from social networking data, leveraging both posted text content and social interactions between users 18 / 31
  • 33. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] 19 / 31
  • 34. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Text content component 19 / 31
  • 35. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Text content component Observed data: tokens occurring in documents posted by users =⇒ 3 nested plates Latent topics assigned to each token Latent viewpoints assigned at document-level 19 / 31
  • 36. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Text content component Following the Topic-Aspect Model from [Paul+, AAAI ’10], definition of four word types specified by switch variables (level) and x (route): Background words =⇒ = 0, x = 0 Viewpoint words =⇒ = 0, x = 1 Topic words =⇒ = 1, x = 0 Viewpoint-topic words =⇒ = 1, x = 1 19 / 31
  • 37. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Social interaction component 19 / 31
  • 38. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Social interaction component Outgoing interactions for user u = interactions initiated by u on another user (recipient r) I #GOP RT Following SN-LDA from [Sachan+, WSDM ’14], viewpoints assigned to outgoing interactions (homophily) 19 / 31
  • 39. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Social interaction component . . . But outgoing interactions insufficient for some users I #GOP RT RT @ =⇒ We propose to also exploit incoming interactions 19 / 31
  • 40. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Social interaction component Incoming interactions for user u = interactions initiated by another user (sender s) on u I #GOP RT Viewpoint assigned to the document being interacted upon 19 / 31
  • 41. C2: Viewpoint Discovery in Social Networks SNVDM: the Social Network Viewpoint Discovery Model We defined the Social Network Viewpoint Discovery Model to jointly discover topics and viewpoints from posted text content and social interactions [Thonet+, CIKM ’17] Approximate inference based on Collapsed Gibbs Sampling Dirichlet/Bernoulli distributions σ, ψ, θ, π, φ, ξ integrated out Successively sample discrete latent variables , x, z, v, v from their posterior distributions (i.e., given observations w, s, r) Hyperparameters δ, γ, α, η, µ sampled according to the auxiliary variable technique following [Newman+, J. Mach. Learn. Res. ’09] and β fixed to 0.01 19 / 31
  • 42. C2: Viewpoint Discovery in Social Networks Limits of SNVDM’s social interaction component Some users have very few social interactions =⇒ Difficult to identify their viewpoints based on scarce direct interactions 20 / 31
  • 43. C2: Viewpoint Discovery in Social Networks Limits of SNVDM’s social interaction component We propose to extend SNVDM to leverage “aquaintances of acquaintances” (≈ friends of friends) How? =⇒ Generalized Pólya Urn scheme 20 / 31
  • 44. C2: Viewpoint Discovery in Social Networks SNVDM-GPU: extension of SNVDM based on Generalized Pólya Urn Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling E.g., for outgoing interaction o from user u on user u : p(vuo = v|ruo = u , rest) ∝ nuv + η 1 V nu• + η · nvu + µ 1 U nv• + µ SNVDM vs . . . p(vuo = v|ruo = u , rest) ∝ nuv + η 1 V nu• + η · nvu + u ∈R u τ λ nvu + µ 1 U nv• + U u =1 u ∈R u τ λ nvu + µ . . . SNVDM-GPU 21 / 31
  • 45. C2: Viewpoint Discovery in Social Networks Experimental setup Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican) Dataset #Users #Tweets #Tokens Vocabulary #Interactions Yes/Dem. No/Rep. Indyref 589 575 270,075 2,043,204 38,942 696,654 Midterms 767 778 113,545 975,199 25,312 241,741 22 / 31
  • 46. C2: Viewpoint Discovery in Social Networks Experimental setup Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican) State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] =⇒ Only text content to discover viewpoints and topics Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14] =⇒ Text content and outgoing interactions to discover communities (≈ viewpoints) and topics Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16] =⇒ Text content to discover viewpoints and topics, and parts of speech to distinguish between topic words and viewpoint-topic words 22 / 31
  • 47. C2: Viewpoint Discovery in Social Networks Experimental setup Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican) State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14] Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16] Degenerate version of SNVDM: SNVDM-WI (without incoming interactions) SNVDM vs . . . . . . SNVDM-WI 22 / 31
  • 48. C2: Viewpoint Discovery in Social Networks Experimental setup Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican) State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14] Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16] Degenerate version of SNVDM: SNVDM-WI (without incoming interactions) Proposed models: SNVDM SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns 22 / 31
  • 49. C2: Viewpoint Discovery in Social Networks Experimental setup Twitter datasets from [Brigadir+, WebSci ’15] on the 2014 Scottish Independence Referendum (v = Yes/No) and the 2014 US Midterm Elections (v = Democrat/Republican) State-of-the-art baselines: Topic-Aspect Model (TAM) from [Paul+, AAAI ’10] Social Network Latent Dirichlet Allocation (SN-LDA) from [Sachan+, WSDM ’14] Viewpoint and Opinion Discovery Unification Model (VODUM) from [Thonet+, ECIR ’16] Degenerate version of SNVDM: SNVDM-WI (without incoming interactions) Proposed models: SNVDM SNVDM-GPU (τ = 10): only 10 most interacting acquaintances used in Generalized Pólya Urns SNVDM-GPU (τ = ∞): all acquaintances used in Generalized Pólya Urns Viewpoint clustering performance measured in terms of Purity (≈ to what extent obtained clusters are homogeneous) and Normalized Mutual Information (information theoretic clustering measure) 22 / 31
  • 50. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Indyref Observation 1: consistent results across different numbers of topics 23 / 31
  • 51. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Indyref Observation 2: SNVDM, SNVDM-GPU (τ = 10), SNVDM-GPU (τ = ∞) all baselines 23 / 31
  • 52. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Indyref Observation 3: SN-LDA TAM, VODUM =⇒ interactions ↑↑↑ 23 / 31
  • 53. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Indyref Observation 4: SNVDM SNVDM-WI =⇒ incoming interactions ↑↑ 23 / 31
  • 54. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Indyref Observation 5: SNVDM-GPU (τ = ∞) SNVDM-GPU (τ = 10) SNVDM =⇒ GPU ↑ 23 / 31
  • 55. C2: Viewpoint Discovery in Social Networks Evaluation: viewpoint clustering on Midterms Observation 6: similar trends on Midterms but greater improvement for our models over baselines 23 / 31
  • 56. C2: Viewpoint Discovery in Social Networks Evaluation: impact of social network sparsity Clustering of users’ viewpoints on Indyref for different degrees of network sparsity (T = 10) Observation: performance degraded for lower percentage of interactions 24 / 31
  • 57. C2: Viewpoint Discovery in Social Networks Evaluation: qualitative analysis Most probable topic words and viewpoint-topic words for topics from Indyref and Midterms Topic: Scottish independence Neutral Viewpoint: Yes Viewpoint: No #indyref #voteyes #indyref scotland yes uk independence scotland salmond vote independence #bettertogether campaign westminster #scotdecides scottish vote separation uk independent currency people country thanks future #yes today independent #scotland say Topic: Energy and resources Neutral Viewpoint: Dem. Viewpoint: Rep. energy #actonclimate #4jobs house climate #obamacare new #p2 #jobs gas change gop natural #climatechange obama #energy clean bills #ff oil jobs #kxl energy house support #gop act economic seec watch Reasonable coherence of topic words and viewpoint-topic words Topic words indeed unbiased towards any viewpoints Use of viewpoint-specific hashtags and mention of different issues for different viewpoints 25 / 31
  • 59. Conclusion Summary of contributions VODUM discovers viewpoints and topics in text documents, exploiting parts of speech to distinguish between topical words and opinion words [Thonet+, ECIR ’16] Lessons learned: opinion and topical words partitioning ↑↑↑, sentence-level topic assignments ↑↑↑ SNVDM(-GPU) discovers viewpoints and topics in social networks, leveraging both posted text content and social interactions [Thonet+, CIKM ’17] Lessons learned: social interactions ↑↑↑ 27 / 31
  • 60. Conclusion Perspectives Integrate time dimension and geolocation, e.g., to analyze party support during elections Model viewpoints as real-valued variables to better capture nuanced opinions Design a viewpoint summarization framework to build argument maps and help mitigate the filter bubble and echo chamber phenomenon 28 / 31
  • 62. Conclusion References Barberá, P. (2015). Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data. Polit. Anal., 23(1), 76–91. Blei, D. M., Ng, A. Y., Jordan, M. I. (2001). Latent Dirichlet Allocation. In Proc. of NIPS ’01 (pp. 601–608). Brigadir, I., Greene, D., Cunningham, P. (2015). Analyzing Discourse Communities with Distributional Semantic Models. In Proc. of WebSci ’15. Fang, Y., Si, L., Somasundaram, N., Yu, Z. (2012). Mining Contrastive Opinions on Political Texts using Cross-Perspective Topic Model. In Proc. of WSDM ’12 (pp. 63–72). Joshi, A., Bhattacharyya, P., Carman, M. (2016). Political Issue Extraction Model: A Novel Hierarchical Topic Model That Uses Tweets By Political And Non-Political Authors. In Proc. of WASSA@NAACL-HLT ’16 (pp. 82–90). Lin, W.-H., Wilson, T., Wiebe, J., Hauptmann, A. (2006). Which Side are You on? Identifying Perspectives at the Document and Sentence Levels. In Proc. of CoNLL ’06 (pp. 109–116). Liu, Z., Zheng, Q., Wang, F., Tian, Z., Li, B. (2014). A Dynamic Nonparametric Model for Characterizing the Topical Communities in Social Streams. In Proc. of SDM ’14 (pp. 379–387). Newman, D., Asuncion, A., Smyth, P., Welling, M. (2009). Distributed Algorithms for Topic Models. J. of Mach. Learn. Res., 10, 1801–1828. Paul, M. J., Girju, R. (2010). A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics. In Proc. of AAAI ’10 (pp. 545–550). 30 / 31
  • 63. Conclusion References (continued) Paul, M. J., Zhai, C., Girju, R. (2010). Summarizing Contrastive Viewpoints in Opinionated Text. In Proc. of EMNLP ’10 (pp. 66–76). Qiu, M., Jiang, J. (2013). A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts. In Proc. NAACL-HLT ’13 (pp. 1031–1040). Qiu, M., Yang, L., Jiang, J. (2013). Modeling Interaction Features for Debate Side Clustering. In Proc. of CIKM ’13 (pp. 873–878). Sachan, M., Dubey, A., Srivastava, S., Xing, E. P., Hovy, E. (2014). Spatial Compactness meets Topical Consistency: Jointly Modeling Links and Content for Community Detection. In Proc. of WSDM ’14 (pp. 503–512). Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K. (2016). VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery. In Proc. of ECIR ’16 (pp. 533–545). Thonet, T., Cabanac, G., Boughanem, M., Pinel-Sauvagnat, K. (2017). Users Are Known by the Company They Keep: Topic Models for Viewpoint Discovery in Social Networks. In Proc. of CIKM ’17 (pp. 87-96). Trabelsi, A., Zaiane, O. R. (2014). Mining Contentious Documents Using an Unsupervised Topic Model Based Approach. In Proc. of ICDM ’14 (pp. 550–559). Turney, P. D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. of ACL ’02 (pp. 417–424). 31 / 31
  • 64. Conclusion Appendix: clustering metrics Given the groundtruth classes S = {S1, S2}, the obtained clusters C = {C1, C2}, and the document collection D: Acc(C, S) = 1 |D| max |C1 ∩ S1| + |C2 ∩ S2|, |C1 ∩ S2| + |C2 ∩ S1| Purity(C, S) = 1 |D| max |C1 ∩ S1|, |C1 ∩ S2| + max |C2 ∩ S1|, |C2 ∩ S2| NMI(C, S) = 2 I(C, S) H(C) + H(S) with I(C, S) = j,k |Cj ∩ Sk| |D| log |D| |Cj ∩ Sk| |Cj| |Sk| and H(C) = − j |Cj| |D| log |Cj| |D| 31 / 31
  • 65. Conclusion Appendix: perplexity analysis for VODUM and baselines Held-out perplexity computed through 10-fold cross validation for T ∈ {5, 10, 15, 20, 30, 50} Lower perplexity = better generalization performance TAM LDA for small number of topics ( 20) and LDA TAM for large number of topics ( 20) JTV TAM, LDA for all number of topics VODUM TAM, JTV, LDA for all number of topics ! VODUM’s vocabulary is different with that of TAM, JTV and LDA: partitioning of topical words and opinion words 10 20 30 40 50 400500600700800900 Number of topics (T) Averageperplexity Models VODUM TAM JTV LDA 31 / 31
  • 66. Conclusion Appendix: execution time for SNVDM and baselines Execution time (in seconds) of one Gibbs sampling iteration on Indyref (with T = 10) and Midterms (with T = 15) Indyref Midterms TAM 1.45 0.87 SN-LDA 1.18 0.64 VODUM 2.78 1.85 SNVDM-WII 2.08 1.08 SNVDM 2.49 1.15 SNVDM-GPU (τ = 10) 3.47 1.34 SNVDM-GPU (τ = ∞) 14.67 2.56 31 / 31
  • 67. Conclusion Appendix: Simple Pólya Urn scheme The compound Dirichlet-Multinomial distribution (used in LDA-based topic models) can be interpreted as an urn sampling metaphor with an over-replacement policy + Urn 2. Duplicate the drawn ball 3. Put back in the urn the original ball and its duplicate Infinite ball generator 1. Randomly draw a ball from the urn 31 / 31
  • 68. Conclusion Appendix: Generalized Pólya Urn scheme The Simple Pólya Urn scheme can be generalized by modifying the replacement rule to exploit similarities between balls’ colors [Mahmoud, 2008] + Urn 2. Duplicate the drawn ball and generate parts of balls for those similar to the drawn ball 3. Put back in the urn the original ball, its duplicate and the parts of similar balls Infinite ball generator + + 1. Randomly draw a ball from the urn 31 / 31
  • 69. Conclusion Appendix: SNVDM-GPU Using Generalized Pólya Urn in SNVDM requires minor changes in collapsed Gibbs sampling E.g., for outgoing interaction o from user u on user u : p(vuo = v|ruo = u , rest) ∝ nuv + η 1 V nu• + η · nvu + µ 1 U nv• + µ SNVDM vs . . . p(vuo = v|ruo = u , rest) ∝ nuv + η 1 V nu• + η · U u =1 Au u nvu + µ 1 U U u =1 Au • nvu + µ . . . SNVDM-GPU The addition matrix A defines the weight to put on count nvu for each u : Au u =    1 if u = u , λ if u is among top τ acquaintances of u , 0 otherwise with 0 ≤ λ ≤ 1 (λ = 0 =⇒ “vanilla” SNVDM) and τ ∈ N weaker link 31 / 31